[Qemu-devel] [RFC 0/6] enable numa configuration before machine

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
@ 2017-10-16 16:22 Igor Mammedov
  2017-10-16 16:22 ` [Qemu-devel] [RFC 1/6] numa: postpone options post-processing till machine_run_board_init() Igor Mammedov
                   ` (6 more replies)
  0 siblings, 7 replies; 93+ messages in thread
From: Igor Mammedov @ 2017-10-16 16:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: eblake, armbru, ehabkost, pkrempa, david, peter.maydell,
	pbonzini, cohuck

Series allows to configure NUMA mapping at runtime using QMP/HMP
interface. For that to happen it introduces a new '-paused' CLI option
which allows to pause QEMU before machine_init() is run and
adds new set-numa-node HMP/QMP commands which in conjuction with
info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
NUMA mapping for cpus.

HMP configuration session for CLI '-smp 1,maxcpus=2' would look like:

(qemu) info hotpluggable-cpus 
Hotpluggable CPUs:
  type: "qemu64-x86_64-cpu"
  vcpus_count: "1"
  CPUInstance Properties:
    socket-id: "1"
    core-id: "0"
    thread-id: "0"
  type: "qemu64-x86_64-cpu"
  vcpus_count: "1"
  qom_path: "/machine/unattached/device[0]"
  CPUInstance Properties:
    socket-id: "0"
    core-id: "0"
    thread-id: "0"
(qemu) set-numa-node node,nodeid=0
(qemu) set-numa-node node,nodeid=1
(qemu) set-numa-node cpu,socket-id=0,node-id=0
(qemu) set-numa-node cpu,socket-id=1,node-id=1
(qemu) info hotpluggable-cpus 
Hotpluggable CPUs:
  type: "qemu64-x86_64-cpu"
  vcpus_count: "1"
  CPUInstance Properties:
    node-id: "1"
    socket-id: "1"
    core-id: "0"
    thread-id: "0"
  type: "qemu64-x86_64-cpu"
  vcpus_count: "1"
  CPUInstance Properties:
    node-id: "0"
    socket-id: "0"
    core-id: "0"
    thread-id: "0"
(qemu) cont

git tree for testing:
  https://github.com/imammedo/qemu qmp_preconfig_rfc


CC: eblake@redhat.com
CC: armbru@redhat.com
CC: ehabkost@redhat.com
CC: pkrempa@redhat.com
CC: david@gibson.dropbear.id.au
CC: peter.maydell@linaro.org
CC: pbonzini@redhat.com
CC: cohuck@redhat.com

Igor Mammedov (6):
  numa: postpone options post-processing till machine_run_board_init()
  numa: split out NumaOptions parsing into parse_NumaOptions()
  possible_cpus: add CPUArchId::type field
  CLI: add -paused option
  HMP: add set-numa-node command
  QMP: add set-numa-node command

 hmp.h                      |  1 +
 include/hw/boards.h        |  2 ++
 include/sysemu/numa.h      |  2 ++
 include/sysemu/sysemu.h    |  1 +
 hmp-commands.hx            | 13 ++++++++
 hmp.c                      | 23 ++++++++++++++
 hw/arm/virt.c              |  3 +-
 hw/core/machine.c          | 18 ++++++-----
 hw/i386/pc.c               |  4 ++-
 hw/ppc/spapr.c             | 13 +++++---
 hw/s390x/s390-virtio-ccw.c |  1 +
 numa.c                     | 79 ++++++++++++++++++++++++++++++++++------------
 qapi-schema.json           | 13 ++++++++
 qemu-options.hx            | 15 +++++++++
 qmp.c                      |  5 +++
 vl.c                       | 54 ++++++++++++++++++++++++++++++-
 16 files changed, 210 insertions(+), 37 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [Qemu-devel] [RFC 1/6] numa: postpone options post-processing till machine_run_board_init()
  2017-10-16 16:22 [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP Igor Mammedov
@ 2017-10-16 16:22 ` Igor Mammedov
  2017-10-17  5:49   ` David Gibson
  2017-10-16 16:22 ` [Qemu-devel] [RFC 2/6] numa: split out NumaOptions parsing into parse_NumaOptions() Igor Mammedov
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 93+ messages in thread
From: Igor Mammedov @ 2017-10-16 16:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: eblake, armbru, ehabkost, pkrempa, david, peter.maydell,
	pbonzini, cohuck

in preparation for numa options to being handled via QMP before
machine_run_board_init(), move final numa configuration checks
and processing to machine_run_board_init() so it could take into
account both CLI (via parse_numa_opts()) and QMP input

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 include/sysemu/numa.h |  1 +
 hw/core/machine.c     |  5 +++--
 numa.c                | 13 ++++++++-----
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 5c6df28..c19e456 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -31,6 +31,7 @@ struct NumaNodeMem {
 
 extern NodeInfo numa_info[MAX_NODES];
 void parse_numa_opts(MachineState *ms);
+void numa_complete_configuration(MachineState *ms);
 void query_numa_node_mem(NumaNodeMem node_mem[]);
 extern QemuOptsList qemu_numa_opts;
 void numa_set_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node);
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 80647ed..f482211 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -701,7 +701,7 @@ static char *cpu_slot_to_string(const CPUArchId *cpu)
     return g_string_free(s, false);
 }
 
-static void machine_numa_finish_init(MachineState *machine)
+static void machine_numa_finish_cpu_init(MachineState *machine)
 {
     int i;
     bool default_mapping;
@@ -756,7 +756,8 @@ void machine_run_board_init(MachineState *machine)
     MachineClass *machine_class = MACHINE_GET_CLASS(machine);
 
     if (nb_numa_nodes) {
-        machine_numa_finish_init(machine);
+        numa_complete_configuration(machine);
+        machine_numa_finish_cpu_init(machine);
     }
     machine_class->init(machine);
 }
diff --git a/numa.c b/numa.c
index 8d78d95..18af4ff 100644
--- a/numa.c
+++ b/numa.c
@@ -424,15 +424,11 @@ void numa_default_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
     nodes[i].node_mem = size - usedmem;
 }
 
-void parse_numa_opts(MachineState *ms)
+void numa_complete_configuration(MachineState *ms)
 {
     int i;
     MachineClass *mc = MACHINE_GET_CLASS(ms);
 
-    if (qemu_opts_foreach(qemu_find_opts("numa"), parse_numa, ms, NULL)) {
-        exit(1);
-    }
-
     assert(max_numa_nodeid <= MAX_NODES);
 
     /* No support for sparse NUMA node IDs yet: */
@@ -508,6 +504,13 @@ void parse_numa_opts(MachineState *ms)
     }
 }
 
+void parse_numa_opts(MachineState *ms)
+{
+    if (qemu_opts_foreach(qemu_find_opts("numa"), parse_numa, ms, NULL)) {
+        exit(1);
+    }
+}
+
 void numa_cpu_pre_plug(const CPUArchId *slot, DeviceState *dev, Error **errp)
 {
     int node_id = object_property_get_int(OBJECT(dev), "node-id", &error_abort);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [Qemu-devel] [RFC 2/6] numa: split out NumaOptions parsing into parse_NumaOptions()
  2017-10-16 16:22 [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP Igor Mammedov
  2017-10-16 16:22 ` [Qemu-devel] [RFC 1/6] numa: postpone options post-processing till machine_run_board_init() Igor Mammedov
@ 2017-10-16 16:22 ` Igor Mammedov
  2017-10-18  3:27   ` David Gibson
  2017-10-16 16:22 ` [Qemu-devel] [RFC 3/6] possible_cpus: add CPUArchId::type field Igor Mammedov
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 93+ messages in thread
From: Igor Mammedov @ 2017-10-16 16:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: eblake, armbru, ehabkost, pkrempa, david, peter.maydell,
	pbonzini, cohuck

it will allow to reuse parse_NumaOptions() for parsing
configuration commands received via QMP interface

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 include/sysemu/numa.h |  1 +
 numa.c                | 48 +++++++++++++++++++++++++++++-------------------
 2 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index c19e456..aad4230 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -30,6 +30,7 @@ struct NumaNodeMem {
 };
 
 extern NodeInfo numa_info[MAX_NODES];
+int parse_numa(void *opaque, QemuOpts *opts, Error **errp);
 void parse_numa_opts(MachineState *ms);
 void numa_complete_configuration(MachineState *ms);
 void query_numa_node_mem(NumaNodeMem node_mem[]);
diff --git a/numa.c b/numa.c
index 18af4ff..d8e7dc0 100644
--- a/numa.c
+++ b/numa.c
@@ -254,28 +254,11 @@ static void parse_numa_distance(NumaDistOptions *dist, Error **errp)
     have_numa_distance = true;
 }
 
-static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
+static
+void parse_NumaOptions(MachineState *ms, NumaOptions *object, Error **errp)
 {
-    NumaOptions *object = NULL;
-    MachineState *ms = opaque;
     Error *err = NULL;
 
-    {
-        Visitor *v = opts_visitor_new(opts);
-        visit_type_NumaOptions(v, NULL, &object, &err);
-        visit_free(v);
-    }
-
-    if (err) {
-        goto end;
-    }
-
-    /* Fix up legacy suffix-less format */
-    if ((object->type == NUMA_OPTIONS_TYPE_NODE) && object->u.node.has_mem) {
-        const char *mem_str = qemu_opt_get(opts, "mem");
-        qemu_strtosz_MiB(mem_str, NULL, &object->u.node.mem);
-    }
-
     switch (object->type) {
     case NUMA_OPTIONS_TYPE_NODE:
         parse_numa_node(ms, &object->u.node, &err);
@@ -310,6 +293,33 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
     }
 
 end:
+    if (err) {
+        error_propagate(errp, err);
+    }
+}
+
+int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
+{
+    NumaOptions *object = NULL;
+    MachineState *ms = MACHINE(opaque);
+    Error *err = NULL;
+    Visitor *v = opts_visitor_new(opts);
+
+    visit_type_NumaOptions(v, NULL, &object, &err);
+    visit_free(v);
+    if (err) {
+        goto end;
+    }
+
+    /* Fix up legacy suffix-less format */
+    if ((object->type == NUMA_OPTIONS_TYPE_NODE) && object->u.node.has_mem) {
+        const char *mem_str = qemu_opt_get(opts, "mem");
+        qemu_strtosz_MiB(mem_str, NULL, &object->u.node.mem);
+    }
+
+    parse_NumaOptions(ms, object, &err);
+
+end:
     qapi_free_NumaOptions(object);
     if (err) {
         error_report_err(err);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [Qemu-devel] [RFC 3/6] possible_cpus: add CPUArchId::type field
  2017-10-16 16:22 [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP Igor Mammedov
  2017-10-16 16:22 ` [Qemu-devel] [RFC 1/6] numa: postpone options post-processing till machine_run_board_init() Igor Mammedov
  2017-10-16 16:22 ` [Qemu-devel] [RFC 2/6] numa: split out NumaOptions parsing into parse_NumaOptions() Igor Mammedov
@ 2017-10-16 16:22 ` Igor Mammedov
  2017-10-18 11:12   ` [Qemu-devel] [RFC v2 " Igor Mammedov
  2017-10-16 16:22 ` [Qemu-devel] [RFC 4/6] CLI: add -paused option Igor Mammedov
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 93+ messages in thread
From: Igor Mammedov @ 2017-10-16 16:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: eblake, armbru, ehabkost, pkrempa, david, peter.maydell,
	pbonzini, cohuck

For enabling early cpu to numa node configuration at runtime
qmp_query_hotpluggable_cpus() should provide a list of available
cpu slots at early stage, before machine_init() is called and
the 1st cpu is created, so that mgmt might be able to call it
and use output to set numa mapping.
Use MachineClass::possible_cpu_arch_ids() callback to set
cpu type info, along with the rest of possible cpu properties,
to let machine define which cpu type* will be used.

* for SPAPR it will be a spapr core type and for ARM/s390x/x86
  a respective descendant of CPUClass.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 include/hw/boards.h        |  2 ++
 hw/arm/virt.c              |  3 ++-
 hw/core/machine.c          | 12 ++++++------
 hw/i386/pc.c               |  4 +++-
 hw/ppc/spapr.c             | 13 ++++++++-----
 hw/s390x/s390-virtio-ccw.c |  1 +
 6 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 156e0a5..2c3e958 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -80,6 +80,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
  * CPUArchId:
  * @arch_id - architecture-dependent CPU ID of present or possible CPU
  * @cpu - pointer to corresponding CPU object if it's present on NULL otherwise
+ * @type - QOM class name of possible @cpu object
  * @props - CPU object properties, initialized by board
  * #vcpus_count - number of threads provided by @cpu object
  */
@@ -88,6 +89,7 @@ typedef struct {
     int64_t vcpus_count;
     CpuInstanceProperties props;
     Object *cpu;
+    const char *type;
 } CPUArchId;
 
 /**
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 9e18b41..88319db 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1357,7 +1357,7 @@ static void machvirt_init(MachineState *machine)
             break;
         }
 
-        cpuobj = object_new(machine->cpu_type);
+        cpuobj = object_new(possible_cpus->cpus[n].type);
         object_property_set_int(cpuobj, possible_cpus->cpus[n].arch_id,
                                 "mp-affinity", NULL);
 
@@ -1573,6 +1573,7 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
                                   sizeof(CPUArchId) * max_cpus);
     ms->possible_cpus->len = max_cpus;
     for (n = 0; n < ms->possible_cpus->len; n++) {
+        ms->possible_cpus->cpus[n].type = ms->cpu_type;
         ms->possible_cpus->cpus[n].arch_id =
             virt_cpu_mp_affinity(vms, n);
         ms->possible_cpus->cpus[n].props.has_thread_id = true;
diff --git a/hw/core/machine.c b/hw/core/machine.c
index f482211..1e1fca5 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -363,18 +363,18 @@ static void machine_init_notify(Notifier *notifier, void *data)
 HotpluggableCPUList *machine_query_hotpluggable_cpus(MachineState *machine)
 {
     int i;
-    Object *cpu;
     HotpluggableCPUList *head = NULL;
-    const char *cpu_type;
+    MachineClass *mc = MACHINE_GET_CLASS(machine);
+
+    /* force board to initialize possible_cpus if it hasn't been done yet */
+    mc->possible_cpu_arch_ids(machine);
 
-    cpu = machine->possible_cpus->cpus[0].cpu;
-    assert(cpu); /* Boot cpu is always present */
-    cpu_type = object_get_typename(cpu);
     for (i = 0; i < machine->possible_cpus->len; i++) {
+        Object *cpu;
         HotpluggableCPUList *list_item = g_new0(typeof(*list_item), 1);
         HotpluggableCPU *cpu_item = g_new0(typeof(*cpu_item), 1);
 
-        cpu_item->type = g_strdup(cpu_type);
+        cpu_item->type = g_strdup(machine->possible_cpus->cpus[i].type);
         cpu_item->vcpus_count = machine->possible_cpus->cpus[i].vcpus_count;
         cpu_item->props = g_memdup(&machine->possible_cpus->cpus[i].props,
                                    sizeof(*cpu_item->props));
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 8e307f7..99afb2f1 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1147,7 +1147,8 @@ void pc_cpus_init(PCMachineState *pcms)
     pcms->apic_id_limit = x86_cpu_apic_id_from_index(max_cpus - 1) + 1;
     possible_cpus = mc->possible_cpu_arch_ids(ms);
     for (i = 0; i < smp_cpus; i++) {
-        pc_new_cpu(ms->cpu_type, possible_cpus->cpus[i].arch_id, &error_fatal);
+        pc_new_cpu(possible_cpus->cpus[i].type, possible_cpus->cpus[i].arch_id,
+                   &error_fatal);
     }
 }
 
@@ -2269,6 +2270,7 @@ static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
     for (i = 0; i < ms->possible_cpus->len; i++) {
         X86CPUTopoInfo topo;
 
+        ms->possible_cpus->cpus[i].type = ms->cpu_type;
         ms->possible_cpus->cpus[i].vcpus_count = 1;
         ms->possible_cpus->cpus[i].arch_id = x86_cpu_apic_id_from_index(i);
         x86_topo_ids_from_apicid(ms->possible_cpus->cpus[i].arch_id,
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 29de845..9f455e8 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2129,11 +2129,6 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
     int boot_cores_nr = smp_cpus / smp_threads;
     int i;
 
-    if (!type) {
-        error_report("Unable to find sPAPR CPU Core definition");
-        exit(1);
-    }
-
     possible_cpus = mc->possible_cpu_arch_ids(machine);
     if (mc->has_hotpluggable_cpus) {
         if (smp_cpus % smp_threads) {
@@ -3419,6 +3414,7 @@ static int64_t spapr_get_default_cpu_node_id(const MachineState *ms, int idx)
 static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
 {
     int i;
+    const char *core_type;
     int spapr_max_cores = max_cpus / smp_threads;
     MachineClass *mc = MACHINE_GET_CLASS(machine);
 
@@ -3430,12 +3426,19 @@ static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
         return machine->possible_cpus;
     }
 
+    core_type = spapr_get_cpu_core_type(machine->cpu_type);
+    if (!core_type) {
+        error_report("Unable to find sPAPR CPU Core definition");
+        exit(1);
+    }
+
     machine->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
                              sizeof(CPUArchId) * spapr_max_cores);
     machine->possible_cpus->len = spapr_max_cores;
     for (i = 0; i < machine->possible_cpus->len; i++) {
         int core_id = i * smp_threads;
 
+        machine->possible_cpus->cpus[i].type = core_type;
         machine->possible_cpus->cpus[i].vcpus_count = smp_threads;
         machine->possible_cpus->cpus[i].arch_id = core_id;
         machine->possible_cpus->cpus[i].props.has_core_id = true;
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index f64db51..ae73fb6 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -385,6 +385,7 @@ static const CPUArchIdList *s390_possible_cpu_arch_ids(MachineState *ms)
                                   sizeof(CPUArchId) * max_cpus);
     ms->possible_cpus->len = max_cpus;
     for (i = 0; i < ms->possible_cpus->len; i++) {
+        ms->possible_cpus->cpus[i].type = ms->cpu_type;
         ms->possible_cpus->cpus[i].vcpus_count = 1;
         ms->possible_cpus->cpus[i].arch_id = i;
         ms->possible_cpus->cpus[i].props.has_core_id = true;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-16 16:22 [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP Igor Mammedov
                   ` (2 preceding siblings ...)
  2017-10-16 16:22 ` [Qemu-devel] [RFC 3/6] possible_cpus: add CPUArchId::type field Igor Mammedov
@ 2017-10-16 16:22 ` Igor Mammedov
  2017-10-16 16:35   ` Daniel P. Berrange
  2017-10-16 16:59   ` Eduardo Habkost
  2017-10-16 16:22 ` [Qemu-devel] [RFC 5/6] HMP: add set-numa-node command Igor Mammedov
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 93+ messages in thread
From: Igor Mammedov @ 2017-10-16 16:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: eblake, armbru, ehabkost, pkrempa, david, peter.maydell,
	pbonzini, cohuck

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 include/sysemu/sysemu.h |  1 +
 qemu-options.hx         | 15 ++++++++++++++
 qmp.c                   |  5 +++++
 vl.c                    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index b213696..3feb94f 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -66,6 +66,7 @@ typedef enum WakeupReason {
     QEMU_WAKEUP_REASON_OTHER,
 } WakeupReason;
 
+void qemu_exit_preconfig_request(void);
 void qemu_system_reset_request(ShutdownCause reason);
 void qemu_system_suspend_request(void);
 void qemu_register_suspend_notifier(Notifier *notifier);
diff --git a/qemu-options.hx b/qemu-options.hx
index 39225ae..bd44db8 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3498,6 +3498,21 @@ STEXI
 Run the emulation in single step mode.
 ETEXI
 
+DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
+    "-paused [state=]postconf|preconf\n"
+    "                postconf: pause QEMU after machine is initialized\n"
+    "                preconf: pause QEMU before machine is initialized\n",
+    QEMU_ARCH_ALL)
+STEXI
+@item -paused
+@findex -paused
+if set enabled interactive configuration stages before machine emulation starts.
+'postconf' option value mimics -S option behaviour where machine is created
+but emulation isn't started. 'preconf' option value pauses QEMU before machine
+is created, which allows to query and configure properties affecting machine
+initialization. Use monitor/QMP command 'cont' to go to exit paused state.
+ETEXI
+
 DEF("S", 0, QEMU_OPTION_S, \
     "-S              freeze CPU at startup (use 'c' to start execution)\n",
     QEMU_ARCH_ALL)
diff --git a/qmp.c b/qmp.c
index e8c3031..49e9a5c 100644
--- a/qmp.c
+++ b/qmp.c
@@ -167,6 +167,11 @@ void qmp_cont(Error **errp)
     BlockBackend *blk;
     Error *local_err = NULL;
 
+    if (runstate_check(RUN_STATE_PRELAUNCH)) {
+        qemu_exit_preconfig_request();
+        return;
+    }
+
     /* if there is a dump in background, we should wait until the dump
      * finished */
     if (dump_in_progress()) {
diff --git a/vl.c b/vl.c
index 3fed457..30631fd 100644
--- a/vl.c
+++ b/vl.c
@@ -555,6 +555,20 @@ static QemuOptsList qemu_fw_cfg_opts = {
     },
 };
 
+static QemuOptsList qemu_paused_opts = {
+    .name = "paused",
+    .implied_opt_name = "state",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_paused_opts.head),
+    .desc = {
+        {
+            .name = "state",
+            .type = QEMU_OPT_STRING,
+            .help = "Pause state of QEMU on startup",
+        },
+        { /* end of list */ }
+    },
+};
+
 /**
  * Get machine options
  *
@@ -1689,6 +1703,11 @@ static pid_t shutdown_pid;
 static int powerdown_requested;
 static int debug_requested;
 static int suspend_requested;
+static enum {
+    PRECONFIG_CONT = 0,
+    PRECONFIG_PAUSE,
+    PRECONFIG_SKIP,
+} preconfig_requested;
 static WakeupReason wakeup_reason;
 static NotifierList powerdown_notifiers =
     NOTIFIER_LIST_INITIALIZER(powerdown_notifiers);
@@ -1773,6 +1792,11 @@ static int qemu_debug_requested(void)
     return r;
 }
 
+void qemu_exit_preconfig_request(void)
+{
+    preconfig_requested = PRECONFIG_CONT;
+}
+
 /*
  * Reset the VM. Issue an event unless @reason is SHUTDOWN_CAUSE_NONE.
  */
@@ -1939,6 +1963,12 @@ static bool main_loop_should_exit(void)
     RunState r;
     ShutdownCause request;
 
+    if (runstate_check(RUN_STATE_PRELAUNCH)) {
+        if (preconfig_requested == PRECONFIG_CONT) {
+            preconfig_requested = PRECONFIG_SKIP;
+            return true;
+        }
+    }
     if (qemu_debug_requested()) {
         vm_stop(RUN_STATE_DEBUG);
     }
@@ -3177,6 +3207,7 @@ int main(int argc, char **argv, char **envp)
     qemu_add_opts(&qemu_icount_opts);
     qemu_add_opts(&qemu_semihosting_config_opts);
     qemu_add_opts(&qemu_fw_cfg_opts);
+    qemu_add_opts(&qemu_paused_opts);
     module_call_init(MODULE_INIT_OPTS);
 
     runstate_init();
@@ -3845,6 +3876,26 @@ int main(int argc, char **argv, char **envp)
                     exit(1);
                 }
                 break;
+            case QEMU_OPTION_paused:
+                {
+                    const char *value;
+
+                    opts = qemu_opts_parse_noisily(qemu_find_opts("paused"),
+                                                   optarg, true);
+                    if (opts == NULL) {
+                        exit(1);
+                    }
+                    value = qemu_opt_get(opts, "state");
+                    if (!strcmp(value, "postconf")) {
+                        autostart = 0;
+                    } else if (!strcmp(value, "preconf")) {
+                        preconfig_requested = PRECONFIG_PAUSE;
+                    } else {
+                        error_report("incomplete '-paused' option\n");
+                        exit(1);
+                    }
+                    break;
+                }
             case QEMU_OPTION_enable_kvm:
                 olist = qemu_find_opts("machine");
                 qemu_opts_parse_noisily(olist, "accel=kvm", false);
@@ -4731,7 +4782,6 @@ int main(int argc, char **argv, char **envp)
     current_machine->boot_order = boot_order;
     current_machine->cpu_model = cpu_model;
 
-
     /* parse features once if machine provides default cpu_type */
     if (machine_class->default_cpu_type) {
         current_machine->cpu_type = machine_class->default_cpu_type;
@@ -4741,6 +4791,8 @@ int main(int argc, char **argv, char **envp)
         }
     }
 
+    main_loop(); /* do monitor/qmp handling at preconfig state if requested */
+
     machine_run_board_init(current_machine);
 
     realtime_init();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [Qemu-devel] [RFC 5/6] HMP: add set-numa-node command
  2017-10-16 16:22 [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP Igor Mammedov
                   ` (3 preceding siblings ...)
  2017-10-16 16:22 ` [Qemu-devel] [RFC 4/6] CLI: add -paused option Igor Mammedov
@ 2017-10-16 16:22 ` Igor Mammedov
  2017-10-16 16:22 ` [Qemu-devel] [RFC 6/6] QMP: " Igor Mammedov
  2017-10-16 16:36 ` [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP Daniel P. Berrange
  6 siblings, 0 replies; 93+ messages in thread
From: Igor Mammedov @ 2017-10-16 16:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: eblake, armbru, ehabkost, pkrempa, david, peter.maydell,
	pbonzini, cohuck

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hmp.h           |  1 +
 hmp-commands.hx | 13 +++++++++++++
 hmp.c           | 23 +++++++++++++++++++++++
 numa.c          | 19 +++++++++++++++++++
 4 files changed, 56 insertions(+)

diff --git a/hmp.h b/hmp.h
index 3605003..6e87f46 100644
--- a/hmp.h
+++ b/hmp.h
@@ -146,5 +146,6 @@ void hmp_info_ramblock(Monitor *mon, const QDict *qdict);
 void hmp_hotpluggable_cpus(Monitor *mon, const QDict *qdict);
 void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict);
 void hmp_info_memory_size_summary(Monitor *mon, const QDict *qdict);
+void hmp_set_numa_node(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 1941e19..1f95b3f 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1837,6 +1837,19 @@ Print QOM properties of object at location @var{path}
 ETEXI
 
     {
+        .name       = "set-numa-node",
+        .args_type  = "numa:O",
+        .params     = "see -numa CLI option for possible options",
+        .help       = "assign CPU to numa node",
+        .cmd        = hmp_set_numa_node,
+    },
+
+STEXI
+@item qom-set @var{path} @var{property} @var{value}
+Set QOM property @var{property} of object at location @var{path} to value @var{value}
+ETEXI
+
+    {
         .name       = "qom-set",
         .args_type  = "path:s,property:s,value:s",
         .params     = "path property value",
diff --git a/hmp.c b/hmp.c
index 739d330..69bae5b 100644
--- a/hmp.c
+++ b/hmp.c
@@ -43,6 +43,7 @@
 #include "hw/intc/intc.h"
 #include "migration/snapshot.h"
 #include "migration/misc.h"
+#include "sysemu/numa.h"
 
 #ifdef CONFIG_SPICE
 #include <spice/enums.h>
@@ -2896,3 +2897,25 @@ void hmp_info_memory_size_summary(Monitor *mon, const QDict *qdict)
     }
     hmp_handle_error(mon, &err);
 }
+
+void hmp_set_numa_node(Monitor *mon, const QDict *qdict)
+{
+    QemuOpts *opts;
+    Error *err = NULL;
+    MachineState *ms = MACHINE(qdev_get_machine());
+
+    opts = qemu_opts_from_qdict(qemu_find_opts("numa"), qdict, &err);
+    if (err) {
+        goto end;
+    }
+
+    parse_numa(ms, opts, &err);
+    if (err) {
+        goto end;
+    }
+
+end:
+    if (err) {
+        hmp_handle_error(mon, &err);
+    }
+}
diff --git a/numa.c b/numa.c
index d8e7dc0..a530d9c 100644
--- a/numa.c
+++ b/numa.c
@@ -47,6 +47,12 @@ QemuOptsList qemu_numa_opts = {
     .desc = { { 0 } } /* validated with OptsVisitor */
 };
 
+static enum {
+    NUMA_DISABLED, /* no numa was configured */
+    NUMA_ENABLED,  /* numa configuration is in process */
+    NUMA_COMPLETE, /* configuration is complete and can't be altered */
+} numa_is_configured;
+
 static int have_memdevs = -1;
 static int max_numa_nodeid; /* Highest specified NUMA node ID, plus one.
                              * For all nodes, nodeid < max_numa_nodeid
@@ -259,6 +265,18 @@ void parse_NumaOptions(MachineState *ms, NumaOptions *object, Error **errp)
 {
     Error *err = NULL;
 
+    if (numa_is_configured == NUMA_COMPLETE) {
+        error_setg(&err, "NUMA configuration is finalized and can't be changed,"
+                   " use CLI option or set-numa-node HMP/QMP command at"
+                   " preconfig stage");
+        goto end;
+    } else if (runstate_check(RUN_STATE_PRELAUNCH)) {
+        numa_is_configured = NUMA_ENABLED;
+    } else {
+        error_setg(&err, "NUMA is not enabled at start/preconfig stage");
+        goto end;
+    }
+
     switch (object->type) {
     case NUMA_OPTIONS_TYPE_NODE:
         parse_numa_node(ms, &object->u.node, &err);
@@ -512,6 +530,7 @@ void numa_complete_configuration(MachineState *ms)
     } else {
         numa_set_mem_node_id(0, ram_size, 0);
     }
+    numa_is_configured = NUMA_COMPLETE;
 }
 
 void parse_numa_opts(MachineState *ms)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [Qemu-devel] [RFC 6/6] QMP: add set-numa-node command
  2017-10-16 16:22 [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP Igor Mammedov
                   ` (4 preceding siblings ...)
  2017-10-16 16:22 ` [Qemu-devel] [RFC 5/6] HMP: add set-numa-node command Igor Mammedov
@ 2017-10-16 16:22 ` Igor Mammedov
  2017-10-16 16:36 ` [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP Daniel P. Berrange
  6 siblings, 0 replies; 93+ messages in thread
From: Igor Mammedov @ 2017-10-16 16:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: eblake, armbru, ehabkost, pkrempa, david, peter.maydell,
	pbonzini, cohuck

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/core/machine.c |  1 +
 numa.c            |  5 +++++
 qapi-schema.json  | 13 +++++++++++++
 3 files changed, 19 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 1e1fca5..def9b9a 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -22,6 +22,7 @@
 #include "qemu/cutils.h"
 #include "sysemu/numa.h"
 #include "sysemu/qtest.h"
+#include "qmp-commands.h"
 
 static char *machine_get_accel(Object *obj, Error **errp)
 {
diff --git a/numa.c b/numa.c
index a530d9c..1c99fca 100644
--- a/numa.c
+++ b/numa.c
@@ -540,6 +540,11 @@ void parse_numa_opts(MachineState *ms)
     }
 }
 
+void qmp_set_numa_node(NumaOptions *cmd, Error **errp)
+{
+    parse_NumaOptions(MACHINE(qdev_get_machine()), cmd, errp);
+}
+
 void numa_cpu_pre_plug(const CPUArchId *slot, DeviceState *dev, Error **errp)
 {
     int node_id = object_property_get_int(OBJECT(dev), "node-id", &error_abort);
diff --git a/qapi-schema.json b/qapi-schema.json
index a9dd043..600f87b 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3200,3 +3200,16 @@
 # Since: 2.11
 ##
 { 'command': 'watchdog-set-action', 'data' : {'action': 'WatchdogAction'} }
+
+##
+# @set-numa-node:
+#
+# Runtime equivalent of '-numa' CLI option, available at
+# preconfigure stage to configure numa mapping before initializing
+# machine.
+#
+# Since 2.10
+##
+{ 'command': 'set-numa-node', 'boxed': true,
+  'data': 'NumaOptions'
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-16 16:22 ` [Qemu-devel] [RFC 4/6] CLI: add -paused option Igor Mammedov
@ 2017-10-16 16:35   ` Daniel P. Berrange
  2017-10-17  8:17     ` Igor Mammedov
  2017-10-20 15:38     ` Eduardo Habkost
  2017-10-16 16:59   ` Eduardo Habkost
  1 sibling, 2 replies; 93+ messages in thread
From: Daniel P. Berrange @ 2017-10-16 16:35 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, peter.maydell, pkrempa, ehabkost, cohuck, armbru,
	pbonzini, david

On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:

This really needs to have a commit message that provides justification
for why this option is needed when we already have -S that is used
to allow configuration before the guest starts.

> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  include/sysemu/sysemu.h |  1 +
>  qemu-options.hx         | 15 ++++++++++++++
>  qmp.c                   |  5 +++++
>  vl.c                    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  4 files changed, 74 insertions(+), 1 deletion(-)
> 
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index b213696..3feb94f 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -66,6 +66,7 @@ typedef enum WakeupReason {
>      QEMU_WAKEUP_REASON_OTHER,
>  } WakeupReason;
>  
> +void qemu_exit_preconfig_request(void);
>  void qemu_system_reset_request(ShutdownCause reason);
>  void qemu_system_suspend_request(void);
>  void qemu_register_suspend_notifier(Notifier *notifier);
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 39225ae..bd44db8 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -3498,6 +3498,21 @@ STEXI
>  Run the emulation in single step mode.
>  ETEXI
>  
> +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> +    "-paused [state=]postconf|preconf\n"
> +    "                postconf: pause QEMU after machine is initialized\n"
> +    "                preconf: pause QEMU before machine is initialized\n",
> +    QEMU_ARCH_ALL)
> +STEXI
> +@item -paused
> +@findex -paused
> +if set enabled interactive configuration stages before machine emulation starts.
> +'postconf' option value mimics -S option behaviour where machine is created
> +but emulation isn't started. 'preconf' option value pauses QEMU before machine
> +is created, which allows to query and configure properties affecting machine
> +initialization. Use monitor/QMP command 'cont' to go to exit paused state.
> +ETEXI

To me it feels rather unpleasant to be exposing this kind of detailed knowledge
about the steps QEMU goes through when consttructing the machine and expecting
the mgmt application to synchronize certain monitor actions against this.

> +
>  DEF("S", 0, QEMU_OPTION_S, \
>      "-S              freeze CPU at startup (use 'c' to start execution)\n",
>      QEMU_ARCH_ALL)
> diff --git a/qmp.c b/qmp.c
> index e8c3031..49e9a5c 100644
> --- a/qmp.c
> +++ b/qmp.c
> @@ -167,6 +167,11 @@ void qmp_cont(Error **errp)
>      BlockBackend *blk;
>      Error *local_err = NULL;
>  
> +    if (runstate_check(RUN_STATE_PRELAUNCH)) {
> +        qemu_exit_preconfig_request();
> +        return;
> +    }
> +
>      /* if there is a dump in background, we should wait until the dump
>       * finished */
>      if (dump_in_progress()) {
> diff --git a/vl.c b/vl.c
> index 3fed457..30631fd 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -555,6 +555,20 @@ static QemuOptsList qemu_fw_cfg_opts = {
>      },
>  };
>  
> +static QemuOptsList qemu_paused_opts = {
> +    .name = "paused",
> +    .implied_opt_name = "state",
> +    .head = QTAILQ_HEAD_INITIALIZER(qemu_paused_opts.head),
> +    .desc = {
> +        {
> +            .name = "state",
> +            .type = QEMU_OPT_STRING,
> +            .help = "Pause state of QEMU on startup",
> +        },
> +        { /* end of list */ }
> +    },
> +};
> +
>  /**
>   * Get machine options
>   *
> @@ -1689,6 +1703,11 @@ static pid_t shutdown_pid;
>  static int powerdown_requested;
>  static int debug_requested;
>  static int suspend_requested;
> +static enum {
> +    PRECONFIG_CONT = 0,
> +    PRECONFIG_PAUSE,
> +    PRECONFIG_SKIP,
> +} preconfig_requested;
>  static WakeupReason wakeup_reason;
>  static NotifierList powerdown_notifiers =
>      NOTIFIER_LIST_INITIALIZER(powerdown_notifiers);
> @@ -1773,6 +1792,11 @@ static int qemu_debug_requested(void)
>      return r;
>  }
>  
> +void qemu_exit_preconfig_request(void)
> +{
> +    preconfig_requested = PRECONFIG_CONT;
> +}
> +
>  /*
>   * Reset the VM. Issue an event unless @reason is SHUTDOWN_CAUSE_NONE.
>   */
> @@ -1939,6 +1963,12 @@ static bool main_loop_should_exit(void)
>      RunState r;
>      ShutdownCause request;
>  
> +    if (runstate_check(RUN_STATE_PRELAUNCH)) {
> +        if (preconfig_requested == PRECONFIG_CONT) {
> +            preconfig_requested = PRECONFIG_SKIP;
> +            return true;
> +        }
> +    }
>      if (qemu_debug_requested()) {
>          vm_stop(RUN_STATE_DEBUG);
>      }
> @@ -3177,6 +3207,7 @@ int main(int argc, char **argv, char **envp)
>      qemu_add_opts(&qemu_icount_opts);
>      qemu_add_opts(&qemu_semihosting_config_opts);
>      qemu_add_opts(&qemu_fw_cfg_opts);
> +    qemu_add_opts(&qemu_paused_opts);
>      module_call_init(MODULE_INIT_OPTS);
>  
>      runstate_init();
> @@ -3845,6 +3876,26 @@ int main(int argc, char **argv, char **envp)
>                      exit(1);
>                  }
>                  break;
> +            case QEMU_OPTION_paused:
> +                {
> +                    const char *value;
> +
> +                    opts = qemu_opts_parse_noisily(qemu_find_opts("paused"),
> +                                                   optarg, true);
> +                    if (opts == NULL) {
> +                        exit(1);
> +                    }
> +                    value = qemu_opt_get(opts, "state");
> +                    if (!strcmp(value, "postconf")) {
> +                        autostart = 0;
> +                    } else if (!strcmp(value, "preconf")) {
> +                        preconfig_requested = PRECONFIG_PAUSE;
> +                    } else {
> +                        error_report("incomplete '-paused' option\n");
> +                        exit(1);
> +                    }
> +                    break;
> +                }
>              case QEMU_OPTION_enable_kvm:
>                  olist = qemu_find_opts("machine");
>                  qemu_opts_parse_noisily(olist, "accel=kvm", false);
> @@ -4731,7 +4782,6 @@ int main(int argc, char **argv, char **envp)
>      current_machine->boot_order = boot_order;
>      current_machine->cpu_model = cpu_model;
>  
> -
>      /* parse features once if machine provides default cpu_type */
>      if (machine_class->default_cpu_type) {
>          current_machine->cpu_type = machine_class->default_cpu_type;
> @@ -4741,6 +4791,8 @@ int main(int argc, char **argv, char **envp)
>          }
>      }
>  
> +    main_loop(); /* do monitor/qmp handling at preconfig state if requested */
> +
>      machine_run_board_init(current_machine);
>  
>      realtime_init();
> -- 
> 2.7.4
> 
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-16 16:22 [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP Igor Mammedov
                   ` (5 preceding siblings ...)
  2017-10-16 16:22 ` [Qemu-devel] [RFC 6/6] QMP: " Igor Mammedov
@ 2017-10-16 16:36 ` Daniel P. Berrange
  2017-10-16 17:05   ` Eduardo Habkost
  2017-10-17  7:27   ` Igor Mammedov
  6 siblings, 2 replies; 93+ messages in thread
From: Daniel P. Berrange @ 2017-10-16 16:36 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, peter.maydell, pkrempa, ehabkost, cohuck, armbru,
	pbonzini, david

On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
> Series allows to configure NUMA mapping at runtime using QMP/HMP
> interface. For that to happen it introduces a new '-paused' CLI option
> which allows to pause QEMU before machine_init() is run and
> adds new set-numa-node HMP/QMP commands which in conjuction with
> info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> NUMA mapping for cpus.

What's the problem we're seeking solve here compared to what we currently
do for NUMA configuration ?

> 
> HMP configuration session for CLI '-smp 1,maxcpus=2' would look like:
> 
> (qemu) info hotpluggable-cpus 
> Hotpluggable CPUs:
>   type: "qemu64-x86_64-cpu"
>   vcpus_count: "1"
>   CPUInstance Properties:
>     socket-id: "1"
>     core-id: "0"
>     thread-id: "0"
>   type: "qemu64-x86_64-cpu"
>   vcpus_count: "1"
>   qom_path: "/machine/unattached/device[0]"
>   CPUInstance Properties:
>     socket-id: "0"
>     core-id: "0"
>     thread-id: "0"
> (qemu) set-numa-node node,nodeid=0
> (qemu) set-numa-node node,nodeid=1
> (qemu) set-numa-node cpu,socket-id=0,node-id=0
> (qemu) set-numa-node cpu,socket-id=1,node-id=1
> (qemu) info hotpluggable-cpus 
> Hotpluggable CPUs:
>   type: "qemu64-x86_64-cpu"
>   vcpus_count: "1"
>   CPUInstance Properties:
>     node-id: "1"
>     socket-id: "1"
>     core-id: "0"
>     thread-id: "0"
>   type: "qemu64-x86_64-cpu"
>   vcpus_count: "1"
>   CPUInstance Properties:
>     node-id: "0"
>     socket-id: "0"
>     core-id: "0"
>     thread-id: "0"
> (qemu) cont
> 
> git tree for testing:
>   https://github.com/imammedo/qemu qmp_preconfig_rfc
> 
> 
> CC: eblake@redhat.com
> CC: armbru@redhat.com
> CC: ehabkost@redhat.com
> CC: pkrempa@redhat.com
> CC: david@gibson.dropbear.id.au
> CC: peter.maydell@linaro.org
> CC: pbonzini@redhat.com
> CC: cohuck@redhat.com
> 
> Igor Mammedov (6):
>   numa: postpone options post-processing till machine_run_board_init()
>   numa: split out NumaOptions parsing into parse_NumaOptions()
>   possible_cpus: add CPUArchId::type field
>   CLI: add -paused option
>   HMP: add set-numa-node command
>   QMP: add set-numa-node command
> 
>  hmp.h                      |  1 +
>  include/hw/boards.h        |  2 ++
>  include/sysemu/numa.h      |  2 ++
>  include/sysemu/sysemu.h    |  1 +
>  hmp-commands.hx            | 13 ++++++++
>  hmp.c                      | 23 ++++++++++++++
>  hw/arm/virt.c              |  3 +-
>  hw/core/machine.c          | 18 ++++++-----
>  hw/i386/pc.c               |  4 ++-
>  hw/ppc/spapr.c             | 13 +++++---
>  hw/s390x/s390-virtio-ccw.c |  1 +
>  numa.c                     | 79 ++++++++++++++++++++++++++++++++++------------
>  qapi-schema.json           | 13 ++++++++
>  qemu-options.hx            | 15 +++++++++
>  qmp.c                      |  5 +++
>  vl.c                       | 54 ++++++++++++++++++++++++++++++-
>  16 files changed, 210 insertions(+), 37 deletions(-)
> 
> -- 
> 2.7.4
> 
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-16 16:22 ` [Qemu-devel] [RFC 4/6] CLI: add -paused option Igor Mammedov
  2017-10-16 16:35   ` Daniel P. Berrange
@ 2017-10-16 16:59   ` Eduardo Habkost
  2017-10-16 17:01     ` Paolo Bonzini
                       ` (2 more replies)
  1 sibling, 3 replies; 93+ messages in thread
From: Eduardo Habkost @ 2017-10-16 16:59 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, eblake, armbru, pkrempa, david, peter.maydell,
	pbonzini, cohuck

On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  include/sysemu/sysemu.h |  1 +
>  qemu-options.hx         | 15 ++++++++++++++
>  qmp.c                   |  5 +++++
>  vl.c                    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  4 files changed, 74 insertions(+), 1 deletion(-)
> 
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index b213696..3feb94f 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -66,6 +66,7 @@ typedef enum WakeupReason {
>      QEMU_WAKEUP_REASON_OTHER,
>  } WakeupReason;
>  
> +void qemu_exit_preconfig_request(void);
>  void qemu_system_reset_request(ShutdownCause reason);
>  void qemu_system_suspend_request(void);
>  void qemu_register_suspend_notifier(Notifier *notifier);
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 39225ae..bd44db8 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -3498,6 +3498,21 @@ STEXI
>  Run the emulation in single step mode.
>  ETEXI
>  
> +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> +    "-paused [state=]postconf|preconf\n"
> +    "                postconf: pause QEMU after machine is initialized\n"
> +    "                preconf: pause QEMU before machine is initialized\n",
> +    QEMU_ARCH_ALL)

I would like to allow pausing before machine-type is selected, so
management could run query-machines before choosing a
machine-type.  Would that need a third "-pause" mode, or will we
be able to change "preconf" to pause before select_machine() is
called?

The same probably applies to other things initialized before
machine_run_board_init() that could be configurable using QMP,
including but not limited to:
* Accelerator configuration
* Registering global properties
* RAM size
* SMP/CPU configuration


> +STEXI
> +@item -paused
> +@findex -paused
> +if set enabled interactive configuration stages before machine emulation starts.
> +'postconf' option value mimics -S option behaviour where machine is created
> +but emulation isn't started. 'preconf' option value pauses QEMU before machine
> +is created, which allows to query and configure properties affecting machine
> +initialization. Use monitor/QMP command 'cont' to go to exit paused state.

What if "-S" is used at the same time"?  Will "cont" only
initialize the machine and wait for another "cont" command to
start the VCPUs, or will it unpause everything?


> +ETEXI
> +
>  DEF("S", 0, QEMU_OPTION_S, \
>      "-S              freeze CPU at startup (use 'c' to start execution)\n",
>      QEMU_ARCH_ALL)
> diff --git a/qmp.c b/qmp.c
> index e8c3031..49e9a5c 100644
> --- a/qmp.c
> +++ b/qmp.c
> @@ -167,6 +167,11 @@ void qmp_cont(Error **errp)
>      BlockBackend *blk;
>      Error *local_err = NULL;
>  
> +    if (runstate_check(RUN_STATE_PRELAUNCH)) {
> +        qemu_exit_preconfig_request();
> +        return;
> +    }
> +
>      /* if there is a dump in background, we should wait until the dump
>       * finished */
>      if (dump_in_progress()) {
> diff --git a/vl.c b/vl.c
> index 3fed457..30631fd 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -555,6 +555,20 @@ static QemuOptsList qemu_fw_cfg_opts = {
>      },
>  };
>  
> +static QemuOptsList qemu_paused_opts = {
> +    .name = "paused",
> +    .implied_opt_name = "state",
> +    .head = QTAILQ_HEAD_INITIALIZER(qemu_paused_opts.head),
> +    .desc = {
> +        {
> +            .name = "state",
> +            .type = QEMU_OPT_STRING,
> +            .help = "Pause state of QEMU on startup",
> +        },
> +        { /* end of list */ }
> +    },
> +};
> +
>  /**
>   * Get machine options
>   *
> @@ -1689,6 +1703,11 @@ static pid_t shutdown_pid;
>  static int powerdown_requested;
>  static int debug_requested;
>  static int suspend_requested;
> +static enum {
> +    PRECONFIG_CONT = 0,
> +    PRECONFIG_PAUSE,
> +    PRECONFIG_SKIP,
> +} preconfig_requested;
>  static WakeupReason wakeup_reason;
>  static NotifierList powerdown_notifiers =
>      NOTIFIER_LIST_INITIALIZER(powerdown_notifiers);
> @@ -1773,6 +1792,11 @@ static int qemu_debug_requested(void)
>      return r;
>  }
>  
> +void qemu_exit_preconfig_request(void)
> +{
> +    preconfig_requested = PRECONFIG_CONT;
> +}
> +
>  /*
>   * Reset the VM. Issue an event unless @reason is SHUTDOWN_CAUSE_NONE.
>   */
> @@ -1939,6 +1963,12 @@ static bool main_loop_should_exit(void)
>      RunState r;
>      ShutdownCause request;
>  
> +    if (runstate_check(RUN_STATE_PRELAUNCH)) {
> +        if (preconfig_requested == PRECONFIG_CONT) {
> +            preconfig_requested = PRECONFIG_SKIP;
> +            return true;
> +        }
> +    }
>      if (qemu_debug_requested()) {
>          vm_stop(RUN_STATE_DEBUG);
>      }
> @@ -3177,6 +3207,7 @@ int main(int argc, char **argv, char **envp)
>      qemu_add_opts(&qemu_icount_opts);
>      qemu_add_opts(&qemu_semihosting_config_opts);
>      qemu_add_opts(&qemu_fw_cfg_opts);
> +    qemu_add_opts(&qemu_paused_opts);
>      module_call_init(MODULE_INIT_OPTS);
>  
>      runstate_init();
> @@ -3845,6 +3876,26 @@ int main(int argc, char **argv, char **envp)
>                      exit(1);
>                  }
>                  break;
> +            case QEMU_OPTION_paused:
> +                {
> +                    const char *value;
> +
> +                    opts = qemu_opts_parse_noisily(qemu_find_opts("paused"),
> +                                                   optarg, true);
> +                    if (opts == NULL) {
> +                        exit(1);
> +                    }
> +                    value = qemu_opt_get(opts, "state");
> +                    if (!strcmp(value, "postconf")) {
> +                        autostart = 0;
> +                    } else if (!strcmp(value, "preconf")) {
> +                        preconfig_requested = PRECONFIG_PAUSE;
> +                    } else {
> +                        error_report("incomplete '-paused' option\n");
> +                        exit(1);
> +                    }
> +                    break;
> +                }
>              case QEMU_OPTION_enable_kvm:
>                  olist = qemu_find_opts("machine");
>                  qemu_opts_parse_noisily(olist, "accel=kvm", false);
> @@ -4731,7 +4782,6 @@ int main(int argc, char **argv, char **envp)
>      current_machine->boot_order = boot_order;
>      current_machine->cpu_model = cpu_model;
>  
> -
>      /* parse features once if machine provides default cpu_type */
>      if (machine_class->default_cpu_type) {
>          current_machine->cpu_type = machine_class->default_cpu_type;
> @@ -4741,6 +4791,8 @@ int main(int argc, char **argv, char **envp)
>          }
>      }
>  
> +    main_loop(); /* do monitor/qmp handling at preconfig state if requested */
> +

I'm impressed by the simplicity of the implementation.  I though
this would involve moving everything between this line and the
next main_loop() call outside main(), so they would be called by
qmp_cont().

Any expert on GLib's Event Loop sees any gotcha in this method?

I would like to do a careful review of main_loop_wait() and
main_loop_should_exit(), to ensure those functions don't depend
on anything that's initialized after this line.  Probably a few
existing QMP commands can crash if machine is not initialized
yet?

The rules and expectations on initialization ordering are very
subtle, I suggest including test code for the new feature to
ensure nothing crashes or breaks in the future.


>      machine_run_board_init(current_machine);
>  
>      realtime_init();
> -- 
> 2.7.4
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-16 16:59   ` Eduardo Habkost
@ 2017-10-16 17:01     ` Paolo Bonzini
  2017-10-16 17:17       ` Eduardo Habkost
  2017-10-17 14:48       ` Daniel P. Berrange
  2017-10-17  9:10     ` Igor Mammedov
  2017-10-19 10:42     ` David Gibson
  2 siblings, 2 replies; 93+ messages in thread
From: Paolo Bonzini @ 2017-10-16 17:01 UTC (permalink / raw)
  To: Eduardo Habkost, Igor Mammedov
  Cc: qemu-devel, eblake, armbru, pkrempa, david, peter.maydell, cohuck

On 16/10/2017 18:59, Eduardo Habkost wrote:
>> +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
>> +    "-paused [state=]postconf|preconf\n"
>> +    "                postconf: pause QEMU after machine is initialized\n"
>> +    "                preconf: pause QEMU before machine is initialized\n",
>> +    QEMU_ARCH_ALL)
> I would like to allow pausing before machine-type is selected, so
> management could run query-machines before choosing a
> machine-type.  Would that need a third "-pause" mode, or will we
> be able to change "preconf" to pause before select_machine() is
> called?
> 
> The same probably applies to other things initialized before
> machine_run_board_init() that could be configurable using QMP,
> including but not limited to:
> * Accelerator configuration
> * Registering global properties
> * RAM size
> * SMP/CPU configuration

Should (or could) "-M none" be changed in a backwards-compatible way to
allow such preconfiguration?  For example

  qemu -M none -monitor stdio
  (qemu) machine-set-options pc,accel=kvm
  (qemu) c

Paolo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-16 16:36 ` [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP Daniel P. Berrange
@ 2017-10-16 17:05   ` Eduardo Habkost
  2017-10-17  7:27   ` Igor Mammedov
  1 sibling, 0 replies; 93+ messages in thread
From: Eduardo Habkost @ 2017-10-16 17:05 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Igor Mammedov, qemu-devel, peter.maydell, pkrempa, cohuck,
	armbru, pbonzini, david, Laine Stump

On Mon, Oct 16, 2017 at 05:36:36PM +0100, Daniel P. Berrange wrote:
> On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
> > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > interface. For that to happen it introduces a new '-paused' CLI option
> > which allows to pause QEMU before machine_init() is run and
> > adds new set-numa-node HMP/QMP commands which in conjuction with
> > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > NUMA mapping for cpus.
> 
> What's the problem we're seeking solve here compared to what we currently
> do for NUMA configuration ?

I don't understand completely what exactly Igor is trying to
solve, but this new mode would be very helpful to address the
issues mentioned at:

http://www.linux-kvm.org/images/4/46/03x06A-Eduardo_HabkostMachine-type_Introspection_and_Configuration_Where_Are_We_Going.pdf
(starting on slide 12)

> 
> > 
> > HMP configuration session for CLI '-smp 1,maxcpus=2' would look like:
> > 
> > (qemu) info hotpluggable-cpus 
> > Hotpluggable CPUs:
> >   type: "qemu64-x86_64-cpu"
> >   vcpus_count: "1"
> >   CPUInstance Properties:
> >     socket-id: "1"
> >     core-id: "0"
> >     thread-id: "0"
> >   type: "qemu64-x86_64-cpu"
> >   vcpus_count: "1"
> >   qom_path: "/machine/unattached/device[0]"
> >   CPUInstance Properties:
> >     socket-id: "0"
> >     core-id: "0"
> >     thread-id: "0"
> > (qemu) set-numa-node node,nodeid=0
> > (qemu) set-numa-node node,nodeid=1
> > (qemu) set-numa-node cpu,socket-id=0,node-id=0
> > (qemu) set-numa-node cpu,socket-id=1,node-id=1
> > (qemu) info hotpluggable-cpus 
> > Hotpluggable CPUs:
> >   type: "qemu64-x86_64-cpu"
> >   vcpus_count: "1"
> >   CPUInstance Properties:
> >     node-id: "1"
> >     socket-id: "1"
> >     core-id: "0"
> >     thread-id: "0"
> >   type: "qemu64-x86_64-cpu"
> >   vcpus_count: "1"
> >   CPUInstance Properties:
> >     node-id: "0"
> >     socket-id: "0"
> >     core-id: "0"
> >     thread-id: "0"
> > (qemu) cont
> > 
> > git tree for testing:
> >   https://github.com/imammedo/qemu qmp_preconfig_rfc
> > 
> > 
> > CC: eblake@redhat.com
> > CC: armbru@redhat.com
> > CC: ehabkost@redhat.com
> > CC: pkrempa@redhat.com
> > CC: david@gibson.dropbear.id.au
> > CC: peter.maydell@linaro.org
> > CC: pbonzini@redhat.com
> > CC: cohuck@redhat.com
> > 
> > Igor Mammedov (6):
> >   numa: postpone options post-processing till machine_run_board_init()
> >   numa: split out NumaOptions parsing into parse_NumaOptions()
> >   possible_cpus: add CPUArchId::type field
> >   CLI: add -paused option
> >   HMP: add set-numa-node command
> >   QMP: add set-numa-node command
> > 
> >  hmp.h                      |  1 +
> >  include/hw/boards.h        |  2 ++
> >  include/sysemu/numa.h      |  2 ++
> >  include/sysemu/sysemu.h    |  1 +
> >  hmp-commands.hx            | 13 ++++++++
> >  hmp.c                      | 23 ++++++++++++++
> >  hw/arm/virt.c              |  3 +-
> >  hw/core/machine.c          | 18 ++++++-----
> >  hw/i386/pc.c               |  4 ++-
> >  hw/ppc/spapr.c             | 13 +++++---
> >  hw/s390x/s390-virtio-ccw.c |  1 +
> >  numa.c                     | 79 ++++++++++++++++++++++++++++++++++------------
> >  qapi-schema.json           | 13 ++++++++
> >  qemu-options.hx            | 15 +++++++++
> >  qmp.c                      |  5 +++
> >  vl.c                       | 54 ++++++++++++++++++++++++++++++-
> >  16 files changed, 210 insertions(+), 37 deletions(-)
> > 
> > -- 
> > 2.7.4
> > 
> > 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

-- 
Eduardo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-16 17:01     ` Paolo Bonzini
@ 2017-10-16 17:17       ` Eduardo Habkost
  2017-10-17  8:47         ` Paolo Bonzini
  2017-10-17 14:48       ` Daniel P. Berrange
  1 sibling, 1 reply; 93+ messages in thread
From: Eduardo Habkost @ 2017-10-16 17:17 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Igor Mammedov, qemu-devel, eblake, armbru, pkrempa, david,
	peter.maydell, cohuck

On Mon, Oct 16, 2017 at 07:01:01PM +0200, Paolo Bonzini wrote:
> On 16/10/2017 18:59, Eduardo Habkost wrote:
> >> +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> >> +    "-paused [state=]postconf|preconf\n"
> >> +    "                postconf: pause QEMU after machine is initialized\n"
> >> +    "                preconf: pause QEMU before machine is initialized\n",
> >> +    QEMU_ARCH_ALL)
> > I would like to allow pausing before machine-type is selected, so
> > management could run query-machines before choosing a
> > machine-type.  Would that need a third "-pause" mode, or will we
> > be able to change "preconf" to pause before select_machine() is
> > called?
> > 
> > The same probably applies to other things initialized before
> > machine_run_board_init() that could be configurable using QMP,
> > including but not limited to:
> > * Accelerator configuration
> > * Registering global properties
> > * RAM size
> > * SMP/CPU configuration
> 
> Should (or could) "-M none" be changed in a backwards-compatible way to
> allow such preconfiguration?  For example
> 
>   qemu -M none -monitor stdio
>   (qemu) machine-set-options pc,accel=kvm
>   (qemu) c

Sounds like an interesting idea.  It would require ensuring it's
really safe to destroy current_machine/accel (and other global
state) and replace them with another object on the fly (which is
probably a nice goal by itself).

-- 
Eduardo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 1/6] numa: postpone options post-processing till machine_run_board_init()
  2017-10-16 16:22 ` [Qemu-devel] [RFC 1/6] numa: postpone options post-processing till machine_run_board_init() Igor Mammedov
@ 2017-10-17  5:49   ` David Gibson
  0 siblings, 0 replies; 93+ messages in thread
From: David Gibson @ 2017-10-17  5:49 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, eblake, armbru, ehabkost, pkrempa, peter.maydell,
	pbonzini, cohuck

[-- Attachment #1: Type: text/plain, Size: 3232 bytes --]

On Mon, Oct 16, 2017 at 06:22:51PM +0200, Igor Mammedov wrote:
> in preparation for numa options to being handled via QMP before
> machine_run_board_init(), move final numa configuration checks
> and processing to machine_run_board_init() so it could take into
> account both CLI (via parse_numa_opts()) and QMP input
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  include/sysemu/numa.h |  1 +
>  hw/core/machine.c     |  5 +++--
>  numa.c                | 13 ++++++++-----
>  3 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index 5c6df28..c19e456 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -31,6 +31,7 @@ struct NumaNodeMem {
>  
>  extern NodeInfo numa_info[MAX_NODES];
>  void parse_numa_opts(MachineState *ms);
> +void numa_complete_configuration(MachineState *ms);
>  void query_numa_node_mem(NumaNodeMem node_mem[]);
>  extern QemuOptsList qemu_numa_opts;
>  void numa_set_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node);
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 80647ed..f482211 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -701,7 +701,7 @@ static char *cpu_slot_to_string(const CPUArchId *cpu)
>      return g_string_free(s, false);
>  }
>  
> -static void machine_numa_finish_init(MachineState *machine)
> +static void machine_numa_finish_cpu_init(MachineState *machine)
>  {
>      int i;
>      bool default_mapping;
> @@ -756,7 +756,8 @@ void machine_run_board_init(MachineState *machine)
>      MachineClass *machine_class = MACHINE_GET_CLASS(machine);
>  
>      if (nb_numa_nodes) {
> -        machine_numa_finish_init(machine);
> +        numa_complete_configuration(machine);
> +        machine_numa_finish_cpu_init(machine);
>      }
>      machine_class->init(machine);
>  }
> diff --git a/numa.c b/numa.c
> index 8d78d95..18af4ff 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -424,15 +424,11 @@ void numa_default_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
>      nodes[i].node_mem = size - usedmem;
>  }
>  
> -void parse_numa_opts(MachineState *ms)
> +void numa_complete_configuration(MachineState *ms)
>  {
>      int i;
>      MachineClass *mc = MACHINE_GET_CLASS(ms);
>  
> -    if (qemu_opts_foreach(qemu_find_opts("numa"), parse_numa, ms, NULL)) {
> -        exit(1);
> -    }
> -
>      assert(max_numa_nodeid <= MAX_NODES);
>  
>      /* No support for sparse NUMA node IDs yet: */
> @@ -508,6 +504,13 @@ void parse_numa_opts(MachineState *ms)
>      }
>  }
>  
> +void parse_numa_opts(MachineState *ms)
> +{
> +    if (qemu_opts_foreach(qemu_find_opts("numa"), parse_numa, ms, NULL)) {
> +        exit(1);
> +    }
> +}
> +
>  void numa_cpu_pre_plug(const CPUArchId *slot, DeviceState *dev, Error **errp)
>  {
>      int node_id = object_property_get_int(OBJECT(dev), "node-id", &error_abort);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-16 16:36 ` [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP Daniel P. Berrange
  2017-10-16 17:05   ` Eduardo Habkost
@ 2017-10-17  7:27   ` Igor Mammedov
  2017-10-17 15:07     ` Daniel P. Berrange
  1 sibling, 1 reply; 93+ messages in thread
From: Igor Mammedov @ 2017-10-17  7:27 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: qemu-devel, peter.maydell, pkrempa, ehabkost, cohuck, armbru,
	pbonzini, david

On Mon, 16 Oct 2017 17:36:36 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
> > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > interface. For that to happen it introduces a new '-paused' CLI option
> > which allows to pause QEMU before machine_init() is run and
> > adds new set-numa-node HMP/QMP commands which in conjuction with
> > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > NUMA mapping for cpus.  
> 
> What's the problem we're seeking solve here compared to what we currently
> do for NUMA configuration ?
From RHBZ1382425
"
Current -numa CLI interface is quite limited in terms that allow map CPUs to NUMA nodes as it requires to provide cpu_index values which are non obvious and depend on machine/arch. As result libvirt has to assume/re-implement cpu_index allocation logic to provide valid values for -numa cpus=... QEMU CLI option.

Now QEMU has in place generic CPU hotplug interface and ability to query possible CPUs layout (with QMP command query-hotpluggable-cpus),
however it requires to run QEMU once per each machine type and topology configuration (-M & -smp combination) which would be too taxing for mgmt layer to do.
Currently proposed idea to solve the issue is to do NUMA mapping at runtime:
 1. start QEMU in stopped mode with needed -M & -smp configuration
    but leave out "-numa cpus" options
 2. query possible cpus layout (query-hotpluggable-cpus)
 3. use new QMP command to map CPUs to NUMA node in terms of generic CPU
    hotplug interface (socket/core/thread)

    commit (419fcde numa: add '-numa cpu,...' option for property based node mapping)
    added CLI option for topology based

...
 4. continue VM exection
"

> > 
> > HMP configuration session for CLI '-smp 1,maxcpus=2' would look like:
> > 
> > (qemu) info hotpluggable-cpus 
> > Hotpluggable CPUs:
> >   type: "qemu64-x86_64-cpu"
> >   vcpus_count: "1"
> >   CPUInstance Properties:
> >     socket-id: "1"
> >     core-id: "0"
> >     thread-id: "0"
> >   type: "qemu64-x86_64-cpu"
> >   vcpus_count: "1"
> >   qom_path: "/machine/unattached/device[0]"
> >   CPUInstance Properties:
> >     socket-id: "0"
> >     core-id: "0"
> >     thread-id: "0"
> > (qemu) set-numa-node node,nodeid=0
> > (qemu) set-numa-node node,nodeid=1
> > (qemu) set-numa-node cpu,socket-id=0,node-id=0
> > (qemu) set-numa-node cpu,socket-id=1,node-id=1
> > (qemu) info hotpluggable-cpus 
> > Hotpluggable CPUs:
> >   type: "qemu64-x86_64-cpu"
> >   vcpus_count: "1"
> >   CPUInstance Properties:
> >     node-id: "1"
> >     socket-id: "1"
> >     core-id: "0"
> >     thread-id: "0"
> >   type: "qemu64-x86_64-cpu"
> >   vcpus_count: "1"
> >   CPUInstance Properties:
> >     node-id: "0"
> >     socket-id: "0"
> >     core-id: "0"
> >     thread-id: "0"
> > (qemu) cont
> > 
> > git tree for testing:
> >   https://github.com/imammedo/qemu qmp_preconfig_rfc
> > 
> > 
> > CC: eblake@redhat.com
> > CC: armbru@redhat.com
> > CC: ehabkost@redhat.com
> > CC: pkrempa@redhat.com
> > CC: david@gibson.dropbear.id.au
> > CC: peter.maydell@linaro.org
> > CC: pbonzini@redhat.com
> > CC: cohuck@redhat.com
> > 
> > Igor Mammedov (6):
> >   numa: postpone options post-processing till machine_run_board_init()
> >   numa: split out NumaOptions parsing into parse_NumaOptions()
> >   possible_cpus: add CPUArchId::type field
> >   CLI: add -paused option
> >   HMP: add set-numa-node command
> >   QMP: add set-numa-node command
> > 
> >  hmp.h                      |  1 +
> >  include/hw/boards.h        |  2 ++
> >  include/sysemu/numa.h      |  2 ++
> >  include/sysemu/sysemu.h    |  1 +
> >  hmp-commands.hx            | 13 ++++++++
> >  hmp.c                      | 23 ++++++++++++++
> >  hw/arm/virt.c              |  3 +-
> >  hw/core/machine.c          | 18 ++++++-----
> >  hw/i386/pc.c               |  4 ++-
> >  hw/ppc/spapr.c             | 13 +++++---
> >  hw/s390x/s390-virtio-ccw.c |  1 +
> >  numa.c                     | 79 ++++++++++++++++++++++++++++++++++------------
> >  qapi-schema.json           | 13 ++++++++
> >  qemu-options.hx            | 15 +++++++++
> >  qmp.c                      |  5 +++
> >  vl.c                       | 54 ++++++++++++++++++++++++++++++-
> >  16 files changed, 210 insertions(+), 37 deletions(-)
> > 
> > -- 
> > 2.7.4
> > 
> >   
> 
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-16 16:35   ` Daniel P. Berrange
@ 2017-10-17  8:17     ` Igor Mammedov
  2017-10-17 10:56       ` Laszlo Ersek
  2017-10-20 15:38     ` Eduardo Habkost
  1 sibling, 1 reply; 93+ messages in thread
From: Igor Mammedov @ 2017-10-17  8:17 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: qemu-devel, peter.maydell, pkrempa, ehabkost, cohuck, armbru,
	pbonzini, david

On Mon, 16 Oct 2017 17:35:15 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:
> 
> This really needs to have a commit message that provides justification
> for why this option is needed when we already have -S that is used
> to allow configuration before the guest starts.
Sorry, I've should have added here what I've tried to describe in cover letter.

-S pauses machine too late as machine is already created by the time
it's paused so trying to reconfigure it might require machine to be recreated.
In case of NUMA options it might be possible to hack x86 target to
rebuild/override acpi/fw_cfg so it would reflect the new settings set
this late but I wouldn't expect that it would work in general.

The cleanest way to configure it is pausing and configuring numa mapping
before machine is build.


> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  include/sysemu/sysemu.h |  1 +
> >  qemu-options.hx         | 15 ++++++++++++++
> >  qmp.c                   |  5 +++++
> >  vl.c                    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++-
> >  4 files changed, 74 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > index b213696..3feb94f 100644
> > --- a/include/sysemu/sysemu.h
> > +++ b/include/sysemu/sysemu.h
> > @@ -66,6 +66,7 @@ typedef enum WakeupReason {
> >      QEMU_WAKEUP_REASON_OTHER,
> >  } WakeupReason;
> >  
> > +void qemu_exit_preconfig_request(void);
> >  void qemu_system_reset_request(ShutdownCause reason);
> >  void qemu_system_suspend_request(void);
> >  void qemu_register_suspend_notifier(Notifier *notifier);
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 39225ae..bd44db8 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -3498,6 +3498,21 @@ STEXI
> >  Run the emulation in single step mode.
> >  ETEXI
> >  
> > +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> > +    "-paused [state=]postconf|preconf\n"
> > +    "                postconf: pause QEMU after machine is initialized\n"
> > +    "                preconf: pause QEMU before machine is initialized\n",
> > +    QEMU_ARCH_ALL)
> > +STEXI
> > +@item -paused
> > +@findex -paused
> > +if set enabled interactive configuration stages before machine emulation starts.
> > +'postconf' option value mimics -S option behaviour where machine is created
> > +but emulation isn't started. 'preconf' option value pauses QEMU before machine
> > +is created, which allows to query and configure properties affecting machine
> > +initialization. Use monitor/QMP command 'cont' to go to exit paused state.
> > +ETEXI  
> 
> To me it feels rather unpleasant to be exposing this kind of detailed knowledge
> about the steps QEMU goes through when consttructing the machine and expecting
> the mgmt application to synchronize certain monitor actions against this.
well, so far alternative seems to be unacceptable as well, i.e.
start qemu twice
  #1 to get cpu layout form given '-M -smp' options
  #2 add -numa options that would map cpus provided at #1 to numa nodes


> > +
> >  DEF("S", 0, QEMU_OPTION_S, \
> >      "-S              freeze CPU at startup (use 'c' to start execution)\n",
> >      QEMU_ARCH_ALL)
> > diff --git a/qmp.c b/qmp.c
> > index e8c3031..49e9a5c 100644
> > --- a/qmp.c
> > +++ b/qmp.c
> > @@ -167,6 +167,11 @@ void qmp_cont(Error **errp)
> >      BlockBackend *blk;
> >      Error *local_err = NULL;
> >  
> > +    if (runstate_check(RUN_STATE_PRELAUNCH)) {
> > +        qemu_exit_preconfig_request();
> > +        return;
> > +    }
> > +
> >      /* if there is a dump in background, we should wait until the dump
> >       * finished */
> >      if (dump_in_progress()) {
> > diff --git a/vl.c b/vl.c
> > index 3fed457..30631fd 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -555,6 +555,20 @@ static QemuOptsList qemu_fw_cfg_opts = {
> >      },
> >  };
> >  
> > +static QemuOptsList qemu_paused_opts = {
> > +    .name = "paused",
> > +    .implied_opt_name = "state",
> > +    .head = QTAILQ_HEAD_INITIALIZER(qemu_paused_opts.head),
> > +    .desc = {
> > +        {
> > +            .name = "state",
> > +            .type = QEMU_OPT_STRING,
> > +            .help = "Pause state of QEMU on startup",
> > +        },
> > +        { /* end of list */ }
> > +    },
> > +};
> > +
> >  /**
> >   * Get machine options
> >   *
> > @@ -1689,6 +1703,11 @@ static pid_t shutdown_pid;
> >  static int powerdown_requested;
> >  static int debug_requested;
> >  static int suspend_requested;
> > +static enum {
> > +    PRECONFIG_CONT = 0,
> > +    PRECONFIG_PAUSE,
> > +    PRECONFIG_SKIP,
> > +} preconfig_requested;
> >  static WakeupReason wakeup_reason;
> >  static NotifierList powerdown_notifiers =
> >      NOTIFIER_LIST_INITIALIZER(powerdown_notifiers);
> > @@ -1773,6 +1792,11 @@ static int qemu_debug_requested(void)
> >      return r;
> >  }
> >  
> > +void qemu_exit_preconfig_request(void)
> > +{
> > +    preconfig_requested = PRECONFIG_CONT;
> > +}
> > +
> >  /*
> >   * Reset the VM. Issue an event unless @reason is SHUTDOWN_CAUSE_NONE.
> >   */
> > @@ -1939,6 +1963,12 @@ static bool main_loop_should_exit(void)
> >      RunState r;
> >      ShutdownCause request;
> >  
> > +    if (runstate_check(RUN_STATE_PRELAUNCH)) {
> > +        if (preconfig_requested == PRECONFIG_CONT) {
> > +            preconfig_requested = PRECONFIG_SKIP;
> > +            return true;
> > +        }
> > +    }
> >      if (qemu_debug_requested()) {
> >          vm_stop(RUN_STATE_DEBUG);
> >      }
> > @@ -3177,6 +3207,7 @@ int main(int argc, char **argv, char **envp)
> >      qemu_add_opts(&qemu_icount_opts);
> >      qemu_add_opts(&qemu_semihosting_config_opts);
> >      qemu_add_opts(&qemu_fw_cfg_opts);
> > +    qemu_add_opts(&qemu_paused_opts);
> >      module_call_init(MODULE_INIT_OPTS);
> >  
> >      runstate_init();
> > @@ -3845,6 +3876,26 @@ int main(int argc, char **argv, char **envp)
> >                      exit(1);
> >                  }
> >                  break;
> > +            case QEMU_OPTION_paused:
> > +                {
> > +                    const char *value;
> > +
> > +                    opts = qemu_opts_parse_noisily(qemu_find_opts("paused"),
> > +                                                   optarg, true);
> > +                    if (opts == NULL) {
> > +                        exit(1);
> > +                    }
> > +                    value = qemu_opt_get(opts, "state");
> > +                    if (!strcmp(value, "postconf")) {
> > +                        autostart = 0;
> > +                    } else if (!strcmp(value, "preconf")) {
> > +                        preconfig_requested = PRECONFIG_PAUSE;
> > +                    } else {
> > +                        error_report("incomplete '-paused' option\n");
> > +                        exit(1);
> > +                    }
> > +                    break;
> > +                }
> >              case QEMU_OPTION_enable_kvm:
> >                  olist = qemu_find_opts("machine");
> >                  qemu_opts_parse_noisily(olist, "accel=kvm", false);
> > @@ -4731,7 +4782,6 @@ int main(int argc, char **argv, char **envp)
> >      current_machine->boot_order = boot_order;
> >      current_machine->cpu_model = cpu_model;
> >  
> > -
> >      /* parse features once if machine provides default cpu_type */
> >      if (machine_class->default_cpu_type) {
> >          current_machine->cpu_type = machine_class->default_cpu_type;
> > @@ -4741,6 +4791,8 @@ int main(int argc, char **argv, char **envp)
> >          }
> >      }
> >  
> > +    main_loop(); /* do monitor/qmp handling at preconfig state if requested */
> > +
> >      machine_run_board_init(current_machine);
> >  
> >      realtime_init();
> > -- 
> > 2.7.4
> > 
> >   
> 
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-16 17:17       ` Eduardo Habkost
@ 2017-10-17  8:47         ` Paolo Bonzini
  2017-10-17  9:25           ` Igor Mammedov
  0 siblings, 1 reply; 93+ messages in thread
From: Paolo Bonzini @ 2017-10-17  8:47 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Igor Mammedov, qemu-devel, eblake, armbru, pkrempa, david,
	peter.maydell, cohuck

On 16/10/2017 19:17, Eduardo Habkost wrote:
>> Should (or could) "-M none" be changed in a backwards-compatible way to
>> allow such preconfiguration?  For example
>>
>>   qemu -M none -monitor stdio
>>   (qemu) machine-set-options pc,accel=kvm
>>   (qemu) c
> Sounds like an interesting idea.  It would require ensuring it's
> really safe to destroy current_machine/accel (and other global
> state) and replace them with another object on the fly (which is
> probably a nice goal by itself).

It is but, alternatively, you could delay creating the "none" machine
until the last second.  The important part, in my opinion, is having a
good command-line interface that we can freeze even if the
implementation below leaves something to be desired.

Paolo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-16 16:59   ` Eduardo Habkost
  2017-10-16 17:01     ` Paolo Bonzini
@ 2017-10-17  9:10     ` Igor Mammedov
  2017-10-19 10:42     ` David Gibson
  2 siblings, 0 replies; 93+ messages in thread
From: Igor Mammedov @ 2017-10-17  9:10 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: qemu-devel, eblake, armbru, pkrempa, david, peter.maydell,
	pbonzini, cohuck

On Mon, 16 Oct 2017 14:59:16 -0200
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  include/sysemu/sysemu.h |  1 +
> >  qemu-options.hx         | 15 ++++++++++++++
> >  qmp.c                   |  5 +++++
> >  vl.c                    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++-
> >  4 files changed, 74 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > index b213696..3feb94f 100644
> > --- a/include/sysemu/sysemu.h
> > +++ b/include/sysemu/sysemu.h
> > @@ -66,6 +66,7 @@ typedef enum WakeupReason {
> >      QEMU_WAKEUP_REASON_OTHER,
> >  } WakeupReason;
> >  
> > +void qemu_exit_preconfig_request(void);
> >  void qemu_system_reset_request(ShutdownCause reason);
> >  void qemu_system_suspend_request(void);
> >  void qemu_register_suspend_notifier(Notifier *notifier);
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 39225ae..bd44db8 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -3498,6 +3498,21 @@ STEXI
> >  Run the emulation in single step mode.
> >  ETEXI
> >  
> > +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> > +    "-paused [state=]postconf|preconf\n"
> > +    "                postconf: pause QEMU after machine is initialized\n"
> > +    "                preconf: pause QEMU before machine is initialized\n",
> > +    QEMU_ARCH_ALL)  
> 
> I would like to allow pausing before machine-type is selected, so
> management could run query-machines before choosing a
> machine-type.  Would that need a third "-pause" mode, or will we
> be able to change "preconf" to pause before select_machine() is
> called?
> The same probably applies to other things initialized before
> machine_run_board_init() that could be configurable using QMP,
> including but not limited to:
> * Accelerator configuration
> * Registering global properties
> * RAM size
> * SMP/CPU configuration
My goal is/were much more narrow and reachable without rewriting whole
qemu again (well I had to do a bit of necessary preparatory refactoring
for that to happen default_cpu+generilizing cpu_model parsing).

This series is focused on allowing to query cpu layout defined by f("-M foo -smp ...")
and configiring numa mapping for resulted layout. So it needs machine
object to exist by the time it's paused, which means that -M and -smp
options have to be parsed by that time.

Allowing pause basically before machine is created I'd guess, would be
a lot of additional re-factoring (beyond this series scope), I can't
say for sure if new pause mode is need for it or 'preconf' could be
moved to earlier stage later.
I'd speculate that for generic handling we would need
  - CLI options dependency tree
  - add new QMP/HMP command (process-cli-option)
       make it actionable, i.e. "process-cli-option -M foo" would create
       machine and user would be allowed to use other options
       that have machine dependency (for example -smp and after that -numa)
I woudn't like to go down that bottomless pit yet right now,
but probably we could add "process-cli-option" right now and allow only
-numa command for now so we would have external interface in place
and could extend it later.

> > +STEXI
> > +@item -paused
> > +@findex -paused
> > +if set enabled interactive configuration stages before machine emulation starts.
> > +'postconf' option value mimics -S option behaviour where machine is created
> > +but emulation isn't started. 'preconf' option value pauses QEMU before machine
> > +is created, which allows to query and configure properties affecting machine
> > +initialization. Use monitor/QMP command 'cont' to go to exit paused state.  
> 
> What if "-S" is used at the same time"?  Will "cont" only
> initialize the machine and wait for another "cont" command to
> start the VCPUs, or will it unpause everything?
in current impl. first 'cont' will exit preconfig loop and continue to
work as it used to be, i.e. -S will cause second pause right before
vcpus started and the second 'cont' will be needed to run machine.

> 
> > +ETEXI
> > +
> >  DEF("S", 0, QEMU_OPTION_S, \
> >      "-S              freeze CPU at startup (use 'c' to start execution)\n",
> >      QEMU_ARCH_ALL)
> > diff --git a/qmp.c b/qmp.c
> > index e8c3031..49e9a5c 100644
> > --- a/qmp.c
> > +++ b/qmp.c
> > @@ -167,6 +167,11 @@ void qmp_cont(Error **errp)
> >      BlockBackend *blk;
> >      Error *local_err = NULL;
> >  
> > +    if (runstate_check(RUN_STATE_PRELAUNCH)) {
> > +        qemu_exit_preconfig_request();
> > +        return;
> > +    }
> > +
> >      /* if there is a dump in background, we should wait until the dump
> >       * finished */
> >      if (dump_in_progress()) {
> > diff --git a/vl.c b/vl.c
> > index 3fed457..30631fd 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -555,6 +555,20 @@ static QemuOptsList qemu_fw_cfg_opts = {
> >      },
> >  };
> >  
> > +static QemuOptsList qemu_paused_opts = {
> > +    .name = "paused",
> > +    .implied_opt_name = "state",
> > +    .head = QTAILQ_HEAD_INITIALIZER(qemu_paused_opts.head),
> > +    .desc = {
> > +        {
> > +            .name = "state",
> > +            .type = QEMU_OPT_STRING,
> > +            .help = "Pause state of QEMU on startup",
> > +        },
> > +        { /* end of list */ }
> > +    },
> > +};
> > +
> >  /**
> >   * Get machine options
> >   *
> > @@ -1689,6 +1703,11 @@ static pid_t shutdown_pid;
> >  static int powerdown_requested;
> >  static int debug_requested;
> >  static int suspend_requested;
> > +static enum {
> > +    PRECONFIG_CONT = 0,
> > +    PRECONFIG_PAUSE,
> > +    PRECONFIG_SKIP,
> > +} preconfig_requested;
> >  static WakeupReason wakeup_reason;
> >  static NotifierList powerdown_notifiers =
> >      NOTIFIER_LIST_INITIALIZER(powerdown_notifiers);
> > @@ -1773,6 +1792,11 @@ static int qemu_debug_requested(void)
> >      return r;
> >  }
> >  
> > +void qemu_exit_preconfig_request(void)
> > +{
> > +    preconfig_requested = PRECONFIG_CONT;
> > +}
> > +
> >  /*
> >   * Reset the VM. Issue an event unless @reason is SHUTDOWN_CAUSE_NONE.
> >   */
> > @@ -1939,6 +1963,12 @@ static bool main_loop_should_exit(void)
> >      RunState r;
> >      ShutdownCause request;
> >  
> > +    if (runstate_check(RUN_STATE_PRELAUNCH)) {
> > +        if (preconfig_requested == PRECONFIG_CONT) {
> > +            preconfig_requested = PRECONFIG_SKIP;
> > +            return true;
> > +        }
> > +    }
> >      if (qemu_debug_requested()) {
> >          vm_stop(RUN_STATE_DEBUG);
> >      }
> > @@ -3177,6 +3207,7 @@ int main(int argc, char **argv, char **envp)
> >      qemu_add_opts(&qemu_icount_opts);
> >      qemu_add_opts(&qemu_semihosting_config_opts);
> >      qemu_add_opts(&qemu_fw_cfg_opts);
> > +    qemu_add_opts(&qemu_paused_opts);
> >      module_call_init(MODULE_INIT_OPTS);
> >  
> >      runstate_init();
> > @@ -3845,6 +3876,26 @@ int main(int argc, char **argv, char **envp)
> >                      exit(1);
> >                  }
> >                  break;
> > +            case QEMU_OPTION_paused:
> > +                {
> > +                    const char *value;
> > +
> > +                    opts = qemu_opts_parse_noisily(qemu_find_opts("paused"),
> > +                                                   optarg, true);
> > +                    if (opts == NULL) {
> > +                        exit(1);
> > +                    }
> > +                    value = qemu_opt_get(opts, "state");
> > +                    if (!strcmp(value, "postconf")) {
> > +                        autostart = 0;
> > +                    } else if (!strcmp(value, "preconf")) {
> > +                        preconfig_requested = PRECONFIG_PAUSE;
> > +                    } else {
> > +                        error_report("incomplete '-paused' option\n");
> > +                        exit(1);
> > +                    }
> > +                    break;
> > +                }
> >              case QEMU_OPTION_enable_kvm:
> >                  olist = qemu_find_opts("machine");
> >                  qemu_opts_parse_noisily(olist, "accel=kvm", false);
> > @@ -4731,7 +4782,6 @@ int main(int argc, char **argv, char **envp)
> >      current_machine->boot_order = boot_order;
> >      current_machine->cpu_model = cpu_model;
> >  
> > -
> >      /* parse features once if machine provides default cpu_type */
> >      if (machine_class->default_cpu_type) {
> >          current_machine->cpu_type = machine_class->default_cpu_type;
> > @@ -4741,6 +4791,8 @@ int main(int argc, char **argv, char **envp)
> >          }
> >      }
> >  
> > +    main_loop(); /* do monitor/qmp handling at preconfig state if requested */
> > +  
> 
> I'm impressed by the simplicity of the implementation.  I though
> this would involve moving everything between this line and the
> next main_loop() call outside main(), so they would be called by
> qmp_cont().
> 
> Any expert on GLib's Event Loop sees any gotcha in this method?
> 
> I would like to do a careful review of main_loop_wait() and
> main_loop_should_exit(), to ensure those functions don't depend
> on anything that's initialized after this line.  Probably a few
> existing QMP commands can crash if machine is not initialized
> yet?
some HMP/QMP commands will crash for sure, any idea on how to
handle issue (i.e. prevent not allowed commands to run) is welcome.

> The rules and expectations on initialization ordering are very
> subtle, I suggest including test code for the new feature to
> ensure nothing crashes or breaks in the future.
that's only RFC, I've omitted testing part as approach
to be used isn't certain yet, but yep I plan on adding tests
for features that are expected to work with this.


> 
> >      machine_run_board_init(current_machine);
> >  
> >      realtime_init();
> > -- 
> > 2.7.4
> >   
> 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-17  8:47         ` Paolo Bonzini
@ 2017-10-17  9:25           ` Igor Mammedov
  0 siblings, 0 replies; 93+ messages in thread
From: Igor Mammedov @ 2017-10-17  9:25 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Eduardo Habkost, qemu-devel, eblake, armbru, pkrempa, david,
	peter.maydell, cohuck

On Tue, 17 Oct 2017 10:47:40 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 16/10/2017 19:17, Eduardo Habkost wrote:
> >> Should (or could) "-M none" be changed in a backwards-compatible way to
> >> allow such preconfiguration?  For example
> >>
> >>   qemu -M none -monitor stdio
> >>   (qemu) machine-set-options pc,accel=kvm
> >>   (qemu) c  
> > Sounds like an interesting idea.  It would require ensuring it's
> > really safe to destroy current_machine/accel (and other global
> > state) and replace them with another object on the fly (which is
> > probably a nice goal by itself).  
> 
> It is but, alternatively, you could delay creating the "none" machine
> until the last second.  The important part, in my opinion, is having a
> good command-line interface that we can freeze even if the
> implementation below leaves something to be desired.
I sort of don't get how '-M none' could be used to build usable
machine (at least currently).

Do we really need "-M none" for dynamic configuration?
I'd imagine doing following instead:
  qemu -monitor stdio -dynconfig
  (qemu) query-machines
  ...
  (qemu) set-option machine pc,accel=kvm
    # machine object is created
  (qemu) set-option smp 1,maxcpus
  (qemu) info hotpluggable-cpus
  ...
  (qemu) set-option numa node
  (qemu) set-option numa cpu,node-id=0,socket=0
  (qemu) set-option numa cpu,node-id=0,socket=1
  (qemu) c

I'd start to make it working from 'info hotpluggable-cpus'
as it's close to my current project of making cpu-hotplug/numa
working nice together and we can expand the same interface to work
at earlier stages on top of that.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-17  8:17     ` Igor Mammedov
@ 2017-10-17 10:56       ` Laszlo Ersek
  2017-10-17 11:11         ` Peter Krempa
  0 siblings, 1 reply; 93+ messages in thread
From: Laszlo Ersek @ 2017-10-17 10:56 UTC (permalink / raw)
  To: Igor Mammedov, Daniel P. Berrange
  Cc: peter.maydell, pkrempa, ehabkost, cohuck, qemu-devel, armbru,
	pbonzini, david

On 10/17/17 10:17, Igor Mammedov wrote:
> On Mon, 16 Oct 2017 17:35:15 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
>> On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:
>>
>> This really needs to have a commit message that provides justification
>> for why this option is needed when we already have -S that is used
>> to allow configuration before the guest starts.
> Sorry, I've should have added here what I've tried to describe in cover letter.
> 
> -S pauses machine too late as machine is already created by the time
> it's paused so trying to reconfigure it might require machine to be recreated.
> In case of NUMA options it might be possible to hack x86 target to
> rebuild/override acpi/fw_cfg so it would reflect the new settings set
> this late but I wouldn't expect that it would work in general.
> 
> The cleanest way to configure it is pausing and configuring numa mapping
> before machine is build.

Asking from the sideline: if the NUMA mapping has to be configured so
early, why can't it be done on the QEMU command line?

(I asked myself the same question when I first saw your patches -- I
couldn't find an explanation in the blurb --, so I assumed it was
obvious and/or others would ask the same question.)

Again, I'm just curious.

Thanks!
Laszlo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-17 10:56       ` Laszlo Ersek
@ 2017-10-17 11:11         ` Peter Krempa
  0 siblings, 0 replies; 93+ messages in thread
From: Peter Krempa @ 2017-10-17 11:11 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Igor Mammedov, Daniel P. Berrange, peter.maydell, ehabkost,
	cohuck, qemu-devel, armbru, pbonzini, david

[-- Attachment #1: Type: text/plain, Size: 1609 bytes --]

On Tue, Oct 17, 2017 at 12:56:28 +0200, Laszlo Ersek wrote:
> On 10/17/17 10:17, Igor Mammedov wrote:
> > On Mon, 16 Oct 2017 17:35:15 +0100
> > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > 
> >> On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:
> >>
> >> This really needs to have a commit message that provides justification
> >> for why this option is needed when we already have -S that is used
> >> to allow configuration before the guest starts.
> > Sorry, I've should have added here what I've tried to describe in cover letter.
> > 
> > -S pauses machine too late as machine is already created by the time
> > it's paused so trying to reconfigure it might require machine to be recreated.
> > In case of NUMA options it might be possible to hack x86 target to
> > rebuild/override acpi/fw_cfg so it would reflect the new settings set
> > this late but I wouldn't expect that it would work in general.
> > 
> > The cleanest way to configure it is pausing and configuring numa mapping
> > before machine is build.
> 
> Asking from the sideline: if the NUMA mapping has to be configured so
> early, why can't it be done on the QEMU command line?
> 
> (I asked myself the same question when I first saw your patches -- I
> couldn't find an explanation in the blurb --, so I assumed it was
> obvious and/or others would ask the same question.)

Because libvirt needs to be able to query qemu before setting stuff up.
As we already established, it's not okay to run a throwaway qemu process
to do so, so we are getting into the chicken/egg problem zone.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-16 17:01     ` Paolo Bonzini
  2017-10-16 17:17       ` Eduardo Habkost
@ 2017-10-17 14:48       ` Daniel P. Berrange
  2017-10-17 15:21         ` Laszlo Ersek
  1 sibling, 1 reply; 93+ messages in thread
From: Daniel P. Berrange @ 2017-10-17 14:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Eduardo Habkost, Igor Mammedov, peter.maydell, pkrempa, cohuck,
	qemu-devel, armbru, david

On Mon, Oct 16, 2017 at 07:01:01PM +0200, Paolo Bonzini wrote:
> On 16/10/2017 18:59, Eduardo Habkost wrote:
> >> +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> >> +    "-paused [state=]postconf|preconf\n"
> >> +    "                postconf: pause QEMU after machine is initialized\n"
> >> +    "                preconf: pause QEMU before machine is initialized\n",
> >> +    QEMU_ARCH_ALL)
> > I would like to allow pausing before machine-type is selected, so
> > management could run query-machines before choosing a
> > machine-type.  Would that need a third "-pause" mode, or will we
> > be able to change "preconf" to pause before select_machine() is
> > called?
> > 
> > The same probably applies to other things initialized before
> > machine_run_board_init() that could be configurable using QMP,
> > including but not limited to:
> > * Accelerator configuration
> > * Registering global properties
> > * RAM size
> > * SMP/CPU configuration
> 
> Should (or could) "-M none" be changed in a backwards-compatible way to
> allow such preconfiguration?  For example
> 
>   qemu -M none -monitor stdio
>   (qemu) machine-set-options pc,accel=kvm
>   (qemu) c

Going down this route has pretty major implications for the way libvirt
manages QEMU, and support / debugging of it. When you look at the QEMU
command line libvirt uses it will be almost devoid of any useful info.
So it will be more involved job to figure out just how QEMU is configured.
This also means it is difficult to replicate the config that libvirt has
used, outside of libvirt for sake of debugging.

I also think it will have pretty significant performance implications
for QEMU startup. To configure a guest via the monitor is going to
require a huge number of monitor commands to be executed to replicate
what we traditionally configured via ARGV. While each monitor command
is not massively slow, the round-trip time of each command will quickly
add up to several 100 milliseconds, perhaps even seconds in the the
case of very large configs. 

Maybe we ultimately have no choice and this is inevitable, but I am
pretty wary of going in the direction of launching bare QEMU and
configuring everything via a huge number of monitor calls.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-17  7:27   ` Igor Mammedov
@ 2017-10-17 15:07     ` Daniel P. Berrange
  2017-10-17 15:24       ` Laszlo Ersek
                         ` (2 more replies)
  0 siblings, 3 replies; 93+ messages in thread
From: Daniel P. Berrange @ 2017-10-17 15:07 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, peter.maydell, pkrempa, ehabkost, cohuck, armbru,
	pbonzini, david

On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> On Mon, 16 Oct 2017 17:36:36 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
> > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
> > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > interface. For that to happen it introduces a new '-paused' CLI option
> > > which allows to pause QEMU before machine_init() is run and
> > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > NUMA mapping for cpus.  
> > 
> > What's the problem we're seeking solve here compared to what we currently
> > do for NUMA configuration ?
> From RHBZ1382425
> "
> Current -numa CLI interface is quite limited in terms that allow map
> CPUs to NUMA nodes as it requires to provide cpu_index values which 
> are non obvious and depend on machine/arch. As result libvirt has to
> assume/re-implement cpu_index allocation logic to provide valid 
> values for -numa cpus=... QEMU CLI option.

In broad terms, this problem applies to every device / object libvirt
asks QEMU to create. For everything else libvirt is able to assign a
"id" string, which is can then use to identify the thing later. The
CPU stuff is different because libvirt isn't able to provide 'id'
strings for each CPU - QEMU generates a psuedo-id internally which
libvirt has to infer. The latter is the same problem we had with
devices before '-device' was introduced allowing 'id' naming.

IMHO we should take the same approach with CPUs and start modelling 
the individual CPUs as something we can explicitly create with -object
or -device. That way libvirt can assign names and does not have to 
care about CPU index values, and it all works just the same way as
any other devices / object we create

ie instead of:

  -smp 8,sockets=4,cores=2,threads=1
  -numa node,nodeid=0,cpus=0-3
  -numa node,nodeid=1,cpus=4-7

we could do:

  -object numa-node,id=numa0
  -object numa-node,id=numa1
  -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
  -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
  -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
  -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
  -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
  -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
  -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
  -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0

(perhaps -device instead of -object above, but that's a minor detail)

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-17 14:48       ` Daniel P. Berrange
@ 2017-10-17 15:21         ` Laszlo Ersek
  2017-10-17 15:35           ` Daniel P. Berrange
  0 siblings, 1 reply; 93+ messages in thread
From: Laszlo Ersek @ 2017-10-17 15:21 UTC (permalink / raw)
  To: Daniel P. Berrange, Paolo Bonzini
  Cc: peter.maydell, pkrempa, Eduardo Habkost, cohuck, qemu-devel,
	armbru, Igor Mammedov, david

On 10/17/17 16:48, Daniel P. Berrange wrote:
> On Mon, Oct 16, 2017 at 07:01:01PM +0200, Paolo Bonzini wrote:
>> On 16/10/2017 18:59, Eduardo Habkost wrote:
>>>> +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
>>>> +    "-paused [state=]postconf|preconf\n"
>>>> +    "                postconf: pause QEMU after machine is initialized\n"
>>>> +    "                preconf: pause QEMU before machine is initialized\n",
>>>> +    QEMU_ARCH_ALL)
>>> I would like to allow pausing before machine-type is selected, so
>>> management could run query-machines before choosing a
>>> machine-type.  Would that need a third "-pause" mode, or will we
>>> be able to change "preconf" to pause before select_machine() is
>>> called?
>>>
>>> The same probably applies to other things initialized before
>>> machine_run_board_init() that could be configurable using QMP,
>>> including but not limited to:
>>> * Accelerator configuration
>>> * Registering global properties
>>> * RAM size
>>> * SMP/CPU configuration
>>
>> Should (or could) "-M none" be changed in a backwards-compatible way to
>> allow such preconfiguration?  For example
>>
>>   qemu -M none -monitor stdio
>>   (qemu) machine-set-options pc,accel=kvm
>>   (qemu) c
> 
> Going down this route has pretty major implications for the way libvirt
> manages QEMU, and support / debugging of it. When you look at the QEMU
> command line libvirt uses it will be almost devoid of any useful info.
> So it will be more involved job to figure out just how QEMU is configured.
> This also means it is difficult to replicate the config that libvirt has
> used, outside of libvirt for sake of debugging.
> 
> I also think it will have pretty significant performance implications
> for QEMU startup. To configure a guest via the monitor is going to
> require a huge number of monitor commands to be executed to replicate
> what we traditionally configured via ARGV. While each monitor command
> is not massively slow, the round-trip time of each command will quickly
> add up to several 100 milliseconds, perhaps even seconds in the the
> case of very large configs. 
> 
> Maybe we ultimately have no choice and this is inevitable, but I am
> pretty wary of going in the direction of launching bare QEMU and
> configuring everything via a huge number of monitor calls.

Where's the sweet spot between
- configuring everything dynamically, over QMP,
- and invoking QEMU separately, for querying capabilities etc?

Thanks,
Laszlo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-17 15:07     ` Daniel P. Berrange
@ 2017-10-17 15:24       ` Laszlo Ersek
  2017-10-17 16:06       ` Igor Mammedov
  2017-10-18 12:19       ` Paolo Bonzini
  2 siblings, 0 replies; 93+ messages in thread
From: Laszlo Ersek @ 2017-10-17 15:24 UTC (permalink / raw)
  To: Daniel P. Berrange, Igor Mammedov
  Cc: peter.maydell, pkrempa, ehabkost, cohuck, qemu-devel, armbru,
	pbonzini, david

On 10/17/17 17:07, Daniel P. Berrange wrote:
> On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
>> On Mon, 16 Oct 2017 17:36:36 +0100
>> "Daniel P. Berrange" <berrange@redhat.com> wrote:
>>
>>> On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
>>>> Series allows to configure NUMA mapping at runtime using QMP/HMP
>>>> interface. For that to happen it introduces a new '-paused' CLI option
>>>> which allows to pause QEMU before machine_init() is run and
>>>> adds new set-numa-node HMP/QMP commands which in conjuction with
>>>> info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
>>>> NUMA mapping for cpus.  
>>>
>>> What's the problem we're seeking solve here compared to what we currently
>>> do for NUMA configuration ?
>> From RHBZ1382425
>> "
>> Current -numa CLI interface is quite limited in terms that allow map
>> CPUs to NUMA nodes as it requires to provide cpu_index values which 
>> are non obvious and depend on machine/arch. As result libvirt has to
>> assume/re-implement cpu_index allocation logic to provide valid 
>> values for -numa cpus=... QEMU CLI option.
> 
> In broad terms, this problem applies to every device / object libvirt
> asks QEMU to create. For everything else libvirt is able to assign a
> "id" string, which is can then use to identify the thing later. The
> CPU stuff is different because libvirt isn't able to provide 'id'
> strings for each CPU - QEMU generates a psuedo-id internally which
> libvirt has to infer.

Oh. This is the critical bit I've been missing.

Sorry about the noise I've made!

Thanks!
Laszlo


> The latter is the same problem we had with
> devices before '-device' was introduced allowing 'id' naming.
> 
> IMHO we should take the same approach with CPUs and start modelling 
> the individual CPUs as something we can explicitly create with -object
> or -device. That way libvirt can assign names and does not have to 
> care about CPU index values, and it all works just the same way as
> any other devices / object we create
> 
> ie instead of:
> 
>   -smp 8,sockets=4,cores=2,threads=1
>   -numa node,nodeid=0,cpus=0-3
>   -numa node,nodeid=1,cpus=4-7
> 
> we could do:
> 
>   -object numa-node,id=numa0
>   -object numa-node,id=numa1
>   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
>   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
>   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
>   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
>   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
>   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
>   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
>   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> 
> (perhaps -device instead of -object above, but that's a minor detail)
> 
> Regards,
> Daniel
> 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-17 15:21         ` Laszlo Ersek
@ 2017-10-17 15:35           ` Daniel P. Berrange
  2017-10-17 15:42             ` Laszlo Ersek
  2017-10-17 15:47             ` Igor Mammedov
  0 siblings, 2 replies; 93+ messages in thread
From: Daniel P. Berrange @ 2017-10-17 15:35 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Paolo Bonzini, peter.maydell, pkrempa, Eduardo Habkost, cohuck,
	qemu-devel, armbru, Igor Mammedov, david

On Tue, Oct 17, 2017 at 05:21:13PM +0200, Laszlo Ersek wrote:
> On 10/17/17 16:48, Daniel P. Berrange wrote:
> > On Mon, Oct 16, 2017 at 07:01:01PM +0200, Paolo Bonzini wrote:
> >> On 16/10/2017 18:59, Eduardo Habkost wrote:
> >>>> +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> >>>> +    "-paused [state=]postconf|preconf\n"
> >>>> +    "                postconf: pause QEMU after machine is initialized\n"
> >>>> +    "                preconf: pause QEMU before machine is initialized\n",
> >>>> +    QEMU_ARCH_ALL)
> >>> I would like to allow pausing before machine-type is selected, so
> >>> management could run query-machines before choosing a
> >>> machine-type.  Would that need a third "-pause" mode, or will we
> >>> be able to change "preconf" to pause before select_machine() is
> >>> called?
> >>>
> >>> The same probably applies to other things initialized before
> >>> machine_run_board_init() that could be configurable using QMP,
> >>> including but not limited to:
> >>> * Accelerator configuration
> >>> * Registering global properties
> >>> * RAM size
> >>> * SMP/CPU configuration
> >>
> >> Should (or could) "-M none" be changed in a backwards-compatible way to
> >> allow such preconfiguration?  For example
> >>
> >>   qemu -M none -monitor stdio
> >>   (qemu) machine-set-options pc,accel=kvm
> >>   (qemu) c
> > 
> > Going down this route has pretty major implications for the way libvirt
> > manages QEMU, and support / debugging of it. When you look at the QEMU
> > command line libvirt uses it will be almost devoid of any useful info.
> > So it will be more involved job to figure out just how QEMU is configured.
> > This also means it is difficult to replicate the config that libvirt has
> > used, outside of libvirt for sake of debugging.
> > 
> > I also think it will have pretty significant performance implications
> > for QEMU startup. To configure a guest via the monitor is going to
> > require a huge number of monitor commands to be executed to replicate
> > what we traditionally configured via ARGV. While each monitor command
> > is not massively slow, the round-trip time of each command will quickly
> > add up to several 100 milliseconds, perhaps even seconds in the the
> > case of very large configs. 
> > 
> > Maybe we ultimately have no choice and this is inevitable, but I am
> > pretty wary of going in the direction of launching bare QEMU and
> > configuring everything via a huge number of monitor calls.
> 
> Where's the sweet spot between
> - configuring everything dynamically, over QMP,
> - and invoking QEMU separately, for querying capabilities etc?

The key with the way we currently invoke & query QEMU over QMP to detect
capabilities is that this is not tied to a specific VM launch process.
We can query capabilities and cache them until such time as we detect
a QEMU binary change. So this never impacts on the startup performance
of individual VMs. The caching is critical, because querying capabilities
is actually quite time intensive already, taking many seconds to query
capabilities on all the different target binaries we have.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-17 15:35           ` Daniel P. Berrange
@ 2017-10-17 15:42             ` Laszlo Ersek
  2017-10-17 15:47               ` Daniel P. Berrange
  2017-10-17 15:47             ` Igor Mammedov
  1 sibling, 1 reply; 93+ messages in thread
From: Laszlo Ersek @ 2017-10-17 15:42 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Paolo Bonzini, peter.maydell, pkrempa, Eduardo Habkost, cohuck,
	qemu-devel, armbru, Igor Mammedov, david

On 10/17/17 17:35, Daniel P. Berrange wrote:
> On Tue, Oct 17, 2017 at 05:21:13PM +0200, Laszlo Ersek wrote:
>> On 10/17/17 16:48, Daniel P. Berrange wrote:
>>> On Mon, Oct 16, 2017 at 07:01:01PM +0200, Paolo Bonzini wrote:
>>>> On 16/10/2017 18:59, Eduardo Habkost wrote:
>>>>>> +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
>>>>>> +    "-paused [state=]postconf|preconf\n"
>>>>>> +    "                postconf: pause QEMU after machine is initialized\n"
>>>>>> +    "                preconf: pause QEMU before machine is initialized\n",
>>>>>> +    QEMU_ARCH_ALL)
>>>>> I would like to allow pausing before machine-type is selected, so
>>>>> management could run query-machines before choosing a
>>>>> machine-type.  Would that need a third "-pause" mode, or will we
>>>>> be able to change "preconf" to pause before select_machine() is
>>>>> called?
>>>>>
>>>>> The same probably applies to other things initialized before
>>>>> machine_run_board_init() that could be configurable using QMP,
>>>>> including but not limited to:
>>>>> * Accelerator configuration
>>>>> * Registering global properties
>>>>> * RAM size
>>>>> * SMP/CPU configuration
>>>>
>>>> Should (or could) "-M none" be changed in a backwards-compatible way to
>>>> allow such preconfiguration?  For example
>>>>
>>>>   qemu -M none -monitor stdio
>>>>   (qemu) machine-set-options pc,accel=kvm
>>>>   (qemu) c
>>>
>>> Going down this route has pretty major implications for the way libvirt
>>> manages QEMU, and support / debugging of it. When you look at the QEMU
>>> command line libvirt uses it will be almost devoid of any useful info.
>>> So it will be more involved job to figure out just how QEMU is configured.
>>> This also means it is difficult to replicate the config that libvirt has
>>> used, outside of libvirt for sake of debugging.
>>>
>>> I also think it will have pretty significant performance implications
>>> for QEMU startup. To configure a guest via the monitor is going to
>>> require a huge number of monitor commands to be executed to replicate
>>> what we traditionally configured via ARGV. While each monitor command
>>> is not massively slow, the round-trip time of each command will quickly
>>> add up to several 100 milliseconds, perhaps even seconds in the the
>>> case of very large configs. 
>>>
>>> Maybe we ultimately have no choice and this is inevitable, but I am
>>> pretty wary of going in the direction of launching bare QEMU and
>>> configuring everything via a huge number of monitor calls.
>>
>> Where's the sweet spot between
>> - configuring everything dynamically, over QMP,
>> - and invoking QEMU separately, for querying capabilities etc?
> 
> The key with the way we currently invoke & query QEMU over QMP to detect
> capabilities is that this is not tied to a specific VM launch process.
> We can query capabilities and cache them until such time as we detect
> a QEMU binary change. So this never impacts on the startup performance
> of individual VMs. The caching is critical, because querying capabilities
> is actually quite time intensive already, taking many seconds to query
> capabilities on all the different target binaries we have.

(Sorry about hijacking the thread, but I can't stop asking :) )

This looks very smart -- for my own education, how does libvirtd detect
a QEMU binary change? Based on executable mtime, size, checksum? Are
perhaps the <emulator> elements of individual domains involved?

Thanks!
Laszlo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-17 15:35           ` Daniel P. Berrange
  2017-10-17 15:42             ` Laszlo Ersek
@ 2017-10-17 15:47             ` Igor Mammedov
  2017-10-17 15:52               ` Daniel P. Berrange
  1 sibling, 1 reply; 93+ messages in thread
From: Igor Mammedov @ 2017-10-17 15:47 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Laszlo Ersek, Paolo Bonzini, peter.maydell, pkrempa,
	Eduardo Habkost, cohuck, qemu-devel, armbru, david

On Tue, 17 Oct 2017 16:35:15 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Tue, Oct 17, 2017 at 05:21:13PM +0200, Laszlo Ersek wrote:
> > On 10/17/17 16:48, Daniel P. Berrange wrote:  
> > > On Mon, Oct 16, 2017 at 07:01:01PM +0200, Paolo Bonzini wrote:  
> > >> On 16/10/2017 18:59, Eduardo Habkost wrote:  
> > >>>> +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> > >>>> +    "-paused [state=]postconf|preconf\n"
> > >>>> +    "                postconf: pause QEMU after machine is initialized\n"
> > >>>> +    "                preconf: pause QEMU before machine is initialized\n",
> > >>>> +    QEMU_ARCH_ALL)  
> > >>> I would like to allow pausing before machine-type is selected, so
> > >>> management could run query-machines before choosing a
> > >>> machine-type.  Would that need a third "-pause" mode, or will we
> > >>> be able to change "preconf" to pause before select_machine() is
> > >>> called?
> > >>>
> > >>> The same probably applies to other things initialized before
> > >>> machine_run_board_init() that could be configurable using QMP,
> > >>> including but not limited to:
> > >>> * Accelerator configuration
> > >>> * Registering global properties
> > >>> * RAM size
> > >>> * SMP/CPU configuration  
> > >>
> > >> Should (or could) "-M none" be changed in a backwards-compatible way to
> > >> allow such preconfiguration?  For example
> > >>
> > >>   qemu -M none -monitor stdio
> > >>   (qemu) machine-set-options pc,accel=kvm
> > >>   (qemu) c  
> > > 
> > > Going down this route has pretty major implications for the way libvirt
> > > manages QEMU, and support / debugging of it. When you look at the QEMU
> > > command line libvirt uses it will be almost devoid of any useful info.
> > > So it will be more involved job to figure out just how QEMU is configured.
> > > This also means it is difficult to replicate the config that libvirt has
> > > used, outside of libvirt for sake of debugging.
> > > 
> > > I also think it will have pretty significant performance implications
> > > for QEMU startup. To configure a guest via the monitor is going to
> > > require a huge number of monitor commands to be executed to replicate
> > > what we traditionally configured via ARGV. While each monitor command
> > > is not massively slow, the round-trip time of each command will quickly
> > > add up to several 100 milliseconds, perhaps even seconds in the the
> > > case of very large configs. 
> > > 
> > > Maybe we ultimately have no choice and this is inevitable, but I am
> > > pretty wary of going in the direction of launching bare QEMU and
> > > configuring everything via a huge number of monitor calls.  
> > 
> > Where's the sweet spot between
> > - configuring everything dynamically, over QMP,
> > - and invoking QEMU separately, for querying capabilities etc?  
> 
> The key with the way we currently invoke & query QEMU over QMP to detect
> capabilities is that this is not tied to a specific VM launch process.
> We can query capabilities and cache them until such time as we detect
> a QEMU binary change. So this never impacts on the startup performance
> of individual VMs. The caching is critical, because querying capabilities
> is actually quite time intensive already, taking many seconds to query
> capabilities on all the different target binaries we have.
is there another alternative for usecase where one option values depends (-numa cpu)
on values of another option values (-M + -smp + -cpu)?
 so far we have 2 options on the table:
   1: do configuration at runtime like in this series
   2: start qemu 2 times
        1st to query cpu layout and
        2nd add -numa options using data from the 1st step

> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-17 15:42             ` Laszlo Ersek
@ 2017-10-17 15:47               ` Daniel P. Berrange
  0 siblings, 0 replies; 93+ messages in thread
From: Daniel P. Berrange @ 2017-10-17 15:47 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Paolo Bonzini, peter.maydell, pkrempa, Eduardo Habkost, cohuck,
	qemu-devel, armbru, Igor Mammedov, david

On Tue, Oct 17, 2017 at 05:42:19PM +0200, Laszlo Ersek wrote:
> On 10/17/17 17:35, Daniel P. Berrange wrote:
> > On Tue, Oct 17, 2017 at 05:21:13PM +0200, Laszlo Ersek wrote:
> >> On 10/17/17 16:48, Daniel P. Berrange wrote:
> >>> On Mon, Oct 16, 2017 at 07:01:01PM +0200, Paolo Bonzini wrote:
> >>>> On 16/10/2017 18:59, Eduardo Habkost wrote:
> >>>>>> +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> >>>>>> +    "-paused [state=]postconf|preconf\n"
> >>>>>> +    "                postconf: pause QEMU after machine is initialized\n"
> >>>>>> +    "                preconf: pause QEMU before machine is initialized\n",
> >>>>>> +    QEMU_ARCH_ALL)
> >>>>> I would like to allow pausing before machine-type is selected, so
> >>>>> management could run query-machines before choosing a
> >>>>> machine-type.  Would that need a third "-pause" mode, or will we
> >>>>> be able to change "preconf" to pause before select_machine() is
> >>>>> called?
> >>>>>
> >>>>> The same probably applies to other things initialized before
> >>>>> machine_run_board_init() that could be configurable using QMP,
> >>>>> including but not limited to:
> >>>>> * Accelerator configuration
> >>>>> * Registering global properties
> >>>>> * RAM size
> >>>>> * SMP/CPU configuration
> >>>>
> >>>> Should (or could) "-M none" be changed in a backwards-compatible way to
> >>>> allow such preconfiguration?  For example
> >>>>
> >>>>   qemu -M none -monitor stdio
> >>>>   (qemu) machine-set-options pc,accel=kvm
> >>>>   (qemu) c
> >>>
> >>> Going down this route has pretty major implications for the way libvirt
> >>> manages QEMU, and support / debugging of it. When you look at the QEMU
> >>> command line libvirt uses it will be almost devoid of any useful info.
> >>> So it will be more involved job to figure out just how QEMU is configured.
> >>> This also means it is difficult to replicate the config that libvirt has
> >>> used, outside of libvirt for sake of debugging.
> >>>
> >>> I also think it will have pretty significant performance implications
> >>> for QEMU startup. To configure a guest via the monitor is going to
> >>> require a huge number of monitor commands to be executed to replicate
> >>> what we traditionally configured via ARGV. While each monitor command
> >>> is not massively slow, the round-trip time of each command will quickly
> >>> add up to several 100 milliseconds, perhaps even seconds in the the
> >>> case of very large configs. 
> >>>
> >>> Maybe we ultimately have no choice and this is inevitable, but I am
> >>> pretty wary of going in the direction of launching bare QEMU and
> >>> configuring everything via a huge number of monitor calls.
> >>
> >> Where's the sweet spot between
> >> - configuring everything dynamically, over QMP,
> >> - and invoking QEMU separately, for querying capabilities etc?
> > 
> > The key with the way we currently invoke & query QEMU over QMP to detect
> > capabilities is that this is not tied to a specific VM launch process.
> > We can query capabilities and cache them until such time as we detect
> > a QEMU binary change. So this never impacts on the startup performance
> > of individual VMs. The caching is critical, because querying capabilities
> > is actually quite time intensive already, taking many seconds to query
> > capabilities on all the different target binaries we have.
> 
> (Sorry about hijacking the thread, but I can't stop asking :) )
> 
> This looks very smart -- for my own education, how does libvirtd detect
> a QEMU binary change? Based on executable mtime, size, checksum? Are
> perhaps the <emulator> elements of individual domains involved?

We store the capabilities info in an XML file in /var, and this contains
the ctime of libvirtd and or qemu, as well as libvirt version number. If
any of those change, the cache is invalidated.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-17 15:47             ` Igor Mammedov
@ 2017-10-17 15:52               ` Daniel P. Berrange
  0 siblings, 0 replies; 93+ messages in thread
From: Daniel P. Berrange @ 2017-10-17 15:52 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Laszlo Ersek, Paolo Bonzini, peter.maydell, pkrempa,
	Eduardo Habkost, cohuck, qemu-devel, armbru, david

On Tue, Oct 17, 2017 at 05:47:03PM +0200, Igor Mammedov wrote:
> On Tue, 17 Oct 2017 16:35:15 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
> > On Tue, Oct 17, 2017 at 05:21:13PM +0200, Laszlo Ersek wrote:
> > > On 10/17/17 16:48, Daniel P. Berrange wrote:  
> > > > On Mon, Oct 16, 2017 at 07:01:01PM +0200, Paolo Bonzini wrote:  
> > > >> On 16/10/2017 18:59, Eduardo Habkost wrote:  
> > > >>>> +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> > > >>>> +    "-paused [state=]postconf|preconf\n"
> > > >>>> +    "                postconf: pause QEMU after machine is initialized\n"
> > > >>>> +    "                preconf: pause QEMU before machine is initialized\n",
> > > >>>> +    QEMU_ARCH_ALL)  
> > > >>> I would like to allow pausing before machine-type is selected, so
> > > >>> management could run query-machines before choosing a
> > > >>> machine-type.  Would that need a third "-pause" mode, or will we
> > > >>> be able to change "preconf" to pause before select_machine() is
> > > >>> called?
> > > >>>
> > > >>> The same probably applies to other things initialized before
> > > >>> machine_run_board_init() that could be configurable using QMP,
> > > >>> including but not limited to:
> > > >>> * Accelerator configuration
> > > >>> * Registering global properties
> > > >>> * RAM size
> > > >>> * SMP/CPU configuration  
> > > >>
> > > >> Should (or could) "-M none" be changed in a backwards-compatible way to
> > > >> allow such preconfiguration?  For example
> > > >>
> > > >>   qemu -M none -monitor stdio
> > > >>   (qemu) machine-set-options pc,accel=kvm
> > > >>   (qemu) c  
> > > > 
> > > > Going down this route has pretty major implications for the way libvirt
> > > > manages QEMU, and support / debugging of it. When you look at the QEMU
> > > > command line libvirt uses it will be almost devoid of any useful info.
> > > > So it will be more involved job to figure out just how QEMU is configured.
> > > > This also means it is difficult to replicate the config that libvirt has
> > > > used, outside of libvirt for sake of debugging.
> > > > 
> > > > I also think it will have pretty significant performance implications
> > > > for QEMU startup. To configure a guest via the monitor is going to
> > > > require a huge number of monitor commands to be executed to replicate
> > > > what we traditionally configured via ARGV. While each monitor command
> > > > is not massively slow, the round-trip time of each command will quickly
> > > > add up to several 100 milliseconds, perhaps even seconds in the the
> > > > case of very large configs. 
> > > > 
> > > > Maybe we ultimately have no choice and this is inevitable, but I am
> > > > pretty wary of going in the direction of launching bare QEMU and
> > > > configuring everything via a huge number of monitor calls.  
> > > 
> > > Where's the sweet spot between
> > > - configuring everything dynamically, over QMP,
> > > - and invoking QEMU separately, for querying capabilities etc?  
> > 
> > The key with the way we currently invoke & query QEMU over QMP to detect
> > capabilities is that this is not tied to a specific VM launch process.
> > We can query capabilities and cache them until such time as we detect
> > a QEMU binary change. So this never impacts on the startup performance
> > of individual VMs. The caching is critical, because querying capabilities
> > is actually quite time intensive already, taking many seconds to query
> > capabilities on all the different target binaries we have.
> is there another alternative for usecase where one option values depends (-numa cpu)
> on values of another option values (-M + -smp + -cpu)?
>  so far we have 2 options on the table:
>    1: do configuration at runtime like in this series
>    2: start qemu 2 times
>         1st to query cpu layout and
>         2nd add -numa options using data from the 1st step

Conceptually the problem occurs in places where libvirt does not fully
specifiy the object being created, leaving QEMU todo some config internally.
The elephant in the room in this regard is '-machine', since the machine
baseboard implies creation of a variety of embedded devices. Libvirt has
embedded knowledge about what device buses are assocaited with each machine
type (ie ISA, PCI, PCI-X, etc).  In theory this information could be
introspectable ahead of time because the info about what controllers are
associated with 'pc' or 'q35' is static. In practical terms though, the
QEMU code for populating machines is not structured in a way that would
allow such introspection without instantiating the machine type.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-17 15:07     ` Daniel P. Berrange
  2017-10-17 15:24       ` Laszlo Ersek
@ 2017-10-17 16:06       ` Igor Mammedov
  2017-10-17 16:09         ` Daniel P. Berrange
  2017-10-18 15:30         ` Daniel P. Berrange
  2017-10-18 12:19       ` Paolo Bonzini
  2 siblings, 2 replies; 93+ messages in thread
From: Igor Mammedov @ 2017-10-17 16:06 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: peter.maydell, pkrempa, ehabkost, cohuck, qemu-devel, armbru,
	pbonzini, david

On Tue, 17 Oct 2017 16:07:59 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > On Mon, 16 Oct 2017 17:36:36 +0100
> > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> >   
> > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:  
> > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > which allows to pause QEMU before machine_init() is run and
> > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > NUMA mapping for cpus.    
> > > 
> > > What's the problem we're seeking solve here compared to what we currently
> > > do for NUMA configuration ?  
> > From RHBZ1382425
> > "
> > Current -numa CLI interface is quite limited in terms that allow map
> > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > are non obvious and depend on machine/arch. As result libvirt has to
> > assume/re-implement cpu_index allocation logic to provide valid 
> > values for -numa cpus=... QEMU CLI option.  
> 
> In broad terms, this problem applies to every device / object libvirt
> asks QEMU to create. For everything else libvirt is able to assign a
> "id" string, which is can then use to identify the thing later. The
> CPU stuff is different because libvirt isn't able to provide 'id'
> strings for each CPU - QEMU generates a psuedo-id internally which
> libvirt has to infer. The latter is the same problem we had with
> devices before '-device' was introduced allowing 'id' naming.
> 
> IMHO we should take the same approach with CPUs and start modelling 
> the individual CPUs as something we can explicitly create with -object
> or -device. That way libvirt can assign names and does not have to 
> care about CPU index values, and it all works just the same way as
> any other devices / object we create
> 
> ie instead of:
> 
>   -smp 8,sockets=4,cores=2,threads=1
>   -numa node,nodeid=0,cpus=0-3
>   -numa node,nodeid=1,cpus=4-7
> 
> we could do:
> 
>   -object numa-node,id=numa0
>   -object numa-node,id=numa1
>   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
>   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
>   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
>   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
>   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
>   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
>   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
>   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
the follow up question would be where do "socket=3,core=1,thread=0"
come from, currently these options are the function of
(-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
runtime after qemu parses -M and -smp options.

It's either mgtg asks qemu for values or it duplicates each board
logic (including compat hacks per machine version) to be able
generate values/properties on it's own.


> (perhaps -device instead of -object above, but that's a minor detail)
> 
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-17 16:06       ` Igor Mammedov
@ 2017-10-17 16:09         ` Daniel P. Berrange
  2017-10-17 16:18           ` Igor Mammedov
  2017-10-18 15:30         ` Daniel P. Berrange
  1 sibling, 1 reply; 93+ messages in thread
From: Daniel P. Berrange @ 2017-10-17 16:09 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: peter.maydell, pkrempa, ehabkost, cohuck, qemu-devel, armbru,
	pbonzini, david

On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> On Tue, 17 Oct 2017 16:07:59 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
> > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > >   
> > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:  
> > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > > which allows to pause QEMU before machine_init() is run and
> > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > NUMA mapping for cpus.    
> > > > 
> > > > What's the problem we're seeking solve here compared to what we currently
> > > > do for NUMA configuration ?  
> > > From RHBZ1382425
> > > "
> > > Current -numa CLI interface is quite limited in terms that allow map
> > > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > > are non obvious and depend on machine/arch. As result libvirt has to
> > > assume/re-implement cpu_index allocation logic to provide valid 
> > > values for -numa cpus=... QEMU CLI option.  
> > 
> > In broad terms, this problem applies to every device / object libvirt
> > asks QEMU to create. For everything else libvirt is able to assign a
> > "id" string, which is can then use to identify the thing later. The
> > CPU stuff is different because libvirt isn't able to provide 'id'
> > strings for each CPU - QEMU generates a psuedo-id internally which
> > libvirt has to infer. The latter is the same problem we had with
> > devices before '-device' was introduced allowing 'id' naming.
> > 
> > IMHO we should take the same approach with CPUs and start modelling 
> > the individual CPUs as something we can explicitly create with -object
> > or -device. That way libvirt can assign names and does not have to 
> > care about CPU index values, and it all works just the same way as
> > any other devices / object we create
> > 
> > ie instead of:
> > 
> >   -smp 8,sockets=4,cores=2,threads=1
> >   -numa node,nodeid=0,cpus=0-3
> >   -numa node,nodeid=1,cpus=4-7
> > 
> > we could do:
> > 
> >   -object numa-node,id=numa0
> >   -object numa-node,id=numa1
> >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> the follow up question would be where do "socket=3,core=1,thread=0"
> come from, currently these options are the function of
> (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> runtime after qemu parses -M and -smp options.

The sockets/cores/threads topology of CPUs is something that comes from
the libvirt guest XML config

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-17 16:09         ` Daniel P. Berrange
@ 2017-10-17 16:18           ` Igor Mammedov
  2017-10-18 12:59             ` Eduardo Habkost
  0 siblings, 1 reply; 93+ messages in thread
From: Igor Mammedov @ 2017-10-17 16:18 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: peter.maydell, pkrempa, ehabkost, cohuck, qemu-devel, armbru,
	pbonzini, david

On Tue, 17 Oct 2017 17:09:26 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> > On Tue, 17 Oct 2017 16:07:59 +0100
> > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> >   
> > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:  
> > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > >     
> > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:    
> > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > NUMA mapping for cpus.      
> > > > > 
> > > > > What's the problem we're seeking solve here compared to what we currently
> > > > > do for NUMA configuration ?    
> > > > From RHBZ1382425
> > > > "
> > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > assume/re-implement cpu_index allocation logic to provide valid 
> > > > values for -numa cpus=... QEMU CLI option.    
> > > 
> > > In broad terms, this problem applies to every device / object libvirt
> > > asks QEMU to create. For everything else libvirt is able to assign a
> > > "id" string, which is can then use to identify the thing later. The
> > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > libvirt has to infer. The latter is the same problem we had with
> > > devices before '-device' was introduced allowing 'id' naming.
> > > 
> > > IMHO we should take the same approach with CPUs and start modelling 
> > > the individual CPUs as something we can explicitly create with -object
> > > or -device. That way libvirt can assign names and does not have to 
> > > care about CPU index values, and it all works just the same way as
> > > any other devices / object we create
> > > 
> > > ie instead of:
> > > 
> > >   -smp 8,sockets=4,cores=2,threads=1
> > >   -numa node,nodeid=0,cpus=0-3
> > >   -numa node,nodeid=1,cpus=4-7
> > > 
> > > we could do:
> > > 
> > >   -object numa-node,id=numa0
> > >   -object numa-node,id=numa1
> > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0  
> > the follow up question would be where do "socket=3,core=1,thread=0"
> > come from, currently these options are the function of
> > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > runtime after qemu parses -M and -smp options.  
> 
> The sockets/cores/threads topology of CPUs is something that comes from
> the libvirt guest XML config
in this case things for libvirt to implement would be to know following details:
   1: which machine/machine version support which set of attributes
   2: valid values for these properties depending on machine/machine version/cpu type


> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 2/6] numa: split out NumaOptions parsing into parse_NumaOptions()
  2017-10-16 16:22 ` [Qemu-devel] [RFC 2/6] numa: split out NumaOptions parsing into parse_NumaOptions() Igor Mammedov
@ 2017-10-18  3:27   ` David Gibson
  2017-10-18 14:53     ` Eric Blake
  0 siblings, 1 reply; 93+ messages in thread
From: David Gibson @ 2017-10-18  3:27 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, eblake, armbru, ehabkost, pkrempa, peter.maydell,
	pbonzini, cohuck

[-- Attachment #1: Type: text/plain, Size: 3237 bytes --]

On Mon, Oct 16, 2017 at 06:22:52PM +0200, Igor Mammedov wrote:
> it will allow to reuse parse_NumaOptions() for parsing
> configuration commands received via QMP interface
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Revieed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  include/sysemu/numa.h |  1 +
>  numa.c                | 48 +++++++++++++++++++++++++++++-------------------
>  2 files changed, 30 insertions(+), 19 deletions(-)
> 
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index c19e456..aad4230 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -30,6 +30,7 @@ struct NumaNodeMem {
>  };
>  
>  extern NodeInfo numa_info[MAX_NODES];
> +int parse_numa(void *opaque, QemuOpts *opts, Error **errp);
>  void parse_numa_opts(MachineState *ms);
>  void numa_complete_configuration(MachineState *ms);
>  void query_numa_node_mem(NumaNodeMem node_mem[]);
> diff --git a/numa.c b/numa.c
> index 18af4ff..d8e7dc0 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -254,28 +254,11 @@ static void parse_numa_distance(NumaDistOptions *dist, Error **errp)
>      have_numa_distance = true;
>  }
>  
> -static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
> +static
> +void parse_NumaOptions(MachineState *ms, NumaOptions *object, Error **errp)
>  {
> -    NumaOptions *object = NULL;
> -    MachineState *ms = opaque;
>      Error *err = NULL;
>  
> -    {
> -        Visitor *v = opts_visitor_new(opts);
> -        visit_type_NumaOptions(v, NULL, &object, &err);
> -        visit_free(v);
> -    }
> -
> -    if (err) {
> -        goto end;
> -    }
> -
> -    /* Fix up legacy suffix-less format */
> -    if ((object->type == NUMA_OPTIONS_TYPE_NODE) && object->u.node.has_mem) {
> -        const char *mem_str = qemu_opt_get(opts, "mem");
> -        qemu_strtosz_MiB(mem_str, NULL, &object->u.node.mem);
> -    }
> -
>      switch (object->type) {
>      case NUMA_OPTIONS_TYPE_NODE:
>          parse_numa_node(ms, &object->u.node, &err);
> @@ -310,6 +293,33 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
>      }
>  
>  end:
> +    if (err) {
> +        error_propagate(errp, err);
> +    }
> +}
> +
> +int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
> +{
> +    NumaOptions *object = NULL;
> +    MachineState *ms = MACHINE(opaque);
> +    Error *err = NULL;
> +    Visitor *v = opts_visitor_new(opts);
> +
> +    visit_type_NumaOptions(v, NULL, &object, &err);
> +    visit_free(v);
> +    if (err) {
> +        goto end;
> +    }
> +
> +    /* Fix up legacy suffix-less format */
> +    if ((object->type == NUMA_OPTIONS_TYPE_NODE) && object->u.node.has_mem) {
> +        const char *mem_str = qemu_opt_get(opts, "mem");
> +        qemu_strtosz_MiB(mem_str, NULL, &object->u.node.mem);
> +    }
> +
> +    parse_NumaOptions(ms, object, &err);
> +
> +end:
>      qapi_free_NumaOptions(object);
>      if (err) {
>          error_report_err(err);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [Qemu-devel] [RFC v2 3/6] possible_cpus: add CPUArchId::type field
  2017-10-16 16:22 ` [Qemu-devel] [RFC 3/6] possible_cpus: add CPUArchId::type field Igor Mammedov
@ 2017-10-18 11:12   ` Igor Mammedov
  2017-10-19  6:31     ` David Gibson
  0 siblings, 1 reply; 93+ messages in thread
From: Igor Mammedov @ 2017-10-18 11:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, pkrempa, ehabkost, pbonzini, david, drjones, cohuck

For enabling early cpu to numa node configuration at runtime
qmp_query_hotpluggable_cpus() should provide a list of available
cpu slots at early stage, before machine_init() is called and
the 1st cpu is created, so that mgmt might be able to call it
and use output to set numa mapping.
Use MachineClass::possible_cpu_arch_ids() callback to set
cpu type info, along with the rest of possible cpu properties,
to let machine define which cpu type* will be used.

* for SPAPR it will be a spapr core type and for ARM/s390x/x86
  a respective descendant of CPUClass.

Move parse_numa_opts() in vl.c after cpu_model is parsed into
cpu_type so that possible_cpu_arch_ids() would know which
cpu_type to use during layout initialization.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
  v2:
     - fix NULL dereference caused by not initialized
       MachineState::cpu_type at the time parse_numa_opts()
       were called
---
 include/hw/boards.h        |  2 ++
 hw/arm/virt.c              |  3 ++-
 hw/core/machine.c          | 12 ++++++------
 hw/i386/pc.c               |  4 +++-
 hw/ppc/spapr.c             | 13 ++++++++-----
 hw/s390x/s390-virtio-ccw.c |  1 +
 vl.c                       |  3 +--
 7 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 191a5b3..fa21758 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -80,6 +80,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
  * CPUArchId:
  * @arch_id - architecture-dependent CPU ID of present or possible CPU
  * @cpu - pointer to corresponding CPU object if it's present on NULL otherwise
+ * @type - QOM class name of possible @cpu object
  * @props - CPU object properties, initialized by board
  * #vcpus_count - number of threads provided by @cpu object
  */
@@ -88,6 +89,7 @@ typedef struct {
     int64_t vcpus_count;
     CpuInstanceProperties props;
     Object *cpu;
+    const char *type;
 } CPUArchId;
 
 /**
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 9e18b41..88319db 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1357,7 +1357,7 @@ static void machvirt_init(MachineState *machine)
             break;
         }
 
-        cpuobj = object_new(machine->cpu_type);
+        cpuobj = object_new(possible_cpus->cpus[n].type);
         object_property_set_int(cpuobj, possible_cpus->cpus[n].arch_id,
                                 "mp-affinity", NULL);
 
@@ -1573,6 +1573,7 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
                                   sizeof(CPUArchId) * max_cpus);
     ms->possible_cpus->len = max_cpus;
     for (n = 0; n < ms->possible_cpus->len; n++) {
+        ms->possible_cpus->cpus[n].type = ms->cpu_type;
         ms->possible_cpus->cpus[n].arch_id =
             virt_cpu_mp_affinity(vms, n);
         ms->possible_cpus->cpus[n].props.has_thread_id = true;
diff --git a/hw/core/machine.c b/hw/core/machine.c
index df46275..42cea7c 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -363,18 +363,18 @@ static void machine_init_notify(Notifier *notifier, void *data)
 HotpluggableCPUList *machine_query_hotpluggable_cpus(MachineState *machine)
 {
     int i;
-    Object *cpu;
     HotpluggableCPUList *head = NULL;
-    const char *cpu_type;
+    MachineClass *mc = MACHINE_GET_CLASS(machine);
+
+    /* force board to initialize possible_cpus if it hasn't been done yet */
+    mc->possible_cpu_arch_ids(machine);
 
-    cpu = machine->possible_cpus->cpus[0].cpu;
-    assert(cpu); /* Boot cpu is always present */
-    cpu_type = object_get_typename(cpu);
     for (i = 0; i < machine->possible_cpus->len; i++) {
+        Object *cpu;
         HotpluggableCPUList *list_item = g_new0(typeof(*list_item), 1);
         HotpluggableCPU *cpu_item = g_new0(typeof(*cpu_item), 1);
 
-        cpu_item->type = g_strdup(cpu_type);
+        cpu_item->type = g_strdup(machine->possible_cpus->cpus[i].type);
         cpu_item->vcpus_count = machine->possible_cpus->cpus[i].vcpus_count;
         cpu_item->props = g_memdup(&machine->possible_cpus->cpus[i].props,
                                    sizeof(*cpu_item->props));
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 8e307f7..99afb2f1 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1147,7 +1147,8 @@ void pc_cpus_init(PCMachineState *pcms)
     pcms->apic_id_limit = x86_cpu_apic_id_from_index(max_cpus - 1) + 1;
     possible_cpus = mc->possible_cpu_arch_ids(ms);
     for (i = 0; i < smp_cpus; i++) {
-        pc_new_cpu(ms->cpu_type, possible_cpus->cpus[i].arch_id, &error_fatal);
+        pc_new_cpu(possible_cpus->cpus[i].type, possible_cpus->cpus[i].arch_id,
+                   &error_fatal);
     }
 }
 
@@ -2269,6 +2270,7 @@ static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
     for (i = 0; i < ms->possible_cpus->len; i++) {
         X86CPUTopoInfo topo;
 
+        ms->possible_cpus->cpus[i].type = ms->cpu_type;
         ms->possible_cpus->cpus[i].vcpus_count = 1;
         ms->possible_cpus->cpus[i].arch_id = x86_cpu_apic_id_from_index(i);
         x86_topo_ids_from_apicid(ms->possible_cpus->cpus[i].arch_id,
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d682f01..9d63477 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2135,11 +2135,6 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
     int boot_cores_nr = smp_cpus / smp_threads;
     int i;
 
-    if (!type) {
-        error_report("Unable to find sPAPR CPU Core definition");
-        exit(1);
-    }
-
     possible_cpus = mc->possible_cpu_arch_ids(machine);
     if (mc->has_hotpluggable_cpus) {
         if (smp_cpus % smp_threads) {
@@ -3438,6 +3433,7 @@ static int64_t spapr_get_default_cpu_node_id(const MachineState *ms, int idx)
 static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
 {
     int i;
+    const char *core_type;
     int spapr_max_cores = max_cpus / smp_threads;
     MachineClass *mc = MACHINE_GET_CLASS(machine);
 
@@ -3449,12 +3445,19 @@ static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
         return machine->possible_cpus;
     }
 
+    core_type = spapr_get_cpu_core_type(machine->cpu_type);
+    if (!core_type) {
+        error_report("Unable to find sPAPR CPU Core definition");
+        exit(1);
+    }
+
     machine->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
                              sizeof(CPUArchId) * spapr_max_cores);
     machine->possible_cpus->len = spapr_max_cores;
     for (i = 0; i < machine->possible_cpus->len; i++) {
         int core_id = i * smp_threads;
 
+        machine->possible_cpus->cpus[i].type = core_type;
         machine->possible_cpus->cpus[i].vcpus_count = smp_threads;
         machine->possible_cpus->cpus[i].arch_id = core_id;
         machine->possible_cpus->cpus[i].props.has_core_id = true;
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index e593c71..75084a8 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -385,6 +385,7 @@ static const CPUArchIdList *s390_possible_cpu_arch_ids(MachineState *ms)
                                   sizeof(CPUArchId) * max_cpus);
     ms->possible_cpus->len = max_cpus;
     for (i = 0; i < ms->possible_cpus->len; i++) {
+        ms->possible_cpus->cpus[i].type = ms->cpu_type;
         ms->possible_cpus->cpus[i].vcpus_count = 1;
         ms->possible_cpus->cpus[i].arch_id = i;
         ms->possible_cpus->cpus[i].props.has_core_id = true;
diff --git a/vl.c b/vl.c
index 0723835..598217a 100644
--- a/vl.c
+++ b/vl.c
@@ -4677,8 +4677,6 @@ int main(int argc, char **argv, char **envp)
     default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
     default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);
 
-    parse_numa_opts(current_machine);
-
     if (qemu_opts_foreach(qemu_find_opts("mon"),
                           mon_init_func, NULL, NULL)) {
         exit(1);
@@ -4737,6 +4735,7 @@ int main(int argc, char **argv, char **envp)
                 cpu_parse_cpu_model(machine_class->default_cpu_type, cpu_model);
         }
     }
+    parse_numa_opts(current_machine);
 
     machine_run_board_init(current_machine);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-17 15:07     ` Daniel P. Berrange
  2017-10-17 15:24       ` Laszlo Ersek
  2017-10-17 16:06       ` Igor Mammedov
@ 2017-10-18 12:19       ` Paolo Bonzini
  2017-10-18 12:27         ` Daniel P. Berrange
  2 siblings, 1 reply; 93+ messages in thread
From: Paolo Bonzini @ 2017-10-18 12:19 UTC (permalink / raw)
  To: Daniel P. Berrange, Igor Mammedov
  Cc: qemu-devel, peter.maydell, pkrempa, ehabkost, cohuck, armbru, david

On 17/10/2017 17:07, Daniel P. Berrange wrote:
> ie instead of:
> 
>   -smp 8,sockets=4,cores=2,threads=1
>   -numa node,nodeid=0,cpus=0-3
>   -numa node,nodeid=1,cpus=4-7
> 
> we could do:
> 
>   -object numa-node,id=numa0
>   -object numa-node,id=numa1
>   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
>   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
>   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
>   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
>   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
>   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
>   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
>   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> 
> (perhaps -device instead of -object above, but that's a minor detail)

I understand that this is just an example, but wasn't this what is solved by

  -smp 8,sockets=4,cores=2,thread=1
  -numa node,nodeid=0 -numa cpu,node-id=0,socket-id=0-1
  -numa node,nodeid=1 -numa cpu,node-id=0,socket-id=2-3

?

Paolo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-18 12:19       ` Paolo Bonzini
@ 2017-10-18 12:27         ` Daniel P. Berrange
  2017-10-18 12:33           ` Paolo Bonzini
  2017-10-18 14:21           ` Igor Mammedov
  0 siblings, 2 replies; 93+ messages in thread
From: Daniel P. Berrange @ 2017-10-18 12:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Igor Mammedov, qemu-devel, peter.maydell, pkrempa, ehabkost,
	cohuck, armbru, david

On Wed, Oct 18, 2017 at 02:19:54PM +0200, Paolo Bonzini wrote:
> On 17/10/2017 17:07, Daniel P. Berrange wrote:
> > ie instead of:
> > 
> >   -smp 8,sockets=4,cores=2,threads=1
> >   -numa node,nodeid=0,cpus=0-3
> >   -numa node,nodeid=1,cpus=4-7
> > 
> > we could do:
> > 
> >   -object numa-node,id=numa0
> >   -object numa-node,id=numa1
> >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> > 
> > (perhaps -device instead of -object above, but that's a minor detail)
> 
> I understand that this is just an example, but wasn't this what is solved by
> 
>   -smp 8,sockets=4,cores=2,thread=1
>   -numa node,nodeid=0 -numa cpu,node-id=0,socket-id=0-1
>   -numa node,nodeid=1 -numa cpu,node-id=0,socket-id=2-3

IIUC, that lets you associate CPUs with NUMA nodes without having to know
the internal QEMU indexes. It won't help you with any monitor commands you
need to run later that expect the CPU index as input value.  My example
where lets you assign IDs to each CPU, which can then be used for montor
commands too - i should have illustrated that bit of it too.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-18 12:27         ` Daniel P. Berrange
@ 2017-10-18 12:33           ` Paolo Bonzini
  2017-10-18 14:26             ` Igor Mammedov
  2017-10-18 14:21           ` Igor Mammedov
  1 sibling, 1 reply; 93+ messages in thread
From: Paolo Bonzini @ 2017-10-18 12:33 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Igor Mammedov, qemu-devel, peter.maydell, pkrempa, ehabkost,
	cohuck, armbru, david

On 18/10/2017 14:27, Daniel P. Berrange wrote:
>>>   -object numa-node,id=numa0
>>>   -object numa-node,id=numa1
>>>   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
>>>   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
>>>   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
>>>   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
>>>   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
>>>   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
>>>   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
>>>   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
>>>
>>> (perhaps -device instead of -object above, but that's a minor detail)
>> I understand that this is just an example, but wasn't this what is solved by
>>
>>   -smp 8,sockets=4,cores=2,thread=1
>>   -numa node,nodeid=0 -numa cpu,node-id=0,socket-id=0-1
>>   -numa node,nodeid=1 -numa cpu,node-id=0,socket-id=2-3
> IIUC, that lets you associate CPUs with NUMA nodes without having to know
> the internal QEMU indexes. It won't help you with any monitor commands you
> need to run later that expect the CPU index as input value.  My example
> where lets you assign IDs to each CPU, which can then be used for montor
> commands too - i should have illustrated that bit of it too.

I guess query-hotpluggable-cpus could also grow a first-cpu-index in the
returned data.

Paolo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-17 16:18           ` Igor Mammedov
@ 2017-10-18 12:59             ` Eduardo Habkost
  2017-10-18 14:44               ` Igor Mammedov
  0 siblings, 1 reply; 93+ messages in thread
From: Eduardo Habkost @ 2017-10-18 12:59 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Daniel P. Berrange, peter.maydell, pkrempa, cohuck, qemu-devel,
	armbru, pbonzini, david

On Tue, Oct 17, 2017 at 06:18:59PM +0200, Igor Mammedov wrote:
> On Tue, 17 Oct 2017 17:09:26 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
> > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > >   
> > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:  
> > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > >     
> > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:    
> > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > NUMA mapping for cpus.      
> > > > > > 
> > > > > > What's the problem we're seeking solve here compared to what we currently
> > > > > > do for NUMA configuration ?    
> > > > > From RHBZ1382425
> > > > > "
> > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > assume/re-implement cpu_index allocation logic to provide valid 
> > > > > values for -numa cpus=... QEMU CLI option.    
> > > > 
> > > > In broad terms, this problem applies to every device / object libvirt
> > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > "id" string, which is can then use to identify the thing later. The
> > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > libvirt has to infer. The latter is the same problem we had with
> > > > devices before '-device' was introduced allowing 'id' naming.
> > > > 
> > > > IMHO we should take the same approach with CPUs and start modelling 
> > > > the individual CPUs as something we can explicitly create with -object
> > > > or -device. That way libvirt can assign names and does not have to 
> > > > care about CPU index values, and it all works just the same way as
> > > > any other devices / object we create
> > > > 
> > > > ie instead of:
> > > > 
> > > >   -smp 8,sockets=4,cores=2,threads=1
> > > >   -numa node,nodeid=0,cpus=0-3
> > > >   -numa node,nodeid=1,cpus=4-7
> > > > 
> > > > we could do:
> > > > 
> > > >   -object numa-node,id=numa0
> > > >   -object numa-node,id=numa1
> > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0  
> > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > come from, currently these options are the function of
> > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > runtime after qemu parses -M and -smp options.  
> > 

Also, note that in the case of NUMA, having identifiers for CPU
objects themselves won't be enough. NUMA settings need
identifiers for CPU slots (even if they are still empty), and
those slots are provided by the machine, not created by the user.


> > The sockets/cores/threads topology of CPUs is something that comes from
> > the libvirt guest XML config
> in this case things for libvirt to implement would be to know following details:
>    1: which machine/machine version support which set of attributes
>    2: valid values for these properties depending on machine/machine version/cpu type

The big assumption in this series is that libvirt doesn't know in
advance how the possible slots for CPUs will look like on each
machine-type, and need to query them using
query-hotpluggable-cpus.

But if this assumption was really true, it would be impossible
for the user to even decide how the NUMA topology will look like,
wouldn't it?

Igor, are you able to give one example of how the user input
(libvirt XML) for configuring NUMA CPU binding could look like if
the user didn't know yet what the available sockets/cores/threads
are?

-- 
Eduardo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-18 12:27         ` Daniel P. Berrange
  2017-10-18 12:33           ` Paolo Bonzini
@ 2017-10-18 14:21           ` Igor Mammedov
  1 sibling, 0 replies; 93+ messages in thread
From: Igor Mammedov @ 2017-10-18 14:21 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Paolo Bonzini, qemu-devel, peter.maydell, pkrempa, ehabkost,
	cohuck, armbru, david

On Wed, 18 Oct 2017 13:27:15 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Wed, Oct 18, 2017 at 02:19:54PM +0200, Paolo Bonzini wrote:
> > On 17/10/2017 17:07, Daniel P. Berrange wrote:  
> > > ie instead of:
> > > 
> > >   -smp 8,sockets=4,cores=2,threads=1
> > >   -numa node,nodeid=0,cpus=0-3
> > >   -numa node,nodeid=1,cpus=4-7
> > > 
> > > we could do:
> > > 
> > >   -object numa-node,id=numa0
> > >   -object numa-node,id=numa1
> > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> > > 
> > > (perhaps -device instead of -object above, but that's a minor detail)  
> > 
> > I understand that this is just an example, but wasn't this what is solved by
> > 
> >   -smp 8,sockets=4,cores=2,thread=1
> >   -numa node,nodeid=0 -numa cpu,node-id=0,socket-id=0-1
> >   -numa node,nodeid=1 -numa cpu,node-id=0,socket-id=2-3  
> 
> IIUC, that lets you associate CPUs with NUMA nodes without having to know
> the internal QEMU indexes. 
yep with -numa cpu, user don't need to know cpu_index-es anymore,
but the next thing to know for user is:
  1: set of properties machine supports
      x86: "{socket|core|thread}-id"
      spapr: "core-id"
      s390: ...
  2: what values to use for above properties,
     they might be 0..n interval but also might be sparse
     (spapr for example), in other words values owned by machine
     and might also depend on its version.

above 2 points were a reason why query-hotpluggable-cpus
had been introduced so that mgmt would query qemu for valid set
of properties/values for a given set of options so it could
compose a valid device_add command for hotplug.

I'm not opposed to adding
  -smp 0 -device foo-cpu,id=cpuX,...
as Daniel suggests but libvirt would have to implement logic that
makes up #1+#2 (which probably means duplicating it from qemu)


>It won't help you with any monitor commands you
> need to run later that expect the CPU index as input value.  My example
> where lets you assign IDs to each CPU, which can then be used for montor
> commands too - i should have illustrated that bit of it too.
monitor commands that take cpu-index is a separate not related story though.
They needs to be worked on to use socket|core|thread-id where it makes
sense.
(
For ex: spapr core can't be used as address with 'cpu' command as it
expects to thread level object stored and other commands would operate on/
expect CPUState being pointed out. Introducing explicit ID set by mgmt
won't be of use here either is it will set for core object while children
threads will be name-less.
In such usecases we can use qom path to thread of interest which
could be queried in runtime with query-cpus that gives thread level view.
)
 
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-18 12:33           ` Paolo Bonzini
@ 2017-10-18 14:26             ` Igor Mammedov
  2017-10-18 14:29               ` Paolo Bonzini
  0 siblings, 1 reply; 93+ messages in thread
From: Igor Mammedov @ 2017-10-18 14:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Daniel P. Berrange, peter.maydell, pkrempa, ehabkost, cohuck,
	qemu-devel, armbru, david

On Wed, 18 Oct 2017 14:33:49 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 18/10/2017 14:27, Daniel P. Berrange wrote:
> >>>   -object numa-node,id=numa0
> >>>   -object numa-node,id=numa1
> >>>   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> >>>   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> >>>   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> >>>   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> >>>   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> >>>   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> >>>   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> >>>   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> >>>
> >>> (perhaps -device instead of -object above, but that's a minor detail)  
> >> I understand that this is just an example, but wasn't this what is solved by
> >>
> >>   -smp 8,sockets=4,cores=2,thread=1
> >>   -numa node,nodeid=0 -numa cpu,node-id=0,socket-id=0-1
> >>   -numa node,nodeid=1 -numa cpu,node-id=0,socket-id=2-3  
> > IIUC, that lets you associate CPUs with NUMA nodes without having to know
> > the internal QEMU indexes. It won't help you with any monitor commands you
> > need to run later that expect the CPU index as input value.  My example
> > where lets you assign IDs to each CPU, which can then be used for montor
> > commands too - i should have illustrated that bit of it too.  
> 
> I guess query-hotpluggable-cpus could also grow a first-cpu-index in the
> returned data.
I guess query-cpus can/does provide cpu-index already,
for query-hotpluggable-cpus it would depend in what's shown there
(would work fro x86/arm/s390 as they publish there CPUState based objects,
but spapr puts cores there which themselves do not have cpu-index,
their children do though)


> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-18 14:26             ` Igor Mammedov
@ 2017-10-18 14:29               ` Paolo Bonzini
  2017-10-18 14:54                 ` Igor Mammedov
  0 siblings, 1 reply; 93+ messages in thread
From: Paolo Bonzini @ 2017-10-18 14:29 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Daniel P. Berrange, peter.maydell, pkrempa, ehabkost, cohuck,
	qemu-devel, armbru, david

On 18/10/2017 16:26, Igor Mammedov wrote:
>> I guess query-hotpluggable-cpus could also grow a first-cpu-index in the
>> returned data.
> 
> I guess query-cpus can/does provide cpu-index already,
> for query-hotpluggable-cpus it would depend in what's shown there
> (would work fro x86/arm/s390 as they publish there CPUState based objects,
> but spapr puts cores there which themselves do not have cpu-index,
> their children do though)

Yeah, that's why I put "first-cpu-index".  The idea is that indices go
from first-cpu-index to first-cpu-index + vcpus-count - 1.

Paolo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-18 12:59             ` Eduardo Habkost
@ 2017-10-18 14:44               ` Igor Mammedov
  2017-10-18 14:49                 ` Daniel P. Berrange
  0 siblings, 1 reply; 93+ messages in thread
From: Igor Mammedov @ 2017-10-18 14:44 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Daniel P. Berrange, peter.maydell, pkrempa, cohuck, qemu-devel,
	armbru, pbonzini, david

On Wed, 18 Oct 2017 10:59:11 -0200
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Tue, Oct 17, 2017 at 06:18:59PM +0200, Igor Mammedov wrote:
> > On Tue, 17 Oct 2017 17:09:26 +0100
> > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> >   
> > > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:  
> > > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > >     
> > > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:    
> > > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > >       
> > > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:      
> > > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > > NUMA mapping for cpus.        
> > > > > > > 
> > > > > > > What's the problem we're seeking solve here compared to what we currently
> > > > > > > do for NUMA configuration ?      
> > > > > > From RHBZ1382425
> > > > > > "
> > > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > > assume/re-implement cpu_index allocation logic to provide valid 
> > > > > > values for -numa cpus=... QEMU CLI option.      
> > > > > 
> > > > > In broad terms, this problem applies to every device / object libvirt
> > > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > > "id" string, which is can then use to identify the thing later. The
> > > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > > libvirt has to infer. The latter is the same problem we had with
> > > > > devices before '-device' was introduced allowing 'id' naming.
> > > > > 
> > > > > IMHO we should take the same approach with CPUs and start modelling 
> > > > > the individual CPUs as something we can explicitly create with -object
> > > > > or -device. That way libvirt can assign names and does not have to 
> > > > > care about CPU index values, and it all works just the same way as
> > > > > any other devices / object we create
> > > > > 
> > > > > ie instead of:
> > > > > 
> > > > >   -smp 8,sockets=4,cores=2,threads=1
> > > > >   -numa node,nodeid=0,cpus=0-3
> > > > >   -numa node,nodeid=1,cpus=4-7
> > > > > 
> > > > > we could do:
> > > > > 
> > > > >   -object numa-node,id=numa0
> > > > >   -object numa-node,id=numa1
> > > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0    
> > > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > > come from, currently these options are the function of
> > > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > > runtime after qemu parses -M and -smp options.    
> > >   
> 
> Also, note that in the case of NUMA, having identifiers for CPU
> objects themselves won't be enough. NUMA settings need
> identifiers for CPU slots (even if they are still empty), and
> those slots are provided by the machine, not created by the user.
> 
> 
> > > The sockets/cores/threads topology of CPUs is something that comes from
> > > the libvirt guest XML config  
> > in this case things for libvirt to implement would be to know following details:
> >    1: which machine/machine version support which set of attributes
> >    2: valid values for these properties depending on machine/machine version/cpu type  
> 
> The big assumption in this series is that libvirt doesn't know in
> advance how the possible slots for CPUs will look like on each
> machine-type, and need to query them using
> query-hotpluggable-cpus.
yep, that's true and it started with introduction of 'device_add cpu'
where libvirt didn't new what to specify as options for new cpu,
hence query-hotpluggable-cpus were added to provide that information.


> But if this assumption was really true, it would be impossible
> for the user to even decide how the NUMA topology will look like,
> wouldn't it?
> 
> Igor, are you able to give one example of how the user input
> (libvirt XML) for configuring NUMA CPU binding could look like if
> the user didn't know yet what the available sockets/cores/threads
> are?
not sure I parse question but looking at libvirt's domain docs
it mentions
  <numa>
    <cell id='0' cpus='0-3' memory='512000' unit='KiB'/>
    <cell id='1' cpus='4-7' memory='512000' unit='KiB' memAccess='shared'/>
  </numa>

here libvirt assumes that there are cpus with cpu-index in range 0-7
/and probably duplicates logic that calculates cpu-index/
If libvirt would continue to duplicate logic we could skip on
implementing early runtime QMP in QEMU and also drop support for
query-hotpluggable-cpus as libvirt would be able to compute
properties/values on it's own.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-18 14:44               ` Igor Mammedov
@ 2017-10-18 14:49                 ` Daniel P. Berrange
  2017-10-18 15:24                   ` Igor Mammedov
  0 siblings, 1 reply; 93+ messages in thread
From: Daniel P. Berrange @ 2017-10-18 14:49 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Eduardo Habkost, peter.maydell, pkrempa, cohuck, qemu-devel,
	armbru, pbonzini, david

On Wed, Oct 18, 2017 at 04:44:35PM +0200, Igor Mammedov wrote:
> On Wed, 18 Oct 2017 10:59:11 -0200
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Tue, Oct 17, 2017 at 06:18:59PM +0200, Igor Mammedov wrote:
> > > On Tue, 17 Oct 2017 17:09:26 +0100
> > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > >   
> > > > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:  
> > > > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > >     
> > > > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:    
> > > > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > > >       
> > > > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:      
> > > > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > > > NUMA mapping for cpus.        
> > > > > > > > 
> > > > > > > > What's the problem we're seeking solve here compared to what we currently
> > > > > > > > do for NUMA configuration ?      
> > > > > > > From RHBZ1382425
> > > > > > > "
> > > > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > > > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > > > assume/re-implement cpu_index allocation logic to provide valid 
> > > > > > > values for -numa cpus=... QEMU CLI option.      
> > > > > > 
> > > > > > In broad terms, this problem applies to every device / object libvirt
> > > > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > > > "id" string, which is can then use to identify the thing later. The
> > > > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > > > libvirt has to infer. The latter is the same problem we had with
> > > > > > devices before '-device' was introduced allowing 'id' naming.
> > > > > > 
> > > > > > IMHO we should take the same approach with CPUs and start modelling 
> > > > > > the individual CPUs as something we can explicitly create with -object
> > > > > > or -device. That way libvirt can assign names and does not have to 
> > > > > > care about CPU index values, and it all works just the same way as
> > > > > > any other devices / object we create
> > > > > > 
> > > > > > ie instead of:
> > > > > > 
> > > > > >   -smp 8,sockets=4,cores=2,threads=1
> > > > > >   -numa node,nodeid=0,cpus=0-3
> > > > > >   -numa node,nodeid=1,cpus=4-7
> > > > > > 
> > > > > > we could do:
> > > > > > 
> > > > > >   -object numa-node,id=numa0
> > > > > >   -object numa-node,id=numa1
> > > > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0    
> > > > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > > > come from, currently these options are the function of
> > > > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > > > runtime after qemu parses -M and -smp options.    
> > > >   
> > 
> > Also, note that in the case of NUMA, having identifiers for CPU
> > objects themselves won't be enough. NUMA settings need
> > identifiers for CPU slots (even if they are still empty), and
> > those slots are provided by the machine, not created by the user.
> > 
> > 
> > > > The sockets/cores/threads topology of CPUs is something that comes from
> > > > the libvirt guest XML config  
> > > in this case things for libvirt to implement would be to know following details:
> > >    1: which machine/machine version support which set of attributes
> > >    2: valid values for these properties depending on machine/machine version/cpu type  
> > 
> > The big assumption in this series is that libvirt doesn't know in
> > advance how the possible slots for CPUs will look like on each
> > machine-type, and need to query them using
> > query-hotpluggable-cpus.
> yep, that's true and it started with introduction of 'device_add cpu'
> where libvirt didn't new what to specify as options for new cpu,
> hence query-hotpluggable-cpus were added to provide that information.
> 
> 
> > But if this assumption was really true, it would be impossible
> > for the user to even decide how the NUMA topology will look like,
> > wouldn't it?
> > 
> > Igor, are you able to give one example of how the user input
> > (libvirt XML) for configuring NUMA CPU binding could look like if
> > the user didn't know yet what the available sockets/cores/threads
> > are?
> not sure I parse question but looking at libvirt's domain docs
> it mentions
>   <numa>
>     <cell id='0' cpus='0-3' memory='512000' unit='KiB'/>
>     <cell id='1' cpus='4-7' memory='512000' unit='KiB' memAccess='shared'/>
>   </numa>
> 
> here libvirt assumes that there are cpus with cpu-index in range 0-7
> /and probably duplicates logic that calculates cpu-index/
> If libvirt would continue to duplicate logic we could skip on
> implementing early runtime QMP in QEMU and also drop support for
> query-hotpluggable-cpus as libvirt would be able to compute
> properties/values on it's own.

>From the POV of the XML, these CPU numbers are *not* required to be
the same as any QEMU CPU index. This is just saying that we've got
a <vcpus>8</vcpu> element, and we want the first 4 CPUs in one node
and the second 4 in the second node. 

If QEMU assigns CPU indexes 70-77 internally, that's not relevant to
the XML POV, which uses 0-7 regardless. If there ever was such a
disjoint representation of CPU indexes libvirt would have to remap
whats in the XML to match whats in QEMU

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 2/6] numa: split out NumaOptions parsing into parse_NumaOptions()
  2017-10-18  3:27   ` David Gibson
@ 2017-10-18 14:53     ` Eric Blake
  0 siblings, 0 replies; 93+ messages in thread
From: Eric Blake @ 2017-10-18 14:53 UTC (permalink / raw)
  To: David Gibson, Igor Mammedov
  Cc: qemu-devel, armbru, ehabkost, pkrempa, peter.maydell, pbonzini, cohuck

[-- Attachment #1: Type: text/plain, Size: 525 bytes --]

On 10/17/2017 10:27 PM, David Gibson wrote:
> On Mon, Oct 16, 2017 at 06:22:52PM +0200, Igor Mammedov wrote:
>> it will allow to reuse parse_NumaOptions() for parsing
>> configuration commands received via QMP interface
>>
>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> 
> Revieed-by: David Gibson <david@gibson.dropbear.id.au>

Does it still count as R-b with the typo? :)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-18 14:29               ` Paolo Bonzini
@ 2017-10-18 14:54                 ` Igor Mammedov
  0 siblings, 0 replies; 93+ messages in thread
From: Igor Mammedov @ 2017-10-18 14:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Daniel P. Berrange, peter.maydell, pkrempa, ehabkost, cohuck,
	qemu-devel, armbru, david

On Wed, 18 Oct 2017 16:29:38 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 18/10/2017 16:26, Igor Mammedov wrote:
> >> I guess query-hotpluggable-cpus could also grow a first-cpu-index in the
> >> returned data.  
> > 
> > I guess query-cpus can/does provide cpu-index already,
> > for query-hotpluggable-cpus it would depend in what's shown there
> > (would work fro x86/arm/s390 as they publish there CPUState based objects,
> > but spapr puts cores there which themselves do not have cpu-index,
> > their children do though)  
> 
> Yeah, that's why I put "first-cpu-index".  The idea is that indices go
> from first-cpu-index to first-cpu-index + vcpus-count - 1.
yep, so far it's so.

we can also add optional extra entries there for each
thread like this:
  
Hotpluggable CPUs:
  type: "power8_v2.0-spapr-cpu-core"
  vcpus_count: "1"
  qom_path: "/machine/unattached/device[0]"
  children threads:
           /machine/unattached/device[0]/thread[0]
           /machine/unattached/device[0]/thread[1]
  CPUInstance Properties:
    core-id: "0"

or ignore high level query-hotpluggable-cpus and use existing
query-cpus which already provides qom path to threads

and replace of cpu-index based monitor commands with qom path
based ones. (though it won't change fact that both are owned by QEMU)

> Paolo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-18 14:49                 ` Daniel P. Berrange
@ 2017-10-18 15:24                   ` Igor Mammedov
  2017-10-18 15:27                     ` Daniel P. Berrange
  0 siblings, 1 reply; 93+ messages in thread
From: Igor Mammedov @ 2017-10-18 15:24 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Eduardo Habkost, peter.maydell, pkrempa, cohuck, qemu-devel,
	armbru, pbonzini, david

On Wed, 18 Oct 2017 15:49:36 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Wed, Oct 18, 2017 at 04:44:35PM +0200, Igor Mammedov wrote:
> > On Wed, 18 Oct 2017 10:59:11 -0200
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> >   
> > > On Tue, Oct 17, 2017 at 06:18:59PM +0200, Igor Mammedov wrote:  
> > > > On Tue, 17 Oct 2017 17:09:26 +0100
> > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > >     
> > > > > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:    
> > > > > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > >       
> > > > > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:      
> > > > > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > > > >         
> > > > > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:        
> > > > > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > > > > NUMA mapping for cpus.          
> > > > > > > > > 
> > > > > > > > > What's the problem we're seeking solve here compared to what we currently
> > > > > > > > > do for NUMA configuration ?        
> > > > > > > > From RHBZ1382425
> > > > > > > > "
> > > > > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > > > > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > > > > assume/re-implement cpu_index allocation logic to provide valid 
> > > > > > > > values for -numa cpus=... QEMU CLI option.        
> > > > > > > 
> > > > > > > In broad terms, this problem applies to every device / object libvirt
> > > > > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > > > > "id" string, which is can then use to identify the thing later. The
> > > > > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > > > > libvirt has to infer. The latter is the same problem we had with
> > > > > > > devices before '-device' was introduced allowing 'id' naming.
> > > > > > > 
> > > > > > > IMHO we should take the same approach with CPUs and start modelling 
> > > > > > > the individual CPUs as something we can explicitly create with -object
> > > > > > > or -device. That way libvirt can assign names and does not have to 
> > > > > > > care about CPU index values, and it all works just the same way as
> > > > > > > any other devices / object we create
> > > > > > > 
> > > > > > > ie instead of:
> > > > > > > 
> > > > > > >   -smp 8,sockets=4,cores=2,threads=1
> > > > > > >   -numa node,nodeid=0,cpus=0-3
> > > > > > >   -numa node,nodeid=1,cpus=4-7
> > > > > > > 
> > > > > > > we could do:
> > > > > > > 
> > > > > > >   -object numa-node,id=numa0
> > > > > > >   -object numa-node,id=numa1
> > > > > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > > > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > > > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > > > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > > > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > > > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > > > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > > > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0      
> > > > > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > > > > come from, currently these options are the function of
> > > > > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > > > > runtime after qemu parses -M and -smp options.      
> > > > >     
> > > 
> > > Also, note that in the case of NUMA, having identifiers for CPU
> > > objects themselves won't be enough. NUMA settings need
> > > identifiers for CPU slots (even if they are still empty), and
> > > those slots are provided by the machine, not created by the user.
> > > 
> > >   
> > > > > The sockets/cores/threads topology of CPUs is something that comes from
> > > > > the libvirt guest XML config    
> > > > in this case things for libvirt to implement would be to know following details:
> > > >    1: which machine/machine version support which set of attributes
> > > >    2: valid values for these properties depending on machine/machine version/cpu type    
> > > 
> > > The big assumption in this series is that libvirt doesn't know in
> > > advance how the possible slots for CPUs will look like on each
> > > machine-type, and need to query them using
> > > query-hotpluggable-cpus.  
> > yep, that's true and it started with introduction of 'device_add cpu'
> > where libvirt didn't new what to specify as options for new cpu,
> > hence query-hotpluggable-cpus were added to provide that information.
> > 
> >   
> > > But if this assumption was really true, it would be impossible
> > > for the user to even decide how the NUMA topology will look like,
> > > wouldn't it?
> > > 
> > > Igor, are you able to give one example of how the user input
> > > (libvirt XML) for configuring NUMA CPU binding could look like if
> > > the user didn't know yet what the available sockets/cores/threads
> > > are?  
> > not sure I parse question but looking at libvirt's domain docs
> > it mentions
> >   <numa>
> >     <cell id='0' cpus='0-3' memory='512000' unit='KiB'/>
> >     <cell id='1' cpus='4-7' memory='512000' unit='KiB' memAccess='shared'/>
> >   </numa>
> > 
> > here libvirt assumes that there are cpus with cpu-index in range 0-7
> > /and probably duplicates logic that calculates cpu-index/
> > If libvirt would continue to duplicate logic we could skip on
> > implementing early runtime QMP in QEMU and also drop support for
> > query-hotpluggable-cpus as libvirt would be able to compute
> > properties/values on it's own.  
> 
> From the POV of the XML, these CPU numbers are *not* required to be
> the same as any QEMU CPU index. This is just saying that we've got
> a <vcpus>8</vcpu> element, and we want the first 4 CPUs in one node
> and the second 4 in the second node. 
> 
> If QEMU assigns CPU indexes 70-77 internally, that's not relevant to
> the XML POV, which uses 0-7 regardless. If there ever was such a
> disjoint representation of CPU indexes libvirt would have to remap
> whats in the XML to match whats in QEMU
that's what I'm saying, libvirt has to knows which cpu-indexes are valid
to use so it is able to build CLI which works:
  "-numa node,nodeid=0,cpus=0-3 -numa node,nodeid=1cpus=4-7"
and if algoritm that assigns cpu-indexes would change on QEMU side
it would break libvirt.

now to newer interface
  "-numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=1,socket-id=1"
libvirt would had to know that socket-id and values 0-1 are valid,
now moving to spapr
  "-numa cpu,node-id=0,core-id=0 -numa cpu,node-id=1,core-id=8"
here valid values are not so obvious, core-id values are function
of "-smp"

this series was written so that mgmt won't have to duplicate logic
to match the same logic in qemu as libvirt didn't want to maintain
it, I'd assume because it's fragile. If libvirt would make up valid
properties/values on it's own we can forget about this series.

> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-18 15:24                   ` Igor Mammedov
@ 2017-10-18 15:27                     ` Daniel P. Berrange
  2017-10-18 20:11                       ` Eduardo Habkost
  0 siblings, 1 reply; 93+ messages in thread
From: Daniel P. Berrange @ 2017-10-18 15:27 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Eduardo Habkost, peter.maydell, pkrempa, cohuck, qemu-devel,
	armbru, pbonzini, david

On Wed, Oct 18, 2017 at 05:24:12PM +0200, Igor Mammedov wrote:
> On Wed, 18 Oct 2017 15:49:36 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
> > On Wed, Oct 18, 2017 at 04:44:35PM +0200, Igor Mammedov wrote:
> > > not sure I parse question but looking at libvirt's domain docs
> > > it mentions
> > >   <numa>
> > >     <cell id='0' cpus='0-3' memory='512000' unit='KiB'/>
> > >     <cell id='1' cpus='4-7' memory='512000' unit='KiB' memAccess='shared'/>
> > >   </numa>
> > > 
> > > here libvirt assumes that there are cpus with cpu-index in range 0-7
> > > /and probably duplicates logic that calculates cpu-index/
> > > If libvirt would continue to duplicate logic we could skip on
> > > implementing early runtime QMP in QEMU and also drop support for
> > > query-hotpluggable-cpus as libvirt would be able to compute
> > > properties/values on it's own.  
> > 
> > From the POV of the XML, these CPU numbers are *not* required to be
> > the same as any QEMU CPU index. This is just saying that we've got
> > a <vcpus>8</vcpu> element, and we want the first 4 CPUs in one node
> > and the second 4 in the second node. 
> > 
> > If QEMU assigns CPU indexes 70-77 internally, that's not relevant to
> > the XML POV, which uses 0-7 regardless. If there ever was such a
> > disjoint representation of CPU indexes libvirt would have to remap
> > whats in the XML to match whats in QEMU
> that's what I'm saying, libvirt has to knows which cpu-indexes are valid
> to use so it is able to build CLI which works:
>   "-numa node,nodeid=0,cpus=0-3 -numa node,nodeid=1cpus=4-7"
> and if algoritm that assigns cpu-indexes would change on QEMU side
> it would break libvirt.

That's why I think QEMU should libvirt assign 'id' values to each
CPU, just like we do for other devices/object. That way QEMU can
have whatever CPU index numbering scheme it likes and it has no
effect on the mgmt app.

> now to newer interface
>   "-numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=1,socket-id=1"
> libvirt would had to know that socket-id and values 0-1 are valid,
> now moving to spapr
>   "-numa cpu,node-id=0,core-id=0 -numa cpu,node-id=1,core-id=8"
> here valid values are not so obvious, core-id values are function
> of "-smp"
> 
> this series was written so that mgmt won't have to duplicate logic
> to match the same logic in qemu as libvirt didn't want to maintain
> it, I'd assume because it's fragile. If libvirt would make up valid
> properties/values on it's own we can forget about this series.

>From libvirt POV we all we want to say is have N sockets, each with M
cores, each with O threads. That is architecture agnostic and what I
was trying to illustrate with my earlier proposed CLI syntax.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-17 16:06       ` Igor Mammedov
  2017-10-17 16:09         ` Daniel P. Berrange
@ 2017-10-18 15:30         ` Daniel P. Berrange
  2017-10-18 20:22           ` Eduardo Habkost
  2017-10-19 15:21           ` Igor Mammedov
  1 sibling, 2 replies; 93+ messages in thread
From: Daniel P. Berrange @ 2017-10-18 15:30 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: peter.maydell, pkrempa, ehabkost, cohuck, qemu-devel, armbru,
	pbonzini, david

On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> On Tue, 17 Oct 2017 16:07:59 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
> > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > >   
> > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:  
> > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > > which allows to pause QEMU before machine_init() is run and
> > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > NUMA mapping for cpus.    
> > > > 
> > > > What's the problem we're seeking solve here compared to what we currently
> > > > do for NUMA configuration ?  
> > > From RHBZ1382425
> > > "
> > > Current -numa CLI interface is quite limited in terms that allow map
> > > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > > are non obvious and depend on machine/arch. As result libvirt has to
> > > assume/re-implement cpu_index allocation logic to provide valid 
> > > values for -numa cpus=... QEMU CLI option.  
> > 
> > In broad terms, this problem applies to every device / object libvirt
> > asks QEMU to create. For everything else libvirt is able to assign a
> > "id" string, which is can then use to identify the thing later. The
> > CPU stuff is different because libvirt isn't able to provide 'id'
> > strings for each CPU - QEMU generates a psuedo-id internally which
> > libvirt has to infer. The latter is the same problem we had with
> > devices before '-device' was introduced allowing 'id' naming.
> > 
> > IMHO we should take the same approach with CPUs and start modelling 
> > the individual CPUs as something we can explicitly create with -object
> > or -device. That way libvirt can assign names and does not have to 
> > care about CPU index values, and it all works just the same way as
> > any other devices / object we create
> > 
> > ie instead of:
> > 
> >   -smp 8,sockets=4,cores=2,threads=1
> >   -numa node,nodeid=0,cpus=0-3
> >   -numa node,nodeid=1,cpus=4-7
> > 
> > we could do:
> > 
> >   -object numa-node,id=numa0
> >   -object numa-node,id=numa1
> >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> the follow up question would be where do "socket=3,core=1,thread=0"
> come from, currently these options are the function of
> (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> runtime after qemu parses -M and -smp options.

NB, I realize my example was open to mis-interpretation. The values I'm
illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
are a plain enumeration of values. ie this is saying the 4th socket, the
2nd core and the 1st thread.  Internally QEMU might have the 2nd core
with a core-id of 8, or 7038 or whatever architecture specific numbering
scheme makes sense, but that's not what the mgmt app gives at the CLI
level


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-18 15:27                     ` Daniel P. Berrange
@ 2017-10-18 20:11                       ` Eduardo Habkost
  0 siblings, 0 replies; 93+ messages in thread
From: Eduardo Habkost @ 2017-10-18 20:11 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Igor Mammedov, peter.maydell, pkrempa, cohuck, qemu-devel,
	armbru, pbonzini, david

On Wed, Oct 18, 2017 at 04:27:47PM +0100, Daniel P. Berrange wrote:
> On Wed, Oct 18, 2017 at 05:24:12PM +0200, Igor Mammedov wrote:
> > On Wed, 18 Oct 2017 15:49:36 +0100
> > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > 
> > > On Wed, Oct 18, 2017 at 04:44:35PM +0200, Igor Mammedov wrote:
> > > > not sure I parse question but looking at libvirt's domain docs
> > > > it mentions
> > > >   <numa>
> > > >     <cell id='0' cpus='0-3' memory='512000' unit='KiB'/>
> > > >     <cell id='1' cpus='4-7' memory='512000' unit='KiB' memAccess='shared'/>
> > > >   </numa>
> > > > 
> > > > here libvirt assumes that there are cpus with cpu-index in range 0-7
> > > > /and probably duplicates logic that calculates cpu-index/
> > > > If libvirt would continue to duplicate logic we could skip on
> > > > implementing early runtime QMP in QEMU and also drop support for
> > > > query-hotpluggable-cpus as libvirt would be able to compute
> > > > properties/values on it's own.  
> > > 
> > > From the POV of the XML, these CPU numbers are *not* required to be
> > > the same as any QEMU CPU index. This is just saying that we've got
> > > a <vcpus>8</vcpu> element, and we want the first 4 CPUs in one node
> > > and the second 4 in the second node. 
> > > 
> > > If QEMU assigns CPU indexes 70-77 internally, that's not relevant to
> > > the XML POV, which uses 0-7 regardless. If there ever was such a
> > > disjoint representation of CPU indexes libvirt would have to remap
> > > whats in the XML to match whats in QEMU
> > that's what I'm saying, libvirt has to knows which cpu-indexes are valid
> > to use so it is able to build CLI which works:
> >   "-numa node,nodeid=0,cpus=0-3 -numa node,nodeid=1cpus=4-7"
> > and if algoritm that assigns cpu-indexes would change on QEMU side
> > it would break libvirt.
> 
> That's why I think QEMU should libvirt assign 'id' values to each
> CPU, just like we do for other devices/object. That way QEMU can
> have whatever CPU index numbering scheme it likes and it has no
> effect on the mgmt app.

Adding an intermediate ID doesn't seem to be address the problem
at all: you would still need to tell QEMU which
socket/core/thread combination correspond to which ID, and the
set of valid socket/core/thread IDs is defined by the
machine-type.

> 
> > now to newer interface
> >   "-numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=1,socket-id=1"
> > libvirt would had to know that socket-id and values 0-1 are valid,
> > now moving to spapr
> >   "-numa cpu,node-id=0,core-id=0 -numa cpu,node-id=1,core-id=8"
> > here valid values are not so obvious, core-id values are function
> > of "-smp"
> > 
> > this series was written so that mgmt won't have to duplicate logic
> > to match the same logic in qemu as libvirt didn't want to maintain
> > it, I'd assume because it's fragile. If libvirt would make up valid
> > properties/values on it's own we can forget about this series.
> 
> From libvirt POV we all we want to say is have N sockets, each with M
> cores, each with O threads. That is architecture agnostic and what I
> was trying to illustrate with my earlier proposed CLI syntax.

The set of valid socket/core/thread IDs accepted by QEMU is
currently machine-dependent.  libvirt shouldn't expect them to be
architecture agnostic.

Defining architecture agnostic rules for them to avoid the need
for query-hotpluggable-cpus would still be a valid proposal, but
it needs to be written down instead of being just an implicit
assumption from the libvirt side.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-18 15:30         ` Daniel P. Berrange
@ 2017-10-18 20:22           ` Eduardo Habkost
  2017-10-19 11:49             ` David Gibson
  2017-10-19 15:21           ` Igor Mammedov
  1 sibling, 1 reply; 93+ messages in thread
From: Eduardo Habkost @ 2017-10-18 20:22 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Igor Mammedov, peter.maydell, pkrempa, cohuck, qemu-devel,
	armbru, pbonzini, david

On Wed, Oct 18, 2017 at 04:30:10PM +0100, Daniel P. Berrange wrote:
> On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> > On Tue, 17 Oct 2017 16:07:59 +0100
> > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > 
> > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > >   
> > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:  
> > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > NUMA mapping for cpus.    
> > > > > 
> > > > > What's the problem we're seeking solve here compared to what we currently
> > > > > do for NUMA configuration ?  
> > > > From RHBZ1382425
> > > > "
> > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > assume/re-implement cpu_index allocation logic to provide valid 
> > > > values for -numa cpus=... QEMU CLI option.  
> > > 
> > > In broad terms, this problem applies to every device / object libvirt
> > > asks QEMU to create. For everything else libvirt is able to assign a
> > > "id" string, which is can then use to identify the thing later. The
> > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > libvirt has to infer. The latter is the same problem we had with
> > > devices before '-device' was introduced allowing 'id' naming.
> > > 
> > > IMHO we should take the same approach with CPUs and start modelling 
> > > the individual CPUs as something we can explicitly create with -object
> > > or -device. That way libvirt can assign names and does not have to 
> > > care about CPU index values, and it all works just the same way as
> > > any other devices / object we create
> > > 
> > > ie instead of:
> > > 
> > >   -smp 8,sockets=4,cores=2,threads=1
> > >   -numa node,nodeid=0,cpus=0-3
> > >   -numa node,nodeid=1,cpus=4-7
> > > 
> > > we could do:
> > > 
> > >   -object numa-node,id=numa0
> > >   -object numa-node,id=numa1
> > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> > the follow up question would be where do "socket=3,core=1,thread=0"
> > come from, currently these options are the function of
> > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > runtime after qemu parses -M and -smp options.
> 
> NB, I realize my example was open to mis-interpretation. The values I'm
> illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> are a plain enumeration of values. ie this is saying the 4th socket, the
> 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> with a core-id of 8, or 7038 or whatever architecture specific numbering
> scheme makes sense, but that's not what the mgmt app gives at the CLI
> level

I believe we have been trying to avoid index numbers to identify
entities as a reaction to the bad experience we had with the
cpu_index/apic_id mess in the past.

An interface using arch-independent socket/core/thread indexes
(not arch-dependent IDs) like you propose in the paragraph above
could be a solution, as long as it is documented very clearly
(and we include automated testing for those constraints).  But
note that this is _not_ how the socket/core/thread IDs on the
"-device *-cpu" and -numa command-line options work today.

Also, this might solve the problem for CPU socket/core/thread
identification, but might not be enough for the messy device
address assignment rules that libvirt needs to duplicate in
src/qemu/qemu_domain_address.c today.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC v2 3/6] possible_cpus: add CPUArchId::type field
  2017-10-18 11:12   ` [Qemu-devel] [RFC v2 " Igor Mammedov
@ 2017-10-19  6:31     ` David Gibson
  2017-10-31 14:01       ` Igor Mammedov
  0 siblings, 1 reply; 93+ messages in thread
From: David Gibson @ 2017-10-19  6:31 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, peter.maydell, pkrempa, ehabkost, pbonzini, drjones, cohuck

[-- Attachment #1: Type: text/plain, Size: 3032 bytes --]

On Wed, Oct 18, 2017 at 01:12:12PM +0200, Igor Mammedov wrote:
> For enabling early cpu to numa node configuration at runtime
> qmp_query_hotpluggable_cpus() should provide a list of available
> cpu slots at early stage, before machine_init() is called and
> the 1st cpu is created, so that mgmt might be able to call it
> and use output to set numa mapping.
> Use MachineClass::possible_cpu_arch_ids() callback to set
> cpu type info, along with the rest of possible cpu properties,
> to let machine define which cpu type* will be used.
> 
> * for SPAPR it will be a spapr core type and for ARM/s390x/x86
>   a respective descendant of CPUClass.
> 
> Move parse_numa_opts() in vl.c after cpu_model is parsed into
> cpu_type so that possible_cpu_arch_ids() would know which
> cpu_type to use during layout initialization.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>   v2:
>      - fix NULL dereference caused by not initialized
>        MachineState::cpu_type at the time parse_numa_opts()
>        were called
> ---
>  include/hw/boards.h        |  2 ++
>  hw/arm/virt.c              |  3 ++-
>  hw/core/machine.c          | 12 ++++++------
>  hw/i386/pc.c               |  4 +++-
>  hw/ppc/spapr.c             | 13 ++++++++-----
>  hw/s390x/s390-virtio-ccw.c |  1 +
>  vl.c                       |  3 +--
>  7 files changed, 23 insertions(+), 15 deletions(-)
> 
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 191a5b3..fa21758 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -80,6 +80,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
>   * CPUArchId:
>   * @arch_id - architecture-dependent CPU ID of present or possible CPU

I know this isn't really in scope for this patch, but is @arch_id here
supposed to have meaning defined by the target, or by the machine?

If it's the machime, it could do with a rename - "arch" means target
to most people (thanks to Linux).

If it's the target, it's kind of bogus, because it doesn't necessarily
have a clear meaning per target - get_arch_id in CPUClass has the same
problem, which is probably one reason it's basically only used by the
x86 code at present.

e.g. for target/ppc, what do we use?  There's the PIR, which is in the
CPU.. but only on some cpu models, not all.  There will generally be
some kind of master PIC id, but there are different PIC models on
different boards.  What goes in the devicetree?  Well only some
machines use devicetree, and they might define the cpu reg 
differently.

Board designs will generally try to make some if not all of those
possible values equal for simplicity, but there's still no real way of
defining a sensible arch_id independent of machine / board.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-16 16:59   ` Eduardo Habkost
  2017-10-16 17:01     ` Paolo Bonzini
  2017-10-17  9:10     ` Igor Mammedov
@ 2017-10-19 10:42     ` David Gibson
  2017-10-20  0:15       ` Eduardo Habkost
  2 siblings, 1 reply; 93+ messages in thread
From: David Gibson @ 2017-10-19 10:42 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Igor Mammedov, qemu-devel, eblake, armbru, pkrempa,
	peter.maydell, pbonzini, cohuck

[-- Attachment #1: Type: text/plain, Size: 8787 bytes --]

On Mon, Oct 16, 2017 at 02:59:16PM -0200, Eduardo Habkost wrote:
> On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  include/sysemu/sysemu.h |  1 +
> >  qemu-options.hx         | 15 ++++++++++++++
> >  qmp.c                   |  5 +++++
> >  vl.c                    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++-
> >  4 files changed, 74 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > index b213696..3feb94f 100644
> > --- a/include/sysemu/sysemu.h
> > +++ b/include/sysemu/sysemu.h
> > @@ -66,6 +66,7 @@ typedef enum WakeupReason {
> >      QEMU_WAKEUP_REASON_OTHER,
> >  } WakeupReason;
> >  
> > +void qemu_exit_preconfig_request(void);
> >  void qemu_system_reset_request(ShutdownCause reason);
> >  void qemu_system_suspend_request(void);
> >  void qemu_register_suspend_notifier(Notifier *notifier);
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 39225ae..bd44db8 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -3498,6 +3498,21 @@ STEXI
> >  Run the emulation in single step mode.
> >  ETEXI
> >  
> > +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> > +    "-paused [state=]postconf|preconf\n"
> > +    "                postconf: pause QEMU after machine is initialized\n"
> > +    "                preconf: pause QEMU before machine is initialized\n",
> > +    QEMU_ARCH_ALL)
> 
> I would like to allow pausing before machine-type is selected, so
> management could run query-machines before choosing a
> machine-type.  Would that need a third "-pause" mode, or will we
> be able to change "preconf" to pause before select_machine() is
> called?
> 
> The same probably applies to other things initialized before
> machine_run_board_init() that could be configurable using QMP,
> including but not limited to:
> * Accelerator configuration
> * Registering global properties
> * RAM size
> * SMP/CPU configuration

Yeah.. having a bunch of different possible pause stages to select
doesn't sound great.  Could we avoid this by instead changing -S to
pause at the earliest possible spot, but having any monitor commands
that require a later stage automatically "fast forwarding" to the
right phase?

> 
> 
> > +STEXI
> > +@item -paused
> > +@findex -paused
> > +if set enabled interactive configuration stages before machine emulation starts.
> > +'postconf' option value mimics -S option behaviour where machine is created
> > +but emulation isn't started. 'preconf' option value pauses QEMU before machine
> > +is created, which allows to query and configure properties affecting machine
> > +initialization. Use monitor/QMP command 'cont' to go to exit paused state.
> 
> What if "-S" is used at the same time"?  Will "cont" only
> initialize the machine and wait for another "cont" command to
> start the VCPUs, or will it unpause everything?
> 
> 
> > +ETEXI
> > +
> >  DEF("S", 0, QEMU_OPTION_S, \
> >      "-S              freeze CPU at startup (use 'c' to start execution)\n",
> >      QEMU_ARCH_ALL)
> > diff --git a/qmp.c b/qmp.c
> > index e8c3031..49e9a5c 100644
> > --- a/qmp.c
> > +++ b/qmp.c
> > @@ -167,6 +167,11 @@ void qmp_cont(Error **errp)
> >      BlockBackend *blk;
> >      Error *local_err = NULL;
> >  
> > +    if (runstate_check(RUN_STATE_PRELAUNCH)) {
> > +        qemu_exit_preconfig_request();
> > +        return;
> > +    }
> > +
> >      /* if there is a dump in background, we should wait until the dump
> >       * finished */
> >      if (dump_in_progress()) {
> > diff --git a/vl.c b/vl.c
> > index 3fed457..30631fd 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -555,6 +555,20 @@ static QemuOptsList qemu_fw_cfg_opts = {
> >      },
> >  };
> >  
> > +static QemuOptsList qemu_paused_opts = {
> > +    .name = "paused",
> > +    .implied_opt_name = "state",
> > +    .head = QTAILQ_HEAD_INITIALIZER(qemu_paused_opts.head),
> > +    .desc = {
> > +        {
> > +            .name = "state",
> > +            .type = QEMU_OPT_STRING,
> > +            .help = "Pause state of QEMU on startup",
> > +        },
> > +        { /* end of list */ }
> > +    },
> > +};
> > +
> >  /**
> >   * Get machine options
> >   *
> > @@ -1689,6 +1703,11 @@ static pid_t shutdown_pid;
> >  static int powerdown_requested;
> >  static int debug_requested;
> >  static int suspend_requested;
> > +static enum {
> > +    PRECONFIG_CONT = 0,
> > +    PRECONFIG_PAUSE,
> > +    PRECONFIG_SKIP,
> > +} preconfig_requested;
> >  static WakeupReason wakeup_reason;
> >  static NotifierList powerdown_notifiers =
> >      NOTIFIER_LIST_INITIALIZER(powerdown_notifiers);
> > @@ -1773,6 +1792,11 @@ static int qemu_debug_requested(void)
> >      return r;
> >  }
> >  
> > +void qemu_exit_preconfig_request(void)
> > +{
> > +    preconfig_requested = PRECONFIG_CONT;
> > +}
> > +
> >  /*
> >   * Reset the VM. Issue an event unless @reason is SHUTDOWN_CAUSE_NONE.
> >   */
> > @@ -1939,6 +1963,12 @@ static bool main_loop_should_exit(void)
> >      RunState r;
> >      ShutdownCause request;
> >  
> > +    if (runstate_check(RUN_STATE_PRELAUNCH)) {
> > +        if (preconfig_requested == PRECONFIG_CONT) {
> > +            preconfig_requested = PRECONFIG_SKIP;
> > +            return true;
> > +        }
> > +    }
> >      if (qemu_debug_requested()) {
> >          vm_stop(RUN_STATE_DEBUG);
> >      }
> > @@ -3177,6 +3207,7 @@ int main(int argc, char **argv, char **envp)
> >      qemu_add_opts(&qemu_icount_opts);
> >      qemu_add_opts(&qemu_semihosting_config_opts);
> >      qemu_add_opts(&qemu_fw_cfg_opts);
> > +    qemu_add_opts(&qemu_paused_opts);
> >      module_call_init(MODULE_INIT_OPTS);
> >  
> >      runstate_init();
> > @@ -3845,6 +3876,26 @@ int main(int argc, char **argv, char **envp)
> >                      exit(1);
> >                  }
> >                  break;
> > +            case QEMU_OPTION_paused:
> > +                {
> > +                    const char *value;
> > +
> > +                    opts = qemu_opts_parse_noisily(qemu_find_opts("paused"),
> > +                                                   optarg, true);
> > +                    if (opts == NULL) {
> > +                        exit(1);
> > +                    }
> > +                    value = qemu_opt_get(opts, "state");
> > +                    if (!strcmp(value, "postconf")) {
> > +                        autostart = 0;
> > +                    } else if (!strcmp(value, "preconf")) {
> > +                        preconfig_requested = PRECONFIG_PAUSE;
> > +                    } else {
> > +                        error_report("incomplete '-paused' option\n");
> > +                        exit(1);
> > +                    }
> > +                    break;
> > +                }
> >              case QEMU_OPTION_enable_kvm:
> >                  olist = qemu_find_opts("machine");
> >                  qemu_opts_parse_noisily(olist, "accel=kvm", false);
> > @@ -4731,7 +4782,6 @@ int main(int argc, char **argv, char **envp)
> >      current_machine->boot_order = boot_order;
> >      current_machine->cpu_model = cpu_model;
> >  
> > -
> >      /* parse features once if machine provides default cpu_type */
> >      if (machine_class->default_cpu_type) {
> >          current_machine->cpu_type = machine_class->default_cpu_type;
> > @@ -4741,6 +4791,8 @@ int main(int argc, char **argv, char **envp)
> >          }
> >      }
> >  
> > +    main_loop(); /* do monitor/qmp handling at preconfig state if requested */
> > +
> 
> I'm impressed by the simplicity of the implementation.  I though
> this would involve moving everything between this line and the
> next main_loop() call outside main(), so they would be called by
> qmp_cont().
> 
> Any expert on GLib's Event Loop sees any gotcha in this method?
> 
> I would like to do a careful review of main_loop_wait() and
> main_loop_should_exit(), to ensure those functions don't depend
> on anything that's initialized after this line.  Probably a few
> existing QMP commands can crash if machine is not initialized
> yet?
> 
> The rules and expectations on initialization ordering are very
> subtle, I suggest including test code for the new feature to
> ensure nothing crashes or breaks in the future.
> 
> 
> >      machine_run_board_init(current_machine);
> >  
> >      realtime_init();
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-18 20:22           ` Eduardo Habkost
@ 2017-10-19 11:49             ` David Gibson
  2017-10-19 12:23               ` Paolo Bonzini
  0 siblings, 1 reply; 93+ messages in thread
From: David Gibson @ 2017-10-19 11:49 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Daniel P. Berrange, Igor Mammedov, peter.maydell, pkrempa,
	cohuck, qemu-devel, armbru, pbonzini

[-- Attachment #1: Type: text/plain, Size: 5770 bytes --]

On Wed, Oct 18, 2017 at 06:22:40PM -0200, Eduardo Habkost wrote:
> On Wed, Oct 18, 2017 at 04:30:10PM +0100, Daniel P. Berrange wrote:
> > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > 
> > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > >   
> > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:  
> > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > interface. For that to happen it introduces a new '-paused' CLI option
> > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > NUMA mapping for cpus.    
> > > > > > 
> > > > > > What's the problem we're seeking solve here compared to what we currently
> > > > > > do for NUMA configuration ?  
> > > > > From RHBZ1382425
> > > > > "
> > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which 
> > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > assume/re-implement cpu_index allocation logic to provide valid 
> > > > > values for -numa cpus=... QEMU CLI option.  
> > > > 
> > > > In broad terms, this problem applies to every device / object libvirt
> > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > "id" string, which is can then use to identify the thing later. The
> > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > libvirt has to infer. The latter is the same problem we had with
> > > > devices before '-device' was introduced allowing 'id' naming.
> > > > 
> > > > IMHO we should take the same approach with CPUs and start modelling 
> > > > the individual CPUs as something we can explicitly create with -object
> > > > or -device. That way libvirt can assign names and does not have to 
> > > > care about CPU index values, and it all works just the same way as
> > > > any other devices / object we create
> > > > 
> > > > ie instead of:
> > > > 
> > > >   -smp 8,sockets=4,cores=2,threads=1
> > > >   -numa node,nodeid=0,cpus=0-3
> > > >   -numa node,nodeid=1,cpus=4-7
> > > > 
> > > > we could do:
> > > > 
> > > >   -object numa-node,id=numa0
> > > >   -object numa-node,id=numa1
> > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > come from, currently these options are the function of
> > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > runtime after qemu parses -M and -smp options.
> > 
> > NB, I realize my example was open to mis-interpretation. The values I'm
> > illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> > are a plain enumeration of values. ie this is saying the 4th socket, the
> > 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> > with a core-id of 8, or 7038 or whatever architecture specific numbering
> > scheme makes sense, but that's not what the mgmt app gives at the CLI
> > level
> 
> I believe we have been trying to avoid index numbers to identify
> entities as a reaction to the bad experience we had with the
> cpu_index/apic_id mess in the past.
> 
> An interface using arch-independent socket/core/thread indexes
> (not arch-dependent IDs) like you propose in the paragraph above
> could be a solution, as long as it is documented very clearly
> (and we include automated testing for those constraints).  But
> note that this is _not_ how the socket/core/thread IDs on the
> "-device *-cpu" and -numa command-line options work today.
> 
> Also, this might solve the problem for CPU socket/core/thread
> identification, but might not be enough for the messy device
> address assignment rules that libvirt needs to duplicate in
> src/qemu/qemu_domain_address.c today.

Note that describing socket/core/thread tuples as arch independent (or
even machine independent) is.. debatable.  I mean it's flexible enough
that most platforms can be fit to that scheme without too much
straining.  But, there's no arch independent way of defining what each
level means in terms of its properties.

So, for example, on spapr - being paravirt - there's no real
distinction between cores and sockets, how you divide them up is
completely arbitrary.  I don't think we have any implemented, but it's
easy to imagine modelling a big server type machine with more than 3
natural layers of heirarchy (say, thread, core, chip,
multi-chip-module, big-honkin-drawer-of-processors, ...).

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-19 11:49             ` David Gibson
@ 2017-10-19 12:23               ` Paolo Bonzini
  2017-10-20  1:21                 ` David Gibson
  0 siblings, 1 reply; 93+ messages in thread
From: Paolo Bonzini @ 2017-10-19 12:23 UTC (permalink / raw)
  To: David Gibson, Eduardo Habkost
  Cc: Daniel P. Berrange, Igor Mammedov, peter.maydell, pkrempa,
	cohuck, qemu-devel, armbru

[-- Attachment #1: Type: text/plain, Size: 845 bytes --]

On 19/10/2017 13:49, David Gibson wrote:
> Note that describing socket/core/thread tuples as arch independent (or
> even machine independent) is.. debatable.  I mean it's flexible enough
> that most platforms can be fit to that scheme without too much
> straining.  But, there's no arch independent way of defining what each
> level means in terms of its properties.
> 
> So, for example, on spapr - being paravirt - there's no real
> distinction between cores and sockets, how you divide them up is
> completely arbitrary.

Same on x86, actually.

It's _common_ that cores on the same socket share L3 cache and that a
socket spans an integer number of NUMA nodes, but it doesn't have to be
that way.

QEMU currently enforces the former (if it tells the guest at all that
there is an L3 cache), but not the latter.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-18 15:30         ` Daniel P. Berrange
  2017-10-18 20:22           ` Eduardo Habkost
@ 2017-10-19 15:21           ` Igor Mammedov
  2017-10-19 15:28             ` Daniel P. Berrange
  1 sibling, 1 reply; 93+ messages in thread
From: Igor Mammedov @ 2017-10-19 15:21 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: peter maydell, pkrempa, ehabkost, cohuck, qemu-devel, armbru,
	pbonzini, david

----- Original Message -----
> From: "Daniel P. Berrange" <berrange@redhat.com>
> To: "Igor Mammedov" <imammedo@redhat.com>
> Cc: "peter maydell" <peter.maydell@linaro.org>, pkrempa@redhat.com, ehabkost@redhat.com, cohuck@redhat.com,
> qemu-devel@nongnu.org, armbru@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au
> Sent: Wednesday, October 18, 2017 5:30:10 PM
> Subject: Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
> 
> On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> > On Tue, 17 Oct 2017 16:07:59 +0100
> > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > 
> > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > >   
> > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
> > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > interface. For that to happen it introduces a new '-paused' CLI
> > > > > > option
> > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > NUMA mapping for cpus.
> > > > > 
> > > > > What's the problem we're seeking solve here compared to what we
> > > > > currently
> > > > > do for NUMA configuration ?
> > > > From RHBZ1382425
> > > > "
> > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > CPUs to NUMA nodes as it requires to provide cpu_index values which
> > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > assume/re-implement cpu_index allocation logic to provide valid
> > > > values for -numa cpus=... QEMU CLI option.
> > > 
> > > In broad terms, this problem applies to every device / object libvirt
> > > asks QEMU to create. For everything else libvirt is able to assign a
> > > "id" string, which is can then use to identify the thing later. The
> > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > libvirt has to infer. The latter is the same problem we had with
> > > devices before '-device' was introduced allowing 'id' naming.
> > > 
> > > IMHO we should take the same approach with CPUs and start modelling
> > > the individual CPUs as something we can explicitly create with -object
> > > or -device. That way libvirt can assign names and does not have to
> > > care about CPU index values, and it all works just the same way as
> > > any other devices / object we create
> > > 
> > > ie instead of:
> > > 
> > >   -smp 8,sockets=4,cores=2,threads=1
> > >   -numa node,nodeid=0,cpus=0-3
> > >   -numa node,nodeid=1,cpus=4-7
> > > 
> > > we could do:
> > > 
> > >   -object numa-node,id=numa0
> > >   -object numa-node,id=numa1
> > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> > the follow up question would be where do "socket=3,core=1,thread=0"
> > come from, currently these options are the function of
> > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > runtime after qemu parses -M and -smp options.
> 
> NB, I realize my example was open to mis-interpretation. The values I'm
> illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> are a plain enumeration of values. ie this is saying the 4th socket, the
> 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> with a core-id of 8, or 7038 or whatever architecture specific numbering
> scheme makes sense, but that's not what the mgmt app gives at the CLI
> level
Even though fixed properties/values simplicity is tempting and it might even
work for what we have implemented in qemu currently (well, SPAPR will need
refactoring (if possible) to meet requirements + compat stuff for current
machines with sparse IDs).
But I have to disagree here and try to oppose it.

QEMU models concrete platforms/hw with certain non abstract properties
and it's libvirt's domain to translate platform specific devices into
'spherical' devices with abstract properties.

Now back to cpus and suggestion to fix the set of 'address' properties
and their values into continuous enumeration range [0..N). That would
  1. put a burden of hiding platform/device details on QEMU
      (which is already bad as QEMU's job is to emulate it)
  2. with abstract 'address' properties and values, user won't have
     a clue as to where device is being attached (as qemu would magically
     remap that to fit specific machine needs)
  2.1. if abstract 'address' properties and values we can do away with
     socket/core/thread/whatnot since they won't mean the same when considered
     from platform point of view, so we can just drop all these nonsense
     and go back to cpu-index that has all the properties you've suggested
     /abstract, [0..N]/.
  3. we currently stopped with socket|core|thread-id properties as they are
     applicable to machines that support -device cpu, but it's up to machine
     to pick witch of these to use (x86: uses all, spar: uses core-id only),
     but current property set is open for extension if need arises without
     need to redefine interface. So fixed list of properties [even ignoring
     values impact] doesn't scale.

We even have cpu-add command which takes cpu-index as argument and
-numa node,cpus=0..X CLI option, good luck with figuring out which cpu goes
where and if it makes any sense from platform point of view.

That's why when designing hot plug for 'device_add cpu' interface, we ended up
with new query-hotpluggble-cpus QMP command, which is currently used by libvirt
for hot-plug:

Approach allows 
   1: machine to publish properties/values that make sense from emulated
      platform point of view but still understandable by user of given hw.
   2: user may use them as opaque mandatory properties to create cpu device if
      he/she doesn't care about where it's plugged.
   3: if user cares about which cpu goes where, properties defined by machine
      provide that info from emulated hw point of view including platform specific
      details.
   4: it's easy to extend set of properties/values if need arises without
      breaking users (provided user will put them all in -device/device_add
      options as it's supposed to)

But current approach has drawback, to call query-hotpluggble-cpus, machine has to
be started first, which is fine for hot plug but not for specifying CLI options.

Currently that could be solved by starting qemu twice when 'defining domain',
where on the first run mgmt queries board layout and caches it for all the next
times the defined machine is started (change in machine/version/-smp/-cpu will
invalidate, cache).

This series allows to avoid this 1st time restart, when creating domain for
the first time, mgmt can query layout and then specify numa mapping without
restarting, it can cache defined mapping as commands exactly match corresponding
CLI options and reuse cached options on the next domain starts.

This approach could be extended further with "device_add cpu" command
so it would be possible to start qemu with -smp 0,... and allow mgmt to
create cpus with explicit IDs controlled by mgmt, and again mgmt may cache
these commands and reuse them on CLI next time machine is started.

I think Eduardo's work on query-slots is superset of query-hotpluggble-cpus,
but working to the same goal to allow mgmt discover which hw is provided by
specific machine and where/which hw could be plugged (like which slot supports
which kind of device and which 'address' should be used to attach device
(socket|core... - for cpus, bus/function - for pic, ...)
 
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange
> |:|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com
> |:|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange
> |:|
> 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-19 15:21           ` Igor Mammedov
@ 2017-10-19 15:28             ` Daniel P. Berrange
  2017-10-19 19:56               ` Eduardo Habkost
  0 siblings, 1 reply; 93+ messages in thread
From: Daniel P. Berrange @ 2017-10-19 15:28 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: peter maydell, pkrempa, ehabkost, cohuck, qemu-devel, armbru,
	pbonzini, david

On Thu, Oct 19, 2017 at 11:21:22AM -0400, Igor Mammedov wrote:
> ----- Original Message -----
> > From: "Daniel P. Berrange" <berrange@redhat.com>
> > To: "Igor Mammedov" <imammedo@redhat.com>
> > Cc: "peter maydell" <peter.maydell@linaro.org>, pkrempa@redhat.com, ehabkost@redhat.com, cohuck@redhat.com,
> > qemu-devel@nongnu.org, armbru@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au
> > Sent: Wednesday, October 18, 2017 5:30:10 PM
> > Subject: Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
> > 
> > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > 
> > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > >   
> > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
> > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > interface. For that to happen it introduces a new '-paused' CLI
> > > > > > > option
> > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > NUMA mapping for cpus.
> > > > > > 
> > > > > > What's the problem we're seeking solve here compared to what we
> > > > > > currently
> > > > > > do for NUMA configuration ?
> > > > > From RHBZ1382425
> > > > > "
> > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which
> > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > assume/re-implement cpu_index allocation logic to provide valid
> > > > > values for -numa cpus=... QEMU CLI option.
> > > > 
> > > > In broad terms, this problem applies to every device / object libvirt
> > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > "id" string, which is can then use to identify the thing later. The
> > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > libvirt has to infer. The latter is the same problem we had with
> > > > devices before '-device' was introduced allowing 'id' naming.
> > > > 
> > > > IMHO we should take the same approach with CPUs and start modelling
> > > > the individual CPUs as something we can explicitly create with -object
> > > > or -device. That way libvirt can assign names and does not have to
> > > > care about CPU index values, and it all works just the same way as
> > > > any other devices / object we create
> > > > 
> > > > ie instead of:
> > > > 
> > > >   -smp 8,sockets=4,cores=2,threads=1
> > > >   -numa node,nodeid=0,cpus=0-3
> > > >   -numa node,nodeid=1,cpus=4-7
> > > > 
> > > > we could do:
> > > > 
> > > >   -object numa-node,id=numa0
> > > >   -object numa-node,id=numa1
> > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > come from, currently these options are the function of
> > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > runtime after qemu parses -M and -smp options.
> > 
> > NB, I realize my example was open to mis-interpretation. The values I'm
> > illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> > are a plain enumeration of values. ie this is saying the 4th socket, the
> > 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> > with a core-id of 8, or 7038 or whatever architecture specific numbering
> > scheme makes sense, but that's not what the mgmt app gives at the CLI
> > level
> Even though fixed properties/values simplicity is tempting and it might even
> work for what we have implemented in qemu currently (well, SPAPR will need
> refactoring (if possible) to meet requirements + compat stuff for current
> machines with sparse IDs).
> But I have to disagree here and try to oppose it.
> 
> QEMU models concrete platforms/hw with certain non abstract properties
> and it's libvirt's domain to translate platform specific devices into
> 'spherical' devices with abstract properties.
> 
> Now back to cpus and suggestion to fix the set of 'address' properties
> and their values into continuous enumeration range [0..N). That would
>   1. put a burden of hiding platform/device details on QEMU
>       (which is already bad as QEMU's job is to emulate it)
>   2. with abstract 'address' properties and values, user won't have
>      a clue as to where device is being attached (as qemu would magically
>      remap that to fit specific machine needs)
>   2.1. if abstract 'address' properties and values we can do away with
>      socket/core/thread/whatnot since they won't mean the same when considered
>      from platform point of view, so we can just drop all these nonsense
>      and go back to cpu-index that has all the properties you've suggested
>      /abstract, [0..N]/.
>   3. we currently stopped with socket|core|thread-id properties as they are
>      applicable to machines that support -device cpu, but it's up to machine
>      to pick witch of these to use (x86: uses all, spar: uses core-id only),
>      but current property set is open for extension if need arises without
>      need to redefine interface. So fixed list of properties [even ignoring
>      values impact] doesn't scale.

Note from the libvirt POV, we don't expose socket-id/core-id/thread-id in our
guest XML, we just provide an overall count of sockets/cores/threads which is
portable. The only arch specific thing we would have todo is express constraints
about ratios of these - eg indicate in some way that ppc doesn't allow mutliple
threads per core for example.

> We even have cpu-add command which takes cpu-index as argument and
> -numa node,cpus=0..X CLI option, good luck with figuring out which cpu goes
> where and if it makes any sense from platform point of view.
> 
> That's why when designing hot plug for 'device_add cpu' interface, we ended up
> with new query-hotpluggble-cpus QMP command, which is currently used by libvirt
> for hot-plug:
> 
> Approach allows 
>    1: machine to publish properties/values that make sense from emulated
>       platform point of view but still understandable by user of given hw.
>    2: user may use them as opaque mandatory properties to create cpu device if
>       he/she doesn't care about where it's plugged.
>    3: if user cares about which cpu goes where, properties defined by machine
>       provide that info from emulated hw point of view including platform specific
>       details.
>    4: it's easy to extend set of properties/values if need arises without
>       breaking users (provided user will put them all in -device/device_add
>       options as it's supposed to)
> 
> But current approach has drawback, to call query-hotpluggble-cpus, machine has to
> be started first, which is fine for hot plug but not for specifying CLI options.
> 
> Currently that could be solved by starting qemu twice when 'defining domain',
> where on the first run mgmt queries board layout and caches it for all the next
> times the defined machine is started (change in machine/version/-smp/-cpu will
> invalidate, cache).
> 
> This series allows to avoid this 1st time restart, when creating domain for
> the first time, mgmt can query layout and then specify numa mapping without
> restarting, it can cache defined mapping as commands exactly match corresponding
> CLI options and reuse cached options on the next domain starts.
> 
> This approach could be extended further with "device_add cpu" command
> so it would be possible to start qemu with -smp 0,... and allow mgmt to
> create cpus with explicit IDs controlled by mgmt, and again mgmt may cache
> these commands and reuse them on CLI next time machine is started
> 
> I think Eduardo's work on query-slots is superset of query-hotpluggble-cpus,
> but working to the same goal to allow mgmt discover which hw is provided by
> specific machine and where/which hw could be plugged (like which slot supports
> which kind of device and which 'address' should be used to attach device
> (socket|core... - for cpus, bus/function - for pic, ...)

As mentioned elsewhere in the thread, the approach of defining the VM config
incrementally via the monitor has significant downsides, by making the config
invisible in any logs of the ARGV, and has likely performance impact when
starting up QEMU, particularly if it is used for more things going forward. To
me these downsides are enough to make the suggested approach for CPUs impractical
for libvirt to use.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-19 15:28             ` Daniel P. Berrange
@ 2017-10-19 19:56               ` Eduardo Habkost
  2017-10-20  9:07                 ` Daniel P. Berrange
  0 siblings, 1 reply; 93+ messages in thread
From: Eduardo Habkost @ 2017-10-19 19:56 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Igor Mammedov, peter maydell, pkrempa, cohuck, qemu-devel,
	armbru, pbonzini, david, Laine Stump, libvir-list

On Thu, Oct 19, 2017 at 04:28:59PM +0100, Daniel P. Berrange wrote:
> On Thu, Oct 19, 2017 at 11:21:22AM -0400, Igor Mammedov wrote:
> > ----- Original Message -----
> > > From: "Daniel P. Berrange" <berrange@redhat.com>
> > > To: "Igor Mammedov" <imammedo@redhat.com>
> > > Cc: "peter maydell" <peter.maydell@linaro.org>, pkrempa@redhat.com, ehabkost@redhat.com, cohuck@redhat.com,
> > > qemu-devel@nongnu.org, armbru@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au
> > > Sent: Wednesday, October 18, 2017 5:30:10 PM
> > > Subject: Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
> > > 
> > > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> > > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > 
> > > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > >   
> > > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
> > > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > > interface. For that to happen it introduces a new '-paused' CLI
> > > > > > > > option
> > > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > > NUMA mapping for cpus.
> > > > > > > 
> > > > > > > What's the problem we're seeking solve here compared to what we
> > > > > > > currently
> > > > > > > do for NUMA configuration ?
> > > > > > From RHBZ1382425
> > > > > > "
> > > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which
> > > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > > assume/re-implement cpu_index allocation logic to provide valid
> > > > > > values for -numa cpus=... QEMU CLI option.
> > > > > 
> > > > > In broad terms, this problem applies to every device / object libvirt
> > > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > > "id" string, which is can then use to identify the thing later. The
> > > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > > libvirt has to infer. The latter is the same problem we had with
> > > > > devices before '-device' was introduced allowing 'id' naming.
> > > > > 
> > > > > IMHO we should take the same approach with CPUs and start modelling
> > > > > the individual CPUs as something we can explicitly create with -object
> > > > > or -device. That way libvirt can assign names and does not have to
> > > > > care about CPU index values, and it all works just the same way as
> > > > > any other devices / object we create
> > > > > 
> > > > > ie instead of:
> > > > > 
> > > > >   -smp 8,sockets=4,cores=2,threads=1
> > > > >   -numa node,nodeid=0,cpus=0-3
> > > > >   -numa node,nodeid=1,cpus=4-7
> > > > > 
> > > > > we could do:
> > > > > 
> > > > >   -object numa-node,id=numa0
> > > > >   -object numa-node,id=numa1
> > > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> > > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > > come from, currently these options are the function of
> > > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > > runtime after qemu parses -M and -smp options.
> > > 
> > > NB, I realize my example was open to mis-interpretation. The values I'm
> > > illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> > > are a plain enumeration of values. ie this is saying the 4th socket, the
> > > 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> > > with a core-id of 8, or 7038 or whatever architecture specific numbering
> > > scheme makes sense, but that's not what the mgmt app gives at the CLI
> > > level
> > Even though fixed properties/values simplicity is tempting and it might even
> > work for what we have implemented in qemu currently (well, SPAPR will need
> > refactoring (if possible) to meet requirements + compat stuff for current
> > machines with sparse IDs).
> > But I have to disagree here and try to oppose it.
> > 
> > QEMU models concrete platforms/hw with certain non abstract properties
> > and it's libvirt's domain to translate platform specific devices into
> > 'spherical' devices with abstract properties.
> > 
> > Now back to cpus and suggestion to fix the set of 'address' properties
> > and their values into continuous enumeration range [0..N). That would
> >   1. put a burden of hiding platform/device details on QEMU
> >       (which is already bad as QEMU's job is to emulate it)
> >   2. with abstract 'address' properties and values, user won't have
> >      a clue as to where device is being attached (as qemu would magically
> >      remap that to fit specific machine needs)
> >   2.1. if abstract 'address' properties and values we can do away with
> >      socket/core/thread/whatnot since they won't mean the same when considered
> >      from platform point of view, so we can just drop all these nonsense
> >      and go back to cpu-index that has all the properties you've suggested
> >      /abstract, [0..N]/.
> >   3. we currently stopped with socket|core|thread-id properties as they are
> >      applicable to machines that support -device cpu, but it's up to machine
> >      to pick witch of these to use (x86: uses all, spar: uses core-id only),
> >      but current property set is open for extension if need arises without
> >      need to redefine interface. So fixed list of properties [even ignoring
> >      values impact] doesn't scale.
> 
> Note from the libvirt POV, we don't expose socket-id/core-id/thread-id in our
> guest XML, we just provide an overall count of sockets/cores/threads which is
> portable. The only arch specific thing we would have todo is express constraints
> about ratios of these - eg indicate in some way that ppc doesn't allow mutliple
> threads per core for example.
> 
> > We even have cpu-add command which takes cpu-index as argument and
> > -numa node,cpus=0..X CLI option, good luck with figuring out which cpu goes
> > where and if it makes any sense from platform point of view.
> > 
> > That's why when designing hot plug for 'device_add cpu' interface, we ended up
> > with new query-hotpluggble-cpus QMP command, which is currently used by libvirt
> > for hot-plug:
> > 
> > Approach allows 
> >    1: machine to publish properties/values that make sense from emulated
> >       platform point of view but still understandable by user of given hw.
> >    2: user may use them as opaque mandatory properties to create cpu device if
> >       he/she doesn't care about where it's plugged.
> >    3: if user cares about which cpu goes where, properties defined by machine
> >       provide that info from emulated hw point of view including platform specific
> >       details.
> >    4: it's easy to extend set of properties/values if need arises without
> >       breaking users (provided user will put them all in -device/device_add
> >       options as it's supposed to)
> > 
> > But current approach has drawback, to call query-hotpluggble-cpus, machine has to
> > be started first, which is fine for hot plug but not for specifying CLI options.
> > 
> > Currently that could be solved by starting qemu twice when 'defining domain',
> > where on the first run mgmt queries board layout and caches it for all the next
> > times the defined machine is started (change in machine/version/-smp/-cpu will
> > invalidate, cache).
> > 
> > This series allows to avoid this 1st time restart, when creating domain for
> > the first time, mgmt can query layout and then specify numa mapping without
> > restarting, it can cache defined mapping as commands exactly match corresponding
> > CLI options and reuse cached options on the next domain starts.
> > 
> > This approach could be extended further with "device_add cpu" command
> > so it would be possible to start qemu with -smp 0,... and allow mgmt to
> > create cpus with explicit IDs controlled by mgmt, and again mgmt may cache
> > these commands and reuse them on CLI next time machine is started
> > 
> > I think Eduardo's work on query-slots is superset of query-hotpluggble-cpus,
> > but working to the same goal to allow mgmt discover which hw is provided by
> > specific machine and where/which hw could be plugged (like which slot supports
> > which kind of device and which 'address' should be used to attach device
> > (socket|core... - for cpus, bus/function - for pic, ...)
> 
> As mentioned elsewhere in the thread, the approach of defining the VM config
> incrementally via the monitor has significant downsides, by making the config
> invisible in any logs of the ARGV, and has likely performance impact when
> starting up QEMU, particularly if it is used for more things going forward. To
> me these downsides are enough to make the suggested approach for CPUs impractical
> for libvirt to use.

Those downsides do exist, but we should weight them against the
downsides of not allowing any information at all to flow from
QEMU to libvirt when starting a VM.

I believe the code in libvirt/src/qemu/qemu_domain_address.c is
a good illustration of those downsides.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-19 10:42     ` David Gibson
@ 2017-10-20  0:15       ` Eduardo Habkost
  2017-10-20  1:19         ` David Gibson
  2017-10-23  9:30         ` Alex Bennée
  0 siblings, 2 replies; 93+ messages in thread
From: Eduardo Habkost @ 2017-10-20  0:15 UTC (permalink / raw)
  To: David Gibson
  Cc: Igor Mammedov, qemu-devel, eblake, armbru, pkrempa,
	peter.maydell, pbonzini, cohuck

On Thu, Oct 19, 2017 at 09:42:18PM +1100, David Gibson wrote:
> On Mon, Oct 16, 2017 at 02:59:16PM -0200, Eduardo Habkost wrote:
> > On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > ---
> > >  include/sysemu/sysemu.h |  1 +
> > >  qemu-options.hx         | 15 ++++++++++++++
> > >  qmp.c                   |  5 +++++
> > >  vl.c                    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++-
> > >  4 files changed, 74 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > > index b213696..3feb94f 100644
> > > --- a/include/sysemu/sysemu.h
> > > +++ b/include/sysemu/sysemu.h
> > > @@ -66,6 +66,7 @@ typedef enum WakeupReason {
> > >      QEMU_WAKEUP_REASON_OTHER,
> > >  } WakeupReason;
> > >  
> > > +void qemu_exit_preconfig_request(void);
> > >  void qemu_system_reset_request(ShutdownCause reason);
> > >  void qemu_system_suspend_request(void);
> > >  void qemu_register_suspend_notifier(Notifier *notifier);
> > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > index 39225ae..bd44db8 100644
> > > --- a/qemu-options.hx
> > > +++ b/qemu-options.hx
> > > @@ -3498,6 +3498,21 @@ STEXI
> > >  Run the emulation in single step mode.
> > >  ETEXI
> > >  
> > > +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> > > +    "-paused [state=]postconf|preconf\n"
> > > +    "                postconf: pause QEMU after machine is initialized\n"
> > > +    "                preconf: pause QEMU before machine is initialized\n",
> > > +    QEMU_ARCH_ALL)
> > 
> > I would like to allow pausing before machine-type is selected, so
> > management could run query-machines before choosing a
> > machine-type.  Would that need a third "-pause" mode, or will we
> > be able to change "preconf" to pause before select_machine() is
> > called?
> > 
> > The same probably applies to other things initialized before
> > machine_run_board_init() that could be configurable using QMP,
> > including but not limited to:
> > * Accelerator configuration
> > * Registering global properties
> > * RAM size
> > * SMP/CPU configuration
> 
> Yeah.. having a bunch of different possible pause stages to select
> doesn't sound great.

I agree.  The number of externally visible pause states should be
as small as possible.


>                       Could we avoid this by instead changing -S to
> pause at the earliest possible spot, but having any monitor commands
> that require a later stage automatically "fast forwarding" to the
> right phase?

That would hide the internal details from the outside.  Sounds
nice, but adding new machine/device configuration QMP commands
while hiding the QEMU state from the outside sounds impossible.

For example, if we use -S today, this works:

  $ qemu-system-x86_64 -S -qmp stdio
  <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}
  -> {"execute":"qmp_capabilities"}
  <- {"return": {}}
  -> {"execute":"query-cpus"}
  <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}

This means "query-cpus" needs to fast-forward to the CPU creation
stage if we want to keep compatibility.

Now, assume we add a set-numa-node command like the one in this
series.  e.g.:

  $ qemu-system-x86_64 -S -qmp stdio
  <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}
  -> {"execute":"qmp_capabilities"}
  <- {"return": {}}
  -> {"execute":"set-numa-node" ... }
  <- {"return": ...}

The command will work only if machine initialization didn't run
yet.

But now an innocent-looking query command would change QEMU state
in an unexpected way:

  $ qemu-system-x86_64 -S -qmp stdio
  <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}
  -> {"execute":"qmp_capabilities"}
  <- {"return": {}}
  -> {"execute":"query-cpus"}  [will silently fast-forward QEMU state]
  <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}
  -> {"execute":"set-numa-node" ... }
  <- {"error": ...}  [the command will fail because the machine was already created]

This means we do have a externally visible "too late to use
set-numa-node" QEMU state, and query-cpus will have a externally
visible side effect.  Every QMP command would need to document
how it affects QEMU state in a externally visible way.

If QEMU pause state is still going to be externally visible this
way, I would prefer to let the client to explicitly tell what's
the state they want QEMU to be, instead of making QEMU change
state silently as a side effect of QMP commands.

> 
[...]


-- 
Eduardo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-20  0:15       ` Eduardo Habkost
@ 2017-10-20  1:19         ` David Gibson
  2017-10-20 14:21           ` Eduardo Habkost
  2017-10-23  9:30         ` Alex Bennée
  1 sibling, 1 reply; 93+ messages in thread
From: David Gibson @ 2017-10-20  1:19 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Igor Mammedov, qemu-devel, eblake, armbru, pkrempa,
	peter.maydell, pbonzini, cohuck

[-- Attachment #1: Type: text/plain, Size: 6714 bytes --]

On Thu, Oct 19, 2017 at 10:15:48PM -0200, Eduardo Habkost wrote:
> On Thu, Oct 19, 2017 at 09:42:18PM +1100, David Gibson wrote:
> > On Mon, Oct 16, 2017 at 02:59:16PM -0200, Eduardo Habkost wrote:
> > > On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:
> > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > ---
> > > >  include/sysemu/sysemu.h |  1 +
> > > >  qemu-options.hx         | 15 ++++++++++++++
> > > >  qmp.c                   |  5 +++++
> > > >  vl.c                    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++-
> > > >  4 files changed, 74 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > > > index b213696..3feb94f 100644
> > > > --- a/include/sysemu/sysemu.h
> > > > +++ b/include/sysemu/sysemu.h
> > > > @@ -66,6 +66,7 @@ typedef enum WakeupReason {
> > > >      QEMU_WAKEUP_REASON_OTHER,
> > > >  } WakeupReason;
> > > >  
> > > > +void qemu_exit_preconfig_request(void);
> > > >  void qemu_system_reset_request(ShutdownCause reason);
> > > >  void qemu_system_suspend_request(void);
> > > >  void qemu_register_suspend_notifier(Notifier *notifier);
> > > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > > index 39225ae..bd44db8 100644
> > > > --- a/qemu-options.hx
> > > > +++ b/qemu-options.hx
> > > > @@ -3498,6 +3498,21 @@ STEXI
> > > >  Run the emulation in single step mode.
> > > >  ETEXI
> > > >  
> > > > +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> > > > +    "-paused [state=]postconf|preconf\n"
> > > > +    "                postconf: pause QEMU after machine is initialized\n"
> > > > +    "                preconf: pause QEMU before machine is initialized\n",
> > > > +    QEMU_ARCH_ALL)
> > > 
> > > I would like to allow pausing before machine-type is selected, so
> > > management could run query-machines before choosing a
> > > machine-type.  Would that need a third "-pause" mode, or will we
> > > be able to change "preconf" to pause before select_machine() is
> > > called?
> > > 
> > > The same probably applies to other things initialized before
> > > machine_run_board_init() that could be configurable using QMP,
> > > including but not limited to:
> > > * Accelerator configuration
> > > * Registering global properties
> > > * RAM size
> > > * SMP/CPU configuration
> > 
> > Yeah.. having a bunch of different possible pause stages to select
> > doesn't sound great.
> 
> I agree.  The number of externally visible pause states should be
> as small as possible.
> 
> 
> >                       Could we avoid this by instead changing -S to
> > pause at the earliest possible spot, but having any monitor commands
> > that require a later stage automatically "fast forwarding" to the
> > right phase?
> 
> That would hide the internal details from the outside.  Sounds
> nice, but adding new machine/device configuration QMP commands
> while hiding the QEMU state from the outside sounds impossible.
> 
> For example, if we use -S today, this works:
> 
>   $ qemu-system-x86_64 -S -qmp stdio
>   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}
>   -> {"execute":"qmp_capabilities"}
>   <- {"return": {}}
>   -> {"execute":"query-cpus"}
>   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}
> 
> This means "query-cpus" needs to fast-forward to the CPU creation
> stage if we want to keep compatibility.
> 
> Now, assume we add a set-numa-node command like the one in this
> series.  e.g.:
> 
>   $ qemu-system-x86_64 -S -qmp stdio
>   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}
>   -> {"execute":"qmp_capabilities"}
>   <- {"return": {}}
>   -> {"execute":"set-numa-node" ... }
>   <- {"return": ...}
> 
> The command will work only if machine initialization didn't run
> yet.
> 
> But now an innocent-looking query command would change QEMU state
> in an unexpected way:
> 
>   $ qemu-system-x86_64 -S -qmp stdio
>   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}
>   -> {"execute":"qmp_capabilities"}
>   <- {"return": {}}
>   -> {"execute":"query-cpus"}  [will silently fast-forward QEMU state]
>   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}
>   -> {"execute":"set-numa-node" ... }
>   <- {"error": ...}  [the command will fail because the machine was already created]
> 
> This means we do have a externally visible "too late to use
> set-numa-node" QEMU state, and query-cpus will have a externally
> visible side effect.  Every QMP command would need to document
> how it affects QEMU state in a externally visible way.
> 
> If QEMU pause state is still going to be externally visible this
> way, I would prefer to let the client to explicitly tell what's
> the state they want QEMU to be, instead of making QEMU change
> state silently as a side effect of QMP commands.

Yeah, good point.  My proposal would just have changed explicitly
exposed ugly internal state to subtly exposed ugly internal state,
which is probably worse :(.


Ok.. next possibly bad idea..

What about a "re-exec" monitor command; it would take what's
essentially a new command line, and basically restart qemu from the
beginning, reparsing this new command line, but without actually 

Pro:
  * Mitigates Daniel Berrange's concern about lots of qemu
    configuration being buried in the qmp session - if libvirt logged
    its last "re-exec" that would have what is generally needed.
  * Lets libvirt do assorted investigation of options, then rewind to
    choose what it actually wants

Con:
  * Would require a bunch of auditing of structures/state to make sure
    they can be re-initialized cleanly
  * Would it be fast enough for libvirt to use?  Do we know if the
    slowness which makes multiple qemu invocations by libvirt
    unattractive is from the kernel/libc/ldso overhead, or from qemu's
    internal start up processing?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-19 12:23               ` Paolo Bonzini
@ 2017-10-20  1:21                 ` David Gibson
  2017-10-20 19:53                   ` Eduardo Habkost
  0 siblings, 1 reply; 93+ messages in thread
From: David Gibson @ 2017-10-20  1:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Eduardo Habkost, Daniel P. Berrange, Igor Mammedov,
	peter.maydell, pkrempa, cohuck, qemu-devel, armbru

[-- Attachment #1: Type: text/plain, Size: 1392 bytes --]

On Thu, Oct 19, 2017 at 02:23:04PM +0200, Paolo Bonzini wrote:
> On 19/10/2017 13:49, David Gibson wrote:
> > Note that describing socket/core/thread tuples as arch independent (or
> > even machine independent) is.. debatable.  I mean it's flexible enough
> > that most platforms can be fit to that scheme without too much
> > straining.  But, there's no arch independent way of defining what each
> > level means in terms of its properties.
> > 
> > So, for example, on spapr - being paravirt - there's no real
> > distinction between cores and sockets, how you divide them up is
> > completely arbitrary.
> 
> Same on x86, actually.
> 
> It's _common_ that cores on the same socket share L3 cache and that a
> socket spans an integer number of NUMA nodes, but it doesn't have to be
> that way.
> 
> QEMU currently enforces the former (if it tells the guest at all that
> there is an L3 cache), but not the latter.

Ok.  Correct me if I'm wrong, but doesn't ACPI describe the NUMA
architecture in terms of this thread/core/socket heirarchy?  That's
not true for PAPR, where the NUMA topology is described in an
independent set of (potentially arbitrarily nested) nodes.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-19 19:56               ` Eduardo Habkost
@ 2017-10-20  9:07                 ` Daniel P. Berrange
  2017-10-20 20:07                   ` Eduardo Habkost
  2017-10-23 10:04                   ` Igor Mammedov
  0 siblings, 2 replies; 93+ messages in thread
From: Daniel P. Berrange @ 2017-10-20  9:07 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Igor Mammedov, peter maydell, pkrempa, cohuck, qemu-devel,
	armbru, pbonzini, david, Laine Stump, libvir-list

On Thu, Oct 19, 2017 at 05:56:49PM -0200, Eduardo Habkost wrote:
> On Thu, Oct 19, 2017 at 04:28:59PM +0100, Daniel P. Berrange wrote:
> > On Thu, Oct 19, 2017 at 11:21:22AM -0400, Igor Mammedov wrote:
> > > ----- Original Message -----
> > > > From: "Daniel P. Berrange" <berrange@redhat.com>
> > > > To: "Igor Mammedov" <imammedo@redhat.com>
> > > > Cc: "peter maydell" <peter.maydell@linaro.org>, pkrempa@redhat.com, ehabkost@redhat.com, cohuck@redhat.com,
> > > > qemu-devel@nongnu.org, armbru@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au
> > > > Sent: Wednesday, October 18, 2017 5:30:10 PM
> > > > Subject: Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
> > > > 
> > > > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> > > > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > 
> > > > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > > > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > > >   
> > > > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
> > > > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > > > interface. For that to happen it introduces a new '-paused' CLI
> > > > > > > > > option
> > > > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > > > NUMA mapping for cpus.
> > > > > > > > 
> > > > > > > > What's the problem we're seeking solve here compared to what we
> > > > > > > > currently
> > > > > > > > do for NUMA configuration ?
> > > > > > > From RHBZ1382425
> > > > > > > "
> > > > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which
> > > > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > > > assume/re-implement cpu_index allocation logic to provide valid
> > > > > > > values for -numa cpus=... QEMU CLI option.
> > > > > > 
> > > > > > In broad terms, this problem applies to every device / object libvirt
> > > > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > > > "id" string, which is can then use to identify the thing later. The
> > > > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > > > libvirt has to infer. The latter is the same problem we had with
> > > > > > devices before '-device' was introduced allowing 'id' naming.
> > > > > > 
> > > > > > IMHO we should take the same approach with CPUs and start modelling
> > > > > > the individual CPUs as something we can explicitly create with -object
> > > > > > or -device. That way libvirt can assign names and does not have to
> > > > > > care about CPU index values, and it all works just the same way as
> > > > > > any other devices / object we create
> > > > > > 
> > > > > > ie instead of:
> > > > > > 
> > > > > >   -smp 8,sockets=4,cores=2,threads=1
> > > > > >   -numa node,nodeid=0,cpus=0-3
> > > > > >   -numa node,nodeid=1,cpus=4-7
> > > > > > 
> > > > > > we could do:
> > > > > > 
> > > > > >   -object numa-node,id=numa0
> > > > > >   -object numa-node,id=numa1
> > > > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> > > > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > > > come from, currently these options are the function of
> > > > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > > > runtime after qemu parses -M and -smp options.
> > > > 
> > > > NB, I realize my example was open to mis-interpretation. The values I'm
> > > > illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> > > > are a plain enumeration of values. ie this is saying the 4th socket, the
> > > > 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> > > > with a core-id of 8, or 7038 or whatever architecture specific numbering
> > > > scheme makes sense, but that's not what the mgmt app gives at the CLI
> > > > level
> > > Even though fixed properties/values simplicity is tempting and it might even
> > > work for what we have implemented in qemu currently (well, SPAPR will need
> > > refactoring (if possible) to meet requirements + compat stuff for current
> > > machines with sparse IDs).
> > > But I have to disagree here and try to oppose it.
> > > 
> > > QEMU models concrete platforms/hw with certain non abstract properties
> > > and it's libvirt's domain to translate platform specific devices into
> > > 'spherical' devices with abstract properties.
> > > 
> > > Now back to cpus and suggestion to fix the set of 'address' properties
> > > and their values into continuous enumeration range [0..N). That would
> > >   1. put a burden of hiding platform/device details on QEMU
> > >       (which is already bad as QEMU's job is to emulate it)
> > >   2. with abstract 'address' properties and values, user won't have
> > >      a clue as to where device is being attached (as qemu would magically
> > >      remap that to fit specific machine needs)
> > >   2.1. if abstract 'address' properties and values we can do away with
> > >      socket/core/thread/whatnot since they won't mean the same when considered
> > >      from platform point of view, so we can just drop all these nonsense
> > >      and go back to cpu-index that has all the properties you've suggested
> > >      /abstract, [0..N]/.
> > >   3. we currently stopped with socket|core|thread-id properties as they are
> > >      applicable to machines that support -device cpu, but it's up to machine
> > >      to pick witch of these to use (x86: uses all, spar: uses core-id only),
> > >      but current property set is open for extension if need arises without
> > >      need to redefine interface. So fixed list of properties [even ignoring
> > >      values impact] doesn't scale.
> > 
> > Note from the libvirt POV, we don't expose socket-id/core-id/thread-id in our
> > guest XML, we just provide an overall count of sockets/cores/threads which is
> > portable. The only arch specific thing we would have todo is express constraints
> > about ratios of these - eg indicate in some way that ppc doesn't allow mutliple
> > threads per core for example.
> > 
> > > We even have cpu-add command which takes cpu-index as argument and
> > > -numa node,cpus=0..X CLI option, good luck with figuring out which cpu goes
> > > where and if it makes any sense from platform point of view.
> > > 
> > > That's why when designing hot plug for 'device_add cpu' interface, we ended up
> > > with new query-hotpluggble-cpus QMP command, which is currently used by libvirt
> > > for hot-plug:
> > > 
> > > Approach allows 
> > >    1: machine to publish properties/values that make sense from emulated
> > >       platform point of view but still understandable by user of given hw.
> > >    2: user may use them as opaque mandatory properties to create cpu device if
> > >       he/she doesn't care about where it's plugged.
> > >    3: if user cares about which cpu goes where, properties defined by machine
> > >       provide that info from emulated hw point of view including platform specific
> > >       details.
> > >    4: it's easy to extend set of properties/values if need arises without
> > >       breaking users (provided user will put them all in -device/device_add
> > >       options as it's supposed to)
> > > 
> > > But current approach has drawback, to call query-hotpluggble-cpus, machine has to
> > > be started first, which is fine for hot plug but not for specifying CLI options.
> > > 
> > > Currently that could be solved by starting qemu twice when 'defining domain',
> > > where on the first run mgmt queries board layout and caches it for all the next
> > > times the defined machine is started (change in machine/version/-smp/-cpu will
> > > invalidate, cache).
> > > 
> > > This series allows to avoid this 1st time restart, when creating domain for
> > > the first time, mgmt can query layout and then specify numa mapping without
> > > restarting, it can cache defined mapping as commands exactly match corresponding
> > > CLI options and reuse cached options on the next domain starts.
> > > 
> > > This approach could be extended further with "device_add cpu" command
> > > so it would be possible to start qemu with -smp 0,... and allow mgmt to
> > > create cpus with explicit IDs controlled by mgmt, and again mgmt may cache
> > > these commands and reuse them on CLI next time machine is started
> > > 
> > > I think Eduardo's work on query-slots is superset of query-hotpluggble-cpus,
> > > but working to the same goal to allow mgmt discover which hw is provided by
> > > specific machine and where/which hw could be plugged (like which slot supports
> > > which kind of device and which 'address' should be used to attach device
> > > (socket|core... - for cpus, bus/function - for pic, ...)
> > 
> > As mentioned elsewhere in the thread, the approach of defining the VM config
> > incrementally via the monitor has significant downsides, by making the config
> > invisible in any logs of the ARGV, and has likely performance impact when
> > starting up QEMU, particularly if it is used for more things going forward. To
> > me these downsides are enough to make the suggested approach for CPUs impractical
> > for libvirt to use.
> 
> Those downsides do exist, but we should weight them against the
> downsides of not allowing any information at all to flow from
> QEMU to libvirt when starting a VM.
> 
> I believe the code in libvirt/src/qemu/qemu_domain_address.c is
> a good illustration of those downsides.

Right, but for this NUMA / CPU scenario I don't think we're going to end up
with complexity like this. I still believe we are able to come up with a
way to represent it at the CLI without so much architecture specific
knowledge.

Even if that is not possible though, from libvirt POV the extra complexity
is worth it, if that is what we need to preserve fast startup time. The
time to start a guest is very important to apps like libguestfs and libvirt
sandbox, so going down a direction which is likely to add 100's or even 1000's
of milliseconds to the startup time is not desirable, even if it makes libvirt
simpler

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-20  1:19         ` David Gibson
@ 2017-10-20 14:21           ` Eduardo Habkost
  2017-10-23  9:49             ` Igor Mammedov
  0 siblings, 1 reply; 93+ messages in thread
From: Eduardo Habkost @ 2017-10-20 14:21 UTC (permalink / raw)
  To: David Gibson
  Cc: Igor Mammedov, qemu-devel, eblake, armbru, pkrempa,
	peter.maydell, pbonzini, cohuck

On Fri, Oct 20, 2017 at 12:19:17PM +1100, David Gibson wrote:
> On Thu, Oct 19, 2017 at 10:15:48PM -0200, Eduardo Habkost wrote:
> > On Thu, Oct 19, 2017 at 09:42:18PM +1100, David Gibson wrote:
> > > On Mon, Oct 16, 2017 at 02:59:16PM -0200, Eduardo Habkost wrote:
> > > > On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:
> > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > > ---
> > > > >  include/sysemu/sysemu.h |  1 +
> > > > >  qemu-options.hx         | 15 ++++++++++++++
> > > > >  qmp.c                   |  5 +++++
> > > > >  vl.c                    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++-
> > > > >  4 files changed, 74 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > > > > index b213696..3feb94f 100644
> > > > > --- a/include/sysemu/sysemu.h
> > > > > +++ b/include/sysemu/sysemu.h
> > > > > @@ -66,6 +66,7 @@ typedef enum WakeupReason {
> > > > >      QEMU_WAKEUP_REASON_OTHER,
> > > > >  } WakeupReason;
> > > > >  
> > > > > +void qemu_exit_preconfig_request(void);
> > > > >  void qemu_system_reset_request(ShutdownCause reason);
> > > > >  void qemu_system_suspend_request(void);
> > > > >  void qemu_register_suspend_notifier(Notifier *notifier);
> > > > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > > > index 39225ae..bd44db8 100644
> > > > > --- a/qemu-options.hx
> > > > > +++ b/qemu-options.hx
> > > > > @@ -3498,6 +3498,21 @@ STEXI
> > > > >  Run the emulation in single step mode.
> > > > >  ETEXI
> > > > >  
> > > > > +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> > > > > +    "-paused [state=]postconf|preconf\n"
> > > > > +    "                postconf: pause QEMU after machine is initialized\n"
> > > > > +    "                preconf: pause QEMU before machine is initialized\n",
> > > > > +    QEMU_ARCH_ALL)
> > > > 
> > > > I would like to allow pausing before machine-type is selected, so
> > > > management could run query-machines before choosing a
> > > > machine-type.  Would that need a third "-pause" mode, or will we
> > > > be able to change "preconf" to pause before select_machine() is
> > > > called?
> > > > 
> > > > The same probably applies to other things initialized before
> > > > machine_run_board_init() that could be configurable using QMP,
> > > > including but not limited to:
> > > > * Accelerator configuration
> > > > * Registering global properties
> > > > * RAM size
> > > > * SMP/CPU configuration
> > > 
> > > Yeah.. having a bunch of different possible pause stages to select
> > > doesn't sound great.
> > 
> > I agree.  The number of externally visible pause states should be
> > as small as possible.
> > 
> > 
> > >                       Could we avoid this by instead changing -S to
> > > pause at the earliest possible spot, but having any monitor commands
> > > that require a later stage automatically "fast forwarding" to the
> > > right phase?
> > 
> > That would hide the internal details from the outside.  Sounds
> > nice, but adding new machine/device configuration QMP commands
> > while hiding the QEMU state from the outside sounds impossible.
> > 
> > For example, if we use -S today, this works:
> > 
> >   $ qemu-system-x86_64 -S -qmp stdio
> >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}
> >   -> {"execute":"qmp_capabilities"}
> >   <- {"return": {}}
> >   -> {"execute":"query-cpus"}
> >   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}
> > 
> > This means "query-cpus" needs to fast-forward to the CPU creation
> > stage if we want to keep compatibility.
> > 
> > Now, assume we add a set-numa-node command like the one in this
> > series.  e.g.:
> > 
> >   $ qemu-system-x86_64 -S -qmp stdio
> >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}
> >   -> {"execute":"qmp_capabilities"}
> >   <- {"return": {}}
> >   -> {"execute":"set-numa-node" ... }
> >   <- {"return": ...}
> > 
> > The command will work only if machine initialization didn't run
> > yet.
> > 
> > But now an innocent-looking query command would change QEMU state
> > in an unexpected way:
> > 
> >   $ qemu-system-x86_64 -S -qmp stdio
> >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}
> >   -> {"execute":"qmp_capabilities"}
> >   <- {"return": {}}
> >   -> {"execute":"query-cpus"}  [will silently fast-forward QEMU state]
> >   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}
> >   -> {"execute":"set-numa-node" ... }
> >   <- {"error": ...}  [the command will fail because the machine was already created]
> > 
> > This means we do have a externally visible "too late to use
> > set-numa-node" QEMU state, and query-cpus will have a externally
> > visible side effect.  Every QMP command would need to document
> > how it affects QEMU state in a externally visible way.
> > 
> > If QEMU pause state is still going to be externally visible this
> > way, I would prefer to let the client to explicitly tell what's
> > the state they want QEMU to be, instead of making QEMU change
> > state silently as a side effect of QMP commands.
> 
> Yeah, good point.  My proposal would just have changed explicitly
> exposed ugly internal state to subtly exposed ugly internal state,
> which is probably worse :(.
> 
> 
> Ok.. next possibly bad idea..
> 
> What about a "re-exec" monitor command; it would take what's
> essentially a new command line, and basically restart qemu from the
> beginning, reparsing this new command line, but without actually 
> 
> Pro:
>   * Mitigates Daniel Berrange's concern about lots of qemu
>     configuration being buried in the qmp session - if libvirt logged
>     its last "re-exec" that would have what is generally needed.
>   * Lets libvirt do assorted investigation of options, then rewind to
>     choose what it actually wants

Sounds like a superset of Paolo's "-machine none" proposal[1].
It would be a very simple interface, not sure it can be easily
implemented efficiently.

[1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg488618.html

> 
> Con:
>   * Would require a bunch of auditing of structures/state to make sure
>     they can be re-initialized cleanly

This sounds like a big obstacle.  QEMU still have too much global
state outside the machine/qdev tree.


>   * Would it be fast enough for libvirt to use?  Do we know if the
>     slowness which makes multiple qemu invocations by libvirt
>     unattractive is from the kernel/libc/ldso overhead, or from qemu's
>     internal start up processing?

My gut feeling is that this could be too slow, if the scope of
"re-exec" is too big.


Now, let me try to go to the opposite extreme: I think you had a
good point in your previous proposal.  Why should we need to
restart/re-execute anything at all just because some bit of
configuration is being changed by libvirt?  Why commands like
set-numa-node should require QEMU to be in a state that is not
covered by -S?  If the guest is not running yet, there should be
no reason to require clients to explicitly pause/continue/restart
anything.

(Translating this to my example above: why exactly have I assumed
above that keeping "query-cpus" working would necessarily make
set-numa-node stop working?)

-- 
Eduardo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-16 16:35   ` Daniel P. Berrange
  2017-10-17  8:17     ` Igor Mammedov
@ 2017-10-20 15:38     ` Eduardo Habkost
  1 sibling, 0 replies; 93+ messages in thread
From: Eduardo Habkost @ 2017-10-20 15:38 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Igor Mammedov, qemu-devel, peter.maydell, pkrempa, cohuck,
	armbru, pbonzini, david

On Mon, Oct 16, 2017 at 05:35:15PM +0100, Daniel P. Berrange wrote:
> On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:
> 
> This really needs to have a commit message that provides justification
> for why this option is needed when we already have -S that is used
> to allow configuration before the guest starts.
> 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  include/sysemu/sysemu.h |  1 +
> >  qemu-options.hx         | 15 ++++++++++++++
> >  qmp.c                   |  5 +++++
> >  vl.c                    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++-
> >  4 files changed, 74 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > index b213696..3feb94f 100644
> > --- a/include/sysemu/sysemu.h
> > +++ b/include/sysemu/sysemu.h
> > @@ -66,6 +66,7 @@ typedef enum WakeupReason {
> >      QEMU_WAKEUP_REASON_OTHER,
> >  } WakeupReason;
> >  
> > +void qemu_exit_preconfig_request(void);
> >  void qemu_system_reset_request(ShutdownCause reason);
> >  void qemu_system_suspend_request(void);
> >  void qemu_register_suspend_notifier(Notifier *notifier);
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 39225ae..bd44db8 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -3498,6 +3498,21 @@ STEXI
> >  Run the emulation in single step mode.
> >  ETEXI
> >  
> > +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> > +    "-paused [state=]postconf|preconf\n"
> > +    "                postconf: pause QEMU after machine is initialized\n"
> > +    "                preconf: pause QEMU before machine is initialized\n",
> > +    QEMU_ARCH_ALL)
> > +STEXI
> > +@item -paused
> > +@findex -paused
> > +if set enabled interactive configuration stages before machine emulation starts.
> > +'postconf' option value mimics -S option behaviour where machine is created
> > +but emulation isn't started. 'preconf' option value pauses QEMU before machine
> > +is created, which allows to query and configure properties affecting machine
> > +initialization. Use monitor/QMP command 'cont' to go to exit paused state.
> > +ETEXI
> 
> To me it feels rather unpleasant to be exposing this kind of detailed knowledge
> about the steps QEMU goes through when consttructing the machine and expecting
> the mgmt application to synchronize certain monitor actions against this.

After discussing some ideas with David in this thread, I think
you have a really good point here: I don't see a reason why
set-numa-node should require anything except -S, except for the
way our machine initialization code work.  In other words, why
should we generate the NUMA tables at
machine_run_board_init()-time and not at vm_start()-time?

-- 
Eduardo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-20  1:21                 ` David Gibson
@ 2017-10-20 19:53                   ` Eduardo Habkost
  2017-10-23  8:17                     ` Igor Mammedov
  2017-10-23  8:45                     ` Igor Mammedov
  0 siblings, 2 replies; 93+ messages in thread
From: Eduardo Habkost @ 2017-10-20 19:53 UTC (permalink / raw)
  To: David Gibson
  Cc: Paolo Bonzini, Daniel P. Berrange, Igor Mammedov, peter.maydell,
	pkrempa, cohuck, qemu-devel, armbru

On Fri, Oct 20, 2017 at 12:21:30PM +1100, David Gibson wrote:
> On Thu, Oct 19, 2017 at 02:23:04PM +0200, Paolo Bonzini wrote:
> > On 19/10/2017 13:49, David Gibson wrote:
> > > Note that describing socket/core/thread tuples as arch independent (or
> > > even machine independent) is.. debatable.  I mean it's flexible enough
> > > that most platforms can be fit to that scheme without too much
> > > straining.  But, there's no arch independent way of defining what each
> > > level means in terms of its properties.
> > > 
> > > So, for example, on spapr - being paravirt - there's no real
> > > distinction between cores and sockets, how you divide them up is
> > > completely arbitrary.
> > 
> > Same on x86, actually.
> > 
> > It's _common_ that cores on the same socket share L3 cache and that a
> > socket spans an integer number of NUMA nodes, but it doesn't have to be
> > that way.
> > 
> > QEMU currently enforces the former (if it tells the guest at all that
> > there is an L3 cache), but not the latter.
> 
> Ok.  Correct me if I'm wrong, but doesn't ACPI describe the NUMA
> architecture in terms of this thread/core/socket heirarchy?  That's
> not true for PAPR, where the NUMA topology is described in an
> independent set of (potentially arbitrarily nested) nodes.

On PC, ACPI NUMA information only refer to CPU APIC IDs, which
identify individual CPU threads; it doesn't care about CPU
socket/core/thread topology.  If I'm not mistaken, the
socket/core/thread topology is not represented in ACPI at all.

Some guest OSes, however, may get very confused if they see an
unexpected NUMA/CPU topology.  IIRC, it was possible to make old
Linux kernel versions panic by generating a weird topology.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-20  9:07                 ` Daniel P. Berrange
@ 2017-10-20 20:07                   ` Eduardo Habkost
  2017-10-23  8:53                     ` Igor Mammedov
  2017-10-23 10:04                   ` Igor Mammedov
  1 sibling, 1 reply; 93+ messages in thread
From: Eduardo Habkost @ 2017-10-20 20:07 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Igor Mammedov, peter maydell, pkrempa, cohuck, qemu-devel,
	armbru, pbonzini, david, Laine Stump, libvir-list

On Fri, Oct 20, 2017 at 10:07:27AM +0100, Daniel P. Berrange wrote:
> On Thu, Oct 19, 2017 at 05:56:49PM -0200, Eduardo Habkost wrote:
> > On Thu, Oct 19, 2017 at 04:28:59PM +0100, Daniel P. Berrange wrote:
> > > On Thu, Oct 19, 2017 at 11:21:22AM -0400, Igor Mammedov wrote:
> > > > ----- Original Message -----
> > > > > From: "Daniel P. Berrange" <berrange@redhat.com>
> > > > > To: "Igor Mammedov" <imammedo@redhat.com>
> > > > > Cc: "peter maydell" <peter.maydell@linaro.org>, pkrempa@redhat.com, ehabkost@redhat.com, cohuck@redhat.com,
> > > > > qemu-devel@nongnu.org, armbru@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au
> > > > > Sent: Wednesday, October 18, 2017 5:30:10 PM
> > > > > Subject: Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
> > > > > 
> > > > > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:
> > > > > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > > 
> > > > > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:
> > > > > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > > > >   
> > > > > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:
> > > > > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > > > > interface. For that to happen it introduces a new '-paused' CLI
> > > > > > > > > > option
> > > > > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > > > > NUMA mapping for cpus.
> > > > > > > > > 
> > > > > > > > > What's the problem we're seeking solve here compared to what we
> > > > > > > > > currently
> > > > > > > > > do for NUMA configuration ?
> > > > > > > > From RHBZ1382425
> > > > > > > > "
> > > > > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which
> > > > > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > > > > assume/re-implement cpu_index allocation logic to provide valid
> > > > > > > > values for -numa cpus=... QEMU CLI option.
> > > > > > > 
> > > > > > > In broad terms, this problem applies to every device / object libvirt
> > > > > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > > > > "id" string, which is can then use to identify the thing later. The
> > > > > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > > > > libvirt has to infer. The latter is the same problem we had with
> > > > > > > devices before '-device' was introduced allowing 'id' naming.
> > > > > > > 
> > > > > > > IMHO we should take the same approach with CPUs and start modelling
> > > > > > > the individual CPUs as something we can explicitly create with -object
> > > > > > > or -device. That way libvirt can assign names and does not have to
> > > > > > > care about CPU index values, and it all works just the same way as
> > > > > > > any other devices / object we create
> > > > > > > 
> > > > > > > ie instead of:
> > > > > > > 
> > > > > > >   -smp 8,sockets=4,cores=2,threads=1
> > > > > > >   -numa node,nodeid=0,cpus=0-3
> > > > > > >   -numa node,nodeid=1,cpus=4-7
> > > > > > > 
> > > > > > > we could do:
> > > > > > > 
> > > > > > >   -object numa-node,id=numa0
> > > > > > >   -object numa-node,id=numa1
> > > > > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > > > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > > > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > > > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > > > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > > > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > > > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > > > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0
> > > > > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > > > > come from, currently these options are the function of
> > > > > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > > > > runtime after qemu parses -M and -smp options.
> > > > > 
> > > > > NB, I realize my example was open to mis-interpretation. The values I'm
> > > > > illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> > > > > are a plain enumeration of values. ie this is saying the 4th socket, the
> > > > > 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> > > > > with a core-id of 8, or 7038 or whatever architecture specific numbering
> > > > > scheme makes sense, but that's not what the mgmt app gives at the CLI
> > > > > level
> > > > Even though fixed properties/values simplicity is tempting and it might even
> > > > work for what we have implemented in qemu currently (well, SPAPR will need
> > > > refactoring (if possible) to meet requirements + compat stuff for current
> > > > machines with sparse IDs).
> > > > But I have to disagree here and try to oppose it.
> > > > 
> > > > QEMU models concrete platforms/hw with certain non abstract properties
> > > > and it's libvirt's domain to translate platform specific devices into
> > > > 'spherical' devices with abstract properties.
> > > > 
> > > > Now back to cpus and suggestion to fix the set of 'address' properties
> > > > and their values into continuous enumeration range [0..N). That would
> > > >   1. put a burden of hiding platform/device details on QEMU
> > > >       (which is already bad as QEMU's job is to emulate it)
> > > >   2. with abstract 'address' properties and values, user won't have
> > > >      a clue as to where device is being attached (as qemu would magically
> > > >      remap that to fit specific machine needs)
> > > >   2.1. if abstract 'address' properties and values we can do away with
> > > >      socket/core/thread/whatnot since they won't mean the same when considered
> > > >      from platform point of view, so we can just drop all these nonsense
> > > >      and go back to cpu-index that has all the properties you've suggested
> > > >      /abstract, [0..N]/.
> > > >   3. we currently stopped with socket|core|thread-id properties as they are
> > > >      applicable to machines that support -device cpu, but it's up to machine
> > > >      to pick witch of these to use (x86: uses all, spar: uses core-id only),
> > > >      but current property set is open for extension if need arises without
> > > >      need to redefine interface. So fixed list of properties [even ignoring
> > > >      values impact] doesn't scale.
> > > 
> > > Note from the libvirt POV, we don't expose socket-id/core-id/thread-id in our
> > > guest XML, we just provide an overall count of sockets/cores/threads which is
> > > portable. The only arch specific thing we would have todo is express constraints
> > > about ratios of these - eg indicate in some way that ppc doesn't allow mutliple
> > > threads per core for example.
> > > 
> > > > We even have cpu-add command which takes cpu-index as argument and
> > > > -numa node,cpus=0..X CLI option, good luck with figuring out which cpu goes
> > > > where and if it makes any sense from platform point of view.
> > > > 
> > > > That's why when designing hot plug for 'device_add cpu' interface, we ended up
> > > > with new query-hotpluggble-cpus QMP command, which is currently used by libvirt
> > > > for hot-plug:
> > > > 
> > > > Approach allows 
> > > >    1: machine to publish properties/values that make sense from emulated
> > > >       platform point of view but still understandable by user of given hw.
> > > >    2: user may use them as opaque mandatory properties to create cpu device if
> > > >       he/she doesn't care about where it's plugged.
> > > >    3: if user cares about which cpu goes where, properties defined by machine
> > > >       provide that info from emulated hw point of view including platform specific
> > > >       details.
> > > >    4: it's easy to extend set of properties/values if need arises without
> > > >       breaking users (provided user will put them all in -device/device_add
> > > >       options as it's supposed to)
> > > > 
> > > > But current approach has drawback, to call query-hotpluggble-cpus, machine has to
> > > > be started first, which is fine for hot plug but not for specifying CLI options.
> > > > 
> > > > Currently that could be solved by starting qemu twice when 'defining domain',
> > > > where on the first run mgmt queries board layout and caches it for all the next
> > > > times the defined machine is started (change in machine/version/-smp/-cpu will
> > > > invalidate, cache).
> > > > 
> > > > This series allows to avoid this 1st time restart, when creating domain for
> > > > the first time, mgmt can query layout and then specify numa mapping without
> > > > restarting, it can cache defined mapping as commands exactly match corresponding
> > > > CLI options and reuse cached options on the next domain starts.
> > > > 
> > > > This approach could be extended further with "device_add cpu" command
> > > > so it would be possible to start qemu with -smp 0,... and allow mgmt to
> > > > create cpus with explicit IDs controlled by mgmt, and again mgmt may cache
> > > > these commands and reuse them on CLI next time machine is started
> > > > 
> > > > I think Eduardo's work on query-slots is superset of query-hotpluggble-cpus,
> > > > but working to the same goal to allow mgmt discover which hw is provided by
> > > > specific machine and where/which hw could be plugged (like which slot supports
> > > > which kind of device and which 'address' should be used to attach device
> > > > (socket|core... - for cpus, bus/function - for pic, ...)
> > > 
> > > As mentioned elsewhere in the thread, the approach of defining the VM config
> > > incrementally via the monitor has significant downsides, by making the config
> > > invisible in any logs of the ARGV, and has likely performance impact when
> > > starting up QEMU, particularly if it is used for more things going forward. To
> > > me these downsides are enough to make the suggested approach for CPUs impractical
> > > for libvirt to use.
> > 
> > Those downsides do exist, but we should weight them against the
> > downsides of not allowing any information at all to flow from
> > QEMU to libvirt when starting a VM.
> > 
> > I believe the code in libvirt/src/qemu/qemu_domain_address.c is
> > a good illustration of those downsides.
> 
> Right, but for this NUMA / CPU scenario I don't think we're going to end up
> with complexity like this. I still believe we are able to come up with a
> way to represent it at the CLI without so much architecture specific
> knowledge.

In the case of NUMA/CPU, I'm inclined to agree.


> 
> Even if that is not possible though, from libvirt POV the extra complexity
> is worth it, if that is what we need to preserve fast startup time. The
> time to start a guest is very important to apps like libguestfs and libvirt
> sandbox, so going down a direction which is likely to add 100's or even 1000's
> of milliseconds to the startup time is not desirable, even if it makes libvirt
> simpler

I don't believe this is likely to add 100's or 1000's of
milliseconds to startup time, but I agree we need to keep an eye
on startup time while introducing new interfaces.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-20 19:53                   ` Eduardo Habkost
@ 2017-10-23  8:17                     ` Igor Mammedov
  2017-10-23  8:45                     ` Igor Mammedov
  1 sibling, 0 replies; 93+ messages in thread
From: Igor Mammedov @ 2017-10-23  8:17 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: David Gibson, Paolo Bonzini, Daniel P. Berrange, peter.maydell,
	pkrempa, cohuck, qemu-devel, armbru

On Fri, 20 Oct 2017 17:53:09 -0200
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Fri, Oct 20, 2017 at 12:21:30PM +1100, David Gibson wrote:
> > On Thu, Oct 19, 2017 at 02:23:04PM +0200, Paolo Bonzini wrote:  
> > > On 19/10/2017 13:49, David Gibson wrote:  
> > > > Note that describing socket/core/thread tuples as arch independent (or
> > > > even machine independent) is.. debatable.  I mean it's flexible enough
> > > > that most platforms can be fit to that scheme without too much
> > > > straining.  But, there's no arch independent way of defining what each
> > > > level means in terms of its properties.
> > > > 
> > > > So, for example, on spapr - being paravirt - there's no real
> > > > distinction between cores and sockets, how you divide them up is
> > > > completely arbitrary.  
> > > 
> > > Same on x86, actually.
> > > 
> > > It's _common_ that cores on the same socket share L3 cache and that a
> > > socket spans an integer number of NUMA nodes, but it doesn't have to be
> > > that way.
> > > 
> > > QEMU currently enforces the former (if it tells the guest at all that
> > > there is an L3 cache), but not the latter.  
> > 
> > Ok.  Correct me if I'm wrong, but doesn't ACPI describe the NUMA
> > architecture in terms of this thread/core/socket heirarchy?  That's
> > not true for PAPR, where the NUMA topology is described in an
> > independent set of (potentially arbitrarily nested) nodes.  
> 
> On PC, ACPI NUMA information only refer to CPU APIC IDs, which
> identify individual CPU threads; it doesn't care about CPU
> socket/core/thread topology.  If I'm not mistaken, the
> socket/core/thread topology is not represented in ACPI at all.
> 
> Some guest OSes, however, may get very confused if they see an
> unexpected NUMA/CPU topology.  IIRC, it was possible to make old
> Linux kernel versions panic by generating a weird topology.
It doesn't mean that it's right thing to do random mapping.
Even if it doesn't crash linux guest anymore, it might have
performance implications on running guest. I'd assume it
outweighs a 1 restart configure cost at domain xml creation time,
the rest of the times vm is started it can reuse cached options.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-20 19:53                   ` Eduardo Habkost
  2017-10-23  8:17                     ` Igor Mammedov
@ 2017-10-23  8:45                     ` Igor Mammedov
  2017-10-25  6:57                       ` Eduardo Habkost
  1 sibling, 1 reply; 93+ messages in thread
From: Igor Mammedov @ 2017-10-23  8:45 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: David Gibson, Paolo Bonzini, Daniel P. Berrange, peter.maydell,
	pkrempa, cohuck, qemu-devel, armbru

On Fri, 20 Oct 2017 17:53:09 -0200
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Fri, Oct 20, 2017 at 12:21:30PM +1100, David Gibson wrote:
> > On Thu, Oct 19, 2017 at 02:23:04PM +0200, Paolo Bonzini wrote:  
> > > On 19/10/2017 13:49, David Gibson wrote:  
> > > > Note that describing socket/core/thread tuples as arch independent (or
> > > > even machine independent) is.. debatable.  I mean it's flexible enough
> > > > that most platforms can be fit to that scheme without too much
> > > > straining.  But, there's no arch independent way of defining what each
> > > > level means in terms of its properties.
> > > > 
> > > > So, for example, on spapr - being paravirt - there's no real
> > > > distinction between cores and sockets, how you divide them up is
> > > > completely arbitrary.  
> > > 
> > > Same on x86, actually.
> > > 
> > > It's _common_ that cores on the same socket share L3 cache and that a
> > > socket spans an integer number of NUMA nodes, but it doesn't have to be
> > > that way.
> > > 
> > > QEMU currently enforces the former (if it tells the guest at all that
> > > there is an L3 cache), but not the latter.  
> > 
> > Ok.  Correct me if I'm wrong, but doesn't ACPI describe the NUMA
> > architecture in terms of this thread/core/socket heirarchy?  That's
> > not true for PAPR, where the NUMA topology is described in an
> > independent set of (potentially arbitrarily nested) nodes.  
> 
> On PC, ACPI NUMA information only refer to CPU APIC IDs, which
> identify individual CPU threads; it doesn't care about CPU
> socket/core/thread topology.  If I'm not mistaken, the
> socket/core/thread topology is not represented in ACPI at all.
ACPI does node mapping per logical cpu (thread) in SRAT table,
so virtually we are able to describe insane configurations.
That however doesn't mean that we should go outside of
what real hw does and confuse guest which may have certain
expectations.

Currently for x86 expectations are that cpus are mapped to numa
nodes either by whole cores or whole sockets (AMD and Intel cpus
respectively). In future it might change.


> Some guest OSes, however, may get very confused if they see an
> unexpected NUMA/CPU topology.  IIRC, it was possible to make old
> Linux kernel versions panic by generating a weird topology.

There where bugs that where fixed on QEMU or guest kernel side
when unexpected mapping were present. While we can 'fix' guest
expectation in linux kernel it might be not possible for other
OSes one more reason we shouldn't allow blind assignment by mgmt.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-20 20:07                   ` Eduardo Habkost
@ 2017-10-23  8:53                     ` Igor Mammedov
  0 siblings, 0 replies; 93+ messages in thread
From: Igor Mammedov @ 2017-10-23  8:53 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Daniel P. Berrange, peter maydell, pkrempa, cohuck, qemu-devel,
	armbru, pbonzini, david, Laine Stump, libvir-list

On Fri, 20 Oct 2017 18:07:03 -0200
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Fri, Oct 20, 2017 at 10:07:27AM +0100, Daniel P. Berrange wrote:
> > On Thu, Oct 19, 2017 at 05:56:49PM -0200, Eduardo Habkost wrote:  
> > > On Thu, Oct 19, 2017 at 04:28:59PM +0100, Daniel P. Berrange wrote:  
> > > > On Thu, Oct 19, 2017 at 11:21:22AM -0400, Igor Mammedov wrote:  
> > > > > ----- Original Message -----  
> > > > > > From: "Daniel P. Berrange" <berrange@redhat.com>
> > > > > > To: "Igor Mammedov" <imammedo@redhat.com>
> > > > > > Cc: "peter maydell" <peter.maydell@linaro.org>, pkrempa@redhat.com, ehabkost@redhat.com, cohuck@redhat.com,
> > > > > > qemu-devel@nongnu.org, armbru@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au
> > > > > > Sent: Wednesday, October 18, 2017 5:30:10 PM
> > > > > > Subject: Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
> > > > > > 
> > > > > > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:  
> > > > > > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > > >   
> > > > > > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:  
> > > > > > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > > > > >     
> > > > > > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:  
> > > > > > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > > > > > interface. For that to happen it introduces a new '-paused' CLI
> > > > > > > > > > > option
> > > > > > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > > > > > NUMA mapping for cpus.  
> > > > > > > > > > 
> > > > > > > > > > What's the problem we're seeking solve here compared to what we
> > > > > > > > > > currently
> > > > > > > > > > do for NUMA configuration ?  
> > > > > > > > > From RHBZ1382425
> > > > > > > > > "
> > > > > > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which
> > > > > > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > > > > > assume/re-implement cpu_index allocation logic to provide valid
> > > > > > > > > values for -numa cpus=... QEMU CLI option.  
> > > > > > > > 
> > > > > > > > In broad terms, this problem applies to every device / object libvirt
> > > > > > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > > > > > "id" string, which is can then use to identify the thing later. The
> > > > > > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > > > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > > > > > libvirt has to infer. The latter is the same problem we had with
> > > > > > > > devices before '-device' was introduced allowing 'id' naming.
> > > > > > > > 
> > > > > > > > IMHO we should take the same approach with CPUs and start modelling
> > > > > > > > the individual CPUs as something we can explicitly create with -object
> > > > > > > > or -device. That way libvirt can assign names and does not have to
> > > > > > > > care about CPU index values, and it all works just the same way as
> > > > > > > > any other devices / object we create
> > > > > > > > 
> > > > > > > > ie instead of:
> > > > > > > > 
> > > > > > > >   -smp 8,sockets=4,cores=2,threads=1
> > > > > > > >   -numa node,nodeid=0,cpus=0-3
> > > > > > > >   -numa node,nodeid=1,cpus=4-7
> > > > > > > > 
> > > > > > > > we could do:
> > > > > > > > 
> > > > > > > >   -object numa-node,id=numa0
> > > > > > > >   -object numa-node,id=numa1
> > > > > > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > > > > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > > > > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > > > > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > > > > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > > > > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > > > > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > > > > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0  
> > > > > > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > > > > > come from, currently these options are the function of
> > > > > > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > > > > > runtime after qemu parses -M and -smp options.  
> > > > > > 
> > > > > > NB, I realize my example was open to mis-interpretation. The values I'm
> > > > > > illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> > > > > > are a plain enumeration of values. ie this is saying the 4th socket, the
> > > > > > 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> > > > > > with a core-id of 8, or 7038 or whatever architecture specific numbering
> > > > > > scheme makes sense, but that's not what the mgmt app gives at the CLI
> > > > > > level  
> > > > > Even though fixed properties/values simplicity is tempting and it might even
> > > > > work for what we have implemented in qemu currently (well, SPAPR will need
> > > > > refactoring (if possible) to meet requirements + compat stuff for current
> > > > > machines with sparse IDs).
> > > > > But I have to disagree here and try to oppose it.
> > > > > 
> > > > > QEMU models concrete platforms/hw with certain non abstract properties
> > > > > and it's libvirt's domain to translate platform specific devices into
> > > > > 'spherical' devices with abstract properties.
> > > > > 
> > > > > Now back to cpus and suggestion to fix the set of 'address' properties
> > > > > and their values into continuous enumeration range [0..N). That would
> > > > >   1. put a burden of hiding platform/device details on QEMU
> > > > >       (which is already bad as QEMU's job is to emulate it)
> > > > >   2. with abstract 'address' properties and values, user won't have
> > > > >      a clue as to where device is being attached (as qemu would magically
> > > > >      remap that to fit specific machine needs)
> > > > >   2.1. if abstract 'address' properties and values we can do away with
> > > > >      socket/core/thread/whatnot since they won't mean the same when considered
> > > > >      from platform point of view, so we can just drop all these nonsense
> > > > >      and go back to cpu-index that has all the properties you've suggested
> > > > >      /abstract, [0..N]/.
> > > > >   3. we currently stopped with socket|core|thread-id properties as they are
> > > > >      applicable to machines that support -device cpu, but it's up to machine
> > > > >      to pick witch of these to use (x86: uses all, spar: uses core-id only),
> > > > >      but current property set is open for extension if need arises without
> > > > >      need to redefine interface. So fixed list of properties [even ignoring
> > > > >      values impact] doesn't scale.  
> > > > 
> > > > Note from the libvirt POV, we don't expose socket-id/core-id/thread-id in our
> > > > guest XML, we just provide an overall count of sockets/cores/threads which is
> > > > portable. The only arch specific thing we would have todo is express constraints
> > > > about ratios of these - eg indicate in some way that ppc doesn't allow mutliple
> > > > threads per core for example.
> > > >   
> > > > > We even have cpu-add command which takes cpu-index as argument and
> > > > > -numa node,cpus=0..X CLI option, good luck with figuring out which cpu goes
> > > > > where and if it makes any sense from platform point of view.
> > > > > 
> > > > > That's why when designing hot plug for 'device_add cpu' interface, we ended up
> > > > > with new query-hotpluggble-cpus QMP command, which is currently used by libvirt
> > > > > for hot-plug:
> > > > > 
> > > > > Approach allows 
> > > > >    1: machine to publish properties/values that make sense from emulated
> > > > >       platform point of view but still understandable by user of given hw.
> > > > >    2: user may use them as opaque mandatory properties to create cpu device if
> > > > >       he/she doesn't care about where it's plugged.
> > > > >    3: if user cares about which cpu goes where, properties defined by machine
> > > > >       provide that info from emulated hw point of view including platform specific
> > > > >       details.
> > > > >    4: it's easy to extend set of properties/values if need arises without
> > > > >       breaking users (provided user will put them all in -device/device_add
> > > > >       options as it's supposed to)
> > > > > 
> > > > > But current approach has drawback, to call query-hotpluggble-cpus, machine has to
> > > > > be started first, which is fine for hot plug but not for specifying CLI options.
> > > > > 
> > > > > Currently that could be solved by starting qemu twice when 'defining domain',
> > > > > where on the first run mgmt queries board layout and caches it for all the next
> > > > > times the defined machine is started (change in machine/version/-smp/-cpu will
> > > > > invalidate, cache).
> > > > > 
> > > > > This series allows to avoid this 1st time restart, when creating domain for
> > > > > the first time, mgmt can query layout and then specify numa mapping without
> > > > > restarting, it can cache defined mapping as commands exactly match corresponding
> > > > > CLI options and reuse cached options on the next domain starts.
> > > > > 
> > > > > This approach could be extended further with "device_add cpu" command
> > > > > so it would be possible to start qemu with -smp 0,... and allow mgmt to
> > > > > create cpus with explicit IDs controlled by mgmt, and again mgmt may cache
> > > > > these commands and reuse them on CLI next time machine is started
> > > > > 
> > > > > I think Eduardo's work on query-slots is superset of query-hotpluggble-cpus,
> > > > > but working to the same goal to allow mgmt discover which hw is provided by
> > > > > specific machine and where/which hw could be plugged (like which slot supports
> > > > > which kind of device and which 'address' should be used to attach device
> > > > > (socket|core... - for cpus, bus/function - for pic, ...)  
> > > > 
> > > > As mentioned elsewhere in the thread, the approach of defining the VM config
> > > > incrementally via the monitor has significant downsides, by making the config
> > > > invisible in any logs of the ARGV, and has likely performance impact when
> > > > starting up QEMU, particularly if it is used for more things going forward. To
> > > > me these downsides are enough to make the suggested approach for CPUs impractical
> > > > for libvirt to use.  
> > > 
> > > Those downsides do exist, but we should weight them against the
> > > downsides of not allowing any information at all to flow from
> > > QEMU to libvirt when starting a VM.
> > > 
> > > I believe the code in libvirt/src/qemu/qemu_domain_address.c is
> > > a good illustration of those downsides.  
> > 
> > Right, but for this NUMA / CPU scenario I don't think we're going to end up
> > with complexity like this. I still believe we are able to come up with a
> > way to represent it at the CLI without so much architecture specific
> > knowledge.  
> 
> In the case of NUMA/CPU, I'm inclined to agree.
Perhaps I don't see how it could be made not arch specific since
I've stared at the issue for too long, so ideas on how to it could
be made arch agnostic and still fit arch specific expectations
are welcome.


> > Even if that is not possible though, from libvirt POV the extra complexity
> > is worth it, if that is what we need to preserve fast startup time. The
> > time to start a guest is very important to apps like libguestfs and libvirt
> > sandbox, so going down a direction which is likely to add 100's or even 1000's
> > of milliseconds to the startup time is not desirable, even if it makes libvirt
> > simpler  
> 
> I don't believe this is likely to add 100's or 1000's of
> milliseconds to startup time, but I agree we need to keep an eye
> on startup time while introducing new interfaces.
In case of configuration over network delays might be arbitrary,
but it's one time cost since discovered layout could be cached in
domain config when it's defined for the first time.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-20  0:15       ` Eduardo Habkost
  2017-10-20  1:19         ` David Gibson
@ 2017-10-23  9:30         ` Alex Bennée
  1 sibling, 0 replies; 93+ messages in thread
From: Alex Bennée @ 2017-10-23  9:30 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: David Gibson, peter.maydell, pkrempa, cohuck, qemu-devel, armbru,
	pbonzini, Igor Mammedov


Eduardo Habkost <ehabkost@redhat.com> writes:

> On Thu, Oct 19, 2017 at 09:42:18PM +1100, David Gibson wrote:
>> On Mon, Oct 16, 2017 at 02:59:16PM -0200, Eduardo Habkost wrote:
>> > On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:
>> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>> > > ---
>> > >  include/sysemu/sysemu.h |  1 +
>> > >  qemu-options.hx         | 15 ++++++++++++++
>> > >  qmp.c                   |  5 +++++
>> > >  vl.c                    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++-
>> > >  4 files changed, 74 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
>> > > index b213696..3feb94f 100644
>> > > --- a/include/sysemu/sysemu.h
>> > > +++ b/include/sysemu/sysemu.h
>> > > @@ -66,6 +66,7 @@ typedef enum WakeupReason {
>> > >      QEMU_WAKEUP_REASON_OTHER,
>> > >  } WakeupReason;
>> > >
>> > > +void qemu_exit_preconfig_request(void);
>> > >  void qemu_system_reset_request(ShutdownCause reason);
>> > >  void qemu_system_suspend_request(void);
>> > >  void qemu_register_suspend_notifier(Notifier *notifier);
>> > > diff --git a/qemu-options.hx b/qemu-options.hx
>> > > index 39225ae..bd44db8 100644
>> > > --- a/qemu-options.hx
>> > > +++ b/qemu-options.hx
>> > > @@ -3498,6 +3498,21 @@ STEXI
>> > >  Run the emulation in single step mode.
>> > >  ETEXI
>> > >
>> > > +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
>> > > +    "-paused [state=]postconf|preconf\n"
>> > > +    "                postconf: pause QEMU after machine is initialized\n"
>> > > +    "                preconf: pause QEMU before machine is initialized\n",
>> > > +    QEMU_ARCH_ALL)
>> >
>> > I would like to allow pausing before machine-type is selected, so
>> > management could run query-machines before choosing a
>> > machine-type.  Would that need a third "-pause" mode, or will we
>> > be able to change "preconf" to pause before select_machine() is
>> > called?
>> >
>> > The same probably applies to other things initialized before
>> > machine_run_board_init() that could be configurable using QMP,
>> > including but not limited to:
>> > * Accelerator configuration
>> > * Registering global properties
>> > * RAM size
>> > * SMP/CPU configuration
>>
>> Yeah.. having a bunch of different possible pause stages to select
>> doesn't sound great.
>
> I agree.  The number of externally visible pause states should be
> as small as possible.

--pause isn't overly descriptive either. Maybe something like
--wait-for-dynamic-config which is a mouthful but makes it clearer why
you would want this over -S

>
>
>>                       Could we avoid this by instead changing -S to
>> pause at the earliest possible spot, but having any monitor commands
>> that require a later stage automatically "fast forwarding" to the
>> right phase?
>
> That would hide the internal details from the outside.  Sounds
> nice, but adding new machine/device configuration QMP commands
> while hiding the QEMU state from the outside sounds impossible.
>
> For example, if we use -S today, this works:
>
>   $ qemu-system-x86_64 -S -qmp stdio
>   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}
>   -> {"execute":"qmp_capabilities"}
>   <- {"return": {}}
>   -> {"execute":"query-cpus"}
>   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}
>
> This means "query-cpus" needs to fast-forward to the CPU creation
> stage if we want to keep compatibility.
>
> Now, assume we add a set-numa-node command like the one in this
> series.  e.g.:
>
>   $ qemu-system-x86_64 -S -qmp stdio
>   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}
>   -> {"execute":"qmp_capabilities"}
>   <- {"return": {}}
>   -> {"execute":"set-numa-node" ... }
>   <- {"return": ...}
>
> The command will work only if machine initialization didn't run
> yet.
>
> But now an innocent-looking query command would change QEMU state
> in an unexpected way:
>
>   $ qemu-system-x86_64 -S -qmp stdio
>   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}
>   -> {"execute":"qmp_capabilities"}
>   <- {"return": {}}
>   -> {"execute":"query-cpus"}  [will silently fast-forward QEMU state]
>   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}
>   -> {"execute":"set-numa-node" ... }
>   <- {"error": ...}  [the command will fail because the machine was already created]
>
> This means we do have a externally visible "too late to use
> set-numa-node" QEMU state, and query-cpus will have a externally
> visible side effect.  Every QMP command would need to document
> how it affects QEMU state in a externally visible way.
>
> If QEMU pause state is still going to be externally visible this
> way, I would prefer to let the client to explicitly tell what's
> the state they want QEMU to be, instead of making QEMU change
> state silently as a side effect of QMP commands.
>
>>
> [...]


--
Alex Bennée

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-20 14:21           ` Eduardo Habkost
@ 2017-10-23  9:49             ` Igor Mammedov
  2017-10-23  9:53               ` Daniel P. Berrange
  2017-10-25 10:35               ` Eduardo Habkost
  0 siblings, 2 replies; 93+ messages in thread
From: Igor Mammedov @ 2017-10-23  9:49 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: David Gibson, qemu-devel, eblake, armbru, pkrempa, peter.maydell,
	pbonzini, cohuck

On Fri, 20 Oct 2017 12:21:00 -0200
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Fri, Oct 20, 2017 at 12:19:17PM +1100, David Gibson wrote:
> > On Thu, Oct 19, 2017 at 10:15:48PM -0200, Eduardo Habkost wrote:  
> > > On Thu, Oct 19, 2017 at 09:42:18PM +1100, David Gibson wrote:  
> > > > On Mon, Oct 16, 2017 at 02:59:16PM -0200, Eduardo Habkost wrote:  
> > > > > On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:  
> > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > > > ---
> > > > > >  include/sysemu/sysemu.h |  1 +
> > > > > >  qemu-options.hx         | 15 ++++++++++++++
> > > > > >  qmp.c                   |  5 +++++
> > > > > >  vl.c                    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++-
> > > > > >  4 files changed, 74 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > > > > > index b213696..3feb94f 100644
> > > > > > --- a/include/sysemu/sysemu.h
> > > > > > +++ b/include/sysemu/sysemu.h
> > > > > > @@ -66,6 +66,7 @@ typedef enum WakeupReason {
> > > > > >      QEMU_WAKEUP_REASON_OTHER,
> > > > > >  } WakeupReason;
> > > > > >  
> > > > > > +void qemu_exit_preconfig_request(void);
> > > > > >  void qemu_system_reset_request(ShutdownCause reason);
> > > > > >  void qemu_system_suspend_request(void);
> > > > > >  void qemu_register_suspend_notifier(Notifier *notifier);
> > > > > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > > > > index 39225ae..bd44db8 100644
> > > > > > --- a/qemu-options.hx
> > > > > > +++ b/qemu-options.hx
> > > > > > @@ -3498,6 +3498,21 @@ STEXI
> > > > > >  Run the emulation in single step mode.
> > > > > >  ETEXI
> > > > > >  
> > > > > > +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> > > > > > +    "-paused [state=]postconf|preconf\n"
> > > > > > +    "                postconf: pause QEMU after machine is initialized\n"
> > > > > > +    "                preconf: pause QEMU before machine is initialized\n",
> > > > > > +    QEMU_ARCH_ALL)  
> > > > > 
> > > > > I would like to allow pausing before machine-type is selected, so
> > > > > management could run query-machines before choosing a
> > > > > machine-type.  Would that need a third "-pause" mode, or will we
> > > > > be able to change "preconf" to pause before select_machine() is
> > > > > called?
> > > > > 
> > > > > The same probably applies to other things initialized before
> > > > > machine_run_board_init() that could be configurable using QMP,
> > > > > including but not limited to:
> > > > > * Accelerator configuration
> > > > > * Registering global properties
> > > > > * RAM size
> > > > > * SMP/CPU configuration  
> > > > 
> > > > Yeah.. having a bunch of different possible pause stages to select
> > > > doesn't sound great.  
> > > 
> > > I agree.  The number of externally visible pause states should be
> > > as small as possible.
> > > 
> > >   
> > > >                       Could we avoid this by instead changing -S to
> > > > pause at the earliest possible spot, but having any monitor commands
> > > > that require a later stage automatically "fast forwarding" to the
> > > > right phase?  
> > > 
> > > That would hide the internal details from the outside.  Sounds
> > > nice, but adding new machine/device configuration QMP commands
> > > while hiding the QEMU state from the outside sounds impossible.
> > > 
> > > For example, if we use -S today, this works:
> > > 
> > >   $ qemu-system-x86_64 -S -qmp stdio
> > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}  
> > >   -> {"execute":"qmp_capabilities"}  
> > >   <- {"return": {}}  
> > >   -> {"execute":"query-cpus"}  
> > >   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}
> > > 
> > > This means "query-cpus" needs to fast-forward to the CPU creation
> > > stage if we want to keep compatibility.
> > > 
> > > Now, assume we add a set-numa-node command like the one in this
> > > series.  e.g.:
> > > 
> > >   $ qemu-system-x86_64 -S -qmp stdio
> > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}  
> > >   -> {"execute":"qmp_capabilities"}  
> > >   <- {"return": {}}  
> > >   -> {"execute":"set-numa-node" ... }  
> > >   <- {"return": ...}
> > > 
> > > The command will work only if machine initialization didn't run
> > > yet.
> > > 
> > > But now an innocent-looking query command would change QEMU state
> > > in an unexpected way:
> > > 
> > >   $ qemu-system-x86_64 -S -qmp stdio
> > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}  
> > >   -> {"execute":"qmp_capabilities"}  
> > >   <- {"return": {}}  
> > >   -> {"execute":"query-cpus"}  [will silently fast-forward QEMU state]  
> > >   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}  
> > >   -> {"execute":"set-numa-node" ... }  
> > >   <- {"error": ...}  [the command will fail because the machine was already created]
> > > 
> > > This means we do have a externally visible "too late to use
> > > set-numa-node" QEMU state, and query-cpus will have a externally
> > > visible side effect.  Every QMP command would need to document
> > > how it affects QEMU state in a externally visible way.
> > > 
> > > If QEMU pause state is still going to be externally visible this
> > > way, I would prefer to let the client to explicitly tell what's
> > > the state they want QEMU to be, instead of making QEMU change
> > > state silently as a side effect of QMP commands.  
> > 
> > Yeah, good point.  My proposal would just have changed explicitly
> > exposed ugly internal state to subtly exposed ugly internal state,
> > which is probably worse :(.
> > 
> > 
> > Ok.. next possibly bad idea..
> > 
> > What about a "re-exec" monitor command; it would take what's
> > essentially a new command line, and basically restart qemu from the
> > beginning, reparsing this new command line, but without actually 
> > 
> > Pro:
> >   * Mitigates Daniel Berrange's concern about lots of qemu
> >     configuration being buried in the qmp session - if libvirt logged
> >     its last "re-exec" that would have what is generally needed.
> >   * Lets libvirt do assorted investigation of options, then rewind to
> >     choose what it actually wants  
> 
> Sounds like a superset of Paolo's "-machine none" proposal[1].
> It would be a very simple interface, not sure it can be easily
> implemented efficiently.
> 
> [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg488618.html
> 
> > 
> > Con:
> >   * Would require a bunch of auditing of structures/state to make sure
> >     they can be re-initialized cleanly  
> 
> This sounds like a big obstacle.  QEMU still have too much global
> state outside the machine/qdev tree.
> 
> 
> >   * Would it be fast enough for libvirt to use?  Do we know if the
> >     slowness which makes multiple qemu invocations by libvirt
> >     unattractive is from the kernel/libc/ldso overhead, or from qemu's
> >     internal start up processing?  
> 
> My gut feeling is that this could be too slow, if the scope of
> "re-exec" is too big.
> 
> 
> Now, let me try to go to the opposite extreme: I think you had a
> good point in your previous proposal.  Why should we need to
> restart/re-execute anything at all just because some bit of
> configuration is being changed by libvirt?  Why commands like
> set-numa-node should require QEMU to be in a state that is not
> covered by -S?  If the guest is not running yet, there should be
> no reason to require clients to explicitly pause/continue/restart
> anything.
It's probably doable to do numa config at '-S' time for x86 (arm),
since ACPI tables are regenerated on the first read (legacy fw_cfg
would be a little problematic but probably could be 'fixed' as well)

But I can't say outright if it's doable for other targets,
in general issue here is that '-S' pauses after machine_done is run
and all necessary wiring board requires is finalized by then
and no hooks run after unpause.
If there is a general consensus to go this route, I can invest
some time in making it work (then this series could be dropped)

Even so, postponing set-numa to '-S' won't address Daniel's concern,
i.e. configuration would take several round trips of command to complete
potentially oven slow network. But as it was said libvirt can cache
new CLI options for further reuse.
Whether is slower/faster than starting qemu with '-M foo -smp ...' +
querying layout and then restarting it again with -numa options
would depend on network speed.

> 
> (Translating this to my example above: why exactly have I assumed
> above that keeping "query-cpus" working would necessarily make
> set-numa-node stop working?)
> 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-23  9:49             ` Igor Mammedov
@ 2017-10-23  9:53               ` Daniel P. Berrange
  2017-10-23 10:36                 ` Igor Mammedov
  2017-10-25 10:35               ` Eduardo Habkost
  1 sibling, 1 reply; 93+ messages in thread
From: Daniel P. Berrange @ 2017-10-23  9:53 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Eduardo Habkost, peter.maydell, pkrempa, cohuck, armbru,
	qemu-devel, pbonzini, David Gibson

On Mon, Oct 23, 2017 at 11:49:13AM +0200, Igor Mammedov wrote:
> On Fri, 20 Oct 2017 12:21:00 -0200
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Fri, Oct 20, 2017 at 12:19:17PM +1100, David Gibson wrote:
> > > On Thu, Oct 19, 2017 at 10:15:48PM -0200, Eduardo Habkost wrote:  
> > > > On Thu, Oct 19, 2017 at 09:42:18PM +1100, David Gibson wrote:  
> > > > > On Mon, Oct 16, 2017 at 02:59:16PM -0200, Eduardo Habkost wrote:  
> > > > > > On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:  
> > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > > > > ---
> > > > > > >  include/sysemu/sysemu.h |  1 +
> > > > > > >  qemu-options.hx         | 15 ++++++++++++++
> > > > > > >  qmp.c                   |  5 +++++
> > > > > > >  vl.c                    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++-
> > > > > > >  4 files changed, 74 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > > > > > > index b213696..3feb94f 100644
> > > > > > > --- a/include/sysemu/sysemu.h
> > > > > > > +++ b/include/sysemu/sysemu.h
> > > > > > > @@ -66,6 +66,7 @@ typedef enum WakeupReason {
> > > > > > >      QEMU_WAKEUP_REASON_OTHER,
> > > > > > >  } WakeupReason;
> > > > > > >  
> > > > > > > +void qemu_exit_preconfig_request(void);
> > > > > > >  void qemu_system_reset_request(ShutdownCause reason);
> > > > > > >  void qemu_system_suspend_request(void);
> > > > > > >  void qemu_register_suspend_notifier(Notifier *notifier);
> > > > > > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > > > > > index 39225ae..bd44db8 100644
> > > > > > > --- a/qemu-options.hx
> > > > > > > +++ b/qemu-options.hx
> > > > > > > @@ -3498,6 +3498,21 @@ STEXI
> > > > > > >  Run the emulation in single step mode.
> > > > > > >  ETEXI
> > > > > > >  
> > > > > > > +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> > > > > > > +    "-paused [state=]postconf|preconf\n"
> > > > > > > +    "                postconf: pause QEMU after machine is initialized\n"
> > > > > > > +    "                preconf: pause QEMU before machine is initialized\n",
> > > > > > > +    QEMU_ARCH_ALL)  
> > > > > > 
> > > > > > I would like to allow pausing before machine-type is selected, so
> > > > > > management could run query-machines before choosing a
> > > > > > machine-type.  Would that need a third "-pause" mode, or will we
> > > > > > be able to change "preconf" to pause before select_machine() is
> > > > > > called?
> > > > > > 
> > > > > > The same probably applies to other things initialized before
> > > > > > machine_run_board_init() that could be configurable using QMP,
> > > > > > including but not limited to:
> > > > > > * Accelerator configuration
> > > > > > * Registering global properties
> > > > > > * RAM size
> > > > > > * SMP/CPU configuration  
> > > > > 
> > > > > Yeah.. having a bunch of different possible pause stages to select
> > > > > doesn't sound great.  
> > > > 
> > > > I agree.  The number of externally visible pause states should be
> > > > as small as possible.
> > > > 
> > > >   
> > > > >                       Could we avoid this by instead changing -S to
> > > > > pause at the earliest possible spot, but having any monitor commands
> > > > > that require a later stage automatically "fast forwarding" to the
> > > > > right phase?  
> > > > 
> > > > That would hide the internal details from the outside.  Sounds
> > > > nice, but adding new machine/device configuration QMP commands
> > > > while hiding the QEMU state from the outside sounds impossible.
> > > > 
> > > > For example, if we use -S today, this works:
> > > > 
> > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}  
> > > >   -> {"execute":"qmp_capabilities"}  
> > > >   <- {"return": {}}  
> > > >   -> {"execute":"query-cpus"}  
> > > >   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}
> > > > 
> > > > This means "query-cpus" needs to fast-forward to the CPU creation
> > > > stage if we want to keep compatibility.
> > > > 
> > > > Now, assume we add a set-numa-node command like the one in this
> > > > series.  e.g.:
> > > > 
> > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}  
> > > >   -> {"execute":"qmp_capabilities"}  
> > > >   <- {"return": {}}  
> > > >   -> {"execute":"set-numa-node" ... }  
> > > >   <- {"return": ...}
> > > > 
> > > > The command will work only if machine initialization didn't run
> > > > yet.
> > > > 
> > > > But now an innocent-looking query command would change QEMU state
> > > > in an unexpected way:
> > > > 
> > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}  
> > > >   -> {"execute":"qmp_capabilities"}  
> > > >   <- {"return": {}}  
> > > >   -> {"execute":"query-cpus"}  [will silently fast-forward QEMU state]  
> > > >   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}  
> > > >   -> {"execute":"set-numa-node" ... }  
> > > >   <- {"error": ...}  [the command will fail because the machine was already created]
> > > > 
> > > > This means we do have a externally visible "too late to use
> > > > set-numa-node" QEMU state, and query-cpus will have a externally
> > > > visible side effect.  Every QMP command would need to document
> > > > how it affects QEMU state in a externally visible way.
> > > > 
> > > > If QEMU pause state is still going to be externally visible this
> > > > way, I would prefer to let the client to explicitly tell what's
> > > > the state they want QEMU to be, instead of making QEMU change
> > > > state silently as a side effect of QMP commands.  
> > > 
> > > Yeah, good point.  My proposal would just have changed explicitly
> > > exposed ugly internal state to subtly exposed ugly internal state,
> > > which is probably worse :(.
> > > 
> > > 
> > > Ok.. next possibly bad idea..
> > > 
> > > What about a "re-exec" monitor command; it would take what's
> > > essentially a new command line, and basically restart qemu from the
> > > beginning, reparsing this new command line, but without actually 
> > > 
> > > Pro:
> > >   * Mitigates Daniel Berrange's concern about lots of qemu
> > >     configuration being buried in the qmp session - if libvirt logged
> > >     its last "re-exec" that would have what is generally needed.
> > >   * Lets libvirt do assorted investigation of options, then rewind to
> > >     choose what it actually wants  
> > 
> > Sounds like a superset of Paolo's "-machine none" proposal[1].
> > It would be a very simple interface, not sure it can be easily
> > implemented efficiently.
> > 
> > [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg488618.html
> > 
> > > 
> > > Con:
> > >   * Would require a bunch of auditing of structures/state to make sure
> > >     they can be re-initialized cleanly  
> > 
> > This sounds like a big obstacle.  QEMU still have too much global
> > state outside the machine/qdev tree.
> > 
> > 
> > >   * Would it be fast enough for libvirt to use?  Do we know if the
> > >     slowness which makes multiple qemu invocations by libvirt
> > >     unattractive is from the kernel/libc/ldso overhead, or from qemu's
> > >     internal start up processing?  
> > 
> > My gut feeling is that this could be too slow, if the scope of
> > "re-exec" is too big.
> > 
> > 
> > Now, let me try to go to the opposite extreme: I think you had a
> > good point in your previous proposal.  Why should we need to
> > restart/re-execute anything at all just because some bit of
> > configuration is being changed by libvirt?  Why commands like
> > set-numa-node should require QEMU to be in a state that is not
> > covered by -S?  If the guest is not running yet, there should be
> > no reason to require clients to explicitly pause/continue/restart
> > anything.
> It's probably doable to do numa config at '-S' time for x86 (arm),
> since ACPI tables are regenerated on the first read (legacy fw_cfg
> would be a little problematic but probably could be 'fixed' as well)
> 
> But I can't say outright if it's doable for other targets,
> in general issue here is that '-S' pauses after machine_done is run
> and all necessary wiring board requires is finalized by then
> and no hooks run after unpause.
> If there is a general consensus to go this route, I can invest
> some time in making it work (then this series could be dropped)
> 
> Even so, postponing set-numa to '-S' won't address Daniel's concern,
> i.e. configuration would take several round trips of command to complete
> potentially oven slow network. But as it was said libvirt can cache
> new CLI options for further reuse.

We can cache stuff from the generic "-m none" invokation, but we won't
cache stuff from invokation of a specific VM instance, because we can't
have confidence that such data is independant of the VM config. So we
would likely just end up hardcoding the arch specific data in libvirt if
that was all QEMU provided.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-20  9:07                 ` Daniel P. Berrange
  2017-10-20 20:07                   ` Eduardo Habkost
@ 2017-10-23 10:04                   ` Igor Mammedov
  2017-10-23 10:19                     ` Daniel P. Berrange
  1 sibling, 1 reply; 93+ messages in thread
From: Igor Mammedov @ 2017-10-23 10:04 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Eduardo Habkost, peter maydell, pkrempa, cohuck, qemu-devel,
	armbru, pbonzini, david, Laine Stump, libvir-list

On Fri, 20 Oct 2017 10:07:27 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Thu, Oct 19, 2017 at 05:56:49PM -0200, Eduardo Habkost wrote:
> > On Thu, Oct 19, 2017 at 04:28:59PM +0100, Daniel P. Berrange wrote:  
> > > On Thu, Oct 19, 2017 at 11:21:22AM -0400, Igor Mammedov wrote:  
> > > > ----- Original Message -----  
> > > > > From: "Daniel P. Berrange" <berrange@redhat.com>
> > > > > To: "Igor Mammedov" <imammedo@redhat.com>
> > > > > Cc: "peter maydell" <peter.maydell@linaro.org>, pkrempa@redhat.com, ehabkost@redhat.com, cohuck@redhat.com,
> > > > > qemu-devel@nongnu.org, armbru@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au
> > > > > Sent: Wednesday, October 18, 2017 5:30:10 PM
> > > > > Subject: Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
> > > > > 
> > > > > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:  
> > > > > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > >   
> > > > > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:  
> > > > > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > > > >     
> > > > > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:  
> > > > > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > > > > interface. For that to happen it introduces a new '-paused' CLI
> > > > > > > > > > option
> > > > > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > > > > NUMA mapping for cpus.  
> > > > > > > > > 
> > > > > > > > > What's the problem we're seeking solve here compared to what we
> > > > > > > > > currently
> > > > > > > > > do for NUMA configuration ?  
> > > > > > > > From RHBZ1382425
> > > > > > > > "
> > > > > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which
> > > > > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > > > > assume/re-implement cpu_index allocation logic to provide valid
> > > > > > > > values for -numa cpus=... QEMU CLI option.  
> > > > > > > 
> > > > > > > In broad terms, this problem applies to every device / object libvirt
> > > > > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > > > > "id" string, which is can then use to identify the thing later. The
> > > > > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > > > > libvirt has to infer. The latter is the same problem we had with
> > > > > > > devices before '-device' was introduced allowing 'id' naming.
> > > > > > > 
> > > > > > > IMHO we should take the same approach with CPUs and start modelling
> > > > > > > the individual CPUs as something we can explicitly create with -object
> > > > > > > or -device. That way libvirt can assign names and does not have to
> > > > > > > care about CPU index values, and it all works just the same way as
> > > > > > > any other devices / object we create
> > > > > > > 
> > > > > > > ie instead of:
> > > > > > > 
> > > > > > >   -smp 8,sockets=4,cores=2,threads=1
> > > > > > >   -numa node,nodeid=0,cpus=0-3
> > > > > > >   -numa node,nodeid=1,cpus=4-7
> > > > > > > 
> > > > > > > we could do:
> > > > > > > 
> > > > > > >   -object numa-node,id=numa0
> > > > > > >   -object numa-node,id=numa1
> > > > > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > > > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > > > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > > > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > > > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > > > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > > > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > > > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0  
> > > > > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > > > > come from, currently these options are the function of
> > > > > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > > > > runtime after qemu parses -M and -smp options.  
> > > > > 
> > > > > NB, I realize my example was open to mis-interpretation. The values I'm
> > > > > illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> > > > > are a plain enumeration of values. ie this is saying the 4th socket, the
> > > > > 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> > > > > with a core-id of 8, or 7038 or whatever architecture specific numbering
> > > > > scheme makes sense, but that's not what the mgmt app gives at the CLI
> > > > > level  
> > > > Even though fixed properties/values simplicity is tempting and it might even
> > > > work for what we have implemented in qemu currently (well, SPAPR will need
> > > > refactoring (if possible) to meet requirements + compat stuff for current
> > > > machines with sparse IDs).
> > > > But I have to disagree here and try to oppose it.
> > > > 
> > > > QEMU models concrete platforms/hw with certain non abstract properties
> > > > and it's libvirt's domain to translate platform specific devices into
> > > > 'spherical' devices with abstract properties.
> > > > 
> > > > Now back to cpus and suggestion to fix the set of 'address' properties
> > > > and their values into continuous enumeration range [0..N). That would
> > > >   1. put a burden of hiding platform/device details on QEMU
> > > >       (which is already bad as QEMU's job is to emulate it)
> > > >   2. with abstract 'address' properties and values, user won't have
> > > >      a clue as to where device is being attached (as qemu would magically
> > > >      remap that to fit specific machine needs)
> > > >   2.1. if abstract 'address' properties and values we can do away with
> > > >      socket/core/thread/whatnot since they won't mean the same when considered
> > > >      from platform point of view, so we can just drop all these nonsense
> > > >      and go back to cpu-index that has all the properties you've suggested
> > > >      /abstract, [0..N]/.
> > > >   3. we currently stopped with socket|core|thread-id properties as they are
> > > >      applicable to machines that support -device cpu, but it's up to machine
> > > >      to pick witch of these to use (x86: uses all, spar: uses core-id only),
> > > >      but current property set is open for extension if need arises without
> > > >      need to redefine interface. So fixed list of properties [even ignoring
> > > >      values impact] doesn't scale.  
> > > 
> > > Note from the libvirt POV, we don't expose socket-id/core-id/thread-id in our
> > > guest XML, we just provide an overall count of sockets/cores/threads which is
> > > portable. The only arch specific thing we would have todo is express constraints
> > > about ratios of these - eg indicate in some way that ppc doesn't allow mutliple
> > > threads per core for example.
> > >   
> > > > We even have cpu-add command which takes cpu-index as argument and
> > > > -numa node,cpus=0..X CLI option, good luck with figuring out which cpu goes
> > > > where and if it makes any sense from platform point of view.
> > > > 
> > > > That's why when designing hot plug for 'device_add cpu' interface, we ended up
> > > > with new query-hotpluggble-cpus QMP command, which is currently used by libvirt
> > > > for hot-plug:
> > > > 
> > > > Approach allows 
> > > >    1: machine to publish properties/values that make sense from emulated
> > > >       platform point of view but still understandable by user of given hw.
> > > >    2: user may use them as opaque mandatory properties to create cpu device if
> > > >       he/she doesn't care about where it's plugged.
> > > >    3: if user cares about which cpu goes where, properties defined by machine
> > > >       provide that info from emulated hw point of view including platform specific
> > > >       details.
> > > >    4: it's easy to extend set of properties/values if need arises without
> > > >       breaking users (provided user will put them all in -device/device_add
> > > >       options as it's supposed to)
> > > > 
> > > > But current approach has drawback, to call query-hotpluggble-cpus, machine has to
> > > > be started first, which is fine for hot plug but not for specifying CLI options.
> > > > 
> > > > Currently that could be solved by starting qemu twice when 'defining domain',
> > > > where on the first run mgmt queries board layout and caches it for all the next
> > > > times the defined machine is started (change in machine/version/-smp/-cpu will
> > > > invalidate, cache).
> > > > 
> > > > This series allows to avoid this 1st time restart, when creating domain for
> > > > the first time, mgmt can query layout and then specify numa mapping without
> > > > restarting, it can cache defined mapping as commands exactly match corresponding
> > > > CLI options and reuse cached options on the next domain starts.
> > > > 
> > > > This approach could be extended further with "device_add cpu" command
> > > > so it would be possible to start qemu with -smp 0,... and allow mgmt to
> > > > create cpus with explicit IDs controlled by mgmt, and again mgmt may cache
> > > > these commands and reuse them on CLI next time machine is started
> > > > 
> > > > I think Eduardo's work on query-slots is superset of query-hotpluggble-cpus,
> > > > but working to the same goal to allow mgmt discover which hw is provided by
> > > > specific machine and where/which hw could be plugged (like which slot supports
> > > > which kind of device and which 'address' should be used to attach device
> > > > (socket|core... - for cpus, bus/function - for pic, ...)  
> > > 
> > > As mentioned elsewhere in the thread, the approach of defining the VM config
> > > incrementally via the monitor has significant downsides, by making the config
> > > invisible in any logs of the ARGV, and has likely performance impact when
> > > starting up QEMU, particularly if it is used for more things going forward. To
> > > me these downsides are enough to make the suggested approach for CPUs impractical
> > > for libvirt to use.  
> > 
> > Those downsides do exist, but we should weight them against the
> > downsides of not allowing any information at all to flow from
> > QEMU to libvirt when starting a VM.
> > 
> > I believe the code in libvirt/src/qemu/qemu_domain_address.c is
> > a good illustration of those downsides.  
> 
> Right, but for this NUMA / CPU scenario I don't think we're going to end up
> with complexity like this. I still believe we are able to come up with a
> way to represent it at the CLI without so much architecture specific
> knowledge.
Unfortunately cpu to node mapping isn't arch agnostic and requires
understanding from upper layers when they compose QEMU CLI with it.

> 
> Even if that is not possible though, from libvirt POV the extra complexity
> is worth it, if that is what we need to preserve fast startup time. The
> time to start a guest is very important to apps like libguestfs and libvirt
> sandbox, so going down a direction which is likely to add 100's or even 1000's
> of milliseconds to the startup time is not desirable, even if it makes libvirt
> simpler
Both of the above tools do not use NUMA configuration so it's not really
applicable there.

Can we cache machine layout when domain is created for the first time
and reuse cached values to the next time guest started?

> 
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-23 10:04                   ` Igor Mammedov
@ 2017-10-23 10:19                     ` Daniel P. Berrange
  0 siblings, 0 replies; 93+ messages in thread
From: Daniel P. Berrange @ 2017-10-23 10:19 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Eduardo Habkost, peter maydell, pkrempa, cohuck, qemu-devel,
	armbru, pbonzini, david, Laine Stump, libvir-list

On Mon, Oct 23, 2017 at 12:04:17PM +0200, Igor Mammedov wrote:
> On Fri, 20 Oct 2017 10:07:27 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
> > On Thu, Oct 19, 2017 at 05:56:49PM -0200, Eduardo Habkost wrote:
> > > On Thu, Oct 19, 2017 at 04:28:59PM +0100, Daniel P. Berrange wrote:  
> > > > On Thu, Oct 19, 2017 at 11:21:22AM -0400, Igor Mammedov wrote:  
> > > > > ----- Original Message -----  
> > > > > > From: "Daniel P. Berrange" <berrange@redhat.com>
> > > > > > To: "Igor Mammedov" <imammedo@redhat.com>
> > > > > > Cc: "peter maydell" <peter.maydell@linaro.org>, pkrempa@redhat.com, ehabkost@redhat.com, cohuck@redhat.com,
> > > > > > qemu-devel@nongnu.org, armbru@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au
> > > > > > Sent: Wednesday, October 18, 2017 5:30:10 PM
> > > > > > Subject: Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
> > > > > > 
> > > > > > On Tue, Oct 17, 2017 at 06:06:35PM +0200, Igor Mammedov wrote:  
> > > > > > > On Tue, 17 Oct 2017 16:07:59 +0100
> > > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > > >   
> > > > > > > > On Tue, Oct 17, 2017 at 09:27:02AM +0200, Igor Mammedov wrote:  
> > > > > > > > > On Mon, 16 Oct 2017 17:36:36 +0100
> > > > > > > > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > > > > > > > >     
> > > > > > > > > > On Mon, Oct 16, 2017 at 06:22:50PM +0200, Igor Mammedov wrote:  
> > > > > > > > > > > Series allows to configure NUMA mapping at runtime using QMP/HMP
> > > > > > > > > > > interface. For that to happen it introduces a new '-paused' CLI
> > > > > > > > > > > option
> > > > > > > > > > > which allows to pause QEMU before machine_init() is run and
> > > > > > > > > > > adds new set-numa-node HMP/QMP commands which in conjuction with
> > > > > > > > > > > info hotpluggable-cpus/query-hotpluggable-cpus allow to configure
> > > > > > > > > > > NUMA mapping for cpus.  
> > > > > > > > > > 
> > > > > > > > > > What's the problem we're seeking solve here compared to what we
> > > > > > > > > > currently
> > > > > > > > > > do for NUMA configuration ?  
> > > > > > > > > From RHBZ1382425
> > > > > > > > > "
> > > > > > > > > Current -numa CLI interface is quite limited in terms that allow map
> > > > > > > > > CPUs to NUMA nodes as it requires to provide cpu_index values which
> > > > > > > > > are non obvious and depend on machine/arch. As result libvirt has to
> > > > > > > > > assume/re-implement cpu_index allocation logic to provide valid
> > > > > > > > > values for -numa cpus=... QEMU CLI option.  
> > > > > > > > 
> > > > > > > > In broad terms, this problem applies to every device / object libvirt
> > > > > > > > asks QEMU to create. For everything else libvirt is able to assign a
> > > > > > > > "id" string, which is can then use to identify the thing later. The
> > > > > > > > CPU stuff is different because libvirt isn't able to provide 'id'
> > > > > > > > strings for each CPU - QEMU generates a psuedo-id internally which
> > > > > > > > libvirt has to infer. The latter is the same problem we had with
> > > > > > > > devices before '-device' was introduced allowing 'id' naming.
> > > > > > > > 
> > > > > > > > IMHO we should take the same approach with CPUs and start modelling
> > > > > > > > the individual CPUs as something we can explicitly create with -object
> > > > > > > > or -device. That way libvirt can assign names and does not have to
> > > > > > > > care about CPU index values, and it all works just the same way as
> > > > > > > > any other devices / object we create
> > > > > > > > 
> > > > > > > > ie instead of:
> > > > > > > > 
> > > > > > > >   -smp 8,sockets=4,cores=2,threads=1
> > > > > > > >   -numa node,nodeid=0,cpus=0-3
> > > > > > > >   -numa node,nodeid=1,cpus=4-7
> > > > > > > > 
> > > > > > > > we could do:
> > > > > > > > 
> > > > > > > >   -object numa-node,id=numa0
> > > > > > > >   -object numa-node,id=numa1
> > > > > > > >   -object cpu,id=cpu0,node=numa0,socket=0,core=0,thread=0
> > > > > > > >   -object cpu,id=cpu1,node=numa0,socket=0,core=1,thread=0
> > > > > > > >   -object cpu,id=cpu2,node=numa0,socket=1,core=0,thread=0
> > > > > > > >   -object cpu,id=cpu3,node=numa0,socket=1,core=1,thread=0
> > > > > > > >   -object cpu,id=cpu4,node=numa1,socket=2,core=0,thread=0
> > > > > > > >   -object cpu,id=cpu5,node=numa1,socket=2,core=1,thread=0
> > > > > > > >   -object cpu,id=cpu6,node=numa1,socket=3,core=0,thread=0
> > > > > > > >   -object cpu,id=cpu7,node=numa1,socket=3,core=1,thread=0  
> > > > > > > the follow up question would be where do "socket=3,core=1,thread=0"
> > > > > > > come from, currently these options are the function of
> > > > > > > (-M foo -smp ...) and can be queried vi query-hotpluggble-cpus at
> > > > > > > runtime after qemu parses -M and -smp options.  
> > > > > > 
> > > > > > NB, I realize my example was open to mis-interpretation. The values I'm
> > > > > > illustrating here for socket=3,core=1,thread=0 and *not* ID values, they
> > > > > > are a plain enumeration of values. ie this is saying the 4th socket, the
> > > > > > 2nd core and the 1st thread.  Internally QEMU might have the 2nd core
> > > > > > with a core-id of 8, or 7038 or whatever architecture specific numbering
> > > > > > scheme makes sense, but that's not what the mgmt app gives at the CLI
> > > > > > level  
> > > > > Even though fixed properties/values simplicity is tempting and it might even
> > > > > work for what we have implemented in qemu currently (well, SPAPR will need
> > > > > refactoring (if possible) to meet requirements + compat stuff for current
> > > > > machines with sparse IDs).
> > > > > But I have to disagree here and try to oppose it.
> > > > > 
> > > > > QEMU models concrete platforms/hw with certain non abstract properties
> > > > > and it's libvirt's domain to translate platform specific devices into
> > > > > 'spherical' devices with abstract properties.
> > > > > 
> > > > > Now back to cpus and suggestion to fix the set of 'address' properties
> > > > > and their values into continuous enumeration range [0..N). That would
> > > > >   1. put a burden of hiding platform/device details on QEMU
> > > > >       (which is already bad as QEMU's job is to emulate it)
> > > > >   2. with abstract 'address' properties and values, user won't have
> > > > >      a clue as to where device is being attached (as qemu would magically
> > > > >      remap that to fit specific machine needs)
> > > > >   2.1. if abstract 'address' properties and values we can do away with
> > > > >      socket/core/thread/whatnot since they won't mean the same when considered
> > > > >      from platform point of view, so we can just drop all these nonsense
> > > > >      and go back to cpu-index that has all the properties you've suggested
> > > > >      /abstract, [0..N]/.
> > > > >   3. we currently stopped with socket|core|thread-id properties as they are
> > > > >      applicable to machines that support -device cpu, but it's up to machine
> > > > >      to pick witch of these to use (x86: uses all, spar: uses core-id only),
> > > > >      but current property set is open for extension if need arises without
> > > > >      need to redefine interface. So fixed list of properties [even ignoring
> > > > >      values impact] doesn't scale.  
> > > > 
> > > > Note from the libvirt POV, we don't expose socket-id/core-id/thread-id in our
> > > > guest XML, we just provide an overall count of sockets/cores/threads which is
> > > > portable. The only arch specific thing we would have todo is express constraints
> > > > about ratios of these - eg indicate in some way that ppc doesn't allow mutliple
> > > > threads per core for example.
> > > >   
> > > > > We even have cpu-add command which takes cpu-index as argument and
> > > > > -numa node,cpus=0..X CLI option, good luck with figuring out which cpu goes
> > > > > where and if it makes any sense from platform point of view.
> > > > > 
> > > > > That's why when designing hot plug for 'device_add cpu' interface, we ended up
> > > > > with new query-hotpluggble-cpus QMP command, which is currently used by libvirt
> > > > > for hot-plug:
> > > > > 
> > > > > Approach allows 
> > > > >    1: machine to publish properties/values that make sense from emulated
> > > > >       platform point of view but still understandable by user of given hw.
> > > > >    2: user may use them as opaque mandatory properties to create cpu device if
> > > > >       he/she doesn't care about where it's plugged.
> > > > >    3: if user cares about which cpu goes where, properties defined by machine
> > > > >       provide that info from emulated hw point of view including platform specific
> > > > >       details.
> > > > >    4: it's easy to extend set of properties/values if need arises without
> > > > >       breaking users (provided user will put them all in -device/device_add
> > > > >       options as it's supposed to)
> > > > > 
> > > > > But current approach has drawback, to call query-hotpluggble-cpus, machine has to
> > > > > be started first, which is fine for hot plug but not for specifying CLI options.
> > > > > 
> > > > > Currently that could be solved by starting qemu twice when 'defining domain',
> > > > > where on the first run mgmt queries board layout and caches it for all the next
> > > > > times the defined machine is started (change in machine/version/-smp/-cpu will
> > > > > invalidate, cache).
> > > > > 
> > > > > This series allows to avoid this 1st time restart, when creating domain for
> > > > > the first time, mgmt can query layout and then specify numa mapping without
> > > > > restarting, it can cache defined mapping as commands exactly match corresponding
> > > > > CLI options and reuse cached options on the next domain starts.
> > > > > 
> > > > > This approach could be extended further with "device_add cpu" command
> > > > > so it would be possible to start qemu with -smp 0,... and allow mgmt to
> > > > > create cpus with explicit IDs controlled by mgmt, and again mgmt may cache
> > > > > these commands and reuse them on CLI next time machine is started
> > > > > 
> > > > > I think Eduardo's work on query-slots is superset of query-hotpluggble-cpus,
> > > > > but working to the same goal to allow mgmt discover which hw is provided by
> > > > > specific machine and where/which hw could be plugged (like which slot supports
> > > > > which kind of device and which 'address' should be used to attach device
> > > > > (socket|core... - for cpus, bus/function - for pic, ...)  
> > > > 
> > > > As mentioned elsewhere in the thread, the approach of defining the VM config
> > > > incrementally via the monitor has significant downsides, by making the config
> > > > invisible in any logs of the ARGV, and has likely performance impact when
> > > > starting up QEMU, particularly if it is used for more things going forward. To
> > > > me these downsides are enough to make the suggested approach for CPUs impractical
> > > > for libvirt to use.  
> > > 
> > > Those downsides do exist, but we should weight them against the
> > > downsides of not allowing any information at all to flow from
> > > QEMU to libvirt when starting a VM.
> > > 
> > > I believe the code in libvirt/src/qemu/qemu_domain_address.c is
> > > a good illustration of those downsides.  
> > 
> > Right, but for this NUMA / CPU scenario I don't think we're going to end up
> > with complexity like this. I still believe we are able to come up with a
> > way to represent it at the CLI without so much architecture specific
> > knowledge.
> Unfortunately cpu to node mapping isn't arch agnostic and requires
> understanding from upper layers when they compose QEMU CLI with it.

In terms of the guest config it manages, Libvirt doesn't care about the 
low level core-id, socket-id, thread-id values. It just knows from the
application it has to request 4 sockets, with 4 cores, with 2 threads.
Whether the cores get given core-id 0, 1, 2, 3 vs 1, 2, 16, 17 does not
matter to libvirt, nor the application using libvirt. So to avoid 
architecture differences at startup we just need to be able to configure
 the topology without referring to the arhitecture-specific integer ID 
values.

There are also some architecture constraints in respect of what combination
of sockets/core/threads are available with given CPU models. If we are to
avoid arch specific code, these constraints need to be exposed to libvirt,
which would in turn expose them to the application, to let the application
decide how to best setup the CPU topology when it creates the guest. For
this to be useful to the application it has to be provided separately from
guest startup, because eg, OpenStack decides this aspect of guest
configuration before it even decides what host to run the guest on, let
alone try to start the guest.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-23  9:53               ` Daniel P. Berrange
@ 2017-10-23 10:36                 ` Igor Mammedov
  2017-10-23 10:49                   ` Daniel P. Berrange
  0 siblings, 1 reply; 93+ messages in thread
From: Igor Mammedov @ 2017-10-23 10:36 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Eduardo Habkost, peter.maydell, pkrempa, cohuck, armbru,
	qemu-devel, pbonzini, David Gibson

On Mon, 23 Oct 2017 10:53:16 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Mon, Oct 23, 2017 at 11:49:13AM +0200, Igor Mammedov wrote:
> > On Fri, 20 Oct 2017 12:21:00 -0200
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> >   
> > > On Fri, Oct 20, 2017 at 12:19:17PM +1100, David Gibson wrote:  
> > > > On Thu, Oct 19, 2017 at 10:15:48PM -0200, Eduardo Habkost wrote:    
> > > > > On Thu, Oct 19, 2017 at 09:42:18PM +1100, David Gibson wrote:    
> > > > > > On Mon, Oct 16, 2017 at 02:59:16PM -0200, Eduardo Habkost wrote:    
> > > > > > > On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:    
> > > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > > > > > ---
> > > > > > > >  include/sysemu/sysemu.h |  1 +
> > > > > > > >  qemu-options.hx         | 15 ++++++++++++++
> > > > > > > >  qmp.c                   |  5 +++++
> > > > > > > >  vl.c                    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++-
> > > > > > > >  4 files changed, 74 insertions(+), 1 deletion(-)
> > > > > > > > 
> > > > > > > > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > > > > > > > index b213696..3feb94f 100644
> > > > > > > > --- a/include/sysemu/sysemu.h
> > > > > > > > +++ b/include/sysemu/sysemu.h
> > > > > > > > @@ -66,6 +66,7 @@ typedef enum WakeupReason {
> > > > > > > >      QEMU_WAKEUP_REASON_OTHER,
> > > > > > > >  } WakeupReason;
> > > > > > > >  
> > > > > > > > +void qemu_exit_preconfig_request(void);
> > > > > > > >  void qemu_system_reset_request(ShutdownCause reason);
> > > > > > > >  void qemu_system_suspend_request(void);
> > > > > > > >  void qemu_register_suspend_notifier(Notifier *notifier);
> > > > > > > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > > > > > > index 39225ae..bd44db8 100644
> > > > > > > > --- a/qemu-options.hx
> > > > > > > > +++ b/qemu-options.hx
> > > > > > > > @@ -3498,6 +3498,21 @@ STEXI
> > > > > > > >  Run the emulation in single step mode.
> > > > > > > >  ETEXI
> > > > > > > >  
> > > > > > > > +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> > > > > > > > +    "-paused [state=]postconf|preconf\n"
> > > > > > > > +    "                postconf: pause QEMU after machine is initialized\n"
> > > > > > > > +    "                preconf: pause QEMU before machine is initialized\n",
> > > > > > > > +    QEMU_ARCH_ALL)    
> > > > > > > 
> > > > > > > I would like to allow pausing before machine-type is selected, so
> > > > > > > management could run query-machines before choosing a
> > > > > > > machine-type.  Would that need a third "-pause" mode, or will we
> > > > > > > be able to change "preconf" to pause before select_machine() is
> > > > > > > called?
> > > > > > > 
> > > > > > > The same probably applies to other things initialized before
> > > > > > > machine_run_board_init() that could be configurable using QMP,
> > > > > > > including but not limited to:
> > > > > > > * Accelerator configuration
> > > > > > > * Registering global properties
> > > > > > > * RAM size
> > > > > > > * SMP/CPU configuration    
> > > > > > 
> > > > > > Yeah.. having a bunch of different possible pause stages to select
> > > > > > doesn't sound great.    
> > > > > 
> > > > > I agree.  The number of externally visible pause states should be
> > > > > as small as possible.
> > > > > 
> > > > >     
> > > > > >                       Could we avoid this by instead changing -S to
> > > > > > pause at the earliest possible spot, but having any monitor commands
> > > > > > that require a later stage automatically "fast forwarding" to the
> > > > > > right phase?    
> > > > > 
> > > > > That would hide the internal details from the outside.  Sounds
> > > > > nice, but adding new machine/device configuration QMP commands
> > > > > while hiding the QEMU state from the outside sounds impossible.
> > > > > 
> > > > > For example, if we use -S today, this works:
> > > > > 
> > > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}    
> > > > >   -> {"execute":"qmp_capabilities"}    
> > > > >   <- {"return": {}}    
> > > > >   -> {"execute":"query-cpus"}    
> > > > >   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}
> > > > > 
> > > > > This means "query-cpus" needs to fast-forward to the CPU creation
> > > > > stage if we want to keep compatibility.
> > > > > 
> > > > > Now, assume we add a set-numa-node command like the one in this
> > > > > series.  e.g.:
> > > > > 
> > > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}    
> > > > >   -> {"execute":"qmp_capabilities"}    
> > > > >   <- {"return": {}}    
> > > > >   -> {"execute":"set-numa-node" ... }    
> > > > >   <- {"return": ...}
> > > > > 
> > > > > The command will work only if machine initialization didn't run
> > > > > yet.
> > > > > 
> > > > > But now an innocent-looking query command would change QEMU state
> > > > > in an unexpected way:
> > > > > 
> > > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}    
> > > > >   -> {"execute":"qmp_capabilities"}    
> > > > >   <- {"return": {}}    
> > > > >   -> {"execute":"query-cpus"}  [will silently fast-forward QEMU state]    
> > > > >   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}    
> > > > >   -> {"execute":"set-numa-node" ... }    
> > > > >   <- {"error": ...}  [the command will fail because the machine was already created]
> > > > > 
> > > > > This means we do have a externally visible "too late to use
> > > > > set-numa-node" QEMU state, and query-cpus will have a externally
> > > > > visible side effect.  Every QMP command would need to document
> > > > > how it affects QEMU state in a externally visible way.
> > > > > 
> > > > > If QEMU pause state is still going to be externally visible this
> > > > > way, I would prefer to let the client to explicitly tell what's
> > > > > the state they want QEMU to be, instead of making QEMU change
> > > > > state silently as a side effect of QMP commands.    
> > > > 
> > > > Yeah, good point.  My proposal would just have changed explicitly
> > > > exposed ugly internal state to subtly exposed ugly internal state,
> > > > which is probably worse :(.
> > > > 
> > > > 
> > > > Ok.. next possibly bad idea..
> > > > 
> > > > What about a "re-exec" monitor command; it would take what's
> > > > essentially a new command line, and basically restart qemu from the
> > > > beginning, reparsing this new command line, but without actually 
> > > > 
> > > > Pro:
> > > >   * Mitigates Daniel Berrange's concern about lots of qemu
> > > >     configuration being buried in the qmp session - if libvirt logged
> > > >     its last "re-exec" that would have what is generally needed.
> > > >   * Lets libvirt do assorted investigation of options, then rewind to
> > > >     choose what it actually wants    
> > > 
> > > Sounds like a superset of Paolo's "-machine none" proposal[1].
> > > It would be a very simple interface, not sure it can be easily
> > > implemented efficiently.
> > > 
> > > [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg488618.html
> > >   
> > > > 
> > > > Con:
> > > >   * Would require a bunch of auditing of structures/state to make sure
> > > >     they can be re-initialized cleanly    
> > > 
> > > This sounds like a big obstacle.  QEMU still have too much global
> > > state outside the machine/qdev tree.
> > > 
> > >   
> > > >   * Would it be fast enough for libvirt to use?  Do we know if the
> > > >     slowness which makes multiple qemu invocations by libvirt
> > > >     unattractive is from the kernel/libc/ldso overhead, or from qemu's
> > > >     internal start up processing?    
> > > 
> > > My gut feeling is that this could be too slow, if the scope of
> > > "re-exec" is too big.
> > > 
> > > 
> > > Now, let me try to go to the opposite extreme: I think you had a
> > > good point in your previous proposal.  Why should we need to
> > > restart/re-execute anything at all just because some bit of
> > > configuration is being changed by libvirt?  Why commands like
> > > set-numa-node should require QEMU to be in a state that is not
> > > covered by -S?  If the guest is not running yet, there should be
> > > no reason to require clients to explicitly pause/continue/restart
> > > anything.  
> > It's probably doable to do numa config at '-S' time for x86 (arm),
> > since ACPI tables are regenerated on the first read (legacy fw_cfg
> > would be a little problematic but probably could be 'fixed' as well)
> > 
> > But I can't say outright if it's doable for other targets,
> > in general issue here is that '-S' pauses after machine_done is run
> > and all necessary wiring board requires is finalized by then
> > and no hooks run after unpause.
> > If there is a general consensus to go this route, I can invest
> > some time in making it work (then this series could be dropped)
> > 
> > Even so, postponing set-numa to '-S' won't address Daniel's concern,
> > i.e. configuration would take several round trips of command to complete
> > potentially oven slow network. But as it was said libvirt can cache
> > new CLI options for further reuse.  
> 
> We can cache stuff from the generic "-m none" invokation, but we won't
> cache stuff from invokation of a specific VM instance, because we can't
> have confidence that such data is independant of the VM config. So we
In case if cpu layout we have fixed set of options that influence it
(-M foo_vXX -smp ...),  so from QEMU side it should be possible to
promise it would stay stable.
But such caching would be useful in other use cases as well.
Is the issue in invalidating cached data in case of option(s) would
change cached data?


> would likely just end up hardcoding the arch specific data in libvirt if
> that was all QEMU provided.
Another insane idea is to make algorithm introspectable, i.e.
publish per machine code that would be used by both, mgmt and qemu
to compute layout, for example in python. It's probably not issue
for libvirt but qemu will have to embed python to make shared
algorithm work. Not sure if it's acceptable from qemu pov.

> 
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-23 10:36                 ` Igor Mammedov
@ 2017-10-23 10:49                   ` Daniel P. Berrange
  2017-10-23 11:18                     ` Igor Mammedov
  0 siblings, 1 reply; 93+ messages in thread
From: Daniel P. Berrange @ 2017-10-23 10:49 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Eduardo Habkost, peter.maydell, pkrempa, cohuck, armbru,
	qemu-devel, pbonzini, David Gibson

On Mon, Oct 23, 2017 at 12:36:20PM +0200, Igor Mammedov wrote:
> On Mon, 23 Oct 2017 10:53:16 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
> > On Mon, Oct 23, 2017 at 11:49:13AM +0200, Igor Mammedov wrote:
> > > On Fri, 20 Oct 2017 12:21:00 -0200
> > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > >   
> > > > On Fri, Oct 20, 2017 at 12:19:17PM +1100, David Gibson wrote:  
> > > > > On Thu, Oct 19, 2017 at 10:15:48PM -0200, Eduardo Habkost wrote:    
> > > > > > On Thu, Oct 19, 2017 at 09:42:18PM +1100, David Gibson wrote:    
> > > > > > > On Mon, Oct 16, 2017 at 02:59:16PM -0200, Eduardo Habkost wrote:    
> > > > > > > > On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:    
> > > > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > > > > > > ---
> > > > > > > > >  include/sysemu/sysemu.h |  1 +
> > > > > > > > >  qemu-options.hx         | 15 ++++++++++++++
> > > > > > > > >  qmp.c                   |  5 +++++
> > > > > > > > >  vl.c                    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++-
> > > > > > > > >  4 files changed, 74 insertions(+), 1 deletion(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > > > > > > > > index b213696..3feb94f 100644
> > > > > > > > > --- a/include/sysemu/sysemu.h
> > > > > > > > > +++ b/include/sysemu/sysemu.h
> > > > > > > > > @@ -66,6 +66,7 @@ typedef enum WakeupReason {
> > > > > > > > >      QEMU_WAKEUP_REASON_OTHER,
> > > > > > > > >  } WakeupReason;
> > > > > > > > >  
> > > > > > > > > +void qemu_exit_preconfig_request(void);
> > > > > > > > >  void qemu_system_reset_request(ShutdownCause reason);
> > > > > > > > >  void qemu_system_suspend_request(void);
> > > > > > > > >  void qemu_register_suspend_notifier(Notifier *notifier);
> > > > > > > > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > > > > > > > index 39225ae..bd44db8 100644
> > > > > > > > > --- a/qemu-options.hx
> > > > > > > > > +++ b/qemu-options.hx
> > > > > > > > > @@ -3498,6 +3498,21 @@ STEXI
> > > > > > > > >  Run the emulation in single step mode.
> > > > > > > > >  ETEXI
> > > > > > > > >  
> > > > > > > > > +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> > > > > > > > > +    "-paused [state=]postconf|preconf\n"
> > > > > > > > > +    "                postconf: pause QEMU after machine is initialized\n"
> > > > > > > > > +    "                preconf: pause QEMU before machine is initialized\n",
> > > > > > > > > +    QEMU_ARCH_ALL)    
> > > > > > > > 
> > > > > > > > I would like to allow pausing before machine-type is selected, so
> > > > > > > > management could run query-machines before choosing a
> > > > > > > > machine-type.  Would that need a third "-pause" mode, or will we
> > > > > > > > be able to change "preconf" to pause before select_machine() is
> > > > > > > > called?
> > > > > > > > 
> > > > > > > > The same probably applies to other things initialized before
> > > > > > > > machine_run_board_init() that could be configurable using QMP,
> > > > > > > > including but not limited to:
> > > > > > > > * Accelerator configuration
> > > > > > > > * Registering global properties
> > > > > > > > * RAM size
> > > > > > > > * SMP/CPU configuration    
> > > > > > > 
> > > > > > > Yeah.. having a bunch of different possible pause stages to select
> > > > > > > doesn't sound great.    
> > > > > > 
> > > > > > I agree.  The number of externally visible pause states should be
> > > > > > as small as possible.
> > > > > > 
> > > > > >     
> > > > > > >                       Could we avoid this by instead changing -S to
> > > > > > > pause at the earliest possible spot, but having any monitor commands
> > > > > > > that require a later stage automatically "fast forwarding" to the
> > > > > > > right phase?    
> > > > > > 
> > > > > > That would hide the internal details from the outside.  Sounds
> > > > > > nice, but adding new machine/device configuration QMP commands
> > > > > > while hiding the QEMU state from the outside sounds impossible.
> > > > > > 
> > > > > > For example, if we use -S today, this works:
> > > > > > 
> > > > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}    
> > > > > >   -> {"execute":"qmp_capabilities"}    
> > > > > >   <- {"return": {}}    
> > > > > >   -> {"execute":"query-cpus"}    
> > > > > >   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}
> > > > > > 
> > > > > > This means "query-cpus" needs to fast-forward to the CPU creation
> > > > > > stage if we want to keep compatibility.
> > > > > > 
> > > > > > Now, assume we add a set-numa-node command like the one in this
> > > > > > series.  e.g.:
> > > > > > 
> > > > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}    
> > > > > >   -> {"execute":"qmp_capabilities"}    
> > > > > >   <- {"return": {}}    
> > > > > >   -> {"execute":"set-numa-node" ... }    
> > > > > >   <- {"return": ...}
> > > > > > 
> > > > > > The command will work only if machine initialization didn't run
> > > > > > yet.
> > > > > > 
> > > > > > But now an innocent-looking query command would change QEMU state
> > > > > > in an unexpected way:
> > > > > > 
> > > > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}    
> > > > > >   -> {"execute":"qmp_capabilities"}    
> > > > > >   <- {"return": {}}    
> > > > > >   -> {"execute":"query-cpus"}  [will silently fast-forward QEMU state]    
> > > > > >   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}    
> > > > > >   -> {"execute":"set-numa-node" ... }    
> > > > > >   <- {"error": ...}  [the command will fail because the machine was already created]
> > > > > > 
> > > > > > This means we do have a externally visible "too late to use
> > > > > > set-numa-node" QEMU state, and query-cpus will have a externally
> > > > > > visible side effect.  Every QMP command would need to document
> > > > > > how it affects QEMU state in a externally visible way.
> > > > > > 
> > > > > > If QEMU pause state is still going to be externally visible this
> > > > > > way, I would prefer to let the client to explicitly tell what's
> > > > > > the state they want QEMU to be, instead of making QEMU change
> > > > > > state silently as a side effect of QMP commands.    
> > > > > 
> > > > > Yeah, good point.  My proposal would just have changed explicitly
> > > > > exposed ugly internal state to subtly exposed ugly internal state,
> > > > > which is probably worse :(.
> > > > > 
> > > > > 
> > > > > Ok.. next possibly bad idea..
> > > > > 
> > > > > What about a "re-exec" monitor command; it would take what's
> > > > > essentially a new command line, and basically restart qemu from the
> > > > > beginning, reparsing this new command line, but without actually 
> > > > > 
> > > > > Pro:
> > > > >   * Mitigates Daniel Berrange's concern about lots of qemu
> > > > >     configuration being buried in the qmp session - if libvirt logged
> > > > >     its last "re-exec" that would have what is generally needed.
> > > > >   * Lets libvirt do assorted investigation of options, then rewind to
> > > > >     choose what it actually wants    
> > > > 
> > > > Sounds like a superset of Paolo's "-machine none" proposal[1].
> > > > It would be a very simple interface, not sure it can be easily
> > > > implemented efficiently.
> > > > 
> > > > [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg488618.html
> > > >   
> > > > > 
> > > > > Con:
> > > > >   * Would require a bunch of auditing of structures/state to make sure
> > > > >     they can be re-initialized cleanly    
> > > > 
> > > > This sounds like a big obstacle.  QEMU still have too much global
> > > > state outside the machine/qdev tree.
> > > > 
> > > >   
> > > > >   * Would it be fast enough for libvirt to use?  Do we know if the
> > > > >     slowness which makes multiple qemu invocations by libvirt
> > > > >     unattractive is from the kernel/libc/ldso overhead, or from qemu's
> > > > >     internal start up processing?    
> > > > 
> > > > My gut feeling is that this could be too slow, if the scope of
> > > > "re-exec" is too big.
> > > > 
> > > > 
> > > > Now, let me try to go to the opposite extreme: I think you had a
> > > > good point in your previous proposal.  Why should we need to
> > > > restart/re-execute anything at all just because some bit of
> > > > configuration is being changed by libvirt?  Why commands like
> > > > set-numa-node should require QEMU to be in a state that is not
> > > > covered by -S?  If the guest is not running yet, there should be
> > > > no reason to require clients to explicitly pause/continue/restart
> > > > anything.  
> > > It's probably doable to do numa config at '-S' time for x86 (arm),
> > > since ACPI tables are regenerated on the first read (legacy fw_cfg
> > > would be a little problematic but probably could be 'fixed' as well)
> > > 
> > > But I can't say outright if it's doable for other targets,
> > > in general issue here is that '-S' pauses after machine_done is run
> > > and all necessary wiring board requires is finalized by then
> > > and no hooks run after unpause.
> > > If there is a general consensus to go this route, I can invest
> > > some time in making it work (then this series could be dropped)
> > > 
> > > Even so, postponing set-numa to '-S' won't address Daniel's concern,
> > > i.e. configuration would take several round trips of command to complete
> > > potentially oven slow network. But as it was said libvirt can cache
> > > new CLI options for further reuse.  
> > 
> > We can cache stuff from the generic "-m none" invokation, but we won't
> > cache stuff from invokation of a specific VM instance, because we can't
> > have confidence that such data is independant of the VM config. So we
> In case if cpu layout we have fixed set of options that influence it
> (-M foo_vXX -smp ...),  so from QEMU side it should be possible to
> promise it would stay stable.
> But such caching would be useful in other use cases as well.
> Is the issue in invalidating cached data in case of option(s) would
> change cached data?

For the caching to be useful, we need to have a good cache hit rate.
If the cache depends on alot of different CLI args, then you're going
to have to populate many caches each with low hit rate. The current
caching is done based on QEMU/libvirtd binary, so we have 1 cache miss
when QEMU or libvirt are upgraded, then 100% cache hit thereafter, so
the cache is very effective.

> > would likely just end up hardcoding the arch specific data in libvirt if
> > that was all QEMU provided.
> Another insane idea is to make algorithm introspectable, i.e.
> publish per machine code that would be used by both, mgmt and qemu
> to compute layout, for example in python. It's probably not issue
> for libvirt but qemu will have to embed python to make shared
> algorithm work. Not sure if it's acceptable from qemu pov.

That's not going to fly - we definitely cannot assume apps want or can
run python code. Libvirt has major users in many programming languages,
including Python, C, Go, Java, Vala and more.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-23 10:49                   ` Daniel P. Berrange
@ 2017-10-23 11:18                     ` Igor Mammedov
  2017-10-25 10:52                       ` Eduardo Habkost
  0 siblings, 1 reply; 93+ messages in thread
From: Igor Mammedov @ 2017-10-23 11:18 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Eduardo Habkost, peter.maydell, pkrempa, cohuck, armbru,
	qemu-devel, pbonzini, David Gibson

On Mon, 23 Oct 2017 11:49:44 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Mon, Oct 23, 2017 at 12:36:20PM +0200, Igor Mammedov wrote:
> > On Mon, 23 Oct 2017 10:53:16 +0100
> > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> >   
> > > On Mon, Oct 23, 2017 at 11:49:13AM +0200, Igor Mammedov wrote:  
> > > > On Fri, 20 Oct 2017 12:21:00 -0200
> > > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > >     
> > > > > On Fri, Oct 20, 2017 at 12:19:17PM +1100, David Gibson wrote:    
> > > > > > On Thu, Oct 19, 2017 at 10:15:48PM -0200, Eduardo Habkost wrote:      
> > > > > > > On Thu, Oct 19, 2017 at 09:42:18PM +1100, David Gibson wrote:      
> > > > > > > > On Mon, Oct 16, 2017 at 02:59:16PM -0200, Eduardo Habkost wrote:      
> > > > > > > > > On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:      
> > > > > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > > > > > > > ---
> > > > > > > > > >  include/sysemu/sysemu.h |  1 +
> > > > > > > > > >  qemu-options.hx         | 15 ++++++++++++++
> > > > > > > > > >  qmp.c                   |  5 +++++
> > > > > > > > > >  vl.c                    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++-
> > > > > > > > > >  4 files changed, 74 insertions(+), 1 deletion(-)
> > > > > > > > > > 
> > > > > > > > > > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > > > > > > > > > index b213696..3feb94f 100644
> > > > > > > > > > --- a/include/sysemu/sysemu.h
> > > > > > > > > > +++ b/include/sysemu/sysemu.h
> > > > > > > > > > @@ -66,6 +66,7 @@ typedef enum WakeupReason {
> > > > > > > > > >      QEMU_WAKEUP_REASON_OTHER,
> > > > > > > > > >  } WakeupReason;
> > > > > > > > > >  
> > > > > > > > > > +void qemu_exit_preconfig_request(void);
> > > > > > > > > >  void qemu_system_reset_request(ShutdownCause reason);
> > > > > > > > > >  void qemu_system_suspend_request(void);
> > > > > > > > > >  void qemu_register_suspend_notifier(Notifier *notifier);
> > > > > > > > > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > > > > > > > > index 39225ae..bd44db8 100644
> > > > > > > > > > --- a/qemu-options.hx
> > > > > > > > > > +++ b/qemu-options.hx
> > > > > > > > > > @@ -3498,6 +3498,21 @@ STEXI
> > > > > > > > > >  Run the emulation in single step mode.
> > > > > > > > > >  ETEXI
> > > > > > > > > >  
> > > > > > > > > > +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> > > > > > > > > > +    "-paused [state=]postconf|preconf\n"
> > > > > > > > > > +    "                postconf: pause QEMU after machine is initialized\n"
> > > > > > > > > > +    "                preconf: pause QEMU before machine is initialized\n",
> > > > > > > > > > +    QEMU_ARCH_ALL)      
> > > > > > > > > 
> > > > > > > > > I would like to allow pausing before machine-type is selected, so
> > > > > > > > > management could run query-machines before choosing a
> > > > > > > > > machine-type.  Would that need a third "-pause" mode, or will we
> > > > > > > > > be able to change "preconf" to pause before select_machine() is
> > > > > > > > > called?
> > > > > > > > > 
> > > > > > > > > The same probably applies to other things initialized before
> > > > > > > > > machine_run_board_init() that could be configurable using QMP,
> > > > > > > > > including but not limited to:
> > > > > > > > > * Accelerator configuration
> > > > > > > > > * Registering global properties
> > > > > > > > > * RAM size
> > > > > > > > > * SMP/CPU configuration      
> > > > > > > > 
> > > > > > > > Yeah.. having a bunch of different possible pause stages to select
> > > > > > > > doesn't sound great.      
> > > > > > > 
> > > > > > > I agree.  The number of externally visible pause states should be
> > > > > > > as small as possible.
> > > > > > > 
> > > > > > >       
> > > > > > > >                       Could we avoid this by instead changing -S to
> > > > > > > > pause at the earliest possible spot, but having any monitor commands
> > > > > > > > that require a later stage automatically "fast forwarding" to the
> > > > > > > > right phase?      
> > > > > > > 
> > > > > > > That would hide the internal details from the outside.  Sounds
> > > > > > > nice, but adding new machine/device configuration QMP commands
> > > > > > > while hiding the QEMU state from the outside sounds impossible.
> > > > > > > 
> > > > > > > For example, if we use -S today, this works:
> > > > > > > 
> > > > > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > > > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}      
> > > > > > >   -> {"execute":"qmp_capabilities"}      
> > > > > > >   <- {"return": {}}      
> > > > > > >   -> {"execute":"query-cpus"}      
> > > > > > >   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}
> > > > > > > 
> > > > > > > This means "query-cpus" needs to fast-forward to the CPU creation
> > > > > > > stage if we want to keep compatibility.
> > > > > > > 
> > > > > > > Now, assume we add a set-numa-node command like the one in this
> > > > > > > series.  e.g.:
> > > > > > > 
> > > > > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > > > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}      
> > > > > > >   -> {"execute":"qmp_capabilities"}      
> > > > > > >   <- {"return": {}}      
> > > > > > >   -> {"execute":"set-numa-node" ... }      
> > > > > > >   <- {"return": ...}
> > > > > > > 
> > > > > > > The command will work only if machine initialization didn't run
> > > > > > > yet.
> > > > > > > 
> > > > > > > But now an innocent-looking query command would change QEMU state
> > > > > > > in an unexpected way:
> > > > > > > 
> > > > > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > > > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}      
> > > > > > >   -> {"execute":"qmp_capabilities"}      
> > > > > > >   <- {"return": {}}      
> > > > > > >   -> {"execute":"query-cpus"}  [will silently fast-forward QEMU state]      
> > > > > > >   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}      
> > > > > > >   -> {"execute":"set-numa-node" ... }      
> > > > > > >   <- {"error": ...}  [the command will fail because the machine was already created]
> > > > > > > 
> > > > > > > This means we do have a externally visible "too late to use
> > > > > > > set-numa-node" QEMU state, and query-cpus will have a externally
> > > > > > > visible side effect.  Every QMP command would need to document
> > > > > > > how it affects QEMU state in a externally visible way.
> > > > > > > 
> > > > > > > If QEMU pause state is still going to be externally visible this
> > > > > > > way, I would prefer to let the client to explicitly tell what's
> > > > > > > the state they want QEMU to be, instead of making QEMU change
> > > > > > > state silently as a side effect of QMP commands.      
> > > > > > 
> > > > > > Yeah, good point.  My proposal would just have changed explicitly
> > > > > > exposed ugly internal state to subtly exposed ugly internal state,
> > > > > > which is probably worse :(.
> > > > > > 
> > > > > > 
> > > > > > Ok.. next possibly bad idea..
> > > > > > 
> > > > > > What about a "re-exec" monitor command; it would take what's
> > > > > > essentially a new command line, and basically restart qemu from the
> > > > > > beginning, reparsing this new command line, but without actually 
> > > > > > 
> > > > > > Pro:
> > > > > >   * Mitigates Daniel Berrange's concern about lots of qemu
> > > > > >     configuration being buried in the qmp session - if libvirt logged
> > > > > >     its last "re-exec" that would have what is generally needed.
> > > > > >   * Lets libvirt do assorted investigation of options, then rewind to
> > > > > >     choose what it actually wants      
> > > > > 
> > > > > Sounds like a superset of Paolo's "-machine none" proposal[1].
> > > > > It would be a very simple interface, not sure it can be easily
> > > > > implemented efficiently.
> > > > > 
> > > > > [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg488618.html
> > > > >     
> > > > > > 
> > > > > > Con:
> > > > > >   * Would require a bunch of auditing of structures/state to make sure
> > > > > >     they can be re-initialized cleanly      
> > > > > 
> > > > > This sounds like a big obstacle.  QEMU still have too much global
> > > > > state outside the machine/qdev tree.
> > > > > 
> > > > >     
> > > > > >   * Would it be fast enough for libvirt to use?  Do we know if the
> > > > > >     slowness which makes multiple qemu invocations by libvirt
> > > > > >     unattractive is from the kernel/libc/ldso overhead, or from qemu's
> > > > > >     internal start up processing?      
> > > > > 
> > > > > My gut feeling is that this could be too slow, if the scope of
> > > > > "re-exec" is too big.
> > > > > 
> > > > > 
> > > > > Now, let me try to go to the opposite extreme: I think you had a
> > > > > good point in your previous proposal.  Why should we need to
> > > > > restart/re-execute anything at all just because some bit of
> > > > > configuration is being changed by libvirt?  Why commands like
> > > > > set-numa-node should require QEMU to be in a state that is not
> > > > > covered by -S?  If the guest is not running yet, there should be
> > > > > no reason to require clients to explicitly pause/continue/restart
> > > > > anything.    
> > > > It's probably doable to do numa config at '-S' time for x86 (arm),
> > > > since ACPI tables are regenerated on the first read (legacy fw_cfg
> > > > would be a little problematic but probably could be 'fixed' as well)
> > > > 
> > > > But I can't say outright if it's doable for other targets,
> > > > in general issue here is that '-S' pauses after machine_done is run
> > > > and all necessary wiring board requires is finalized by then
> > > > and no hooks run after unpause.
> > > > If there is a general consensus to go this route, I can invest
> > > > some time in making it work (then this series could be dropped)
> > > > 
> > > > Even so, postponing set-numa to '-S' won't address Daniel's concern,
> > > > i.e. configuration would take several round trips of command to complete
> > > > potentially oven slow network. But as it was said libvirt can cache
> > > > new CLI options for further reuse.    
> > > 
> > > We can cache stuff from the generic "-m none" invokation, but we won't
> > > cache stuff from invokation of a specific VM instance, because we can't
> > > have confidence that such data is independant of the VM config. So we  
> > In case if cpu layout we have fixed set of options that influence it
> > (-M foo_vXX -smp ...),  so from QEMU side it should be possible to
> > promise it would stay stable.
> > But such caching would be useful in other use cases as well.
> > Is the issue in invalidating cached data in case of option(s) would
> > change cached data?  
> 
> For the caching to be useful, we need to have a good cache hit rate.
> If the cache depends on alot of different CLI args, then you're going
> to have to populate many caches each with low hit rate. The current
> caching is done based on QEMU/libvirtd binary, so we have 1 cache miss
> when QEMU or libvirt are upgraded, then 100% cache hit thereafter, so
> the cache is very effective.
With per domain cache one could also have about 100% hit rate every time
the domain is started in case a new option does not invalidate cache.

In case of cpu layout it will remove need for query-hotpluggble-cpus
every time VM is started (when cpu hotplug is enabled) which libvirt
does now.

...
> 
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-23  8:45                     ` Igor Mammedov
@ 2017-10-25  6:57                       ` Eduardo Habkost
  2017-10-25  7:02                         ` Daniel P. Berrange
  0 siblings, 1 reply; 93+ messages in thread
From: Eduardo Habkost @ 2017-10-25  6:57 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: David Gibson, Paolo Bonzini, Daniel P. Berrange, peter.maydell,
	pkrempa, cohuck, qemu-devel, armbru

On Mon, Oct 23, 2017 at 10:45:41AM +0200, Igor Mammedov wrote:
> On Fri, 20 Oct 2017 17:53:09 -0200
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Fri, Oct 20, 2017 at 12:21:30PM +1100, David Gibson wrote:
> > > On Thu, Oct 19, 2017 at 02:23:04PM +0200, Paolo Bonzini wrote:  
> > > > On 19/10/2017 13:49, David Gibson wrote:  
> > > > > Note that describing socket/core/thread tuples as arch independent (or
> > > > > even machine independent) is.. debatable.  I mean it's flexible enough
> > > > > that most platforms can be fit to that scheme without too much
> > > > > straining.  But, there's no arch independent way of defining what each
> > > > > level means in terms of its properties.
> > > > > 
> > > > > So, for example, on spapr - being paravirt - there's no real
> > > > > distinction between cores and sockets, how you divide them up is
> > > > > completely arbitrary.  
> > > > 
> > > > Same on x86, actually.
> > > > 
> > > > It's _common_ that cores on the same socket share L3 cache and that a
> > > > socket spans an integer number of NUMA nodes, but it doesn't have to be
> > > > that way.
> > > > 
> > > > QEMU currently enforces the former (if it tells the guest at all that
> > > > there is an L3 cache), but not the latter.  
> > > 
> > > Ok.  Correct me if I'm wrong, but doesn't ACPI describe the NUMA
> > > architecture in terms of this thread/core/socket heirarchy?  That's
> > > not true for PAPR, where the NUMA topology is described in an
> > > independent set of (potentially arbitrarily nested) nodes.  
> > 
> > On PC, ACPI NUMA information only refer to CPU APIC IDs, which
> > identify individual CPU threads; it doesn't care about CPU
> > socket/core/thread topology.  If I'm not mistaken, the
> > socket/core/thread topology is not represented in ACPI at all.
> ACPI does node mapping per logical cpu (thread) in SRAT table,
> so virtually we are able to describe insane configurations.
> That however doesn't mean that we should go outside of
> what real hw does and confuse guest which may have certain
> expectations.

Agreed.

> 
> Currently for x86 expectations are that cpus are mapped to numa
> nodes either by whole cores or whole sockets (AMD and Intel cpus
> respectively). In future it might change.
> 
> 
> > Some guest OSes, however, may get very confused if they see an
> > unexpected NUMA/CPU topology.  IIRC, it was possible to make old
> > Linux kernel versions panic by generating a weird topology.
> 
> There where bugs that where fixed on QEMU or guest kernel side
> when unexpected mapping were present. While we can 'fix' guest
> expectation in linux kernel it might be not possible for other
> OSes one more reason we shouldn't allow blind assignment by mgmt.

One problem with blocking arbitrary assignment is the possibility
of breaking existing VM configurations.  We could enforce the new
rules only on newer machine-types, although this means an
existing VM configuration may stop being runnable after updating
the machine-type.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-25  6:57                       ` Eduardo Habkost
@ 2017-10-25  7:02                         ` Daniel P. Berrange
  2017-10-25 13:37                           ` Eduardo Habkost
  0 siblings, 1 reply; 93+ messages in thread
From: Daniel P. Berrange @ 2017-10-25  7:02 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Igor Mammedov, David Gibson, Paolo Bonzini, peter.maydell,
	pkrempa, cohuck, qemu-devel, armbru

On Wed, Oct 25, 2017 at 08:57:43AM +0200, Eduardo Habkost wrote:
> On Mon, Oct 23, 2017 at 10:45:41AM +0200, Igor Mammedov wrote:
> > On Fri, 20 Oct 2017 17:53:09 -0200
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > 
> > > On Fri, Oct 20, 2017 at 12:21:30PM +1100, David Gibson wrote:
> > > > On Thu, Oct 19, 2017 at 02:23:04PM +0200, Paolo Bonzini wrote:  
> > > > > On 19/10/2017 13:49, David Gibson wrote:  
> > > > > > Note that describing socket/core/thread tuples as arch independent (or
> > > > > > even machine independent) is.. debatable.  I mean it's flexible enough
> > > > > > that most platforms can be fit to that scheme without too much
> > > > > > straining.  But, there's no arch independent way of defining what each
> > > > > > level means in terms of its properties.
> > > > > > 
> > > > > > So, for example, on spapr - being paravirt - there's no real
> > > > > > distinction between cores and sockets, how you divide them up is
> > > > > > completely arbitrary.  
> > > > > 
> > > > > Same on x86, actually.
> > > > > 
> > > > > It's _common_ that cores on the same socket share L3 cache and that a
> > > > > socket spans an integer number of NUMA nodes, but it doesn't have to be
> > > > > that way.
> > > > > 
> > > > > QEMU currently enforces the former (if it tells the guest at all that
> > > > > there is an L3 cache), but not the latter.  
> > > > 
> > > > Ok.  Correct me if I'm wrong, but doesn't ACPI describe the NUMA
> > > > architecture in terms of this thread/core/socket heirarchy?  That's
> > > > not true for PAPR, where the NUMA topology is described in an
> > > > independent set of (potentially arbitrarily nested) nodes.  
> > > 
> > > On PC, ACPI NUMA information only refer to CPU APIC IDs, which
> > > identify individual CPU threads; it doesn't care about CPU
> > > socket/core/thread topology.  If I'm not mistaken, the
> > > socket/core/thread topology is not represented in ACPI at all.
> > ACPI does node mapping per logical cpu (thread) in SRAT table,
> > so virtually we are able to describe insane configurations.
> > That however doesn't mean that we should go outside of
> > what real hw does and confuse guest which may have certain
> > expectations.
> 
> Agreed.
> 
> > 
> > Currently for x86 expectations are that cpus are mapped to numa
> > nodes either by whole cores or whole sockets (AMD and Intel cpus
> > respectively). In future it might change.
> > 
> > 
> > > Some guest OSes, however, may get very confused if they see an
> > > unexpected NUMA/CPU topology.  IIRC, it was possible to make old
> > > Linux kernel versions panic by generating a weird topology.
> > 
> > There where bugs that where fixed on QEMU or guest kernel side
> > when unexpected mapping were present. While we can 'fix' guest
> > expectation in linux kernel it might be not possible for other
> > OSes one more reason we shouldn't allow blind assignment by mgmt.
> 
> One problem with blocking arbitrary assignment is the possibility
> of breaking existing VM configurations.  We could enforce the new
> rules only on newer machine-types, although this means an
> existing VM configuration may stop being runnable after updating
> the machine-type.

We should also be wary of blocking something just because some guest OS
are unhappy. Other guest OS may be perfectly OK with the configuration
and shouldn't be prevented from using it if their admin wants it.

IOW, we should only consider blocking things that are disallowed
by relevant specs, or would impose functional or security problems
in the host. If it is merely that some guest OS are unhappy with
certain configs, that's just a docs problem (eg Windows won't use
more than 2 sockets in many versions, but we shouldn't block use
of more than 2 sockets of course).


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-23  9:49             ` Igor Mammedov
  2017-10-23  9:53               ` Daniel P. Berrange
@ 2017-10-25 10:35               ` Eduardo Habkost
  1 sibling, 0 replies; 93+ messages in thread
From: Eduardo Habkost @ 2017-10-25 10:35 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: David Gibson, qemu-devel, eblake, armbru, pkrempa, peter.maydell,
	pbonzini, cohuck

On Mon, Oct 23, 2017 at 11:49:13AM +0200, Igor Mammedov wrote:
> On Fri, 20 Oct 2017 12:21:00 -0200
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Fri, Oct 20, 2017 at 12:19:17PM +1100, David Gibson wrote:
> > > On Thu, Oct 19, 2017 at 10:15:48PM -0200, Eduardo Habkost wrote:  
> > > > On Thu, Oct 19, 2017 at 09:42:18PM +1100, David Gibson wrote:  
> > > > > On Mon, Oct 16, 2017 at 02:59:16PM -0200, Eduardo Habkost wrote:  
> > > > > > On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:  
> > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > > > > ---
> > > > > > >  include/sysemu/sysemu.h |  1 +
> > > > > > >  qemu-options.hx         | 15 ++++++++++++++
> > > > > > >  qmp.c                   |  5 +++++
> > > > > > >  vl.c                    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++-
> > > > > > >  4 files changed, 74 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > > > > > > index b213696..3feb94f 100644
> > > > > > > --- a/include/sysemu/sysemu.h
> > > > > > > +++ b/include/sysemu/sysemu.h
> > > > > > > @@ -66,6 +66,7 @@ typedef enum WakeupReason {
> > > > > > >      QEMU_WAKEUP_REASON_OTHER,
> > > > > > >  } WakeupReason;
> > > > > > >  
> > > > > > > +void qemu_exit_preconfig_request(void);
> > > > > > >  void qemu_system_reset_request(ShutdownCause reason);
> > > > > > >  void qemu_system_suspend_request(void);
> > > > > > >  void qemu_register_suspend_notifier(Notifier *notifier);
> > > > > > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > > > > > index 39225ae..bd44db8 100644
> > > > > > > --- a/qemu-options.hx
> > > > > > > +++ b/qemu-options.hx
> > > > > > > @@ -3498,6 +3498,21 @@ STEXI
> > > > > > >  Run the emulation in single step mode.
> > > > > > >  ETEXI
> > > > > > >  
> > > > > > > +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> > > > > > > +    "-paused [state=]postconf|preconf\n"
> > > > > > > +    "                postconf: pause QEMU after machine is initialized\n"
> > > > > > > +    "                preconf: pause QEMU before machine is initialized\n",
> > > > > > > +    QEMU_ARCH_ALL)  
> > > > > > 
> > > > > > I would like to allow pausing before machine-type is selected, so
> > > > > > management could run query-machines before choosing a
> > > > > > machine-type.  Would that need a third "-pause" mode, or will we
> > > > > > be able to change "preconf" to pause before select_machine() is
> > > > > > called?
> > > > > > 
> > > > > > The same probably applies to other things initialized before
> > > > > > machine_run_board_init() that could be configurable using QMP,
> > > > > > including but not limited to:
> > > > > > * Accelerator configuration
> > > > > > * Registering global properties
> > > > > > * RAM size
> > > > > > * SMP/CPU configuration  
> > > > > 
> > > > > Yeah.. having a bunch of different possible pause stages to select
> > > > > doesn't sound great.  
> > > > 
> > > > I agree.  The number of externally visible pause states should be
> > > > as small as possible.
> > > > 
> > > >   
> > > > >                       Could we avoid this by instead changing -S to
> > > > > pause at the earliest possible spot, but having any monitor commands
> > > > > that require a later stage automatically "fast forwarding" to the
> > > > > right phase?  
> > > > 
> > > > That would hide the internal details from the outside.  Sounds
> > > > nice, but adding new machine/device configuration QMP commands
> > > > while hiding the QEMU state from the outside sounds impossible.
> > > > 
> > > > For example, if we use -S today, this works:
> > > > 
> > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}  
> > > >   -> {"execute":"qmp_capabilities"}  
> > > >   <- {"return": {}}  
> > > >   -> {"execute":"query-cpus"}  
> > > >   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}
> > > > 
> > > > This means "query-cpus" needs to fast-forward to the CPU creation
> > > > stage if we want to keep compatibility.
> > > > 
> > > > Now, assume we add a set-numa-node command like the one in this
> > > > series.  e.g.:
> > > > 
> > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}  
> > > >   -> {"execute":"qmp_capabilities"}  
> > > >   <- {"return": {}}  
> > > >   -> {"execute":"set-numa-node" ... }  
> > > >   <- {"return": ...}
> > > > 
> > > > The command will work only if machine initialization didn't run
> > > > yet.
> > > > 
> > > > But now an innocent-looking query command would change QEMU state
> > > > in an unexpected way:
> > > > 
> > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}  
> > > >   -> {"execute":"qmp_capabilities"}  
> > > >   <- {"return": {}}  
> > > >   -> {"execute":"query-cpus"}  [will silently fast-forward QEMU state]  
> > > >   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}  
> > > >   -> {"execute":"set-numa-node" ... }  
> > > >   <- {"error": ...}  [the command will fail because the machine was already created]
> > > > 
> > > > This means we do have a externally visible "too late to use
> > > > set-numa-node" QEMU state, and query-cpus will have a externally
> > > > visible side effect.  Every QMP command would need to document
> > > > how it affects QEMU state in a externally visible way.
> > > > 
> > > > If QEMU pause state is still going to be externally visible this
> > > > way, I would prefer to let the client to explicitly tell what's
> > > > the state they want QEMU to be, instead of making QEMU change
> > > > state silently as a side effect of QMP commands.  
> > > 
> > > Yeah, good point.  My proposal would just have changed explicitly
> > > exposed ugly internal state to subtly exposed ugly internal state,
> > > which is probably worse :(.
> > > 
> > > 
> > > Ok.. next possibly bad idea..
> > > 
> > > What about a "re-exec" monitor command; it would take what's
> > > essentially a new command line, and basically restart qemu from the
> > > beginning, reparsing this new command line, but without actually 
> > > 
> > > Pro:
> > >   * Mitigates Daniel Berrange's concern about lots of qemu
> > >     configuration being buried in the qmp session - if libvirt logged
> > >     its last "re-exec" that would have what is generally needed.
> > >   * Lets libvirt do assorted investigation of options, then rewind to
> > >     choose what it actually wants  
> > 
> > Sounds like a superset of Paolo's "-machine none" proposal[1].
> > It would be a very simple interface, not sure it can be easily
> > implemented efficiently.
> > 
> > [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg488618.html
> > 
> > > 
> > > Con:
> > >   * Would require a bunch of auditing of structures/state to make sure
> > >     they can be re-initialized cleanly  
> > 
> > This sounds like a big obstacle.  QEMU still have too much global
> > state outside the machine/qdev tree.
> > 
> > 
> > >   * Would it be fast enough for libvirt to use?  Do we know if the
> > >     slowness which makes multiple qemu invocations by libvirt
> > >     unattractive is from the kernel/libc/ldso overhead, or from qemu's
> > >     internal start up processing?  
> > 
> > My gut feeling is that this could be too slow, if the scope of
> > "re-exec" is too big.
> > 
> > 
> > Now, let me try to go to the opposite extreme: I think you had a
> > good point in your previous proposal.  Why should we need to
> > restart/re-execute anything at all just because some bit of
> > configuration is being changed by libvirt?  Why commands like
> > set-numa-node should require QEMU to be in a state that is not
> > covered by -S?  If the guest is not running yet, there should be
> > no reason to require clients to explicitly pause/continue/restart
> > anything.
> It's probably doable to do numa config at '-S' time for x86 (arm),
> since ACPI tables are regenerated on the first read (legacy fw_cfg
> would be a little problematic but probably could be 'fixed' as well)
> 
> But I can't say outright if it's doable for other targets,
> in general issue here is that '-S' pauses after machine_done is run
> and all necessary wiring board requires is finalized by then
> and no hooks run after unpause.
> If there is a general consensus to go this route, I can invest
> some time in making it work (then this series could be dropped)

My argument is that it must be always possible to change
configuration using -S (before issuing a 'cont' command), because
the guest is not running at all.  If current QEMU code makes that
difficult, we should address it internally in QEMU.


> 
> Even so, postponing set-numa to '-S' won't address Daniel's concern,
> i.e. configuration would take several round trips of command to complete
> potentially oven slow network. But as it was said libvirt can cache
> new CLI options for further reuse.
> Whether is slower/faster than starting qemu with '-M foo -smp ...' +
> querying layout and then restarting it again with -numa options
> would depend on network speed.

True, my argument doesn't address that concern.  But I expect QMP
configuration commands to be always done through a local socket,
so this is just about the added latency for local QMP round
trips.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] CLI: add -paused option
  2017-10-23 11:18                     ` Igor Mammedov
@ 2017-10-25 10:52                       ` Eduardo Habkost
  0 siblings, 0 replies; 93+ messages in thread
From: Eduardo Habkost @ 2017-10-25 10:52 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Daniel P. Berrange, peter.maydell, pkrempa, cohuck, armbru,
	qemu-devel, pbonzini, David Gibson

On Mon, Oct 23, 2017 at 01:18:30PM +0200, Igor Mammedov wrote:
> On Mon, 23 Oct 2017 11:49:44 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
> > On Mon, Oct 23, 2017 at 12:36:20PM +0200, Igor Mammedov wrote:
> > > On Mon, 23 Oct 2017 10:53:16 +0100
> > > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> > >   
> > > > On Mon, Oct 23, 2017 at 11:49:13AM +0200, Igor Mammedov wrote:  
> > > > > On Fri, 20 Oct 2017 12:21:00 -0200
> > > > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > > >     
> > > > > > On Fri, Oct 20, 2017 at 12:19:17PM +1100, David Gibson wrote:    
> > > > > > > On Thu, Oct 19, 2017 at 10:15:48PM -0200, Eduardo Habkost wrote:      
> > > > > > > > On Thu, Oct 19, 2017 at 09:42:18PM +1100, David Gibson wrote:      
> > > > > > > > > On Mon, Oct 16, 2017 at 02:59:16PM -0200, Eduardo Habkost wrote:      
> > > > > > > > > > On Mon, Oct 16, 2017 at 06:22:54PM +0200, Igor Mammedov wrote:      
> > > > > > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > > > > > > > > ---
> > > > > > > > > > >  include/sysemu/sysemu.h |  1 +
> > > > > > > > > > >  qemu-options.hx         | 15 ++++++++++++++
> > > > > > > > > > >  qmp.c                   |  5 +++++
> > > > > > > > > > >  vl.c                    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++-
> > > > > > > > > > >  4 files changed, 74 insertions(+), 1 deletion(-)
> > > > > > > > > > > 
> > > > > > > > > > > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > > > > > > > > > > index b213696..3feb94f 100644
> > > > > > > > > > > --- a/include/sysemu/sysemu.h
> > > > > > > > > > > +++ b/include/sysemu/sysemu.h
> > > > > > > > > > > @@ -66,6 +66,7 @@ typedef enum WakeupReason {
> > > > > > > > > > >      QEMU_WAKEUP_REASON_OTHER,
> > > > > > > > > > >  } WakeupReason;
> > > > > > > > > > >  
> > > > > > > > > > > +void qemu_exit_preconfig_request(void);
> > > > > > > > > > >  void qemu_system_reset_request(ShutdownCause reason);
> > > > > > > > > > >  void qemu_system_suspend_request(void);
> > > > > > > > > > >  void qemu_register_suspend_notifier(Notifier *notifier);
> > > > > > > > > > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > > > > > > > > > index 39225ae..bd44db8 100644
> > > > > > > > > > > --- a/qemu-options.hx
> > > > > > > > > > > +++ b/qemu-options.hx
> > > > > > > > > > > @@ -3498,6 +3498,21 @@ STEXI
> > > > > > > > > > >  Run the emulation in single step mode.
> > > > > > > > > > >  ETEXI
> > > > > > > > > > >  
> > > > > > > > > > > +DEF("paused", HAS_ARG, QEMU_OPTION_paused, \
> > > > > > > > > > > +    "-paused [state=]postconf|preconf\n"
> > > > > > > > > > > +    "                postconf: pause QEMU after machine is initialized\n"
> > > > > > > > > > > +    "                preconf: pause QEMU before machine is initialized\n",
> > > > > > > > > > > +    QEMU_ARCH_ALL)      
> > > > > > > > > > 
> > > > > > > > > > I would like to allow pausing before machine-type is selected, so
> > > > > > > > > > management could run query-machines before choosing a
> > > > > > > > > > machine-type.  Would that need a third "-pause" mode, or will we
> > > > > > > > > > be able to change "preconf" to pause before select_machine() is
> > > > > > > > > > called?
> > > > > > > > > > 
> > > > > > > > > > The same probably applies to other things initialized before
> > > > > > > > > > machine_run_board_init() that could be configurable using QMP,
> > > > > > > > > > including but not limited to:
> > > > > > > > > > * Accelerator configuration
> > > > > > > > > > * Registering global properties
> > > > > > > > > > * RAM size
> > > > > > > > > > * SMP/CPU configuration      
> > > > > > > > > 
> > > > > > > > > Yeah.. having a bunch of different possible pause stages to select
> > > > > > > > > doesn't sound great.      
> > > > > > > > 
> > > > > > > > I agree.  The number of externally visible pause states should be
> > > > > > > > as small as possible.
> > > > > > > > 
> > > > > > > >       
> > > > > > > > >                       Could we avoid this by instead changing -S to
> > > > > > > > > pause at the earliest possible spot, but having any monitor commands
> > > > > > > > > that require a later stage automatically "fast forwarding" to the
> > > > > > > > > right phase?      
> > > > > > > > 
> > > > > > > > That would hide the internal details from the outside.  Sounds
> > > > > > > > nice, but adding new machine/device configuration QMP commands
> > > > > > > > while hiding the QEMU state from the outside sounds impossible.
> > > > > > > > 
> > > > > > > > For example, if we use -S today, this works:
> > > > > > > > 
> > > > > > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > > > > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}      
> > > > > > > >   -> {"execute":"qmp_capabilities"}      
> > > > > > > >   <- {"return": {}}      
> > > > > > > >   -> {"execute":"query-cpus"}      
> > > > > > > >   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}
> > > > > > > > 
> > > > > > > > This means "query-cpus" needs to fast-forward to the CPU creation
> > > > > > > > stage if we want to keep compatibility.
> > > > > > > > 
> > > > > > > > Now, assume we add a set-numa-node command like the one in this
> > > > > > > > series.  e.g.:
> > > > > > > > 
> > > > > > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > > > > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}      
> > > > > > > >   -> {"execute":"qmp_capabilities"}      
> > > > > > > >   <- {"return": {}}      
> > > > > > > >   -> {"execute":"set-numa-node" ... }      
> > > > > > > >   <- {"return": ...}
> > > > > > > > 
> > > > > > > > The command will work only if machine initialization didn't run
> > > > > > > > yet.
> > > > > > > > 
> > > > > > > > But now an innocent-looking query command would change QEMU state
> > > > > > > > in an unexpected way:
> > > > > > > > 
> > > > > > > >   $ qemu-system-x86_64 -S -qmp stdio
> > > > > > > >   <- {"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": " (v2.10.0-83-g9375da7831)"}, "capabilities": []}}      
> > > > > > > >   -> {"execute":"qmp_capabilities"}      
> > > > > > > >   <- {"return": {}}      
> > > > > > > >   -> {"execute":"query-cpus"}  [will silently fast-forward QEMU state]      
> > > > > > > >   <- {"return": [{"arch": "x86", "current": true, "props": {"core-id": 0, "thread-id": 0, "socket-id": 0}, "CPU": 0, "qom_path": "/machine/unattached/device[0]", "pc": 4294967280, "halted": false, "thread_id": 4038}]}      
> > > > > > > >   -> {"execute":"set-numa-node" ... }      
> > > > > > > >   <- {"error": ...}  [the command will fail because the machine was already created]
> > > > > > > > 
> > > > > > > > This means we do have a externally visible "too late to use
> > > > > > > > set-numa-node" QEMU state, and query-cpus will have a externally
> > > > > > > > visible side effect.  Every QMP command would need to document
> > > > > > > > how it affects QEMU state in a externally visible way.
> > > > > > > > 
> > > > > > > > If QEMU pause state is still going to be externally visible this
> > > > > > > > way, I would prefer to let the client to explicitly tell what's
> > > > > > > > the state they want QEMU to be, instead of making QEMU change
> > > > > > > > state silently as a side effect of QMP commands.      
> > > > > > > 
> > > > > > > Yeah, good point.  My proposal would just have changed explicitly
> > > > > > > exposed ugly internal state to subtly exposed ugly internal state,
> > > > > > > which is probably worse :(.
> > > > > > > 
> > > > > > > 
> > > > > > > Ok.. next possibly bad idea..
> > > > > > > 
> > > > > > > What about a "re-exec" monitor command; it would take what's
> > > > > > > essentially a new command line, and basically restart qemu from the
> > > > > > > beginning, reparsing this new command line, but without actually 
> > > > > > > 
> > > > > > > Pro:
> > > > > > >   * Mitigates Daniel Berrange's concern about lots of qemu
> > > > > > >     configuration being buried in the qmp session - if libvirt logged
> > > > > > >     its last "re-exec" that would have what is generally needed.
> > > > > > >   * Lets libvirt do assorted investigation of options, then rewind to
> > > > > > >     choose what it actually wants      
> > > > > > 
> > > > > > Sounds like a superset of Paolo's "-machine none" proposal[1].
> > > > > > It would be a very simple interface, not sure it can be easily
> > > > > > implemented efficiently.
> > > > > > 
> > > > > > [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg488618.html
> > > > > >     
> > > > > > > 
> > > > > > > Con:
> > > > > > >   * Would require a bunch of auditing of structures/state to make sure
> > > > > > >     they can be re-initialized cleanly      
> > > > > > 
> > > > > > This sounds like a big obstacle.  QEMU still have too much global
> > > > > > state outside the machine/qdev tree.
> > > > > > 
> > > > > >     
> > > > > > >   * Would it be fast enough for libvirt to use?  Do we know if the
> > > > > > >     slowness which makes multiple qemu invocations by libvirt
> > > > > > >     unattractive is from the kernel/libc/ldso overhead, or from qemu's
> > > > > > >     internal start up processing?      
> > > > > > 
> > > > > > My gut feeling is that this could be too slow, if the scope of
> > > > > > "re-exec" is too big.
> > > > > > 
> > > > > > 
> > > > > > Now, let me try to go to the opposite extreme: I think you had a
> > > > > > good point in your previous proposal.  Why should we need to
> > > > > > restart/re-execute anything at all just because some bit of
> > > > > > configuration is being changed by libvirt?  Why commands like
> > > > > > set-numa-node should require QEMU to be in a state that is not
> > > > > > covered by -S?  If the guest is not running yet, there should be
> > > > > > no reason to require clients to explicitly pause/continue/restart
> > > > > > anything.    
> > > > > It's probably doable to do numa config at '-S' time for x86 (arm),
> > > > > since ACPI tables are regenerated on the first read (legacy fw_cfg
> > > > > would be a little problematic but probably could be 'fixed' as well)
> > > > > 
> > > > > But I can't say outright if it's doable for other targets,
> > > > > in general issue here is that '-S' pauses after machine_done is run
> > > > > and all necessary wiring board requires is finalized by then
> > > > > and no hooks run after unpause.
> > > > > If there is a general consensus to go this route, I can invest
> > > > > some time in making it work (then this series could be dropped)
> > > > > 
> > > > > Even so, postponing set-numa to '-S' won't address Daniel's concern,
> > > > > i.e. configuration would take several round trips of command to complete
> > > > > potentially oven slow network. But as it was said libvirt can cache
> > > > > new CLI options for further reuse.    
> > > > 
> > > > We can cache stuff from the generic "-m none" invokation, but we won't
> > > > cache stuff from invokation of a specific VM instance, because we can't
> > > > have confidence that such data is independant of the VM config. So we  
> > > In case if cpu layout we have fixed set of options that influence it
> > > (-M foo_vXX -smp ...),  so from QEMU side it should be possible to
> > > promise it would stay stable.
> > > But such caching would be useful in other use cases as well.
> > > Is the issue in invalidating cached data in case of option(s) would
> > > change cached data?  
> > 
> > For the caching to be useful, we need to have a good cache hit rate.
> > If the cache depends on alot of different CLI args, then you're going
> > to have to populate many caches each with low hit rate. The current
> > caching is done based on QEMU/libvirtd binary, so we have 1 cache miss
> > when QEMU or libvirt are upgraded, then 100% cache hit thereafter, so
> > the cache is very effective.
> With per domain cache one could also have about 100% hit rate every time
> the domain is started in case a new option does not invalidate cache.

Single-use VMs is an use case libvirt cares about, and in that
case the hit rate would be 0%.

...unless we specify more complex caching rules for
query-hotpluggable-cpus, which IMO would be more complex and
error-prone than simply allowing predictable
socket-index/core-index/thread-index values to identify CPU
slots.

(But, is the latency added by 2 or 3 QMP commands really an issue
here?)

> 
> In case of cpu layout it will remove need for query-hotpluggble-cpus
> every time VM is started (when cpu hotplug is enabled) which libvirt
> does now.
> 
> ...
> > 
> > Regards,
> > Daniel
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
  2017-10-25  7:02                         ` Daniel P. Berrange
@ 2017-10-25 13:37                           ` Eduardo Habkost
  0 siblings, 0 replies; 93+ messages in thread
From: Eduardo Habkost @ 2017-10-25 13:37 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Igor Mammedov, David Gibson, Paolo Bonzini, peter.maydell,
	pkrempa, cohuck, qemu-devel, armbru

On Wed, Oct 25, 2017 at 08:02:06AM +0100, Daniel P. Berrange wrote:
> On Wed, Oct 25, 2017 at 08:57:43AM +0200, Eduardo Habkost wrote:
> > On Mon, Oct 23, 2017 at 10:45:41AM +0200, Igor Mammedov wrote:
> > > On Fri, 20 Oct 2017 17:53:09 -0200
> > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > 
> > > > On Fri, Oct 20, 2017 at 12:21:30PM +1100, David Gibson wrote:
> > > > > On Thu, Oct 19, 2017 at 02:23:04PM +0200, Paolo Bonzini wrote:  
> > > > > > On 19/10/2017 13:49, David Gibson wrote:  
> > > > > > > Note that describing socket/core/thread tuples as arch independent (or
> > > > > > > even machine independent) is.. debatable.  I mean it's flexible enough
> > > > > > > that most platforms can be fit to that scheme without too much
> > > > > > > straining.  But, there's no arch independent way of defining what each
> > > > > > > level means in terms of its properties.
> > > > > > > 
> > > > > > > So, for example, on spapr - being paravirt - there's no real
> > > > > > > distinction between cores and sockets, how you divide them up is
> > > > > > > completely arbitrary.  
> > > > > > 
> > > > > > Same on x86, actually.
> > > > > > 
> > > > > > It's _common_ that cores on the same socket share L3 cache and that a
> > > > > > socket spans an integer number of NUMA nodes, but it doesn't have to be
> > > > > > that way.
> > > > > > 
> > > > > > QEMU currently enforces the former (if it tells the guest at all that
> > > > > > there is an L3 cache), but not the latter.  
> > > > > 
> > > > > Ok.  Correct me if I'm wrong, but doesn't ACPI describe the NUMA
> > > > > architecture in terms of this thread/core/socket heirarchy?  That's
> > > > > not true for PAPR, where the NUMA topology is described in an
> > > > > independent set of (potentially arbitrarily nested) nodes.  
> > > > 
> > > > On PC, ACPI NUMA information only refer to CPU APIC IDs, which
> > > > identify individual CPU threads; it doesn't care about CPU
> > > > socket/core/thread topology.  If I'm not mistaken, the
> > > > socket/core/thread topology is not represented in ACPI at all.
> > > ACPI does node mapping per logical cpu (thread) in SRAT table,
> > > so virtually we are able to describe insane configurations.
> > > That however doesn't mean that we should go outside of
> > > what real hw does and confuse guest which may have certain
> > > expectations.
> > 
> > Agreed.
> > 
> > > 
> > > Currently for x86 expectations are that cpus are mapped to numa
> > > nodes either by whole cores or whole sockets (AMD and Intel cpus
> > > respectively). In future it might change.
> > > 
> > > 
> > > > Some guest OSes, however, may get very confused if they see an
> > > > unexpected NUMA/CPU topology.  IIRC, it was possible to make old
> > > > Linux kernel versions panic by generating a weird topology.
> > > 
> > > There where bugs that where fixed on QEMU or guest kernel side
> > > when unexpected mapping were present. While we can 'fix' guest
> > > expectation in linux kernel it might be not possible for other
> > > OSes one more reason we shouldn't allow blind assignment by mgmt.
> > 
> > One problem with blocking arbitrary assignment is the possibility
> > of breaking existing VM configurations.  We could enforce the new
> > rules only on newer machine-types, although this means an
> > existing VM configuration may stop being runnable after updating
> > the machine-type.
> 
> We should also be wary of blocking something just because some guest OS
> are unhappy. Other guest OS may be perfectly OK with the configuration
> and shouldn't be prevented from using it if their admin wants it.
> 
> IOW, we should only consider blocking things that are disallowed
> by relevant specs, or would impose functional or security problems
> in the host. If it is merely that some guest OS are unhappy with
> certain configs, that's just a docs problem (eg Windows won't use
> more than 2 sockets in many versions, but we shouldn't block use
> of more than 2 sockets of course).

I agree with this for things that only some guests are unhappy
with, but I'm wary of allowing something that is not expected to
work on any guest and is known to cause issues just because it's
not forbidden by the spec.  Supporting things that are actually
useful and supported by guest OSes is already hard enough.

For new features, I'd rather be conservative and allow only
configurations that are expected to work.  We can always update
QEMU later to allow something that wasn't allowed before.

For existing features like thread-level NUMA binding, the best
solution is not always obvious because people may be relying on
them.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC v2 3/6] possible_cpus: add CPUArchId::type field
  2017-10-19  6:31     ` David Gibson
@ 2017-10-31 14:01       ` Igor Mammedov
  2017-11-06 18:02         ` Eduardo Habkost
  0 siblings, 1 reply; 93+ messages in thread
From: Igor Mammedov @ 2017-10-31 14:01 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, peter.maydell, pkrempa, ehabkost, pbonzini, drjones, cohuck

On Thu, 19 Oct 2017 17:31:51 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Oct 18, 2017 at 01:12:12PM +0200, Igor Mammedov wrote:
> > For enabling early cpu to numa node configuration at runtime
> > qmp_query_hotpluggable_cpus() should provide a list of available
> > cpu slots at early stage, before machine_init() is called and
> > the 1st cpu is created, so that mgmt might be able to call it
> > and use output to set numa mapping.
> > Use MachineClass::possible_cpu_arch_ids() callback to set
> > cpu type info, along with the rest of possible cpu properties,
> > to let machine define which cpu type* will be used.
> > 
> > * for SPAPR it will be a spapr core type and for ARM/s390x/x86
> >   a respective descendant of CPUClass.
> > 
> > Move parse_numa_opts() in vl.c after cpu_model is parsed into
> > cpu_type so that possible_cpu_arch_ids() would know which
> > cpu_type to use during layout initialization.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> > ---
> >   v2:
> >      - fix NULL dereference caused by not initialized
> >        MachineState::cpu_type at the time parse_numa_opts()
> >        were called
> > ---
> >  include/hw/boards.h        |  2 ++
> >  hw/arm/virt.c              |  3 ++-
> >  hw/core/machine.c          | 12 ++++++------
> >  hw/i386/pc.c               |  4 +++-
> >  hw/ppc/spapr.c             | 13 ++++++++-----
> >  hw/s390x/s390-virtio-ccw.c |  1 +
> >  vl.c                       |  3 +--
> >  7 files changed, 23 insertions(+), 15 deletions(-)
> > 
> > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > index 191a5b3..fa21758 100644
> > --- a/include/hw/boards.h
> > +++ b/include/hw/boards.h
> > @@ -80,6 +80,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
> >   * CPUArchId:
> >   * @arch_id - architecture-dependent CPU ID of present or possible CPU  
> 
> I know this isn't really in scope for this patch, but is @arch_id here
> supposed to have meaning defined by the target, or by the machine?
> 
> If it's the machime, it could do with a rename - "arch" means target
> to most people (thanks to Linux).
> 
> If it's the target, it's kind of bogus, because it doesn't necessarily
> have a clear meaning per target - get_arch_id in CPUClass has the same
> problem, which is probably one reason it's basically only used by the
> x86 code at present.
> 
> e.g. for target/ppc, what do we use?  There's the PIR, which is in the
> CPU.. but only on some cpu models, not all.  There will generally be
> some kind of master PIC id, but there are different PIC models on
> different boards.  What goes in the devicetree?  Well only some
> machines use devicetree, and they might define the cpu reg 
> differently.
> 
> Board designs will generally try to make some if not all of those
> possible values equal for simplicity, but there's still no real way of
> defining a sensible arch_id independent of machine / board.
I'd say arch_id is machine specific so far, it was introduced when we
didn't have CpuInstanceProperties and at that time we considered only
vcpus (threads) and doesn't really apply to spapr cores.

In general we could do away with arch_id and use CpuInstanceProperties
instead, but arch_id also serves aux purpose, it allows machine to
pre-calculate(cache) apic-id/mpidr values in one place and then they
are/(could be) used by arch in-depended code to build acpi tables.
So if we drop arch_id we would need to introduce a machine hook,
which would translate CpuInstanceProperties into current arch_id.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC v2 3/6] possible_cpus: add CPUArchId::type field
  2017-10-31 14:01       ` Igor Mammedov
@ 2017-11-06 18:02         ` Eduardo Habkost
  2017-11-07 15:04           ` Cornelia Huck
  2017-11-09  6:53           ` David Gibson
  0 siblings, 2 replies; 93+ messages in thread
From: Eduardo Habkost @ 2017-11-06 18:02 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: David Gibson, qemu-devel, peter.maydell, pkrempa, pbonzini,
	drjones, cohuck

On Tue, Oct 31, 2017 at 03:01:14PM +0100, Igor Mammedov wrote:
> On Thu, 19 Oct 2017 17:31:51 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, Oct 18, 2017 at 01:12:12PM +0200, Igor Mammedov wrote:
> > > For enabling early cpu to numa node configuration at runtime
> > > qmp_query_hotpluggable_cpus() should provide a list of available
> > > cpu slots at early stage, before machine_init() is called and
> > > the 1st cpu is created, so that mgmt might be able to call it
> > > and use output to set numa mapping.
> > > Use MachineClass::possible_cpu_arch_ids() callback to set
> > > cpu type info, along with the rest of possible cpu properties,
> > > to let machine define which cpu type* will be used.
> > > 
> > > * for SPAPR it will be a spapr core type and for ARM/s390x/x86
> > >   a respective descendant of CPUClass.
> > > 
> > > Move parse_numa_opts() in vl.c after cpu_model is parsed into
> > > cpu_type so that possible_cpu_arch_ids() would know which
> > > cpu_type to use during layout initialization.
> > > 
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> > 
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > 
> > > ---
> > >   v2:
> > >      - fix NULL dereference caused by not initialized
> > >        MachineState::cpu_type at the time parse_numa_opts()
> > >        were called
> > > ---
> > >  include/hw/boards.h        |  2 ++
> > >  hw/arm/virt.c              |  3 ++-
> > >  hw/core/machine.c          | 12 ++++++------
> > >  hw/i386/pc.c               |  4 +++-
> > >  hw/ppc/spapr.c             | 13 ++++++++-----
> > >  hw/s390x/s390-virtio-ccw.c |  1 +
> > >  vl.c                       |  3 +--
> > >  7 files changed, 23 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > > index 191a5b3..fa21758 100644
> > > --- a/include/hw/boards.h
> > > +++ b/include/hw/boards.h
> > > @@ -80,6 +80,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
> > >   * CPUArchId:
> > >   * @arch_id - architecture-dependent CPU ID of present or possible CPU  
> > 
> > I know this isn't really in scope for this patch, but is @arch_id here
> > supposed to have meaning defined by the target, or by the machine?
> > 
> > If it's the machime, it could do with a rename - "arch" means target
> > to most people (thanks to Linux).
> > 
> > If it's the target, it's kind of bogus, because it doesn't necessarily
> > have a clear meaning per target - get_arch_id in CPUClass has the same
> > problem, which is probably one reason it's basically only used by the
> > x86 code at present.
> > 
> > e.g. for target/ppc, what do we use?  There's the PIR, which is in the
> > CPU.. but only on some cpu models, not all.  There will generally be
> > some kind of master PIC id, but there are different PIC models on
> > different boards.  What goes in the devicetree?  Well only some
> > machines use devicetree, and they might define the cpu reg 
> > differently.
> > 
> > Board designs will generally try to make some if not all of those
> > possible values equal for simplicity, but there's still no real way of
> > defining a sensible arch_id independent of machine / board.
> I'd say arch_id is machine specific so far, it was introduced when we
> didn't have CpuInstanceProperties and at that time we considered only
> vcpus (threads) and doesn't really apply to spapr cores.
> 
> In general we could do away with arch_id and use CpuInstanceProperties
> instead, but arch_id also serves aux purpose, it allows machine to
> pre-calculate(cache) apic-id/mpidr values in one place and then they
> are/(could be) used by arch in-depended code to build acpi tables.
> So if we drop arch_id we would need to introduce a machine hook,
> which would translate CpuInstanceProperties into current arch_id.

I think we need to do a better to job documenting where exactly
we expect arch_id to be used and how, so people know what it's
supposed to return.

If the only place where it's useful now is ACPI code (is it?),
should we rename it to something like get_acpi_id()?

-- 
Eduardo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC v2 3/6] possible_cpus: add CPUArchId::type field
  2017-11-06 18:02         ` Eduardo Habkost
@ 2017-11-07 15:04           ` Cornelia Huck
  2017-11-09  6:58             ` David Gibson
  2017-11-09  6:53           ` David Gibson
  1 sibling, 1 reply; 93+ messages in thread
From: Cornelia Huck @ 2017-11-07 15:04 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Igor Mammedov, David Gibson, qemu-devel, peter.maydell, pkrempa,
	pbonzini, drjones

On Mon, 6 Nov 2017 16:02:16 -0200
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Tue, Oct 31, 2017 at 03:01:14PM +0100, Igor Mammedov wrote:
> > On Thu, 19 Oct 2017 17:31:51 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Wed, Oct 18, 2017 at 01:12:12PM +0200, Igor Mammedov wrote:  
> > > > For enabling early cpu to numa node configuration at runtime
> > > > qmp_query_hotpluggable_cpus() should provide a list of available
> > > > cpu slots at early stage, before machine_init() is called and
> > > > the 1st cpu is created, so that mgmt might be able to call it
> > > > and use output to set numa mapping.
> > > > Use MachineClass::possible_cpu_arch_ids() callback to set
> > > > cpu type info, along with the rest of possible cpu properties,
> > > > to let machine define which cpu type* will be used.
> > > > 
> > > > * for SPAPR it will be a spapr core type and for ARM/s390x/x86
> > > >   a respective descendant of CPUClass.
> > > > 
> > > > Move parse_numa_opts() in vl.c after cpu_model is parsed into
> > > > cpu_type so that possible_cpu_arch_ids() would know which
> > > > cpu_type to use during layout initialization.
> > > > 
> > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>    
> > > 
> > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > >   
> > > > ---
> > > >   v2:
> > > >      - fix NULL dereference caused by not initialized
> > > >        MachineState::cpu_type at the time parse_numa_opts()
> > > >        were called
> > > > ---
> > > >  include/hw/boards.h        |  2 ++
> > > >  hw/arm/virt.c              |  3 ++-
> > > >  hw/core/machine.c          | 12 ++++++------
> > > >  hw/i386/pc.c               |  4 +++-
> > > >  hw/ppc/spapr.c             | 13 ++++++++-----
> > > >  hw/s390x/s390-virtio-ccw.c |  1 +
> > > >  vl.c                       |  3 +--
> > > >  7 files changed, 23 insertions(+), 15 deletions(-)
> > > > 
> > > > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > > > index 191a5b3..fa21758 100644
> > > > --- a/include/hw/boards.h
> > > > +++ b/include/hw/boards.h
> > > > @@ -80,6 +80,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
> > > >   * CPUArchId:
> > > >   * @arch_id - architecture-dependent CPU ID of present or possible CPU    
> > > 
> > > I know this isn't really in scope for this patch, but is @arch_id here
> > > supposed to have meaning defined by the target, or by the machine?
> > > 
> > > If it's the machime, it could do with a rename - "arch" means target
> > > to most people (thanks to Linux).
> > > 
> > > If it's the target, it's kind of bogus, because it doesn't necessarily
> > > have a clear meaning per target - get_arch_id in CPUClass has the same
> > > problem, which is probably one reason it's basically only used by the
> > > x86 code at present.
> > > 
> > > e.g. for target/ppc, what do we use?  There's the PIR, which is in the
> > > CPU.. but only on some cpu models, not all.  There will generally be
> > > some kind of master PIC id, but there are different PIC models on
> > > different boards.  What goes in the devicetree?  Well only some
> > > machines use devicetree, and they might define the cpu reg 
> > > differently.
> > > 
> > > Board designs will generally try to make some if not all of those
> > > possible values equal for simplicity, but there's still no real way of
> > > defining a sensible arch_id independent of machine / board.  
> > I'd say arch_id is machine specific so far, it was introduced when we
> > didn't have CpuInstanceProperties and at that time we considered only
> > vcpus (threads) and doesn't really apply to spapr cores.
> > 
> > In general we could do away with arch_id and use CpuInstanceProperties
> > instead, but arch_id also serves aux purpose, it allows machine to
> > pre-calculate(cache) apic-id/mpidr values in one place and then they
> > are/(could be) used by arch in-depended code to build acpi tables.
> > So if we drop arch_id we would need to introduce a machine hook,
> > which would translate CpuInstanceProperties into current arch_id.  
> 
> I think we need to do a better to job documenting where exactly
> we expect arch_id to be used and how, so people know what it's
> supposed to return.
> 
> If the only place where it's useful now is ACPI code (is it?),
> should we rename it to something like get_acpi_id()?

It is also used in hw/s390x/sclp.c to fill out a control block, so acpi
isn't the only user.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC v2 3/6] possible_cpus: add CPUArchId::type field
  2017-11-06 18:02         ` Eduardo Habkost
  2017-11-07 15:04           ` Cornelia Huck
@ 2017-11-09  6:53           ` David Gibson
  1 sibling, 0 replies; 93+ messages in thread
From: David Gibson @ 2017-11-09  6:53 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Igor Mammedov, qemu-devel, peter.maydell, pkrempa, pbonzini,
	drjones, cohuck

[-- Attachment #1: Type: text/plain, Size: 4691 bytes --]

On Mon, Nov 06, 2017 at 04:02:16PM -0200, Eduardo Habkost wrote:
> On Tue, Oct 31, 2017 at 03:01:14PM +0100, Igor Mammedov wrote:
> > On Thu, 19 Oct 2017 17:31:51 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> > 
> > > On Wed, Oct 18, 2017 at 01:12:12PM +0200, Igor Mammedov wrote:
> > > > For enabling early cpu to numa node configuration at runtime
> > > > qmp_query_hotpluggable_cpus() should provide a list of available
> > > > cpu slots at early stage, before machine_init() is called and
> > > > the 1st cpu is created, so that mgmt might be able to call it
> > > > and use output to set numa mapping.
> > > > Use MachineClass::possible_cpu_arch_ids() callback to set
> > > > cpu type info, along with the rest of possible cpu properties,
> > > > to let machine define which cpu type* will be used.
> > > > 
> > > > * for SPAPR it will be a spapr core type and for ARM/s390x/x86
> > > >   a respective descendant of CPUClass.
> > > > 
> > > > Move parse_numa_opts() in vl.c after cpu_model is parsed into
> > > > cpu_type so that possible_cpu_arch_ids() would know which
> > > > cpu_type to use during layout initialization.
> > > > 
> > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> > > 
> > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > > 
> > > > ---
> > > >   v2:
> > > >      - fix NULL dereference caused by not initialized
> > > >        MachineState::cpu_type at the time parse_numa_opts()
> > > >        were called
> > > > ---
> > > >  include/hw/boards.h        |  2 ++
> > > >  hw/arm/virt.c              |  3 ++-
> > > >  hw/core/machine.c          | 12 ++++++------
> > > >  hw/i386/pc.c               |  4 +++-
> > > >  hw/ppc/spapr.c             | 13 ++++++++-----
> > > >  hw/s390x/s390-virtio-ccw.c |  1 +
> > > >  vl.c                       |  3 +--
> > > >  7 files changed, 23 insertions(+), 15 deletions(-)
> > > > 
> > > > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > > > index 191a5b3..fa21758 100644
> > > > --- a/include/hw/boards.h
> > > > +++ b/include/hw/boards.h
> > > > @@ -80,6 +80,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
> > > >   * CPUArchId:
> > > >   * @arch_id - architecture-dependent CPU ID of present or possible CPU  
> > > 
> > > I know this isn't really in scope for this patch, but is @arch_id here
> > > supposed to have meaning defined by the target, or by the machine?
> > > 
> > > If it's the machime, it could do with a rename - "arch" means target
> > > to most people (thanks to Linux).
> > > 
> > > If it's the target, it's kind of bogus, because it doesn't necessarily
> > > have a clear meaning per target - get_arch_id in CPUClass has the same
> > > problem, which is probably one reason it's basically only used by the
> > > x86 code at present.
> > > 
> > > e.g. for target/ppc, what do we use?  There's the PIR, which is in the
> > > CPU.. but only on some cpu models, not all.  There will generally be
> > > some kind of master PIC id, but there are different PIC models on
> > > different boards.  What goes in the devicetree?  Well only some
> > > machines use devicetree, and they might define the cpu reg 
> > > differently.
> > > 
> > > Board designs will generally try to make some if not all of those
> > > possible values equal for simplicity, but there's still no real way of
> > > defining a sensible arch_id independent of machine / board.
> > I'd say arch_id is machine specific so far, it was introduced when we
> > didn't have CpuInstanceProperties and at that time we considered only
> > vcpus (threads) and doesn't really apply to spapr cores.
> > 
> > In general we could do away with arch_id and use CpuInstanceProperties
> > instead, but arch_id also serves aux purpose, it allows machine to
> > pre-calculate(cache) apic-id/mpidr values in one place and then they
> > are/(could be) used by arch in-depended code to build acpi tables.
> > So if we drop arch_id we would need to introduce a machine hook,
> > which would translate CpuInstanceProperties into current arch_id.
> 
> I think we need to do a better to job documenting where exactly
> we expect arch_id to be used and how, so people know what it's
> supposed to return.

The trouble with this is I think it's impossible - it doesn't have a
well defined meaning.

> If the only place where it's useful now is ACPI code (is it?),
> should we rename it to something like get_acpi_id()?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC v2 3/6] possible_cpus: add CPUArchId::type field
  2017-11-07 15:04           ` Cornelia Huck
@ 2017-11-09  6:58             ` David Gibson
  2017-11-09 20:02               ` Eduardo Habkost
  0 siblings, 1 reply; 93+ messages in thread
From: David Gibson @ 2017-11-09  6:58 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Eduardo Habkost, Igor Mammedov, qemu-devel, peter.maydell,
	pkrempa, pbonzini, drjones

[-- Attachment #1: Type: text/plain, Size: 5672 bytes --]

On Tue, Nov 07, 2017 at 04:04:04PM +0100, Cornelia Huck wrote:
> On Mon, 6 Nov 2017 16:02:16 -0200
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Tue, Oct 31, 2017 at 03:01:14PM +0100, Igor Mammedov wrote:
> > > On Thu, 19 Oct 2017 17:31:51 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > On Wed, Oct 18, 2017 at 01:12:12PM +0200, Igor Mammedov wrote:  
> > > > > For enabling early cpu to numa node configuration at runtime
> > > > > qmp_query_hotpluggable_cpus() should provide a list of available
> > > > > cpu slots at early stage, before machine_init() is called and
> > > > > the 1st cpu is created, so that mgmt might be able to call it
> > > > > and use output to set numa mapping.
> > > > > Use MachineClass::possible_cpu_arch_ids() callback to set
> > > > > cpu type info, along with the rest of possible cpu properties,
> > > > > to let machine define which cpu type* will be used.
> > > > > 
> > > > > * for SPAPR it will be a spapr core type and for ARM/s390x/x86
> > > > >   a respective descendant of CPUClass.
> > > > > 
> > > > > Move parse_numa_opts() in vl.c after cpu_model is parsed into
> > > > > cpu_type so that possible_cpu_arch_ids() would know which
> > > > > cpu_type to use during layout initialization.
> > > > > 
> > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>    
> > > > 
> > > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > > >   
> > > > > ---
> > > > >   v2:
> > > > >      - fix NULL dereference caused by not initialized
> > > > >        MachineState::cpu_type at the time parse_numa_opts()
> > > > >        were called
> > > > > ---
> > > > >  include/hw/boards.h        |  2 ++
> > > > >  hw/arm/virt.c              |  3 ++-
> > > > >  hw/core/machine.c          | 12 ++++++------
> > > > >  hw/i386/pc.c               |  4 +++-
> > > > >  hw/ppc/spapr.c             | 13 ++++++++-----
> > > > >  hw/s390x/s390-virtio-ccw.c |  1 +
> > > > >  vl.c                       |  3 +--
> > > > >  7 files changed, 23 insertions(+), 15 deletions(-)
> > > > > 
> > > > > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > > > > index 191a5b3..fa21758 100644
> > > > > --- a/include/hw/boards.h
> > > > > +++ b/include/hw/boards.h
> > > > > @@ -80,6 +80,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
> > > > >   * CPUArchId:
> > > > >   * @arch_id - architecture-dependent CPU ID of present or possible CPU    
> > > > 
> > > > I know this isn't really in scope for this patch, but is @arch_id here
> > > > supposed to have meaning defined by the target, or by the machine?
> > > > 
> > > > If it's the machime, it could do with a rename - "arch" means target
> > > > to most people (thanks to Linux).
> > > > 
> > > > If it's the target, it's kind of bogus, because it doesn't necessarily
> > > > have a clear meaning per target - get_arch_id in CPUClass has the same
> > > > problem, which is probably one reason it's basically only used by the
> > > > x86 code at present.
> > > > 
> > > > e.g. for target/ppc, what do we use?  There's the PIR, which is in the
> > > > CPU.. but only on some cpu models, not all.  There will generally be
> > > > some kind of master PIC id, but there are different PIC models on
> > > > different boards.  What goes in the devicetree?  Well only some
> > > > machines use devicetree, and they might define the cpu reg 
> > > > differently.
> > > > 
> > > > Board designs will generally try to make some if not all of those
> > > > possible values equal for simplicity, but there's still no real way of
> > > > defining a sensible arch_id independent of machine / board.  
> > > I'd say arch_id is machine specific so far, it was introduced when we
> > > didn't have CpuInstanceProperties and at that time we considered only
> > > vcpus (threads) and doesn't really apply to spapr cores.
> > > 
> > > In general we could do away with arch_id and use CpuInstanceProperties
> > > instead, but arch_id also serves aux purpose, it allows machine to
> > > pre-calculate(cache) apic-id/mpidr values in one place and then they
> > > are/(could be) used by arch in-depended code to build acpi tables.
> > > So if we drop arch_id we would need to introduce a machine hook,
> > > which would translate CpuInstanceProperties into current arch_id.  
> > 
> > I think we need to do a better to job documenting where exactly
> > we expect arch_id to be used and how, so people know what it's
> > supposed to return.
> > 
> > If the only place where it's useful now is ACPI code (is it?),
> > should we rename it to something like get_acpi_id()?
> 
> It is also used in hw/s390x/sclp.c to fill out a control block, so acpi
> isn't the only user.

Yeah.. this is kind of bogus.  The s390 use is in machine specific
code, so it's basically just re-using the field for an unrelated usage
to the x86/arm one (ACPI).

If we can't assign a universal meaning to the field (even if the
actual values are per-machine) - and I don't think we can - then I
really don't think it belongs in CPUState.  A machine hook which
translates an ArchId to an acpi_id is the correct solution I believe.
Or even an ACPIMachine interface (to be implemented by machines which
do ACPI) which has a method to do this.

Since both the assignment and use are in machine type specific code
for s390, it can have its own field in the s390 specific cpu subclass.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC v2 3/6] possible_cpus: add CPUArchId::type field
  2017-11-09  6:58             ` David Gibson
@ 2017-11-09 20:02               ` Eduardo Habkost
  2017-11-10 10:14                 ` Cornelia Huck
  2017-11-21 14:02                 ` Igor Mammedov
  0 siblings, 2 replies; 93+ messages in thread
From: Eduardo Habkost @ 2017-11-09 20:02 UTC (permalink / raw)
  To: David Gibson
  Cc: Cornelia Huck, Igor Mammedov, qemu-devel, peter.maydell, pkrempa,
	pbonzini, drjones

On Thu, Nov 09, 2017 at 05:58:03PM +1100, David Gibson wrote:
> On Tue, Nov 07, 2017 at 04:04:04PM +0100, Cornelia Huck wrote:
> > On Mon, 6 Nov 2017 16:02:16 -0200
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > 
> > > On Tue, Oct 31, 2017 at 03:01:14PM +0100, Igor Mammedov wrote:
> > > > On Thu, 19 Oct 2017 17:31:51 +1100
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >   
> > > > > On Wed, Oct 18, 2017 at 01:12:12PM +0200, Igor Mammedov wrote:  
> > > > > > For enabling early cpu to numa node configuration at runtime
> > > > > > qmp_query_hotpluggable_cpus() should provide a list of available
> > > > > > cpu slots at early stage, before machine_init() is called and
> > > > > > the 1st cpu is created, so that mgmt might be able to call it
> > > > > > and use output to set numa mapping.
> > > > > > Use MachineClass::possible_cpu_arch_ids() callback to set
> > > > > > cpu type info, along with the rest of possible cpu properties,
> > > > > > to let machine define which cpu type* will be used.
> > > > > > 
> > > > > > * for SPAPR it will be a spapr core type and for ARM/s390x/x86
> > > > > >   a respective descendant of CPUClass.
> > > > > > 
> > > > > > Move parse_numa_opts() in vl.c after cpu_model is parsed into
> > > > > > cpu_type so that possible_cpu_arch_ids() would know which
> > > > > > cpu_type to use during layout initialization.
> > > > > > 
> > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>    
> > > > > 
> > > > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > > > >   
> > > > > > ---
> > > > > >   v2:
> > > > > >      - fix NULL dereference caused by not initialized
> > > > > >        MachineState::cpu_type at the time parse_numa_opts()
> > > > > >        were called
> > > > > > ---
> > > > > >  include/hw/boards.h        |  2 ++
> > > > > >  hw/arm/virt.c              |  3 ++-
> > > > > >  hw/core/machine.c          | 12 ++++++------
> > > > > >  hw/i386/pc.c               |  4 +++-
> > > > > >  hw/ppc/spapr.c             | 13 ++++++++-----
> > > > > >  hw/s390x/s390-virtio-ccw.c |  1 +
> > > > > >  vl.c                       |  3 +--
> > > > > >  7 files changed, 23 insertions(+), 15 deletions(-)
> > > > > > 
> > > > > > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > > > > > index 191a5b3..fa21758 100644
> > > > > > --- a/include/hw/boards.h
> > > > > > +++ b/include/hw/boards.h
> > > > > > @@ -80,6 +80,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
> > > > > >   * CPUArchId:
> > > > > >   * @arch_id - architecture-dependent CPU ID of present or possible CPU    
> > > > > 
> > > > > I know this isn't really in scope for this patch, but is @arch_id here
> > > > > supposed to have meaning defined by the target, or by the machine?
> > > > > 
> > > > > If it's the machime, it could do with a rename - "arch" means target
> > > > > to most people (thanks to Linux).
> > > > > 
> > > > > If it's the target, it's kind of bogus, because it doesn't necessarily
> > > > > have a clear meaning per target - get_arch_id in CPUClass has the same
> > > > > problem, which is probably one reason it's basically only used by the
> > > > > x86 code at present.
> > > > > 
> > > > > e.g. for target/ppc, what do we use?  There's the PIR, which is in the
> > > > > CPU.. but only on some cpu models, not all.  There will generally be
> > > > > some kind of master PIC id, but there are different PIC models on
> > > > > different boards.  What goes in the devicetree?  Well only some
> > > > > machines use devicetree, and they might define the cpu reg 
> > > > > differently.
> > > > > 
> > > > > Board designs will generally try to make some if not all of those
> > > > > possible values equal for simplicity, but there's still no real way of
> > > > > defining a sensible arch_id independent of machine / board.  
> > > > I'd say arch_id is machine specific so far, it was introduced when we
> > > > didn't have CpuInstanceProperties and at that time we considered only
> > > > vcpus (threads) and doesn't really apply to spapr cores.
> > > > 
> > > > In general we could do away with arch_id and use CpuInstanceProperties
> > > > instead, but arch_id also serves aux purpose, it allows machine to
> > > > pre-calculate(cache) apic-id/mpidr values in one place and then they
> > > > are/(could be) used by arch in-depended code to build acpi tables.
> > > > So if we drop arch_id we would need to introduce a machine hook,
> > > > which would translate CpuInstanceProperties into current arch_id.  
> > > 
> > > I think we need to do a better to job documenting where exactly
> > > we expect arch_id to be used and how, so people know what it's
> > > supposed to return.
> > > 
> > > If the only place where it's useful now is ACPI code (is it?),
> > > should we rename it to something like get_acpi_id()?
> > 
> > It is also used in hw/s390x/sclp.c to fill out a control block, so acpi
> > isn't the only user.
> 
> Yeah.. this is kind of bogus.  The s390 use is in machine specific
> code, so it's basically just re-using the field for an unrelated usage
> to the x86/arm one (ACPI).
> 
> If we can't assign a universal meaning to the field (even if the
> actual values are per-machine) - and I don't think we can - then I
> really don't think it belongs in CPUState.  A machine hook which
> translates an ArchId to an acpi_id is the correct solution I believe.
> Or even an ACPIMachine interface (to be implemented by machines which
> do ACPI) which has a method to do this.
> 
> Since both the assignment and use are in machine type specific code
> for s390, it can have its own field in the s390 specific cpu subclass.
> 

I agree.  This might require duplicating cpu_by_arch_id() and
cpu_exists() into machine-specific code, but this doesn't sound
too bad: there's only one user of cpu_by_arch_id() (that's
x86-specific code living inside monitor.c), and one user of
cpu_exists() (that's s390-specific code).

(Maybe those users could be rewritten to use
MachineState::possible_cpus, like pc_find_cpu_slot()).

-- 
Eduardo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC v2 3/6] possible_cpus: add CPUArchId::type field
  2017-11-09 20:02               ` Eduardo Habkost
@ 2017-11-10 10:14                 ` Cornelia Huck
  2017-11-10 12:34                   ` David Hildenbrand
  2017-11-21 14:02                 ` Igor Mammedov
  1 sibling, 1 reply; 93+ messages in thread
From: Cornelia Huck @ 2017-11-10 10:14 UTC (permalink / raw)
  To: Eduardo Habkost, David Hildenbrand
  Cc: David Gibson, Igor Mammedov, qemu-devel, peter.maydell, pkrempa,
	pbonzini, drjones

On Thu, 9 Nov 2017 18:02:35 -0200
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Thu, Nov 09, 2017 at 05:58:03PM +1100, David Gibson wrote:
> > On Tue, Nov 07, 2017 at 04:04:04PM +0100, Cornelia Huck wrote:  
> > > On Mon, 6 Nov 2017 16:02:16 -0200
> > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > >   
> > > > On Tue, Oct 31, 2017 at 03:01:14PM +0100, Igor Mammedov wrote:  
> > > > > On Thu, 19 Oct 2017 17:31:51 +1100
> > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > >     
> > > > > > On Wed, Oct 18, 2017 at 01:12:12PM +0200, Igor Mammedov wrote:    
> > > > > > > For enabling early cpu to numa node configuration at runtime
> > > > > > > qmp_query_hotpluggable_cpus() should provide a list of available
> > > > > > > cpu slots at early stage, before machine_init() is called and
> > > > > > > the 1st cpu is created, so that mgmt might be able to call it
> > > > > > > and use output to set numa mapping.
> > > > > > > Use MachineClass::possible_cpu_arch_ids() callback to set
> > > > > > > cpu type info, along with the rest of possible cpu properties,
> > > > > > > to let machine define which cpu type* will be used.
> > > > > > > 
> > > > > > > * for SPAPR it will be a spapr core type and for ARM/s390x/x86
> > > > > > >   a respective descendant of CPUClass.
> > > > > > > 
> > > > > > > Move parse_numa_opts() in vl.c after cpu_model is parsed into
> > > > > > > cpu_type so that possible_cpu_arch_ids() would know which
> > > > > > > cpu_type to use during layout initialization.
> > > > > > > 
> > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>      
> > > > > > 
> > > > > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > > > > >     
> > > > > > > ---
> > > > > > >   v2:
> > > > > > >      - fix NULL dereference caused by not initialized
> > > > > > >        MachineState::cpu_type at the time parse_numa_opts()
> > > > > > >        were called
> > > > > > > ---
> > > > > > >  include/hw/boards.h        |  2 ++
> > > > > > >  hw/arm/virt.c              |  3 ++-
> > > > > > >  hw/core/machine.c          | 12 ++++++------
> > > > > > >  hw/i386/pc.c               |  4 +++-
> > > > > > >  hw/ppc/spapr.c             | 13 ++++++++-----
> > > > > > >  hw/s390x/s390-virtio-ccw.c |  1 +
> > > > > > >  vl.c                       |  3 +--
> > > > > > >  7 files changed, 23 insertions(+), 15 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > > > > > > index 191a5b3..fa21758 100644
> > > > > > > --- a/include/hw/boards.h
> > > > > > > +++ b/include/hw/boards.h
> > > > > > > @@ -80,6 +80,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
> > > > > > >   * CPUArchId:
> > > > > > >   * @arch_id - architecture-dependent CPU ID of present or possible CPU      
> > > > > > 
> > > > > > I know this isn't really in scope for this patch, but is @arch_id here
> > > > > > supposed to have meaning defined by the target, or by the machine?
> > > > > > 
> > > > > > If it's the machime, it could do with a rename - "arch" means target
> > > > > > to most people (thanks to Linux).
> > > > > > 
> > > > > > If it's the target, it's kind of bogus, because it doesn't necessarily
> > > > > > have a clear meaning per target - get_arch_id in CPUClass has the same
> > > > > > problem, which is probably one reason it's basically only used by the
> > > > > > x86 code at present.
> > > > > > 
> > > > > > e.g. for target/ppc, what do we use?  There's the PIR, which is in the
> > > > > > CPU.. but only on some cpu models, not all.  There will generally be
> > > > > > some kind of master PIC id, but there are different PIC models on
> > > > > > different boards.  What goes in the devicetree?  Well only some
> > > > > > machines use devicetree, and they might define the cpu reg 
> > > > > > differently.
> > > > > > 
> > > > > > Board designs will generally try to make some if not all of those
> > > > > > possible values equal for simplicity, but there's still no real way of
> > > > > > defining a sensible arch_id independent of machine / board.    
> > > > > I'd say arch_id is machine specific so far, it was introduced when we
> > > > > didn't have CpuInstanceProperties and at that time we considered only
> > > > > vcpus (threads) and doesn't really apply to spapr cores.
> > > > > 
> > > > > In general we could do away with arch_id and use CpuInstanceProperties
> > > > > instead, but arch_id also serves aux purpose, it allows machine to
> > > > > pre-calculate(cache) apic-id/mpidr values in one place and then they
> > > > > are/(could be) used by arch in-depended code to build acpi tables.
> > > > > So if we drop arch_id we would need to introduce a machine hook,
> > > > > which would translate CpuInstanceProperties into current arch_id.    
> > > > 
> > > > I think we need to do a better to job documenting where exactly
> > > > we expect arch_id to be used and how, so people know what it's
> > > > supposed to return.
> > > > 
> > > > If the only place where it's useful now is ACPI code (is it?),
> > > > should we rename it to something like get_acpi_id()?  
> > > 
> > > It is also used in hw/s390x/sclp.c to fill out a control block, so acpi
> > > isn't the only user.  
> > 
> > Yeah.. this is kind of bogus.  The s390 use is in machine specific
> > code, so it's basically just re-using the field for an unrelated usage
> > to the x86/arm one (ACPI).
> > 
> > If we can't assign a universal meaning to the field (even if the
> > actual values are per-machine) - and I don't think we can - then I
> > really don't think it belongs in CPUState.  A machine hook which
> > translates an ArchId to an acpi_id is the correct solution I believe.
> > Or even an ACPIMachine interface (to be implemented by machines which
> > do ACPI) which has a method to do this.
> > 
> > Since both the assignment and use are in machine type specific code
> > for s390, it can have its own field in the s390 specific cpu subclass.
> >   
> 
> I agree.  This might require duplicating cpu_by_arch_id() and
> cpu_exists() into machine-specific code, but this doesn't sound
> too bad: there's only one user of cpu_by_arch_id() (that's
> x86-specific code living inside monitor.c), and one user of
> cpu_exists() (that's s390-specific code).
> 
> (Maybe those users could be rewritten to use
> MachineState::possible_cpus, like pc_find_cpu_slot()).

David (H), does that sound workable to you?

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC v2 3/6] possible_cpus: add CPUArchId::type field
  2017-11-10 10:14                 ` Cornelia Huck
@ 2017-11-10 12:34                   ` David Hildenbrand
  2017-11-10 12:58                     ` Eduardo Habkost
  0 siblings, 1 reply; 93+ messages in thread
From: David Hildenbrand @ 2017-11-10 12:34 UTC (permalink / raw)
  To: Cornelia Huck, Eduardo Habkost
  Cc: David Gibson, Igor Mammedov, qemu-devel, peter.maydell, pkrempa,
	pbonzini, drjones

On 10.11.2017 11:14, Cornelia Huck wrote:
> On Thu, 9 Nov 2017 18:02:35 -0200
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
>> On Thu, Nov 09, 2017 at 05:58:03PM +1100, David Gibson wrote:
>>> On Tue, Nov 07, 2017 at 04:04:04PM +0100, Cornelia Huck wrote:  
>>>> On Mon, 6 Nov 2017 16:02:16 -0200
>>>> Eduardo Habkost <ehabkost@redhat.com> wrote:
>>>>   
>>>>> On Tue, Oct 31, 2017 at 03:01:14PM +0100, Igor Mammedov wrote:  
>>>>>> On Thu, 19 Oct 2017 17:31:51 +1100
>>>>>> David Gibson <david@gibson.dropbear.id.au> wrote:
>>>>>>     
>>>>>>> On Wed, Oct 18, 2017 at 01:12:12PM +0200, Igor Mammedov wrote:    
>>>>>>>> For enabling early cpu to numa node configuration at runtime
>>>>>>>> qmp_query_hotpluggable_cpus() should provide a list of available
>>>>>>>> cpu slots at early stage, before machine_init() is called and
>>>>>>>> the 1st cpu is created, so that mgmt might be able to call it
>>>>>>>> and use output to set numa mapping.
>>>>>>>> Use MachineClass::possible_cpu_arch_ids() callback to set
>>>>>>>> cpu type info, along with the rest of possible cpu properties,
>>>>>>>> to let machine define which cpu type* will be used.
>>>>>>>>
>>>>>>>> * for SPAPR it will be a spapr core type and for ARM/s390x/x86
>>>>>>>>   a respective descendant of CPUClass.
>>>>>>>>
>>>>>>>> Move parse_numa_opts() in vl.c after cpu_model is parsed into
>>>>>>>> cpu_type so that possible_cpu_arch_ids() would know which
>>>>>>>> cpu_type to use during layout initialization.
>>>>>>>>
>>>>>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>      
>>>>>>>
>>>>>>> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
>>>>>>>     
>>>>>>>> ---
>>>>>>>>   v2:
>>>>>>>>      - fix NULL dereference caused by not initialized
>>>>>>>>        MachineState::cpu_type at the time parse_numa_opts()
>>>>>>>>        were called
>>>>>>>> ---
>>>>>>>>  include/hw/boards.h        |  2 ++
>>>>>>>>  hw/arm/virt.c              |  3 ++-
>>>>>>>>  hw/core/machine.c          | 12 ++++++------
>>>>>>>>  hw/i386/pc.c               |  4 +++-
>>>>>>>>  hw/ppc/spapr.c             | 13 ++++++++-----
>>>>>>>>  hw/s390x/s390-virtio-ccw.c |  1 +
>>>>>>>>  vl.c                       |  3 +--
>>>>>>>>  7 files changed, 23 insertions(+), 15 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/include/hw/boards.h b/include/hw/boards.h
>>>>>>>> index 191a5b3..fa21758 100644
>>>>>>>> --- a/include/hw/boards.h
>>>>>>>> +++ b/include/hw/boards.h
>>>>>>>> @@ -80,6 +80,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
>>>>>>>>   * CPUArchId:
>>>>>>>>   * @arch_id - architecture-dependent CPU ID of present or possible CPU      
>>>>>>>
>>>>>>> I know this isn't really in scope for this patch, but is @arch_id here
>>>>>>> supposed to have meaning defined by the target, or by the machine?
>>>>>>>
>>>>>>> If it's the machime, it could do with a rename - "arch" means target
>>>>>>> to most people (thanks to Linux).
>>>>>>>
>>>>>>> If it's the target, it's kind of bogus, because it doesn't necessarily
>>>>>>> have a clear meaning per target - get_arch_id in CPUClass has the same
>>>>>>> problem, which is probably one reason it's basically only used by the
>>>>>>> x86 code at present.
>>>>>>>
>>>>>>> e.g. for target/ppc, what do we use?  There's the PIR, which is in the
>>>>>>> CPU.. but only on some cpu models, not all.  There will generally be
>>>>>>> some kind of master PIC id, but there are different PIC models on
>>>>>>> different boards.  What goes in the devicetree?  Well only some
>>>>>>> machines use devicetree, and they might define the cpu reg 
>>>>>>> differently.
>>>>>>>
>>>>>>> Board designs will generally try to make some if not all of those
>>>>>>> possible values equal for simplicity, but there's still no real way of
>>>>>>> defining a sensible arch_id independent of machine / board.    
>>>>>> I'd say arch_id is machine specific so far, it was introduced when we
>>>>>> didn't have CpuInstanceProperties and at that time we considered only
>>>>>> vcpus (threads) and doesn't really apply to spapr cores.
>>>>>>
>>>>>> In general we could do away with arch_id and use CpuInstanceProperties
>>>>>> instead, but arch_id also serves aux purpose, it allows machine to
>>>>>> pre-calculate(cache) apic-id/mpidr values in one place and then they
>>>>>> are/(could be) used by arch in-depended code to build acpi tables.
>>>>>> So if we drop arch_id we would need to introduce a machine hook,
>>>>>> which would translate CpuInstanceProperties into current arch_id.    
>>>>>
>>>>> I think we need to do a better to job documenting where exactly
>>>>> we expect arch_id to be used and how, so people know what it's
>>>>> supposed to return.
>>>>>
>>>>> If the only place where it's useful now is ACPI code (is it?),
>>>>> should we rename it to something like get_acpi_id()?  
>>>>
>>>> It is also used in hw/s390x/sclp.c to fill out a control block, so acpi
>>>> isn't the only user.  
>>>
>>> Yeah.. this is kind of bogus.  The s390 use is in machine specific
>>> code, so it's basically just re-using the field for an unrelated usage
>>> to the x86/arm one (ACPI).

as index == arch_id on s390x, that code could easily be changed to
something like:

@@ -45,7 +45,7 @@ static void prepare_cpu_entries(SCLPDevice *sclp,
CPUEntry *entry, int *count)
         if (!ms->possible_cpus->cpus[i].cpu) {
             continue;
         }
-        entry[*count].address = ms->possible_cpus->cpus[i].arch_id;
+        entry[*count].address = i;
         entry[*count].type = 0;
         memcpy(entry[*count].features, features, sizeof(features));
         (*count)++;

arch_id just looked like the right thing to use (documentation issue
mentioned above)


>>>
>>> If we can't assign a universal meaning to the field (even if the
>>> actual values are per-machine) - and I don't think we can - then I
>>> really don't think it belongs in CPUState.  A machine hook which
>>> translates an ArchId to an acpi_id is the correct solution I believe.
>>> Or even an ACPIMachine interface (to be implemented by machines which
>>> do ACPI) which has a method to do this.
>>>
>>> Since both the assignment and use are in machine type specific code
>>> for s390, it can have its own field in the s390 specific cpu subclass.

s390x doesn't need arch_id at all.

cs->cpu_index can be used.

>>>   
>>
>> I agree.  This might require duplicating cpu_by_arch_id() and
>> cpu_exists() into machine-specific code, but this doesn't sound
>> too bad: there's only one user of cpu_by_arch_id() (that's
>> x86-specific code living inside monitor.c), and one user of
>> cpu_exists() (that's s390-specific code).>>
>> (Maybe those users could be rewritten to use
>> MachineState::possible_cpus, like pc_find_cpu_slot()).


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC v2 3/6] possible_cpus: add CPUArchId::type field
  2017-11-10 12:34                   ` David Hildenbrand
@ 2017-11-10 12:58                     ` Eduardo Habkost
  2017-11-10 13:07                       ` David Hildenbrand
  0 siblings, 1 reply; 93+ messages in thread
From: Eduardo Habkost @ 2017-11-10 12:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Cornelia Huck, David Gibson, Igor Mammedov, qemu-devel,
	peter.maydell, pkrempa, pbonzini, drjones

On Fri, Nov 10, 2017 at 01:34:42PM +0100, David Hildenbrand wrote:
> On 10.11.2017 11:14, Cornelia Huck wrote:
> > On Thu, 9 Nov 2017 18:02:35 -0200
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > 
> >> On Thu, Nov 09, 2017 at 05:58:03PM +1100, David Gibson wrote:
> >>> On Tue, Nov 07, 2017 at 04:04:04PM +0100, Cornelia Huck wrote:  
> >>>> On Mon, 6 Nov 2017 16:02:16 -0200
> >>>> Eduardo Habkost <ehabkost@redhat.com> wrote:
> >>>>   
> >>>>> On Tue, Oct 31, 2017 at 03:01:14PM +0100, Igor Mammedov wrote:  
> >>>>>> On Thu, 19 Oct 2017 17:31:51 +1100
> >>>>>> David Gibson <david@gibson.dropbear.id.au> wrote:
> >>>>>>     
> >>>>>>> On Wed, Oct 18, 2017 at 01:12:12PM +0200, Igor Mammedov wrote:    
> >>>>>>>> For enabling early cpu to numa node configuration at runtime
> >>>>>>>> qmp_query_hotpluggable_cpus() should provide a list of available
> >>>>>>>> cpu slots at early stage, before machine_init() is called and
> >>>>>>>> the 1st cpu is created, so that mgmt might be able to call it
> >>>>>>>> and use output to set numa mapping.
> >>>>>>>> Use MachineClass::possible_cpu_arch_ids() callback to set
> >>>>>>>> cpu type info, along with the rest of possible cpu properties,
> >>>>>>>> to let machine define which cpu type* will be used.
> >>>>>>>>
> >>>>>>>> * for SPAPR it will be a spapr core type and for ARM/s390x/x86
> >>>>>>>>   a respective descendant of CPUClass.
> >>>>>>>>
> >>>>>>>> Move parse_numa_opts() in vl.c after cpu_model is parsed into
> >>>>>>>> cpu_type so that possible_cpu_arch_ids() would know which
> >>>>>>>> cpu_type to use during layout initialization.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>      
> >>>>>>>
> >>>>>>> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> >>>>>>>     
> >>>>>>>> ---
> >>>>>>>>   v2:
> >>>>>>>>      - fix NULL dereference caused by not initialized
> >>>>>>>>        MachineState::cpu_type at the time parse_numa_opts()
> >>>>>>>>        were called
> >>>>>>>> ---
> >>>>>>>>  include/hw/boards.h        |  2 ++
> >>>>>>>>  hw/arm/virt.c              |  3 ++-
> >>>>>>>>  hw/core/machine.c          | 12 ++++++------
> >>>>>>>>  hw/i386/pc.c               |  4 +++-
> >>>>>>>>  hw/ppc/spapr.c             | 13 ++++++++-----
> >>>>>>>>  hw/s390x/s390-virtio-ccw.c |  1 +
> >>>>>>>>  vl.c                       |  3 +--
> >>>>>>>>  7 files changed, 23 insertions(+), 15 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/include/hw/boards.h b/include/hw/boards.h
> >>>>>>>> index 191a5b3..fa21758 100644
> >>>>>>>> --- a/include/hw/boards.h
> >>>>>>>> +++ b/include/hw/boards.h
> >>>>>>>> @@ -80,6 +80,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
> >>>>>>>>   * CPUArchId:
> >>>>>>>>   * @arch_id - architecture-dependent CPU ID of present or possible CPU      
> >>>>>>>
> >>>>>>> I know this isn't really in scope for this patch, but is @arch_id here
> >>>>>>> supposed to have meaning defined by the target, or by the machine?
> >>>>>>>
> >>>>>>> If it's the machime, it could do with a rename - "arch" means target
> >>>>>>> to most people (thanks to Linux).
> >>>>>>>
> >>>>>>> If it's the target, it's kind of bogus, because it doesn't necessarily
> >>>>>>> have a clear meaning per target - get_arch_id in CPUClass has the same
> >>>>>>> problem, which is probably one reason it's basically only used by the
> >>>>>>> x86 code at present.
> >>>>>>>
> >>>>>>> e.g. for target/ppc, what do we use?  There's the PIR, which is in the
> >>>>>>> CPU.. but only on some cpu models, not all.  There will generally be
> >>>>>>> some kind of master PIC id, but there are different PIC models on
> >>>>>>> different boards.  What goes in the devicetree?  Well only some
> >>>>>>> machines use devicetree, and they might define the cpu reg 
> >>>>>>> differently.
> >>>>>>>
> >>>>>>> Board designs will generally try to make some if not all of those
> >>>>>>> possible values equal for simplicity, but there's still no real way of
> >>>>>>> defining a sensible arch_id independent of machine / board.    
> >>>>>> I'd say arch_id is machine specific so far, it was introduced when we
> >>>>>> didn't have CpuInstanceProperties and at that time we considered only
> >>>>>> vcpus (threads) and doesn't really apply to spapr cores.
> >>>>>>
> >>>>>> In general we could do away with arch_id and use CpuInstanceProperties
> >>>>>> instead, but arch_id also serves aux purpose, it allows machine to
> >>>>>> pre-calculate(cache) apic-id/mpidr values in one place and then they
> >>>>>> are/(could be) used by arch in-depended code to build acpi tables.
> >>>>>> So if we drop arch_id we would need to introduce a machine hook,
> >>>>>> which would translate CpuInstanceProperties into current arch_id.    
> >>>>>
> >>>>> I think we need to do a better to job documenting where exactly
> >>>>> we expect arch_id to be used and how, so people know what it's
> >>>>> supposed to return.
> >>>>>
> >>>>> If the only place where it's useful now is ACPI code (is it?),
> >>>>> should we rename it to something like get_acpi_id()?  
> >>>>
> >>>> It is also used in hw/s390x/sclp.c to fill out a control block, so acpi
> >>>> isn't the only user.  
> >>>
> >>> Yeah.. this is kind of bogus.  The s390 use is in machine specific
> >>> code, so it's basically just re-using the field for an unrelated usage
> >>> to the x86/arm one (ACPI).
> 
> as index == arch_id on s390x, that code could easily be changed to
> something like:
> 
> @@ -45,7 +45,7 @@ static void prepare_cpu_entries(SCLPDevice *sclp,
> CPUEntry *entry, int *count)
>          if (!ms->possible_cpus->cpus[i].cpu) {
>              continue;
>          }
> -        entry[*count].address = ms->possible_cpus->cpus[i].arch_id;
> +        entry[*count].address = i;

What about decoupling it from the array index, by using:
    entry[*count].address = ms->possible_cpus->cpus[i].props.core_id;
or:
    entry[*count].address = S390_CPU(ms->possible_cpus->cpus[i].cpu)->core_id;
?


>          entry[*count].type = 0;
>          memcpy(entry[*count].features, features, sizeof(features));
>          (*count)++;
> 
> arch_id just looked like the right thing to use (documentation issue
> mentioned above)
> 
> 
> >>>
> >>> If we can't assign a universal meaning to the field (even if the
> >>> actual values are per-machine) - and I don't think we can - then I
> >>> really don't think it belongs in CPUState.  A machine hook which
> >>> translates an ArchId to an acpi_id is the correct solution I believe.
> >>> Or even an ACPIMachine interface (to be implemented by machines which
> >>> do ACPI) which has a method to do this.
> >>>
> >>> Since both the assignment and use are in machine type specific code
> >>> for s390, it can have its own field in the s390 specific cpu subclass.
> 
> s390x doesn't need arch_id at all.
> 
> cs->cpu_index can be used.

What about the cpu_exists() check in s390_cpu_realizefn()?  It
could be moved to a new s390_machine_device_pre_plug() method
that just checks ms->possible_cpus->cpus[cpu->env.core_id].cpu.

> 
> >>>   
> >>
> >> I agree.  This might require duplicating cpu_by_arch_id() and
> >> cpu_exists() into machine-specific code, but this doesn't sound
> >> too bad: there's only one user of cpu_by_arch_id() (that's
> >> x86-specific code living inside monitor.c), and one user of
> >> cpu_exists() (that's s390-specific code).>>
> >> (Maybe those users could be rewritten to use
> >> MachineState::possible_cpus, like pc_find_cpu_slot()).
> 
> 
> -- 
> 
> Thanks,
> 
> David / dhildenb

-- 
Eduardo

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC v2 3/6] possible_cpus: add CPUArchId::type field
  2017-11-10 12:58                     ` Eduardo Habkost
@ 2017-11-10 13:07                       ` David Hildenbrand
  0 siblings, 0 replies; 93+ messages in thread
From: David Hildenbrand @ 2017-11-10 13:07 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Cornelia Huck, David Gibson, Igor Mammedov, qemu-devel,
	peter.maydell, pkrempa, pbonzini, drjones

On 10.11.2017 13:58, Eduardo Habkost wrote:
> On Fri, Nov 10, 2017 at 01:34:42PM +0100, David Hildenbrand wrote:
>> On 10.11.2017 11:14, Cornelia Huck wrote:
>>> On Thu, 9 Nov 2017 18:02:35 -0200
>>> Eduardo Habkost <ehabkost@redhat.com> wrote:
>>>
>>>> On Thu, Nov 09, 2017 at 05:58:03PM +1100, David Gibson wrote:
>>>>> On Tue, Nov 07, 2017 at 04:04:04PM +0100, Cornelia Huck wrote:  
>>>>>> On Mon, 6 Nov 2017 16:02:16 -0200
>>>>>> Eduardo Habkost <ehabkost@redhat.com> wrote:
>>>>>>   
>>>>>>> On Tue, Oct 31, 2017 at 03:01:14PM +0100, Igor Mammedov wrote:  
>>>>>>>> On Thu, 19 Oct 2017 17:31:51 +1100
>>>>>>>> David Gibson <david@gibson.dropbear.id.au> wrote:
>>>>>>>>     
>>>>>>>>> On Wed, Oct 18, 2017 at 01:12:12PM +0200, Igor Mammedov wrote:    
>>>>>>>>>> For enabling early cpu to numa node configuration at runtime
>>>>>>>>>> qmp_query_hotpluggable_cpus() should provide a list of available
>>>>>>>>>> cpu slots at early stage, before machine_init() is called and
>>>>>>>>>> the 1st cpu is created, so that mgmt might be able to call it
>>>>>>>>>> and use output to set numa mapping.
>>>>>>>>>> Use MachineClass::possible_cpu_arch_ids() callback to set
>>>>>>>>>> cpu type info, along with the rest of possible cpu properties,
>>>>>>>>>> to let machine define which cpu type* will be used.
>>>>>>>>>>
>>>>>>>>>> * for SPAPR it will be a spapr core type and for ARM/s390x/x86
>>>>>>>>>>   a respective descendant of CPUClass.
>>>>>>>>>>
>>>>>>>>>> Move parse_numa_opts() in vl.c after cpu_model is parsed into
>>>>>>>>>> cpu_type so that possible_cpu_arch_ids() would know which
>>>>>>>>>> cpu_type to use during layout initialization.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>      
>>>>>>>>>
>>>>>>>>> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
>>>>>>>>>     
>>>>>>>>>> ---
>>>>>>>>>>   v2:
>>>>>>>>>>      - fix NULL dereference caused by not initialized
>>>>>>>>>>        MachineState::cpu_type at the time parse_numa_opts()
>>>>>>>>>>        were called
>>>>>>>>>> ---
>>>>>>>>>>  include/hw/boards.h        |  2 ++
>>>>>>>>>>  hw/arm/virt.c              |  3 ++-
>>>>>>>>>>  hw/core/machine.c          | 12 ++++++------
>>>>>>>>>>  hw/i386/pc.c               |  4 +++-
>>>>>>>>>>  hw/ppc/spapr.c             | 13 ++++++++-----
>>>>>>>>>>  hw/s390x/s390-virtio-ccw.c |  1 +
>>>>>>>>>>  vl.c                       |  3 +--
>>>>>>>>>>  7 files changed, 23 insertions(+), 15 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/include/hw/boards.h b/include/hw/boards.h
>>>>>>>>>> index 191a5b3..fa21758 100644
>>>>>>>>>> --- a/include/hw/boards.h
>>>>>>>>>> +++ b/include/hw/boards.h
>>>>>>>>>> @@ -80,6 +80,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
>>>>>>>>>>   * CPUArchId:
>>>>>>>>>>   * @arch_id - architecture-dependent CPU ID of present or possible CPU      
>>>>>>>>>
>>>>>>>>> I know this isn't really in scope for this patch, but is @arch_id here
>>>>>>>>> supposed to have meaning defined by the target, or by the machine?
>>>>>>>>>
>>>>>>>>> If it's the machime, it could do with a rename - "arch" means target
>>>>>>>>> to most people (thanks to Linux).
>>>>>>>>>
>>>>>>>>> If it's the target, it's kind of bogus, because it doesn't necessarily
>>>>>>>>> have a clear meaning per target - get_arch_id in CPUClass has the same
>>>>>>>>> problem, which is probably one reason it's basically only used by the
>>>>>>>>> x86 code at present.
>>>>>>>>>
>>>>>>>>> e.g. for target/ppc, what do we use?  There's the PIR, which is in the
>>>>>>>>> CPU.. but only on some cpu models, not all.  There will generally be
>>>>>>>>> some kind of master PIC id, but there are different PIC models on
>>>>>>>>> different boards.  What goes in the devicetree?  Well only some
>>>>>>>>> machines use devicetree, and they might define the cpu reg 
>>>>>>>>> differently.
>>>>>>>>>
>>>>>>>>> Board designs will generally try to make some if not all of those
>>>>>>>>> possible values equal for simplicity, but there's still no real way of
>>>>>>>>> defining a sensible arch_id independent of machine / board.    
>>>>>>>> I'd say arch_id is machine specific so far, it was introduced when we
>>>>>>>> didn't have CpuInstanceProperties and at that time we considered only
>>>>>>>> vcpus (threads) and doesn't really apply to spapr cores.
>>>>>>>>
>>>>>>>> In general we could do away with arch_id and use CpuInstanceProperties
>>>>>>>> instead, but arch_id also serves aux purpose, it allows machine to
>>>>>>>> pre-calculate(cache) apic-id/mpidr values in one place and then they
>>>>>>>> are/(could be) used by arch in-depended code to build acpi tables.
>>>>>>>> So if we drop arch_id we would need to introduce a machine hook,
>>>>>>>> which would translate CpuInstanceProperties into current arch_id.    
>>>>>>>
>>>>>>> I think we need to do a better to job documenting where exactly
>>>>>>> we expect arch_id to be used and how, so people know what it's
>>>>>>> supposed to return.
>>>>>>>
>>>>>>> If the only place where it's useful now is ACPI code (is it?),
>>>>>>> should we rename it to something like get_acpi_id()?  
>>>>>>
>>>>>> It is also used in hw/s390x/sclp.c to fill out a control block, so acpi
>>>>>> isn't the only user.  
>>>>>
>>>>> Yeah.. this is kind of bogus.  The s390 use is in machine specific
>>>>> code, so it's basically just re-using the field for an unrelated usage
>>>>> to the x86/arm one (ACPI).
>>
>> as index == arch_id on s390x, that code could easily be changed to
>> something like:
>>
>> @@ -45,7 +45,7 @@ static void prepare_cpu_entries(SCLPDevice *sclp,
>> CPUEntry *entry, int *count)
>>          if (!ms->possible_cpus->cpus[i].cpu) {
>>              continue;
>>          }
>> -        entry[*count].address = ms->possible_cpus->cpus[i].arch_id;
>> +        entry[*count].address = i;
> 
> What about decoupling it from the array index, by using:
>     entry[*count].address = ms->possible_cpus->cpus[i].props.core_id;
> or:
>     entry[*count].address = S390_CPU(ms->possible_cpus->cpus[i].cpu)->core_id;
> ?

Yes, we could do that, but doesn't really matter for now. I would ACK
either :)

> 
> 
>>          entry[*count].type = 0;
>>          memcpy(entry[*count].features, features, sizeof(features));
>>          (*count)++;
>>
>> arch_id just looked like the right thing to use (documentation issue
>> mentioned above)
>>
>>
>>>>>
>>>>> If we can't assign a universal meaning to the field (even if the
>>>>> actual values are per-machine) - and I don't think we can - then I
>>>>> really don't think it belongs in CPUState.  A machine hook which
>>>>> translates an ArchId to an acpi_id is the correct solution I believe.
>>>>> Or even an ACPIMachine interface (to be implemented by machines which
>>>>> do ACPI) which has a method to do this.
>>>>>
>>>>> Since both the assignment and use are in machine type specific code
>>>>> for s390, it can have its own field in the s390 specific cpu subclass.
>>
>> s390x doesn't need arch_id at all.
>>
>> cs->cpu_index can be used.
> 
> What about the cpu_exists() check in s390_cpu_realizefn()?  It
> could be moved to a new s390_machine_device_pre_plug() method
> that just checks ms->possible_cpus->cpus[cpu->env.core_id].cpu.
> 

I always hated that part (cpu_exists()). We can completely drop
cpu_exists() on s390x and simply add that check for pre plug as you
said, fine with me!
-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [RFC v2 3/6] possible_cpus: add CPUArchId::type field
  2017-11-09 20:02               ` Eduardo Habkost
  2017-11-10 10:14                 ` Cornelia Huck
@ 2017-11-21 14:02                 ` Igor Mammedov
  1 sibling, 0 replies; 93+ messages in thread
From: Igor Mammedov @ 2017-11-21 14:02 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: David Gibson, Cornelia Huck, qemu-devel, peter.maydell, pkrempa,
	pbonzini, drjones

On Thu, 9 Nov 2017 18:02:35 -0200
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Thu, Nov 09, 2017 at 05:58:03PM +1100, David Gibson wrote:
> > On Tue, Nov 07, 2017 at 04:04:04PM +0100, Cornelia Huck wrote:  
> > > On Mon, 6 Nov 2017 16:02:16 -0200
> > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > >   
> > > > On Tue, Oct 31, 2017 at 03:01:14PM +0100, Igor Mammedov wrote:  
> > > > > On Thu, 19 Oct 2017 17:31:51 +1100
> > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > >     
> > > > > > On Wed, Oct 18, 2017 at 01:12:12PM +0200, Igor Mammedov wrote:    
> > > > > > > For enabling early cpu to numa node configuration at runtime
> > > > > > > qmp_query_hotpluggable_cpus() should provide a list of available
> > > > > > > cpu slots at early stage, before machine_init() is called and
> > > > > > > the 1st cpu is created, so that mgmt might be able to call it
> > > > > > > and use output to set numa mapping.
> > > > > > > Use MachineClass::possible_cpu_arch_ids() callback to set
> > > > > > > cpu type info, along with the rest of possible cpu properties,
> > > > > > > to let machine define which cpu type* will be used.
> > > > > > > 
> > > > > > > * for SPAPR it will be a spapr core type and for ARM/s390x/x86
> > > > > > >   a respective descendant of CPUClass.
> > > > > > > 
> > > > > > > Move parse_numa_opts() in vl.c after cpu_model is parsed into
> > > > > > > cpu_type so that possible_cpu_arch_ids() would know which
> > > > > > > cpu_type to use during layout initialization.
> > > > > > > 
> > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>      
> > > > > > 
> > > > > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > > > > >     
> > > > > > > ---
> > > > > > >   v2:
> > > > > > >      - fix NULL dereference caused by not initialized
> > > > > > >        MachineState::cpu_type at the time parse_numa_opts()
> > > > > > >        were called
> > > > > > > ---
> > > > > > >  include/hw/boards.h        |  2 ++
> > > > > > >  hw/arm/virt.c              |  3 ++-
> > > > > > >  hw/core/machine.c          | 12 ++++++------
> > > > > > >  hw/i386/pc.c               |  4 +++-
> > > > > > >  hw/ppc/spapr.c             | 13 ++++++++-----
> > > > > > >  hw/s390x/s390-virtio-ccw.c |  1 +
> > > > > > >  vl.c                       |  3 +--
> > > > > > >  7 files changed, 23 insertions(+), 15 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > > > > > > index 191a5b3..fa21758 100644
> > > > > > > --- a/include/hw/boards.h
> > > > > > > +++ b/include/hw/boards.h
> > > > > > > @@ -80,6 +80,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
> > > > > > >   * CPUArchId:
> > > > > > >   * @arch_id - architecture-dependent CPU ID of present or possible CPU      
> > > > > > 
> > > > > > I know this isn't really in scope for this patch, but is @arch_id here
> > > > > > supposed to have meaning defined by the target, or by the machine?
> > > > > > 
> > > > > > If it's the machime, it could do with a rename - "arch" means target
> > > > > > to most people (thanks to Linux).
> > > > > > 
> > > > > > If it's the target, it's kind of bogus, because it doesn't necessarily
> > > > > > have a clear meaning per target - get_arch_id in CPUClass has the same
> > > > > > problem, which is probably one reason it's basically only used by the
> > > > > > x86 code at present.
> > > > > > 
> > > > > > e.g. for target/ppc, what do we use?  There's the PIR, which is in the
> > > > > > CPU.. but only on some cpu models, not all.  There will generally be
> > > > > > some kind of master PIC id, but there are different PIC models on
> > > > > > different boards.  What goes in the devicetree?  Well only some
> > > > > > machines use devicetree, and they might define the cpu reg 
> > > > > > differently.
> > > > > > 
> > > > > > Board designs will generally try to make some if not all of those
> > > > > > possible values equal for simplicity, but there's still no real way of
> > > > > > defining a sensible arch_id independent of machine / board.    
> > > > > I'd say arch_id is machine specific so far, it was introduced when we
> > > > > didn't have CpuInstanceProperties and at that time we considered only
> > > > > vcpus (threads) and doesn't really apply to spapr cores.
> > > > > 
> > > > > In general we could do away with arch_id and use CpuInstanceProperties
> > > > > instead, but arch_id also serves aux purpose, it allows machine to
> > > > > pre-calculate(cache) apic-id/mpidr values in one place and then they
> > > > > are/(could be) used by arch in-depended code to build acpi tables.
> > > > > So if we drop arch_id we would need to introduce a machine hook,
> > > > > which would translate CpuInstanceProperties into current arch_id.    
> > > > 
> > > > I think we need to do a better to job documenting where exactly
> > > > we expect arch_id to be used and how, so people know what it's
> > > > supposed to return.
> > > > 
> > > > If the only place where it's useful now is ACPI code (is it?),
> > > > should we rename it to something like get_acpi_id()?  
> > > 
> > > It is also used in hw/s390x/sclp.c to fill out a control block, so acpi
> > > isn't the only user.  
> > 
> > Yeah.. this is kind of bogus.  The s390 use is in machine specific
> > code, so it's basically just re-using the field for an unrelated usage
> > to the x86/arm one (ACPI).
> > 
> > If we can't assign a universal meaning to the field (even if the
> > actual values are per-machine) - and I don't think we can - then I
> > really don't think it belongs in CPUState.  A machine hook which
> > translates an ArchId to an acpi_id is the correct solution I believe.
> > Or even an ACPIMachine interface (to be implemented by machines which
> > do ACPI) which has a method to do this.
> > 
> > Since both the assignment and use are in machine type specific code
> > for s390, it can have its own field in the s390 specific cpu subclass.
> >   
> 
> I agree.  This might require duplicating cpu_by_arch_id() and
> cpu_exists() into machine-specific code, but this doesn't sound
> too bad: there's only one user of cpu_by_arch_id() (that's
> x86-specific code living inside monitor.c), and one user of
> cpu_exists() (that's s390-specific code).
> 
> (Maybe those users could be rewritten to use
> MachineState::possible_cpus, like pc_find_cpu_slot()).
I've  added  getting rid of arch_id from generic structure on
my TODO list.

^ permalink raw reply	[flat|nested] 93+ messages in thread

end of thread, other threads:[~2017-11-21 14:02 UTC | newest]

Thread overview: 93+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-16 16:22 [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP Igor Mammedov
2017-10-16 16:22 ` [Qemu-devel] [RFC 1/6] numa: postpone options post-processing till machine_run_board_init() Igor Mammedov
2017-10-17  5:49   ` David Gibson
2017-10-16 16:22 ` [Qemu-devel] [RFC 2/6] numa: split out NumaOptions parsing into parse_NumaOptions() Igor Mammedov
2017-10-18  3:27   ` David Gibson
2017-10-18 14:53     ` Eric Blake
2017-10-16 16:22 ` [Qemu-devel] [RFC 3/6] possible_cpus: add CPUArchId::type field Igor Mammedov
2017-10-18 11:12   ` [Qemu-devel] [RFC v2 " Igor Mammedov
2017-10-19  6:31     ` David Gibson
2017-10-31 14:01       ` Igor Mammedov
2017-11-06 18:02         ` Eduardo Habkost
2017-11-07 15:04           ` Cornelia Huck
2017-11-09  6:58             ` David Gibson
2017-11-09 20:02               ` Eduardo Habkost
2017-11-10 10:14                 ` Cornelia Huck
2017-11-10 12:34                   ` David Hildenbrand
2017-11-10 12:58                     ` Eduardo Habkost
2017-11-10 13:07                       ` David Hildenbrand
2017-11-21 14:02                 ` Igor Mammedov
2017-11-09  6:53           ` David Gibson
2017-10-16 16:22 ` [Qemu-devel] [RFC 4/6] CLI: add -paused option Igor Mammedov
2017-10-16 16:35   ` Daniel P. Berrange
2017-10-17  8:17     ` Igor Mammedov
2017-10-17 10:56       ` Laszlo Ersek
2017-10-17 11:11         ` Peter Krempa
2017-10-20 15:38     ` Eduardo Habkost
2017-10-16 16:59   ` Eduardo Habkost
2017-10-16 17:01     ` Paolo Bonzini
2017-10-16 17:17       ` Eduardo Habkost
2017-10-17  8:47         ` Paolo Bonzini
2017-10-17  9:25           ` Igor Mammedov
2017-10-17 14:48       ` Daniel P. Berrange
2017-10-17 15:21         ` Laszlo Ersek
2017-10-17 15:35           ` Daniel P. Berrange
2017-10-17 15:42             ` Laszlo Ersek
2017-10-17 15:47               ` Daniel P. Berrange
2017-10-17 15:47             ` Igor Mammedov
2017-10-17 15:52               ` Daniel P. Berrange
2017-10-17  9:10     ` Igor Mammedov
2017-10-19 10:42     ` David Gibson
2017-10-20  0:15       ` Eduardo Habkost
2017-10-20  1:19         ` David Gibson
2017-10-20 14:21           ` Eduardo Habkost
2017-10-23  9:49             ` Igor Mammedov
2017-10-23  9:53               ` Daniel P. Berrange
2017-10-23 10:36                 ` Igor Mammedov
2017-10-23 10:49                   ` Daniel P. Berrange
2017-10-23 11:18                     ` Igor Mammedov
2017-10-25 10:52                       ` Eduardo Habkost
2017-10-25 10:35               ` Eduardo Habkost
2017-10-23  9:30         ` Alex Bennée
2017-10-16 16:22 ` [Qemu-devel] [RFC 5/6] HMP: add set-numa-node command Igor Mammedov
2017-10-16 16:22 ` [Qemu-devel] [RFC 6/6] QMP: " Igor Mammedov
2017-10-16 16:36 ` [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP Daniel P. Berrange
2017-10-16 17:05   ` Eduardo Habkost
2017-10-17  7:27   ` Igor Mammedov
2017-10-17 15:07     ` Daniel P. Berrange
2017-10-17 15:24       ` Laszlo Ersek
2017-10-17 16:06       ` Igor Mammedov
2017-10-17 16:09         ` Daniel P. Berrange
2017-10-17 16:18           ` Igor Mammedov
2017-10-18 12:59             ` Eduardo Habkost
2017-10-18 14:44               ` Igor Mammedov
2017-10-18 14:49                 ` Daniel P. Berrange
2017-10-18 15:24                   ` Igor Mammedov
2017-10-18 15:27                     ` Daniel P. Berrange
2017-10-18 20:11                       ` Eduardo Habkost
2017-10-18 15:30         ` Daniel P. Berrange
2017-10-18 20:22           ` Eduardo Habkost
2017-10-19 11:49             ` David Gibson
2017-10-19 12:23               ` Paolo Bonzini
2017-10-20  1:21                 ` David Gibson
2017-10-20 19:53                   ` Eduardo Habkost
2017-10-23  8:17                     ` Igor Mammedov
2017-10-23  8:45                     ` Igor Mammedov
2017-10-25  6:57                       ` Eduardo Habkost
2017-10-25  7:02                         ` Daniel P. Berrange
2017-10-25 13:37                           ` Eduardo Habkost
2017-10-19 15:21           ` Igor Mammedov
2017-10-19 15:28             ` Daniel P. Berrange
2017-10-19 19:56               ` Eduardo Habkost
2017-10-20  9:07                 ` Daniel P. Berrange
2017-10-20 20:07                   ` Eduardo Habkost
2017-10-23  8:53                     ` Igor Mammedov
2017-10-23 10:04                   ` Igor Mammedov
2017-10-23 10:19                     ` Daniel P. Berrange
2017-10-18 12:19       ` Paolo Bonzini
2017-10-18 12:27         ` Daniel P. Berrange
2017-10-18 12:33           ` Paolo Bonzini
2017-10-18 14:26             ` Igor Mammedov
2017-10-18 14:29               ` Paolo Bonzini
2017-10-18 14:54                 ` Igor Mammedov
2017-10-18 14:21           ` Igor Mammedov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.