All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option
@ 2017-03-22 13:32 Igor Mammedov
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 01/23] tests: add CPUs to numa node mapping test Igor Mammedov
                   ` (23 more replies)
  0 siblings, 24 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

Changes since RFC:
    * convert all targets that support numa (Eduardo)
    * add numa CLI tests
    * support wildcard matching with "-numa cpu,..." (Paolo)

Series introduces a new CLI option to allow mapping cpus to numa
nodes using public properties [socket|core|thread]-ids instead of
internal cpu_index and moving internal handling of cpu<->node
mapping from cpu_index based global bitmaps to MachineState.

New '-numa cpu' option is supported only on PC and SPAPR
machines that implement hotpluggable-cpus query.
ARM machine user-facing interface stays cpu_index based due to
lack of hotpluggable-cpus support, but internally cpu<->node
mapping will be using the common for PC/SPAPR/ARM approach
(i.e. store mapping info in MachineState:possible_cpus)

It only provides CLI interface to do mapping, there is no QMP
one as I haven't found a suitable place/way to update/set mapping
after machine_done for QEMU started with -S (stopped mode) so that
mgmt could query hopluggable-cpus first, then map them to numa nodes
in runtime before actually allowing guest to run.

Another alternative I've been considering is to add CLI option
similar to -S but that would pause initialization before machine_init()
callback is run so that user can get CPU layout with hopluggable-cpus,
then map CPUs to numa nodes and unpause to let machine_init() initialize
machine using previously predefined numa mapping.
Such option might also be useful for other usecases.


git repo for testing:
   https://github.com/imammedo/qemu.git cphp_numa_cfg_v1
reference to RFC:
   https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg03693.html

CC: Eduardo Habkost <ehabkost@redhat.com>
CC: Peter Maydell <peter.maydell@linaro.org>
CC: Andrew Jones <drjones@redhat.com>
CC: David Gibson <david@gibson.dropbear.id.au>
CC: Eric Blake <eblake@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Shannon Zhao <zhaoshenglong@huawei.com>
CC: qemu-arm@nongnu.org
CC: qemu-ppc@nongnu.org

Igor Mammedov (23):
  tests: add CPUs to numa node mapping test
  hw/arm/virt: extract mp-affinity calculation in separate function
  hw/arm/virt: use machine->possible_cpus for storing possible topology
    info
  hw/arm/virt: explicitly allocate cpu_index for cpus
  numa: move source of default CPUs to NUMA node mapping into boards
  spapr: add node-id property to sPAPR core
  pc: add node-id property to CPU
  virt-arm: add node-id property to CPU
  numa: add check that board supports cpu_index to node mapping
  numa: mirror cpu to node mapping in MachineState::possible_cpus
  numa: do default mapping based on possible_cpus instead of node_cpu
    bitmaps
  pc: get numa node mapping from possible_cpus instead of
    numa_get_node_for_cpu()
  spapr: get numa node mapping from possible_cpus instead of
    numa_get_node_for_cpu()
  virt-arm: get numa node mapping from possible_cpus instead of
    numa_get_node_for_cpu()
  QMP: include CpuInstanceProperties into query_cpus output output
  tests: numa: add case for QMP command query-cpus
  numa: remove no longer used numa_get_node_for_cpu()
  numa: remove no longer need numa_post_machine_init()
  machine: call machine init from wrapper
  numa: use possible_cpus for not mapped CPUs check
  numa: remove node_cpu bitmaps as they are no longer used
  numa: add '-numa cpu,...' option for property based node mapping
  tests: check -numa node,cpu=props_list usecase

 include/hw/boards.h             |  11 +-
 include/hw/ppc/spapr_cpu_core.h |   1 +
 include/qom/cpu.h               |   2 +
 include/sysemu/numa.h           |   8 +-
 cpus.c                          |   9 ++
 hw/acpi/cpu.c                   |   7 +-
 hw/arm/virt-acpi-build.c        |  19 +--
 hw/arm/virt.c                   | 137 +++++++++++++++---
 hw/core/machine.c               | 132 ++++++++++++++++++
 hw/i386/acpi-build.c            |  11 +-
 hw/i386/pc.c                    |  53 +++++--
 hw/ppc/spapr.c                  |  44 +++++-
 hw/ppc/spapr_cpu_core.c         |  21 ++-
 numa.c                          | 145 +++++++------------
 qapi-schema.json                |  13 +-
 qemu-options.hx                 |  23 ++-
 target/arm/cpu.c                |   1 +
 target/i386/cpu.c               |   1 +
 tests/Makefile.include          |   5 +
 tests/numa-test.c               | 301 ++++++++++++++++++++++++++++++++++++++++
 vl.c                            |   6 +-
 21 files changed, 761 insertions(+), 189 deletions(-)
 create mode 100644 tests/numa-test.c

--
2.7.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 01/23] tests: add CPUs to numa node mapping test
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-03-27  0:31   ` David Gibson
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 02/23] hw/arm/virt: extract mp-affinity calculation in separate function Igor Mammedov
                   ` (22 subsequent siblings)
  23 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 tests/Makefile.include |   5 +++
 tests/numa-test.c      | 106 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 111 insertions(+)
 create mode 100644 tests/numa-test.c

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 402e71c..4547b01 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -260,6 +260,7 @@ check-qtest-i386-y += tests/test-filter-mirror$(EXESUF)
 check-qtest-i386-y += tests/test-filter-redirector$(EXESUF)
 check-qtest-i386-y += tests/postcopy-test$(EXESUF)
 check-qtest-i386-y += tests/test-x86-cpuid-compat$(EXESUF)
+check-qtest-i386-y += tests/numa-test$(EXESUF)
 check-qtest-x86_64-y += $(check-qtest-i386-y)
 gcov-files-i386-y += i386-softmmu/hw/timer/mc146818rtc.c
 gcov-files-x86_64-y = $(subst i386-softmmu/,x86_64-softmmu/,$(gcov-files-i386-y))
@@ -300,6 +301,7 @@ check-qtest-ppc64-y += tests/test-netfilter$(EXESUF)
 check-qtest-ppc64-y += tests/test-filter-mirror$(EXESUF)
 check-qtest-ppc64-y += tests/test-filter-redirector$(EXESUF)
 check-qtest-ppc64-y += tests/display-vga-test$(EXESUF)
+check-qtest-ppc64-y += tests/numa-test$(EXESUF)
 check-qtest-ppc64-$(CONFIG_EVENTFD) += tests/ivshmem-test$(EXESUF)
 
 check-qtest-sh4-y = tests/endianness-test$(EXESUF)
@@ -324,6 +326,8 @@ gcov-files-arm-y += arm-softmmu/hw/block/virtio-blk.c
 check-qtest-arm-y += tests/test-arm-mptimer$(EXESUF)
 gcov-files-arm-y += hw/timer/arm_mptimer.c
 
+check-qtest-aarch64-y = tests/numa-test$(EXESUF)
+
 check-qtest-microblazeel-y = $(check-qtest-microblaze-y)
 
 check-qtest-xtensaeb-y = $(check-qtest-xtensa-y)
@@ -747,6 +751,7 @@ tests/vhost-user-bridge$(EXESUF): tests/vhost-user-bridge.o contrib/libvhost-use
 tests/test-uuid$(EXESUF): tests/test-uuid.o $(test-util-obj-y)
 tests/test-arm-mptimer$(EXESUF): tests/test-arm-mptimer.o
 tests/test-qapi-util$(EXESUF): tests/test-qapi-util.o $(test-util-obj-y)
+tests/numa-test$(EXESUF): tests/numa-test.o
 
 tests/migration/stress$(EXESUF): tests/migration/stress.o
 	$(call quiet-command, $(LINKPROG) -static -O3 $(PTHREAD_LIB) -o $@ $< ,"LINK","$(TARGET_DIR)$@")
diff --git a/tests/numa-test.c b/tests/numa-test.c
new file mode 100644
index 0000000..f5da0c8
--- /dev/null
+++ b/tests/numa-test.c
@@ -0,0 +1,106 @@
+/*
+ * NUMA configuration test cases
+ *
+ * Copyright (c) 2017 Red Hat Inc.
+ * Authors:
+ *  Igor Mammedov <imammedo@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+
+static char *make_cli(const char *generic_cli, const char *test_cli)
+{
+    return g_strdup_printf("%s %s", generic_cli ? generic_cli : "", test_cli);
+}
+
+static char *hmp_info_numa(void)
+{
+    QDict *resp;
+    char *s;
+
+    resp = qmp("{ 'execute': 'human-monitor-command', 'arguments': "
+                      "{ 'command-line': 'info numa '} }");
+    g_assert(resp);
+    g_assert(qdict_haskey(resp, "return"));
+    s = g_strdup(qdict_get_str(resp, "return"));
+    g_assert(s);
+    QDECREF(resp);
+    return s;
+}
+
+static void test_mon_explicit(const void *data)
+{
+    char *s;
+    char *cli;
+
+    cli = make_cli(data, "-smp 8 "
+                   "-numa node,nodeid=0,cpus=0-3 "
+                   "-numa node,nodeid=1,cpus=4-7 ");
+    qtest_start(cli);
+
+    s = hmp_info_numa();
+    g_assert(strstr(s, "node 0 cpus: 0 1 2 3"));
+    g_assert(strstr(s, "node 1 cpus: 4 5 6 7"));
+    g_free(s);
+
+    qtest_end();
+    g_free(cli);
+}
+
+static void test_mon_default(const void *data)
+{
+    char *s;
+    char *cli;
+
+    cli = make_cli(data, "-smp 8 -numa node -numa node");
+    qtest_start(cli);
+
+    s = hmp_info_numa();
+    g_assert(strstr(s, "node 0 cpus: 0 2 4 6"));
+    g_assert(strstr(s, "node 1 cpus: 1 3 5 7"));
+    g_free(s);
+
+    qtest_end();
+    g_free(cli);
+}
+
+static void test_mon_partial(const void *data)
+{
+    char *s;
+    char *cli;
+
+    cli = make_cli(data, "-smp 8 "
+                   "-numa node,nodeid=0,cpus=0-1 "
+                   "-numa node,nodeid=1,cpus=4-5 ");
+    qtest_start(cli);
+
+    s = hmp_info_numa();
+    g_assert(strstr(s, "node 0 cpus: 0 1 2 3 6 7"));
+    g_assert(strstr(s, "node 1 cpus: 4 5"));
+    g_free(s);
+
+    qtest_end();
+    g_free(cli);
+}
+
+int main(int argc, char **argv)
+{
+    const char *args = NULL;
+    const char *arch = qtest_get_arch();
+
+    if (strcmp(arch, "aarch64") == 0) {
+        args = "-machine virt";
+    }
+
+    g_test_init(&argc, &argv, NULL);
+
+    qtest_add_data_func("/numa/mon/default", args, test_mon_default);
+    qtest_add_data_func("/numa/mon/cpus/explicit", args, test_mon_explicit);
+    qtest_add_data_func("/numa/mon/cpus/partial", args, test_mon_partial);
+
+    return g_test_run();
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 02/23] hw/arm/virt: extract mp-affinity calculation in separate function
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 01/23] tests: add CPUs to numa node mapping test Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-04-25 14:09   ` Andrew Jones
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 03/23] hw/arm/virt: use machine->possible_cpus for storing possible topology info Igor Mammedov
                   ` (21 subsequent siblings)
  23 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/arm/virt.c | 59 ++++++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 42 insertions(+), 17 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 5f62a03..484754e 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1194,6 +1194,45 @@ void virt_machine_done(Notifier *notifier, void *data)
     virt_build_smbios(vms);
 }
 
+static uint64_t virt_idx2mp_affinity(VirtMachineState *vms, int idx)
+{
+    uint64_t mp_affinity;
+    uint8_t clustersz;
+    VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
+
+    if (!vmc->disallow_affinity_adjustment) {
+        uint8_t aff0, aff1;
+
+        if (vms->gic_version == 3) {
+            clustersz = GICV3_TARGETLIST_BITS;
+        } else {
+            clustersz = GIC_TARGETLIST_BITS;
+        }
+
+        /* Adjust MPIDR like 64-bit KVM hosts, which incorporate the
+         * GIC's target-list limitations. 32-bit KVM hosts currently
+         * always create clusters of 4 CPUs, but that is expected to
+         * change when they gain support for gicv3. When KVM is enabled
+         * it will override the changes we make here, therefore our
+         * purposes are to make TCG consistent (with 64-bit KVM hosts)
+         * and to improve SGI efficiency.
+         */
+        aff1 = idx / clustersz;
+        aff0 = idx % clustersz;
+        mp_affinity = (aff1 << ARM_AFF1_SHIFT) | aff0;
+    } else {
+        /* This cpu-id-to-MPIDR affinity is used only for TCG;
+         * KVM will override it. We don't support setting cluster ID
+         * ([16..23]) (known as Aff2 in later ARM ARM versions), or any of
+         * the higher affinity level fields, so these bits always RAZ.
+         */
+        uint32_t Aff1 = idx / ARM_DEFAULT_CPUS_PER_CLUSTER;
+        uint32_t Aff0 = idx % ARM_DEFAULT_CPUS_PER_CLUSTER;
+        mp_affinity = (Aff1 << ARM_AFF1_SHIFT) | Aff0;
+    }
+    return mp_affinity;
+}
+
 static void machvirt_init(MachineState *machine)
 {
     VirtMachineState *vms = VIRT_MACHINE(machine);
@@ -1210,7 +1249,6 @@ static void machvirt_init(MachineState *machine)
     CPUClass *cc;
     Error *err = NULL;
     bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
-    uint8_t clustersz;
 
     if (!cpu_model) {
         cpu_model = "cortex-a15";
@@ -1263,10 +1301,8 @@ static void machvirt_init(MachineState *machine)
      */
     if (vms->gic_version == 3) {
         virt_max_cpus = vms->memmap[VIRT_GIC_REDIST].size / 0x20000;
-        clustersz = GICV3_TARGETLIST_BITS;
     } else {
         virt_max_cpus = GIC_NCPU;
-        clustersz = GIC_TARGETLIST_BITS;
     }
 
     if (max_cpus > virt_max_cpus) {
@@ -1326,20 +1362,9 @@ static void machvirt_init(MachineState *machine)
 
     for (n = 0; n < smp_cpus; n++) {
         Object *cpuobj = object_new(typename);
-        if (!vmc->disallow_affinity_adjustment) {
-            /* Adjust MPIDR like 64-bit KVM hosts, which incorporate the
-             * GIC's target-list limitations. 32-bit KVM hosts currently
-             * always create clusters of 4 CPUs, but that is expected to
-             * change when they gain support for gicv3. When KVM is enabled
-             * it will override the changes we make here, therefore our
-             * purposes are to make TCG consistent (with 64-bit KVM hosts)
-             * and to improve SGI efficiency.
-             */
-            uint8_t aff1 = n / clustersz;
-            uint8_t aff0 = n % clustersz;
-            object_property_set_int(cpuobj, (aff1 << ARM_AFF1_SHIFT) | aff0,
-                                    "mp-affinity", NULL);
-        }
+
+        object_property_set_int(cpuobj, virt_idx2mp_affinity(vms, n),
+                                "mp-affinity", NULL);
 
         if (!vms->secure) {
             object_property_set_bool(cpuobj, false, "has_el3", NULL);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 03/23] hw/arm/virt: use machine->possible_cpus for storing possible topology info
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 01/23] tests: add CPUs to numa node mapping test Igor Mammedov
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 02/23] hw/arm/virt: extract mp-affinity calculation in separate function Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-04-25 14:28   ` Andrew Jones
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 04/23] hw/arm/virt: explicitly allocate cpu_index for cpus Igor Mammedov
                   ` (20 subsequent siblings)
  23 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

for now precalculate and store mp_afinity in possible_cpus
as ARM cpus don't have socket/core/thread-id properties yet.
In follow patches possible_cpus will be used for storing
and setting NUMA node mapping and replace legacy bitmap
based numa_info[node_id].node_cpu/numa_get_node_for_cpu()

For the lack of better idea, this patch cannibalizes
possible_cpus.cpus[x].props.thread_id so that
*_cpu_index_to_props() callback could return addressable
by props CPU which will used by machine_set_cpu_numa_node()
in follow up patches to assign a CPU to node. But
cannibalizing is fine for now as that thread_id isn't exposed
to users (no hotpluggable_cpus callback support for ARM yet)
and it will be used only internally until 'device_add cpu'
is supported where we can decide on which properties to use.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/arm/virt.c | 39 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 484754e..4de46b1 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1237,6 +1237,7 @@ static void machvirt_init(MachineState *machine)
 {
     VirtMachineState *vms = VIRT_MACHINE(machine);
     VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(machine);
+    MachineClass *mc = MACHINE_GET_CLASS(machine);
     qemu_irq pic[NUM_IRQS];
     MemoryRegion *sysmem = get_system_memory();
     MemoryRegion *secure_sysmem = NULL;
@@ -1360,10 +1361,16 @@ static void machvirt_init(MachineState *machine)
         exit(1);
     }
 
-    for (n = 0; n < smp_cpus; n++) {
-        Object *cpuobj = object_new(typename);
+    mc->possible_cpu_arch_ids(machine);
+    for (n = 0; n < machine->possible_cpus->len; n++) {
+        Object *cpuobj;
 
-        object_property_set_int(cpuobj, virt_idx2mp_affinity(vms, n),
+        if (n >= smp_cpus) {
+            break;
+        }
+
+        cpuobj = object_new(typename);
+        object_property_set_int(cpuobj, machine->possible_cpus->cpus[n].arch_id,
                                 "mp-affinity", NULL);
 
         if (!vms->secure) {
@@ -1543,6 +1550,31 @@ static void virt_set_gic_version(Object *obj, const char *value, Error **errp)
     }
 }
 
+static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
+{
+    int n;
+    VirtMachineState *vms = VIRT_MACHINE(ms);
+
+    if (ms->possible_cpus) {
+        assert(ms->possible_cpus->len == max_cpus);
+        return ms->possible_cpus;
+    }
+
+    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
+                                  sizeof(CPUArchId) * max_cpus);
+    ms->possible_cpus->len = max_cpus;
+    for (n = 0; n < ms->possible_cpus->len; n++) {
+        ms->possible_cpus->cpus[n].arch_id =
+            virt_idx2mp_affinity(vms, n);
+        ms->possible_cpus->cpus[n].props.has_thread_id = true;
+        ms->possible_cpus->cpus[n].props.thread_id = n;
+
+        /* TODO: add 'has_node/node' here to describe
+           to which node core belongs */
+    }
+    return ms->possible_cpus;
+}
+
 static void virt_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
@@ -1559,6 +1591,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
     mc->pci_allow_0_address = true;
     /* We know we will never create a pre-ARMv7 CPU which needs 1K pages */
     mc->minimum_page_bits = 12;
+    mc->possible_cpu_arch_ids = virt_possible_cpu_arch_ids;
 }
 
 static const TypeInfo virt_machine_info = {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 04/23] hw/arm/virt: explicitly allocate cpu_index for cpus
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (2 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 03/23] hw/arm/virt: use machine->possible_cpus for storing possible topology info Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-04-25 14:33   ` Andrew Jones
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards Igor Mammedov
                   ` (19 subsequent siblings)
  23 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

Currently cpu_index is implicitly auto assigned during
cpu.realize() time cpu_exec_realizefn()->cpu_list_add().

It happens to match index in possible_cpus so take
control over it and make board initialize cpu_index
to possible_cpus index explicitly. It will at least
document that board is in control of it and when
'-device cpu' support comes it will keep cpu_index
stable regardless of order cpus are created so it won't
break migration.
Within this series it will be used for internal
conversion from storing cpu_index based NUMA node
bitmaps to property based mapping with possible_cpus,
And will allow map cpu_index to a CPU entry in
possible_cpus array.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/arm/virt.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 4de46b1..0cbcbc1 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1364,6 +1364,7 @@ static void machvirt_init(MachineState *machine)
     mc->possible_cpu_arch_ids(machine);
     for (n = 0; n < machine->possible_cpus->len; n++) {
         Object *cpuobj;
+        CPUState *cs;
 
         if (n >= smp_cpus) {
             break;
@@ -1373,6 +1374,9 @@ static void machvirt_init(MachineState *machine)
         object_property_set_int(cpuobj, machine->possible_cpus->cpus[n].arch_id,
                                 "mp-affinity", NULL);
 
+        cs = CPU(cpuobj);
+        cs->cpu_index = n;
+
         if (!vms->secure) {
             object_property_set_bool(cpuobj, false, "has_el3", NULL);
         }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (3 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 04/23] hw/arm/virt: explicitly allocate cpu_index for cpus Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-03-23  6:10   ` Bharata B Rao
                     ` (2 more replies)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 06/23] spapr: add node-id property to sPAPR core Igor Mammedov
                   ` (18 subsequent siblings)
  23 siblings, 3 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

Originally CPU threads were by default assigned in
round-robin fashion. However it was causing issues in
guest since CPU threads from the same socket/core could
be placed on different NUMA nodes.
Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
fixed it by grouping threads within a socket on the same node
introducing cpu_index_to_socket_id() callback and commit
20bb648d (spapr: Fix default NUMA node allocation for threads)
reused callback to fix similar issues for SPAPR machine
even though socket doesn't make much sense there.

As result QEMU ended up having 3 default distribution rules
used by 3 targets /virt-arm, spapr, pc/.

In effort of moving NUMA mapping for CPUs into possible_cpus,
generalize default mapping in numa.c by making boards decide
on default mapping and let them explicitly tell generic
numa code to which node a CPU thread belongs to by replacing
cpu_index_to_socket_id() with @cpu_index_to_instance_props()
which provides default node_id assigned by board to specified
cpu_index.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
Patch only moves source of default mapping to possible_cpus[]
and leaves the rest of NUMA handling to numa_info[node_id].node_cpu
bitmaps. It's up to follow up patches to replace bitmaps
with possible_cpus[] internally.
---
 include/hw/boards.h   |  8 ++++++--
 include/sysemu/numa.h |  2 +-
 hw/arm/virt.c         | 19 +++++++++++++++++--
 hw/i386/pc.c          | 22 ++++++++++++++++------
 hw/ppc/spapr.c        | 27 ++++++++++++++++++++-------
 numa.c                | 15 +++++++++------
 vl.c                  |  2 +-
 7 files changed, 70 insertions(+), 25 deletions(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 269d0ba..1dd0fde 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -74,7 +74,10 @@ typedef struct {
  *    of HotplugHandler object, which handles hotplug operation
  *    for a given @dev. It may return NULL if @dev doesn't require
  *    any actions to be performed by hotplug handler.
- * @cpu_index_to_socket_id:
+ * @cpu_index_to_instance_props:
+ *    used to provide @cpu_index to socket/core/thread number mapping, allowing
+ *    legacy code to perform maping from cpu_index to topology properties
+ *    Returns: tuple of socket/core/thread ids given cpu_index belongs to.
  *    used to provide @cpu_index to socket number mapping, allowing
  *    a machine to group CPU threads belonging to the same socket/package
  *    Returns: socket number given cpu_index belongs to.
@@ -138,7 +141,8 @@ struct MachineClass {
 
     HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
                                            DeviceState *dev);
-    unsigned (*cpu_index_to_socket_id)(unsigned cpu_index);
+    CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine,
+                                                         unsigned cpu_index);
     const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
 };
 
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 8f09dcf..46ea6c7 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -24,7 +24,7 @@ typedef struct node_info {
 } NodeInfo;
 
 extern NodeInfo numa_info[MAX_NODES];
-void parse_numa_opts(MachineClass *mc);
+void parse_numa_opts(MachineState *ms);
 void numa_post_machine_init(void);
 void query_numa_node_mem(uint64_t node_mem[]);
 extern QemuOptsList qemu_numa_opts;
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 0cbcbc1..8748d25 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1554,6 +1554,16 @@ static void virt_set_gic_version(Object *obj, const char *value, Error **errp)
     }
 }
 
+static CpuInstanceProperties
+virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
+{
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
+    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
+
+    assert(cpu_index < possible_cpus->len);
+    return possible_cpus->cpus[cpu_index].props;;
+}
+
 static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
 {
     int n;
@@ -1573,8 +1583,12 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
         ms->possible_cpus->cpus[n].props.has_thread_id = true;
         ms->possible_cpus->cpus[n].props.thread_id = n;
 
-        /* TODO: add 'has_node/node' here to describe
-           to which node core belongs */
+        /* default distribution of CPUs over NUMA nodes */
+        if (nb_numa_nodes) {
+            /* preset values but do not enable them i.e. 'has_node_id = false',
+             * board will enable them if manual mapping wasn't present on CLI */
+            ms->possible_cpus->cpus[n].props.node_id = n % nb_numa_nodes;;
+        }
     }
     return ms->possible_cpus;
 }
@@ -1596,6 +1610,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
     /* We know we will never create a pre-ARMv7 CPU which needs 1K pages */
     mc->minimum_page_bits = 12;
     mc->possible_cpu_arch_ids = virt_possible_cpu_arch_ids;
+    mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
 }
 
 static const TypeInfo virt_machine_info = {
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index d24388e..7031100 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2245,12 +2245,14 @@ static void pc_machine_reset(void)
     }
 }
 
-static unsigned pc_cpu_index_to_socket_id(unsigned cpu_index)
+static CpuInstanceProperties
+pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
 {
-    X86CPUTopoInfo topo;
-    x86_topo_ids_from_idx(smp_cores, smp_threads, cpu_index,
-                          &topo);
-    return topo.pkg_id;
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
+    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
+
+    assert(cpu_index < possible_cpus->len);
+    return possible_cpus->cpus[cpu_index].props;;
 }
 
 static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
@@ -2282,6 +2284,14 @@ static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
         ms->possible_cpus->cpus[i].props.core_id = topo.core_id;
         ms->possible_cpus->cpus[i].props.has_thread_id = true;
         ms->possible_cpus->cpus[i].props.thread_id = topo.smt_id;
+
+        /* default distribution of CPUs over NUMA nodes */
+        if (nb_numa_nodes) {
+            /* preset values but do not enable them i.e. 'has_node_id = false',
+             * board will enable them if manual mapping wasn't present on CLI */
+            ms->possible_cpus->cpus[i].props.node_id =
+                topo.pkg_id % nb_numa_nodes;
+        }
     }
     return ms->possible_cpus;
 }
@@ -2324,7 +2334,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
     pcmc->acpi_data_size = 0x20000 + 0x8000;
     pcmc->save_tsc_khz = true;
     mc->get_hotplug_handler = pc_get_hotpug_handler;
-    mc->cpu_index_to_socket_id = pc_cpu_index_to_socket_id;
+    mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
     mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
     mc->has_hotpluggable_cpus = true;
     mc->default_boot_order = "cad";
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 6ee566d..9dcbbcc 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2921,11 +2921,18 @@ static HotplugHandler *spapr_get_hotplug_handler(MachineState *machine,
     return NULL;
 }
 
-static unsigned spapr_cpu_index_to_socket_id(unsigned cpu_index)
+static CpuInstanceProperties
+spapr_cpu_index_to_props(MachineState *machine, unsigned cpu_index)
 {
-    /* Allocate to NUMA nodes on a "socket" basis (not that concept of
-     * socket means much for the paravirtualized PAPR platform) */
-    return cpu_index / smp_threads / smp_cores;
+    CPUArchId *core_slot;
+    MachineClass *mc = MACHINE_GET_CLASS(machine);
+    int core_id = cpu_index / smp_threads * smp_threads;
+
+    /* make sure possible_cpu are intialized */
+    mc->possible_cpu_arch_ids(machine);
+    core_slot = spapr_find_cpu_slot(machine, core_id, NULL);
+    assert(core_slot);
+    return core_slot->props;
 }
 
 static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
@@ -2952,8 +2959,14 @@ static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
         machine->possible_cpus->cpus[i].arch_id = core_id;
         machine->possible_cpus->cpus[i].props.has_core_id = true;
         machine->possible_cpus->cpus[i].props.core_id = core_id;
-        /* TODO: add 'has_node/node' here to describe
-           to which node core belongs */
+
+        /* default distribution of CPUs over NUMA nodes */
+        if (nb_numa_nodes) {
+            /* preset values but do not enable them i.e. 'has_node_id = false',
+             * board will enable them if manual mapping wasn't present on CLI */
+            machine->possible_cpus->cpus[i].props.node_id =
+                core_id / smp_threads / smp_cores % nb_numa_nodes;
+        }
     }
     return machine->possible_cpus;
 }
@@ -3076,7 +3089,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
     hc->pre_plug = spapr_machine_device_pre_plug;
     hc->plug = spapr_machine_device_plug;
     hc->unplug = spapr_machine_device_unplug;
-    mc->cpu_index_to_socket_id = spapr_cpu_index_to_socket_id;
+    mc->cpu_index_to_instance_props = spapr_cpu_index_to_props;
     mc->possible_cpu_arch_ids = spapr_possible_cpu_arch_ids;
     hc->unplug_request = spapr_machine_device_unplug_request;
 
diff --git a/numa.c b/numa.c
index e01cb54..b6e71bc 100644
--- a/numa.c
+++ b/numa.c
@@ -294,9 +294,10 @@ static void validate_numa_cpus(void)
     g_free(seen_cpus);
 }
 
-void parse_numa_opts(MachineClass *mc)
+void parse_numa_opts(MachineState *ms)
 {
     int i;
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
 
     for (i = 0; i < MAX_NODES; i++) {
         numa_info[i].node_cpu = bitmap_new(max_cpus);
@@ -378,14 +379,16 @@ void parse_numa_opts(MachineClass *mc)
          * rule grouping VCPUs by socket so that VCPUs from the same socket
          * would be on the same node.
          */
+        if (!mc->cpu_index_to_instance_props) {
+            error_report("default CPUs to NUMA node mapping isn't supported");
+            exit(1);
+        }
         if (i == nb_numa_nodes) {
             for (i = 0; i < max_cpus; i++) {
-                unsigned node_id = i % nb_numa_nodes;
-                if (mc->cpu_index_to_socket_id) {
-                    node_id = mc->cpu_index_to_socket_id(i) % nb_numa_nodes;
-                }
+                CpuInstanceProperties props;
+                props = mc->cpu_index_to_instance_props(ms, i);
 
-                set_bit(i, numa_info[node_id].node_cpu);
+                set_bit(i, numa_info[props.node_id].node_cpu);
             }
         }
 
diff --git a/vl.c b/vl.c
index 0b4ed52..5ffb9c3 100644
--- a/vl.c
+++ b/vl.c
@@ -4498,7 +4498,7 @@ int main(int argc, char **argv, char **envp)
     default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
     default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);
 
-    parse_numa_opts(machine_class);
+    parse_numa_opts(current_machine);
 
     if (qemu_opts_foreach(qemu_find_opts("mon"),
                           mon_init_func, NULL, NULL)) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 06/23] spapr: add node-id property to sPAPR core
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (4 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-03-28  4:23   ` David Gibson
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 07/23] pc: add node-id property to CPU Igor Mammedov
                   ` (17 subsequent siblings)
  23 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

it will allow switching from cpu_index to core based numa
mapping in follow up patches.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 include/hw/ppc/spapr_cpu_core.h |  1 +
 include/qom/cpu.h               |  2 ++
 hw/ppc/spapr.c                  | 17 +++++++++++++++++
 hw/ppc/spapr_cpu_core.c         | 11 ++++++++---
 4 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h
index 3c35665..93051e9 100644
--- a/include/hw/ppc/spapr_cpu_core.h
+++ b/include/hw/ppc/spapr_cpu_core.h
@@ -27,6 +27,7 @@ typedef struct sPAPRCPUCore {
 
     /*< public >*/
     void *threads;
+    int node_id;
 } sPAPRCPUCore;
 
 typedef struct sPAPRCPUCoreClass {
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index c3292ef..7f27d56 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -258,6 +258,8 @@ typedef void (*run_on_cpu_func)(CPUState *cpu, run_on_cpu_data data);
 
 struct qemu_work_item;
 
+#define CPU_UNSET_NUMA_NODE_ID -1
+
 /**
  * CPUState:
  * @cpu_index: CPU index (informative).
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 9dcbbcc..9c61721 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2770,9 +2770,11 @@ static void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
     MachineClass *mc = MACHINE_GET_CLASS(hotplug_dev);
     Error *local_err = NULL;
     CPUCore *cc = CPU_CORE(dev);
+    sPAPRCPUCore *sc = SPAPR_CPU_CORE(dev);
     char *base_core_type = spapr_get_cpu_core_type(machine->cpu_model);
     const char *type = object_get_typename(OBJECT(dev));
     CPUArchId *core_slot;
+    int node_id;
     int index;
 
     if (dev->hotplugged && !mc->has_hotpluggable_cpus) {
@@ -2801,6 +2803,21 @@ static void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
         goto out;
     }
 
+    node_id = numa_get_node_for_cpu(cc->core_id);
+    if (node_id == nb_numa_nodes) {
+        /* by default CPUState::numa_node was 0 if it's not set via CLI
+         * keep it this way for now but in future we probably should
+         * refuse to start up with incomplete numa mapping */
+        node_id = 0;
+    }
+    if (sc->node_id == CPU_UNSET_NUMA_NODE_ID) {
+        sc->node_id = node_id;
+    } else if (sc->node_id != node_id) {
+        error_setg(&local_err, "node-id %d must match numa node specified"
+            "with -numa option for cpu-index %d", sc->node_id, cc->core_id);
+        goto out;
+    }
+
 out:
     g_free(base_core_type);
     error_propagate(errp, local_err);
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 6883f09..25988f8 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -163,7 +163,6 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error **errp)
     const char *typename = object_class_get_name(scc->cpu_class);
     size_t size = object_type_get_instance_size(typename);
     Error *local_err = NULL;
-    int core_node_id = numa_get_node_for_cpu(cc->core_id);;
     void *obj;
     int i, j;
 
@@ -181,10 +180,10 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error **errp)
 
         /* Set NUMA node for the added CPUs  */
         node_id = numa_get_node_for_cpu(cs->cpu_index);
-        if (node_id != core_node_id) {
+        if (node_id != sc->node_id) {
             error_setg(&local_err, "Invalid node-id=%d of thread[cpu-index: %d]"
                 " on CPU[core-id: %d, node-id: %d], node-id must be the same",
-                 node_id, cs->cpu_index, cc->core_id, core_node_id);
+                 node_id, cs->cpu_index, cc->core_id, sc->node_id);
             goto err;
         }
         if (node_id < nb_numa_nodes) {
@@ -250,6 +249,11 @@ static const char *spapr_core_models[] = {
     "POWER9_v1.0",
 };
 
+static Property spapr_cpu_core_properties[] = {
+    DEFINE_PROP_INT32("node-id", sPAPRCPUCore, node_id, CPU_UNSET_NUMA_NODE_ID),
+    DEFINE_PROP_END_OF_LIST()
+};
+
 void spapr_cpu_core_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
@@ -257,6 +261,7 @@ void spapr_cpu_core_class_init(ObjectClass *oc, void *data)
 
     dc->realize = spapr_cpu_core_realize;
     dc->unrealize = spapr_cpu_core_unrealizefn;
+    dc->props = spapr_cpu_core_properties;
     scc->cpu_class = cpu_class_by_name(TYPE_POWERPC_CPU, data);
     g_assert(scc->cpu_class);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 07/23] pc: add node-id property to CPU
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (5 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 06/23] spapr: add node-id property to sPAPR core Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-04-12 21:02   ` Eduardo Habkost
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 08/23] virt-arm: " Igor Mammedov
                   ` (16 subsequent siblings)
  23 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

it will allow switching from cpu_index to property based
numa mapping in follow up patches.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/i386/pc.c      | 17 +++++++++++++++++
 target/i386/cpu.c |  1 +
 2 files changed, 18 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 7031100..873bbfa 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1895,6 +1895,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
                             DeviceState *dev, Error **errp)
 {
     int idx;
+    int node_id;
     CPUState *cs;
     CPUArchId *cpu_slot;
     X86CPUTopoInfo topo;
@@ -1984,6 +1985,22 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
 
     cs = CPU(cpu);
     cs->cpu_index = idx;
+
+    node_id = numa_get_node_for_cpu(cs->cpu_index);
+    if (node_id == nb_numa_nodes) {
+        /* by default CPUState::numa_node was 0 if it's not set via CLI
+         * keep it this way for now but in future we probably should
+         * refuse to start up with incomplete numa mapping */
+        node_id = 0;
+    }
+    if (cs->numa_node == CPU_UNSET_NUMA_NODE_ID) {
+        cs->numa_node = node_id;
+    } else if (cs->numa_node != node_id) {
+            error_setg(errp, "node-id %d must match numa node specified"
+                "with -numa option for cpu-index %d",
+                cs->numa_node, cs->cpu_index);
+            return;
+    }
 }
 
 static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 7aa7622..d690244 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3974,6 +3974,7 @@ static Property x86_cpu_properties[] = {
     DEFINE_PROP_INT32("core-id", X86CPU, core_id, -1),
     DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, -1),
 #endif
+    DEFINE_PROP_INT32("node-id", CPUState, numa_node, CPU_UNSET_NUMA_NODE_ID),
     DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false),
     { .name  = "hv-spinlocks", .info  = &qdev_prop_spinlocks },
     DEFINE_PROP_BOOL("hv-relaxed", X86CPU, hyperv_relaxed_timing, false),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 08/23] virt-arm: add node-id property to CPU
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (6 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 07/23] pc: add node-id property to CPU Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-04-25 17:16   ` Andrew Jones
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 09/23] numa: add check that board supports cpu_index to node mapping Igor Mammedov
                   ` (15 subsequent siblings)
  23 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

it will allow switching from cpu_index to property based
numa mapping in follow up patches.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/arm/virt.c    | 15 +++++++++++++++
 target/arm/cpu.c |  1 +
 2 files changed, 16 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 8748d25..68d44f3 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1365,6 +1365,7 @@ static void machvirt_init(MachineState *machine)
     for (n = 0; n < machine->possible_cpus->len; n++) {
         Object *cpuobj;
         CPUState *cs;
+        int node_id;
 
         if (n >= smp_cpus) {
             break;
@@ -1377,6 +1378,20 @@ static void machvirt_init(MachineState *machine)
         cs = CPU(cpuobj);
         cs->cpu_index = n;
 
+        node_id = numa_get_node_for_cpu(cs->cpu_index);
+        if (node_id == nb_numa_nodes) {
+            /* by default CPUState::numa_node was 0 if it's not set via CLI
+             * keep it this way for now but in future we probably should
+             * refuse to start up with incomplete numa mapping */
+             node_id = 0;
+        }
+        if (cs->numa_node == CPU_UNSET_NUMA_NODE_ID) {
+            cs->numa_node = node_id;
+        } else {
+            /* CPU isn't device_add compatible yet, this shouldn't happen */
+            error_setg(&error_abort, "user set node-id not implemented");
+        }
+
         if (!vms->secure) {
             object_property_set_bool(cpuobj, false, "has_el3", NULL);
         }
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 04b062c..a635048 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1606,6 +1606,7 @@ static Property arm_cpu_properties[] = {
     DEFINE_PROP_UINT32("midr", ARMCPU, midr, 0),
     DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
                         mp_affinity, ARM64_AFFINITY_INVALID),
+    DEFINE_PROP_INT32("node-id", CPUState, numa_node, CPU_UNSET_NUMA_NODE_ID),
     DEFINE_PROP_END_OF_LIST()
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 09/23] numa: add check that board supports cpu_index to node mapping
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (7 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 08/23] virt-arm: " Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 10/23] numa: mirror cpu to node mapping in MachineState::possible_cpus Igor Mammedov
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

Default node mapping initialization already checks that board
supports cpu_index to node mapping and refuses to start if
it's not supported. Do the same for explicitly provided
mapping "-numa node,cpus=..."

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 numa.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/numa.c b/numa.c
index b6e71bc..24c596d 100644
--- a/numa.c
+++ b/numa.c
@@ -140,10 +140,12 @@ uint32_t numa_get_node(ram_addr_t addr, Error **errp)
     return -1;
 }
 
-static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error **errp)
+static void numa_node_parse(MachineState *ms, NumaNodeOptions *node,
+                            QemuOpts *opts, Error **errp)
 {
     uint16_t nodenr;
     uint16List *cpus = NULL;
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
 
     if (node->has_nodeid) {
         nodenr = node->nodeid;
@@ -162,6 +164,10 @@ static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error **errp)
         return;
     }
 
+    if (!mc->cpu_index_to_instance_props) {
+        error_report("CPUs to NUMA node mapping isn't supported");
+        exit(1);
+    }
     for (cpus = node->cpus; cpus; cpus = cpus->next) {
         if (cpus->value >= max_cpus) {
             error_setg(errp,
@@ -215,6 +221,7 @@ static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error **errp)
 static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
 {
     NumaOptions *object = NULL;
+    MachineState *ms = opaque;
     Error *err = NULL;
 
     {
@@ -229,7 +236,7 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
 
     switch (object->type) {
     case NUMA_OPTIONS_TYPE_NODE:
-        numa_node_parse(&object->u.node, opts, &err);
+        numa_node_parse(ms, &object->u.node, opts, &err);
         if (err) {
             goto end;
         }
@@ -303,7 +310,7 @@ void parse_numa_opts(MachineState *ms)
         numa_info[i].node_cpu = bitmap_new(max_cpus);
     }
 
-    if (qemu_opts_foreach(qemu_find_opts("numa"), parse_numa, NULL, NULL)) {
+    if (qemu_opts_foreach(qemu_find_opts("numa"), parse_numa, ms, NULL)) {
         exit(1);
     }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 10/23] numa: mirror cpu to node mapping in MachineState::possible_cpus
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (8 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 09/23] numa: add check that board supports cpu_index to node mapping Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-03-28  4:44   ` David Gibson
                     ` (2 more replies)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 11/23] numa: do default mapping based on possible_cpus instead of node_cpu bitmaps Igor Mammedov
                   ` (13 subsequent siblings)
  23 siblings, 3 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

Introduce machine_set_cpu_numa_node() helper that stores
node mapping for CPU in MachineState::possible_cpus.
CPU and node it belongs to is specified by 'props' argument.

Patch doesn't remove old way of storing mapping in
numa_info[X].node_cpu as removing it at the same time
makes patch rather big. Instead it just mirrors mapping
in possible_cpus and follow up per target patches will
switch to possible_cpus and numa_info[X].node_cpu will
be removed once there isn't any users left.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 include/hw/boards.h |  2 ++
 hw/core/machine.c   | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 numa.c              |  8 +++++++
 3 files changed, 78 insertions(+)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 1dd0fde..40f30f1 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -42,6 +42,8 @@ bool machine_dump_guest_core(MachineState *machine);
 bool machine_mem_merge(MachineState *machine);
 void machine_register_compat_props(MachineState *machine);
 HotpluggableCPUList *machine_query_hotpluggable_cpus(MachineState *machine);
+void machine_set_cpu_numa_node(MachineState *machine,
+                               CpuInstanceProperties *props, Error **errp);
 
 /**
  * CPUArchId:
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 0d92672..6ff0b45 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -388,6 +388,74 @@ HotpluggableCPUList *machine_query_hotpluggable_cpus(MachineState *machine)
     return head;
 }
 
+void machine_set_cpu_numa_node(MachineState *machine,
+                               CpuInstanceProperties *props, Error **errp)
+{
+    MachineClass *mc = MACHINE_GET_CLASS(machine);
+    bool match = false;
+    int i;
+
+    if (!mc->possible_cpu_arch_ids) {
+        error_setg(errp, "mapping of CPUs to NUMA node is not supported");
+        return;
+    }
+
+    /* force board to initialize possible_cpus if it hasn't been done yet */
+    mc->possible_cpu_arch_ids(machine);
+
+    for (i = 0; i < machine->possible_cpus->len; i++) {
+        CPUArchId *slot = &machine->possible_cpus->cpus[i];
+
+        /* reject unsupported by board properties */
+        if (props->has_thread_id && !slot->props.has_thread_id) {
+            error_setg(errp, "thread-id is not supported");
+            return;
+        }
+
+        if (props->has_core_id && !slot->props.has_core_id) {
+            error_setg(errp, "core-id is not supported");
+            return;
+        }
+
+        if (props->has_socket_id && !slot->props.has_socket_id) {
+            error_setg(errp, "socket-id is not supported");
+            return;
+        }
+
+        /* skip slots with explicit mismatch */
+        if (props->has_thread_id && props->thread_id != slot->props.thread_id) {
+                continue;
+        }
+
+        if (props->has_core_id && props->core_id != slot->props.core_id) {
+                continue;
+        }
+
+        if (props->has_socket_id && props->socket_id != slot->props.socket_id) {
+                continue;
+        }
+
+        /* reject assignment if slot is already assigned, for compatibility
+         * of legacy cpu_index mapping with SPAPR core based mapping do not
+         * error out if cpu thread and matched core have the same node-id */
+        if (slot->props.has_node_id &&
+            slot->props.node_id != props->node_id) {
+            error_setg(errp, "CPU is already assigned to node-id: %" PRId64,
+                       slot->props.node_id);
+            return;
+        }
+
+        /* assign slot to node as it's matched '-numa cpu' key */
+        match = true;
+        slot->props.node_id = props->node_id;
+        slot->props.has_node_id = props->has_node_id;
+    }
+
+    if (!match) {
+        error_setg(errp, "no match found");
+    }
+}
+
 static void machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
diff --git a/numa.c b/numa.c
index 24c596d..44057f1 100644
--- a/numa.c
+++ b/numa.c
@@ -169,6 +169,7 @@ static void numa_node_parse(MachineState *ms, NumaNodeOptions *node,
         exit(1);
     }
     for (cpus = node->cpus; cpus; cpus = cpus->next) {
+        CpuInstanceProperties props;
         if (cpus->value >= max_cpus) {
             error_setg(errp,
                        "CPU index (%" PRIu16 ")"
@@ -177,6 +178,10 @@ static void numa_node_parse(MachineState *ms, NumaNodeOptions *node,
             return;
         }
         bitmap_set(numa_info[nodenr].node_cpu, cpus->value, 1);
+        props = mc->cpu_index_to_instance_props(ms, cpus->value);
+        props.node_id = nodenr;
+        props.has_node_id = true;
+        machine_set_cpu_numa_node(ms, &props, &error_fatal);
     }
 
     if (node->has_mem && node->has_memdev) {
@@ -393,9 +398,12 @@ void parse_numa_opts(MachineState *ms)
         if (i == nb_numa_nodes) {
             for (i = 0; i < max_cpus; i++) {
                 CpuInstanceProperties props;
+                /* fetch default mapping from board and enable it */
                 props = mc->cpu_index_to_instance_props(ms, i);
+                props.has_node_id = true;
 
                 set_bit(i, numa_info[props.node_id].node_cpu);
+                machine_set_cpu_numa_node(ms, &props, &error_fatal);
             }
         }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 11/23] numa: do default mapping based on possible_cpus instead of node_cpu bitmaps
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (9 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 10/23] numa: mirror cpu to node mapping in MachineState::possible_cpus Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-03-28  4:46   ` David Gibson
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 12/23] pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() Igor Mammedov
                   ` (12 subsequent siblings)
  23 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 numa.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/numa.c b/numa.c
index 44057f1..ab41776 100644
--- a/numa.c
+++ b/numa.c
@@ -309,6 +309,7 @@ static void validate_numa_cpus(void)
 void parse_numa_opts(MachineState *ms)
 {
     int i;
+    const CPUArchIdList *possible_cpus;
     MachineClass *mc = MACHINE_GET_CLASS(ms);
 
     for (i = 0; i < MAX_NODES; i++) {
@@ -379,11 +380,6 @@ void parse_numa_opts(MachineState *ms)
 
         numa_set_mem_ranges();
 
-        for (i = 0; i < nb_numa_nodes; i++) {
-            if (!bitmap_empty(numa_info[i].node_cpu, max_cpus)) {
-                break;
-            }
-        }
         /* Historically VCPUs were assigned in round-robin order to NUMA
          * nodes. However it causes issues with guest not handling it nice
          * in case where cores/threads from a multicore CPU appear on
@@ -391,11 +387,20 @@ void parse_numa_opts(MachineState *ms)
          * rule grouping VCPUs by socket so that VCPUs from the same socket
          * would be on the same node.
          */
-        if (!mc->cpu_index_to_instance_props) {
+        if (!mc->cpu_index_to_instance_props || !mc->possible_cpu_arch_ids) {
             error_report("default CPUs to NUMA node mapping isn't supported");
             exit(1);
         }
-        if (i == nb_numa_nodes) {
+
+        possible_cpus = mc->possible_cpu_arch_ids(ms);
+        for (i = 0; i < possible_cpus->len; i++) {
+            if (possible_cpus->cpus[i].props.has_node_id) {
+                break;
+            }
+        }
+
+        /* no CPUs are assigned to NUMA nodes */
+        if (i == possible_cpus->len) {
             for (i = 0; i < max_cpus; i++) {
                 CpuInstanceProperties props;
                 /* fetch default mapping from board and enable it */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 12/23] pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (10 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 11/23] numa: do default mapping based on possible_cpus instead of node_cpu bitmaps Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 13/23] spapr: " Igor Mammedov
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/acpi/cpu.c        |  7 +++----
 hw/i386/acpi-build.c | 11 ++++-------
 hw/i386/pc.c         | 18 ++++++++++--------
 3 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index 8c719d3..90fe24d 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -503,7 +503,6 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
 
         /* build Processor object for each processor */
         for (i = 0; i < arch_ids->len; i++) {
-            int j;
             Aml *dev;
             Aml *uid = aml_int(i);
             GArray *madt_buf = g_array_new(0, 1, 1);
@@ -557,9 +556,9 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
              * as a result _PXM is required for all CPUs which might
              * be hot-plugged. For simplicity, add it for all CPUs.
              */
-            j = numa_get_node_for_cpu(i);
-            if (j < nb_numa_nodes) {
-                aml_append(dev, aml_name_decl("_PXM", aml_int(j)));
+            if (arch_ids->cpus[i].props.has_node_id) {
+                int node_id = arch_ids->cpus[i].props.node_id;
+                aml_append(dev, aml_name_decl("_PXM", aml_int(node_id)));
             }
 
             aml_append(cpus_dev, dev);
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 2073108..a2be70b 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2306,7 +2306,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
     srat->reserved1 = cpu_to_le32(1);
 
     for (i = 0; i < apic_ids->len; i++) {
-        int j = numa_get_node_for_cpu(i);
+        int node_id = apic_ids->cpus[i].props.has_node_id ?
+            apic_ids->cpus[i].props.node_id : 0;
         uint32_t apic_id = apic_ids->cpus[i].arch_id;
 
         if (apic_id < 255) {
@@ -2316,9 +2317,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
             core->type = ACPI_SRAT_PROCESSOR_APIC;
             core->length = sizeof(*core);
             core->local_apic_id = apic_id;
-            if (j < nb_numa_nodes) {
-                core->proximity_lo = j;
-            }
+            core->proximity_lo = node_id;
             memset(core->proximity_hi, 0, 3);
             core->local_sapic_eid = 0;
             core->flags = cpu_to_le32(1);
@@ -2329,9 +2328,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
             core->type = ACPI_SRAT_PROCESSOR_x2APIC;
             core->length = sizeof(*core);
             core->x2apic_id = cpu_to_le32(apic_id);
-            if (j < nb_numa_nodes) {
-                core->proximity_domain = cpu_to_le32(j);
-            }
+            core->proximity_domain = cpu_to_le32(node_id);
             core->flags = cpu_to_le32(1);
         }
     }
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 873bbfa..6fdec59 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -747,7 +747,9 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
 {
     FWCfgState *fw_cfg;
     uint64_t *numa_fw_cfg;
-    int i, j;
+    int i;
+    const CPUArchIdList *cpus;
+    MachineClass *mc = MACHINE_GET_CLASS(pcms);
 
     fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4, as);
     fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
@@ -782,12 +784,12 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
      */
     numa_fw_cfg = g_new0(uint64_t, 1 + pcms->apic_id_limit + nb_numa_nodes);
     numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
-    for (i = 0; i < max_cpus; i++) {
-        unsigned int apic_id = x86_cpu_apic_id_from_index(i);
+    cpus = mc->possible_cpu_arch_ids(MACHINE(pcms));
+    for (i = 0; i < cpus->len; i++) {
+        unsigned int apic_id = cpus->cpus[i].arch_id;
         assert(apic_id < pcms->apic_id_limit);
-        j = numa_get_node_for_cpu(i);
-        if (j < nb_numa_nodes) {
-            numa_fw_cfg[apic_id + 1] = cpu_to_le64(j);
+        if (cpus->cpus[i].props.has_node_id) {
+            numa_fw_cfg[apic_id + 1] = cpu_to_le64(cpus->cpus[i].props.node_id);
         }
     }
     for (i = 0; i < nb_numa_nodes; i++) {
@@ -1986,8 +1988,8 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
     cs = CPU(cpu);
     cs->cpu_index = idx;
 
-    node_id = numa_get_node_for_cpu(cs->cpu_index);
-    if (node_id == nb_numa_nodes) {
+    node_id = cpu_slot->props.node_id;
+    if (!cpu_slot->props.has_node_id) {
         /* by default CPUState::numa_node was 0 if it's not set via CLI
          * keep it this way for now but in future we probably should
          * refuse to start up with incomplete numa mapping */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 13/23] spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (11 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 12/23] pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 14/23] virt-arm: " Igor Mammedov
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

it's safe to remove thread node_id != core node_id error
branch as machine_set_cpu_numa_node() also does mismatch
check and is called even before any CPU is created.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/ppc/spapr.c          |  4 ++--
 hw/ppc/spapr_cpu_core.c | 14 ++------------
 2 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 9c61721..42cef3d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2803,8 +2803,8 @@ static void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
         goto out;
     }
 
-    node_id = numa_get_node_for_cpu(cc->core_id);
-    if (node_id == nb_numa_nodes) {
+    node_id = core_slot->props.node_id;
+    if (!core_slot->props.has_node_id) {
         /* by default CPUState::numa_node was 0 if it's not set via CLI
          * keep it this way for now but in future we probably should
          * refuse to start up with incomplete numa mapping */
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 25988f8..8d48468 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -168,7 +168,6 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error **errp)
 
     sc->threads = g_malloc0(size * cc->nr_threads);
     for (i = 0; i < cc->nr_threads; i++) {
-        int node_id;
         char id[32];
         CPUState *cs;
 
@@ -178,17 +177,8 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error **errp)
         cs = CPU(obj);
         cs->cpu_index = cc->core_id + i;
 
-        /* Set NUMA node for the added CPUs  */
-        node_id = numa_get_node_for_cpu(cs->cpu_index);
-        if (node_id != sc->node_id) {
-            error_setg(&local_err, "Invalid node-id=%d of thread[cpu-index: %d]"
-                " on CPU[core-id: %d, node-id: %d], node-id must be the same",
-                 node_id, cs->cpu_index, cc->core_id, sc->node_id);
-            goto err;
-        }
-        if (node_id < nb_numa_nodes) {
-            cs->numa_node = node_id;
-        }
+        /* Set NUMA node for the threads belonged to core  */
+        cs->numa_node = sc->node_id;
 
         snprintf(id, sizeof(id), "thread[%d]", i);
         object_property_add_child(OBJECT(sc), id, obj, &local_err);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 14/23] virt-arm: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (12 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 13/23] spapr: " Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-04-25 17:06   ` Andrew Jones
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 15/23] QMP: include CpuInstanceProperties into query_cpus output output Igor Mammedov
                   ` (9 subsequent siblings)
  23 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/arm/virt-acpi-build.c | 19 +++++++------------
 hw/arm/virt.c            | 13 +++++++------
 2 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 0835e59..ce7499c 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -486,30 +486,25 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     AcpiSystemResourceAffinityTable *srat;
     AcpiSratProcessorGiccAffinity *core;
     AcpiSratMemoryAffinity *numamem;
-    int i, j, srat_start;
+    int i, srat_start;
     uint64_t mem_base;
-    uint32_t *cpu_node = g_malloc0(vms->smp_cpus * sizeof(uint32_t));
-
-    for (i = 0; i < vms->smp_cpus; i++) {
-        j = numa_get_node_for_cpu(i);
-        if (j < nb_numa_nodes) {
-                cpu_node[i] = j;
-        }
-    }
+    MachineClass *mc = MACHINE_GET_CLASS(vms);
+    const CPUArchIdList *cpu_list = mc->possible_cpu_arch_ids(MACHINE(vms));
 
     srat_start = table_data->len;
     srat = acpi_data_push(table_data, sizeof(*srat));
     srat->reserved1 = cpu_to_le32(1);
 
-    for (i = 0; i < vms->smp_cpus; ++i) {
+    for (i = 0; i < cpu_list->len; ++i) {
+        int node_id = cpu_list->cpus[i].props.has_node_id ?
+            cpu_list->cpus[i].props.node_id : 0;
         core = acpi_data_push(table_data, sizeof(*core));
         core->type = ACPI_SRAT_PROCESSOR_GICC;
         core->length = sizeof(*core);
-        core->proximity = cpu_to_le32(cpu_node[i]);
+        core->proximity = cpu_to_le32(node_id);
         core->acpi_processor_uid = cpu_to_le32(i);
         core->flags = cpu_to_le32(1);
     }
-    g_free(cpu_node);
 
     mem_base = vms->memmap[VIRT_MEM].base;
     for (i = 0; i < nb_numa_nodes; ++i) {
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 68d44f3..0a75df5 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -338,7 +338,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 {
     int cpu;
     int addr_cells = 1;
-    unsigned int i;
+    const MachineState *ms = MACHINE(vms);
 
     /*
      * From Documentation/devicetree/bindings/arm/cpus.txt
@@ -369,6 +369,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
     for (cpu = vms->smp_cpus - 1; cpu >= 0; cpu--) {
         char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
         ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
+        CPUState *cs = CPU(armcpu);
 
         qemu_fdt_add_subnode(vms->fdt, nodename);
         qemu_fdt_setprop_string(vms->fdt, nodename, "device_type", "cpu");
@@ -389,9 +390,9 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
                                   armcpu->mp_affinity);
         }
 
-        i = numa_get_node_for_cpu(cpu);
-        if (i < nb_numa_nodes) {
-            qemu_fdt_setprop_cell(vms->fdt, nodename, "numa-node-id", i);
+        if (ms->possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
+            qemu_fdt_setprop_cell(vms->fdt, nodename, "numa-node-id",
+                ms->possible_cpus->cpus[cs->cpu_index].props.node_id);
         }
 
         g_free(nodename);
@@ -1378,8 +1379,8 @@ static void machvirt_init(MachineState *machine)
         cs = CPU(cpuobj);
         cs->cpu_index = n;
 
-        node_id = numa_get_node_for_cpu(cs->cpu_index);
-        if (node_id == nb_numa_nodes) {
+        node_id = machine->possible_cpus->cpus[cs->cpu_index].props.node_id;
+        if (!machine->possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
             /* by default CPUState::numa_node was 0 if it's not set via CLI
              * keep it this way for now but in future we probably should
              * refuse to start up with incomplete numa mapping */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 15/23] QMP: include CpuInstanceProperties into query_cpus output output
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (13 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 14/23] virt-arm: " Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-03-23 13:19   ` Eric Blake
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 16/23] tests: numa: add case for QMP command query-cpus Igor Mammedov
                   ` (8 subsequent siblings)
  23 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

if board supports CpuInstanceProperties, report them for
each CPU thread listed. Main motivation for this is to
provide these properties introspection via QMP interface
for using in test cases to verify numa node to cpu mapping,
which includes not only boards that support cpu hotplug
and have this info in query-hotpluggable-cpus (pc/spapr)
but also for boards that don't not support hotpluggable-cpus
but support numa mapping (virt-arm).

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 cpus.c           | 9 +++++++++
 qapi-schema.json | 6 +++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index 167d961..03aa12c 100644
--- a/cpus.c
+++ b/cpus.c
@@ -50,6 +50,7 @@
 #include "qapi-event.h"
 #include "hw/nmi.h"
 #include "sysemu/replay.h"
+#include "hw/boards.h"
 
 #ifdef CONFIG_LINUX
 
@@ -1810,6 +1811,8 @@ void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)
 
 CpuInfoList *qmp_query_cpus(Error **errp)
 {
+    MachineState *ms = MACHINE(qdev_get_machine());
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
     CpuInfoList *head = NULL, *cur_item = NULL;
     CPUState *cpu;
 
@@ -1860,6 +1863,12 @@ CpuInfoList *qmp_query_cpus(Error **errp)
 #else
         info->value->arch = CPU_INFO_ARCH_OTHER;
 #endif
+        if ((info->value->has_props = !!mc->cpu_index_to_instance_props)) {
+            CpuInstanceProperties *props;
+            props = g_malloc0(sizeof(*props));
+            *props = mc->cpu_index_to_instance_props(ms, cpu->cpu_index);
+            info->value->props =  props;
+        }
 
         /* XXX: waiting for the qapi to support GSList */
         if (!cur_item) {
diff --git a/qapi-schema.json b/qapi-schema.json
index 68a4327..a6b5955 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1322,6 +1322,9 @@
 #
 # @thread_id: ID of the underlying host thread
 #
+# @props: properties describing to which node/socket/core/thread
+#         virtual CPU belongs to, provided if supported by board (since 2.10)
+#
 # @arch: architecture of the cpu, which determines which additional fields
 #        will be listed (since 2.6)
 #
@@ -1332,7 +1335,8 @@
 ##
 { 'union': 'CpuInfo',
   'base': {'CPU': 'int', 'current': 'bool', 'halted': 'bool',
-           'qom_path': 'str', 'thread_id': 'int', 'arch': 'CpuInfoArch' },
+           'qom_path': 'str', 'thread_id': 'int',
+           '*props': 'CpuInstanceProperties', 'arch': 'CpuInfoArch' },
   'discriminator': 'arch',
   'data': { 'x86': 'CpuInfoX86',
             'sparc': 'CpuInfoSPARC',
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 16/23] tests: numa: add case for QMP command query-cpus
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (14 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 15/23] QMP: include CpuInstanceProperties into query_cpus output output Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 17/23] numa: remove no longer used numa_get_node_for_cpu() Igor Mammedov
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 tests/numa-test.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/tests/numa-test.c b/tests/numa-test.c
index f5da0c8..8326321 100644
--- a/tests/numa-test.c
+++ b/tests/numa-test.c
@@ -87,6 +87,49 @@ static void test_mon_partial(const void *data)
     g_free(cli);
 }
 
+static QList* get_cpus(QDict **resp) {
+    *resp = qmp("{ 'execute': 'query-cpus' }");
+    g_assert(*resp);
+    g_assert(qdict_haskey(*resp, "return"));
+    return  qdict_get_qlist(*resp, "return");
+}
+
+static void test_query_cpus(const void *data)
+{
+    char *cli;
+    QDict *resp;
+    QList *cpus;
+    const QObject *e;
+
+    cli = make_cli(data, "-smp 8 -numa node,cpus=0-3 -numa node,cpus=4-7");
+    qtest_start(cli);
+    cpus = get_cpus(&resp);
+    g_assert(cpus);
+
+    while ((e = qlist_pop(cpus))) {
+        QDict *cpu, *props;
+        int64_t cpu_idx, node;
+
+        cpu = qobject_to_qdict(e);
+        g_assert(qdict_haskey(cpu, "CPU"));
+        g_assert(qdict_haskey(cpu, "props"));
+
+        cpu_idx = qdict_get_int(cpu, "CPU");
+        props = qdict_get_qdict(cpu, "props");
+        g_assert(qdict_haskey(props, "node-id"));
+        node = qdict_get_int(props, "node-id");
+        if (cpu_idx >= 0 && cpu_idx < 4) {
+            g_assert_cmpint(node, ==, 0);
+        } else {
+            g_assert_cmpint(node, ==, 1);
+        }
+    }
+
+    QDECREF(resp);
+    qtest_end();
+    g_free(cli);
+}
+
 int main(int argc, char **argv)
 {
     const char *args = NULL;
@@ -101,6 +144,7 @@ int main(int argc, char **argv)
     qtest_add_data_func("/numa/mon/default", args, test_mon_default);
     qtest_add_data_func("/numa/mon/cpus/explicit", args, test_mon_explicit);
     qtest_add_data_func("/numa/mon/cpus/partial", args, test_mon_partial);
+    qtest_add_data_func("/numa/qmp/cpus/query-cpus", args, test_query_cpus);
 
     return g_test_run();
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 17/23] numa: remove no longer used numa_get_node_for_cpu()
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (15 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 16/23] tests: numa: add case for QMP command query-cpus Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-03-28  4:54   ` David Gibson
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 18/23] numa: remove no longer need numa_post_machine_init() Igor Mammedov
                   ` (6 subsequent siblings)
  23 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

it's been replaced by fetching mapping info from possible_cpus

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 include/sysemu/numa.h |  4 ----
 numa.c                | 14 --------------
 2 files changed, 18 deletions(-)

diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 46ea6c7..c67763a 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -31,8 +31,4 @@ extern QemuOptsList qemu_numa_opts;
 void numa_set_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node);
 void numa_unset_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node);
 uint32_t numa_get_node(ram_addr_t addr, Error **errp);
-
-/* on success returns node index in numa_info,
- * on failure returns nb_numa_nodes */
-int numa_get_node_for_cpu(int idx);
 #endif
diff --git a/numa.c b/numa.c
index ab41776..187c93f 100644
--- a/numa.c
+++ b/numa.c
@@ -583,20 +583,6 @@ MemdevList *qmp_query_memdev(Error **errp)
     return list;
 }
 
-int numa_get_node_for_cpu(int idx)
-{
-    int i;
-
-    assert(idx < max_cpus);
-
-    for (i = 0; i < nb_numa_nodes; i++) {
-        if (test_bit(idx, numa_info[i].node_cpu)) {
-            break;
-        }
-    }
-    return i;
-}
-
 void ram_block_notifier_add(RAMBlockNotifier *n)
 {
     QLIST_INSERT_HEAD(&ram_list.ramblock_notifiers, n, next);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 18/23] numa: remove no longer need numa_post_machine_init()
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (16 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 17/23] numa: remove no longer used numa_get_node_for_cpu() Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-03-28  4:55   ` David Gibson
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 19/23] machine: call machine init from wrapper Igor Mammedov
                   ` (5 subsequent siblings)
  23 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

CPUState::numa_node is still in use but now it's set by
board when it creates CPU objects. So there isn't any
need to set it again after all CPU's are created,
since it's been already set.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 include/sysemu/numa.h |  1 -
 numa.c                | 15 ---------------
 vl.c                  |  2 --
 3 files changed, 18 deletions(-)

diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index c67763a..345bb94 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -25,7 +25,6 @@ typedef struct node_info {
 
 extern NodeInfo numa_info[MAX_NODES];
 void parse_numa_opts(MachineState *ms);
-void numa_post_machine_init(void);
 void query_numa_node_mem(uint64_t node_mem[]);
 extern QemuOptsList qemu_numa_opts;
 void numa_set_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node);
diff --git a/numa.c b/numa.c
index 187c93f..8461c96 100644
--- a/numa.c
+++ b/numa.c
@@ -418,21 +418,6 @@ void parse_numa_opts(MachineState *ms)
     }
 }
 
-void numa_post_machine_init(void)
-{
-    CPUState *cpu;
-    int i;
-
-    CPU_FOREACH(cpu) {
-        for (i = 0; i < nb_numa_nodes; i++) {
-            assert(cpu->cpu_index < max_cpus);
-            if (test_bit(cpu->cpu_index, numa_info[i].node_cpu)) {
-                cpu->numa_node = i;
-            }
-        }
-    }
-}
-
 static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
                                            const char *name,
                                            uint64_t ram_size)
diff --git a/vl.c b/vl.c
index 5ffb9c3..e5c1620 100644
--- a/vl.c
+++ b/vl.c
@@ -4587,8 +4587,6 @@ int main(int argc, char **argv, char **envp)
 
     cpu_synchronize_all_post_init();
 
-    numa_post_machine_init();
-
     rom_reset_order_override();
 
     /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 19/23] machine: call machine init from wrapper
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (17 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 18/23] numa: remove no longer need numa_post_machine_init() Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 20/23] numa: use possible_cpus for not mapped CPUs check Igor Mammedov
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

add machine_run_board_init() wrapper that calls machine
init for now but in follow up patches it will be used
to run generic machine code that should run before
machine init.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 include/hw/boards.h | 1 +
 hw/core/machine.c   | 6 ++++++
 vl.c                | 2 +-
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 40f30f1..42742d0 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -32,6 +32,7 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
 MachineClass *find_default_machine(void);
 extern MachineState *current_machine;
 
+void machine_run_board_init(MachineState *machine);
 bool machine_usb(MachineState *machine);
 bool machine_kernel_irqchip_allowed(MachineState *machine);
 bool machine_kernel_irqchip_required(MachineState *machine);
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 6ff0b45..d284a63 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -643,6 +643,12 @@ bool machine_mem_merge(MachineState *machine)
     return machine->mem_merge;
 }
 
+void machine_run_board_init(MachineState *machine)
+{
+    MachineClass *machine_class = MACHINE_GET_CLASS(machine);
+    machine_class->init(machine);
+}
+
 static void machine_class_finalize(ObjectClass *klass, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(klass);
diff --git a/vl.c b/vl.c
index e5c1620..2b0aed3 100644
--- a/vl.c
+++ b/vl.c
@@ -4554,7 +4554,7 @@ int main(int argc, char **argv, char **envp)
     current_machine->boot_order = boot_order;
     current_machine->cpu_model = cpu_model;
 
-    machine_class->init(current_machine);
+    machine_run_board_init(current_machine);
 
     realtime_init();
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 20/23] numa: use possible_cpus for not mapped CPUs check
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (18 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 19/23] machine: call machine init from wrapper Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-03-28  5:13   ` David Gibson
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 21/23] numa: remove node_cpu bitmaps as they are no longer used Igor Mammedov
                   ` (3 subsequent siblings)
  23 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

and remove corresponding part in numa.c that uses
node_cpu bitmaps.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
It's one more less user of node_cpu bitmpas, following
commit will remove the last user along with
node_cpu itself.
---
 hw/core/machine.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 numa.c            | 10 ----------
 2 files changed, 58 insertions(+), 10 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index d284a63..ab51d2c 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -19,6 +19,7 @@
 #include "sysemu/sysemu.h"
 #include "qemu/error-report.h"
 #include "qemu/cutils.h"
+#include "sysemu/numa.h"
 
 static char *machine_get_accel(Object *obj, Error **errp)
 {
@@ -643,9 +644,66 @@ bool machine_mem_merge(MachineState *machine)
     return machine->mem_merge;
 }
 
+static char *cpu_slot_to_string(const CPUArchId *cpu)
+{
+    GString *s = g_string_new(NULL);
+    if (cpu->props.has_socket_id) {
+        g_string_append_printf(s, "socket-id: %"PRId64, cpu->props.socket_id);
+    }
+    if (cpu->props.has_core_id) {
+        if (s->len) {
+            g_string_append_printf(s, ", ");
+        }
+        g_string_append_printf(s, "core-id: %"PRId64, cpu->props.core_id);
+    }
+    if (cpu->props.has_thread_id) {
+        if (s->len) {
+            g_string_append_printf(s, ", ");
+        }
+        g_string_append_printf(s, "thread-id: %"PRId64, cpu->props.thread_id);
+    }
+    return g_string_free(s, false);
+}
+
+static void machine_numa_validate(MachineState *machine)
+{
+    int i;
+    GString *s = g_string_new(NULL);
+    MachineClass *mc = MACHINE_GET_CLASS(machine);
+    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(machine);
+
+    assert(nb_numa_nodes);
+    for (i = 0; i < possible_cpus->len; i++) {
+        const CPUArchId *cpu_slot = &possible_cpus->cpus[i];
+
+        /* at this point numa mappings are initilized by CLI options
+         * or with default mappings so it's sufficient to list
+         * all not yet mapped CPUs here */
+        /* TODO: make it hard error in future */
+        if (!cpu_slot->props.has_node_id) {
+            char *cpu_str = cpu_slot_to_string(cpu_slot);
+            g_string_append_printf(s, "%sCPU %d [%s]", s->len ? ", " : "", i,
+                                   cpu_str);
+            g_free(cpu_str);
+        }
+    }
+    if (s->len) {
+        error_report("warning: CPU(s) not present in any NUMA nodes: %s",
+                     s->str);
+        error_report("warning: All CPU(s) up to maxcpus should be described "
+                     "in NUMA config, ability to start up with partial NUMA "
+                     "mappings is obsoleted and will be removed in future");
+    }
+    g_string_free(s, true);
+}
+
 void machine_run_board_init(MachineState *machine)
 {
     MachineClass *machine_class = MACHINE_GET_CLASS(machine);
+
+    if (nb_numa_nodes) {
+        machine_numa_validate(machine);
+    }
     machine_class->init(machine);
 }
 
diff --git a/numa.c b/numa.c
index 8461c96..523558f 100644
--- a/numa.c
+++ b/numa.c
@@ -293,16 +293,6 @@ static void validate_numa_cpus(void)
         bitmap_or(seen_cpus, seen_cpus,
                   numa_info[i].node_cpu, max_cpus);
     }
-
-    if (!bitmap_full(seen_cpus, max_cpus)) {
-        char *msg;
-        bitmap_complement(seen_cpus, seen_cpus, max_cpus);
-        msg = enumerate_cpus(seen_cpus, max_cpus);
-        error_report("warning: CPU(s) not present in any NUMA nodes: %s", msg);
-        error_report("warning: All CPU(s) up to maxcpus should be described "
-                     "in NUMA config");
-        g_free(msg);
-    }
     g_free(seen_cpus);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 21/23] numa: remove node_cpu bitmaps as they are no longer used
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (19 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 20/23] numa: use possible_cpus for not mapped CPUs check Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-03-28  5:13   ` David Gibson
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 22/23] numa: add '-numa cpu, ...' option for property based node mapping Igor Mammedov
                   ` (2 subsequent siblings)
  23 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

Postfactum "CPU(s) present in multiple NUMA nodes" check
was the last user of node_cpu bitmaps, but it's not need
as machine_set_cpu_numa_node() does the similar check at
the time mapping is set for cpus (i.e. when -numa cpus=
is parsed) and ensures that cpu can be mapped only to
one node.

Remove duplicate check based on node_cpu bitmaps and
since the last user is gone remove node_cpu as well,
which completes internal transition from legacy bitmap
based mapping storage to possible_cpus storage.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 include/sysemu/numa.h |  1 -
 numa.c                | 42 ------------------------------------------
 2 files changed, 43 deletions(-)

diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 345bb94..796ee94 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -17,7 +17,6 @@ struct numa_addr_range {
 
 typedef struct node_info {
     uint64_t node_mem;
-    unsigned long *node_cpu;
     struct HostMemoryBackend *node_memdev;
     bool present;
     QLIST_HEAD(, numa_addr_range) addr; /* List to store address ranges */
diff --git a/numa.c b/numa.c
index 523558f..088fae3 100644
--- a/numa.c
+++ b/numa.c
@@ -177,7 +177,6 @@ static void numa_node_parse(MachineState *ms, NumaNodeOptions *node,
                        cpus->value, max_cpus);
             return;
         }
-        bitmap_set(numa_info[nodenr].node_cpu, cpus->value, 1);
         props = mc->cpu_index_to_instance_props(ms, cpus->value);
         props.node_id = nodenr;
         props.has_node_id = true;
@@ -261,51 +260,12 @@ end:
     return 0;
 }
 
-static char *enumerate_cpus(unsigned long *cpus, int max_cpus)
-{
-    int cpu;
-    bool first = true;
-    GString *s = g_string_new(NULL);
-
-    for (cpu = find_first_bit(cpus, max_cpus);
-        cpu < max_cpus;
-        cpu = find_next_bit(cpus, max_cpus, cpu + 1)) {
-        g_string_append_printf(s, "%s%d", first ? "" : " ", cpu);
-        first = false;
-    }
-    return g_string_free(s, FALSE);
-}
-
-static void validate_numa_cpus(void)
-{
-    int i;
-    unsigned long *seen_cpus = bitmap_new(max_cpus);
-
-    for (i = 0; i < nb_numa_nodes; i++) {
-        if (bitmap_intersects(seen_cpus, numa_info[i].node_cpu, max_cpus)) {
-            bitmap_and(seen_cpus, seen_cpus,
-                       numa_info[i].node_cpu, max_cpus);
-            error_report("CPU(s) present in multiple NUMA nodes: %s",
-                         enumerate_cpus(seen_cpus, max_cpus));
-            g_free(seen_cpus);
-            exit(EXIT_FAILURE);
-        }
-        bitmap_or(seen_cpus, seen_cpus,
-                  numa_info[i].node_cpu, max_cpus);
-    }
-    g_free(seen_cpus);
-}
-
 void parse_numa_opts(MachineState *ms)
 {
     int i;
     const CPUArchIdList *possible_cpus;
     MachineClass *mc = MACHINE_GET_CLASS(ms);
 
-    for (i = 0; i < MAX_NODES; i++) {
-        numa_info[i].node_cpu = bitmap_new(max_cpus);
-    }
-
     if (qemu_opts_foreach(qemu_find_opts("numa"), parse_numa, ms, NULL)) {
         exit(1);
     }
@@ -397,12 +357,10 @@ void parse_numa_opts(MachineState *ms)
                 props = mc->cpu_index_to_instance_props(ms, i);
                 props.has_node_id = true;
 
-                set_bit(i, numa_info[props.node_id].node_cpu);
                 machine_set_cpu_numa_node(ms, &props, &error_fatal);
             }
         }
 
-        validate_numa_cpus();
     } else {
         numa_set_mem_node_id(0, ram_size, 0);
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 22/23] numa: add '-numa cpu, ...' option for property based node mapping
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (20 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 21/23] numa: remove node_cpu bitmaps as they are no longer used Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-03-23 13:23   ` Eric Blake
  2017-03-28  5:16   ` David Gibson
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 23/23] tests: check -numa node, cpu=props_list usecase Igor Mammedov
  2017-04-12 20:18 ` [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Eduardo Habkost
  23 siblings, 2 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

legacy cpu to node mapping is using cpu index values to map
VCPU to node with help of '-numa node,nodeid=node,cpus=x[-y]'
option. However cpu index is internal concept and QEMU users
have to guess /reimplement qemu's logic/ to map it to
a concrete cpu socket/core/thread to make sane CPUs
placement across numa nodes.

This patch allows to map cpu objects to numa nodes using
the same properties as used for cpus with -device/device_add
(socket-id/core-id/thread-id/node-id).

At present valid properties/values to address CPUs could be
fetched using hotpluggable-cpus monitor/qmp command, it will
require user to start qemu twice when creating domain to fetch
possible CPUs for a machine type/-smp layout first and
then the second time with numa explicit mapping for actual
usage. The first step results could be saved and reused to
set/change mapping later as far as machine type/-smp stays
the same.

Proposed impl. supports exact and wildcard matching to
simplify CLI and allow to set mapping for a specific cpu
or group of cpu objects specified by matched properties.

For example:

   # exact mapping x86
   -numa cpu,node-id=x,socket-id=y,core-id=z,thread-id=n

   # exact mapping SPAPR
   -numa cpu,node-id=x,core-id=y

   # wildcard mapping, all cpu objects that match socket-id=y
   # are mapped to node-id=x
   -numa cpu,node-id=x,socket-id=y

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 numa.c           | 13 +++++++++++++
 qapi-schema.json |  7 +++++--
 qemu-options.hx  | 23 ++++++++++++++++++++++-
 3 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/numa.c b/numa.c
index 088fae3..588586b 100644
--- a/numa.c
+++ b/numa.c
@@ -246,6 +246,19 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
         }
         nb_numa_nodes++;
         break;
+    case NUMA_OPTIONS_TYPE_CPU:
+        if (!object->u.cpu.has_node_id) {
+            error_setg(&err, "Missing mandatory node-id property");
+            goto end;
+        }
+        if (!numa_info[object->u.cpu.node_id].present) {
+            error_setg(&err, "Invalid node-id=%" PRId64 ", NUMA node must be "
+                "defined with -numa node,nodeid=ID before it's used with "
+                "-numa cpu,node-id=ID", object->u.cpu.node_id);
+            goto end;
+        }
+        machine_set_cpu_numa_node(ms, &object->u.cpu, &err);
+        break;
     default:
         abort();
     }
diff --git a/qapi-schema.json b/qapi-schema.json
index a6b5955..a9a1d5e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -5673,10 +5673,12 @@
 ##
 # @NumaOptionsType:
 #
+# @cpu: property based CPU(s) to node mapping (Since: 2.10)
+#
 # Since: 2.1
 ##
 { 'enum': 'NumaOptionsType',
-  'data': [ 'node' ] }
+  'data': [ 'node', 'cpu' ] }
 
 ##
 # @NumaOptions:
@@ -5689,7 +5691,8 @@
   'base': { 'type': 'NumaOptionsType' },
   'discriminator': 'type',
   'data': {
-    'node': 'NumaNodeOptions' }}
+    'node': 'NumaNodeOptions',
+    'cpu': 'CpuInstanceProperties' }}
 
 ##
 # @NumaNodeOptions:
diff --git a/qemu-options.hx b/qemu-options.hx
index 99af8ed..2185c34 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -139,13 +139,16 @@ ETEXI
 
 DEF("numa", HAS_ARG, QEMU_OPTION_numa,
     "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
-    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n", QEMU_ARCH_ALL)
+    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
+    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n", QEMU_ARCH_ALL)
 STEXI
 @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
 @itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
+@itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
 @findex -numa
 Define a NUMA node and assign RAM and VCPUs to it.
 
+Legacy VCPU assignment uses @samp{cpus} option where
 @var{firstcpu} and @var{lastcpu} are CPU indexes. Each
 @samp{cpus} option represent a contiguous range of CPU indexes
 (or a single VCPU if @var{lastcpu} is omitted). A non-contiguous
@@ -159,6 +162,24 @@ a NUMA node:
 -numa node,cpus=0-2,cpus=5
 @end example
 
+@samp{cpu} option is new alternative to @samp{cpus} option
+uses @samp{socket-id|core-id|thread-id} properties to assign
+CPU objects to a @var{node} using topology layout properties of CPU.
+Set of properties is machine specific, and depends on used machine
+type/@samp{smp} options. It could be queried with @samp{hotpluggable-cpus}
+monitor command.
+@samp{node-id} property specifies @var{node} to which CPU object
+will be assigned, it's required for @var{node} to be declared
+with @samp{node} option before it's used with @samp{cpu} option.
+
+For example:
+@example
+-M pc \
+-smp 1,sockets=2,maxcpus=2 \
+-numa node,nodeid=0 -numa node,nodeid=1 \
+-numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=1,socket-id=1
+@end example
+
 @samp{mem} assigns a given RAM amount to a node. @samp{memdev}
 assigns RAM from a given memory backend device to a node. If
 @samp{mem} and @samp{memdev} are omitted in all nodes, RAM is
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH for-2.10 23/23] tests: check -numa node, cpu=props_list usecase
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (21 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 22/23] numa: add '-numa cpu, ...' option for property based node mapping Igor Mammedov
@ 2017-03-22 13:32 ` Igor Mammedov
  2017-04-12 20:18 ` [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Eduardo Habkost
  23 siblings, 0 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-03-22 13:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 tests/numa-test.c | 151 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 151 insertions(+)

diff --git a/tests/numa-test.c b/tests/numa-test.c
index 8326321..d371923 100644
--- a/tests/numa-test.c
+++ b/tests/numa-test.c
@@ -130,6 +130,144 @@ static void test_query_cpus(const void *data)
     g_free(cli);
 }
 
+static void pc_numa_cpu(const void *data)
+{
+    char *cli;
+    QDict *resp;
+    QList *cpus;
+    const QObject *e;
+
+    cli = make_cli(data, "-cpu pentium -smp 8,sockets=2,cores=2,threads=2 "
+        "-numa node,nodeid=0 -numa node,nodeid=1 "
+        "-numa cpu,node-id=1,socket-id=0 "
+        "-numa cpu,node-id=0,socket-id=1,core-id=0 "
+        "-numa cpu,node-id=0,socket-id=1,core-id=1,thread-id=0 "
+        "-numa cpu,node-id=1,socket-id=1,core-id=1,thread-id=1");
+    qtest_start(cli);
+    cpus = get_cpus(&resp);
+    g_assert(cpus);
+
+    while ((e = qlist_pop(cpus))) {
+        QDict *cpu, *props;
+        int64_t socket, core, thread, node;
+
+        cpu = qobject_to_qdict(e);
+        g_assert(qdict_haskey(cpu, "props"));
+        props = qdict_get_qdict(cpu, "props");
+
+        g_assert(qdict_haskey(props, "node-id"));
+        node = qdict_get_int(props, "node-id");
+        g_assert(qdict_haskey(props, "socket-id"));
+        socket = qdict_get_int(props, "socket-id");
+        g_assert(qdict_haskey(props, "core-id"));
+        core = qdict_get_int(props, "core-id");
+        g_assert(qdict_haskey(props, "thread-id"));
+        thread = qdict_get_int(props, "thread-id");
+
+        if (socket == 0) {
+            g_assert_cmpint(node, ==, 1);
+        } else if (socket == 1 && core == 0) {
+            g_assert_cmpint(node, ==, 0);
+        } else if (socket == 1 && core == 1 && thread == 0) {
+            g_assert_cmpint(node, ==, 0);
+        } else if (socket == 1 && core == 1 && thread == 1) {
+            g_assert_cmpint(node, ==, 1);
+        } else {
+            g_assert(false);
+        }
+    }
+
+    QDECREF(resp);
+    qtest_end();
+    g_free(cli);
+}
+
+static void spapr_numa_cpu(const void *data)
+{
+    char *cli;
+    QDict *resp;
+    QList *cpus;
+    const QObject *e;
+
+    cli = make_cli(data, "-smp 4,cores=4 "
+        "-numa node,nodeid=0 -numa node,nodeid=1 "
+        "-numa cpu,node-id=0,core-id=0 "
+        "-numa cpu,node-id=0,core-id=1 "
+        "-numa cpu,node-id=0,core-id=2 "
+        "-numa cpu,node-id=1,core-id=3");
+    qtest_start(cli);
+    cpus = get_cpus(&resp);
+    g_assert(cpus);
+
+    while ((e = qlist_pop(cpus))) {
+        QDict *cpu, *props;
+        int64_t core, node;
+
+        cpu = qobject_to_qdict(e);
+        g_assert(qdict_haskey(cpu, "props"));
+        props = qdict_get_qdict(cpu, "props");
+
+        g_assert(qdict_haskey(props, "node-id"));
+        node = qdict_get_int(props, "node-id");
+        g_assert(qdict_haskey(props, "core-id"));
+        core = qdict_get_int(props, "core-id");
+
+        if (core >= 0 && core < 3) {
+            g_assert_cmpint(node, ==, 0);
+        } else if (core == 3) {
+            g_assert_cmpint(node, ==, 1);
+        } else {
+            g_assert(false);
+        }
+    }
+
+    QDECREF(resp);
+    qtest_end();
+    g_free(cli);
+}
+
+static void aarch64_numa_cpu(const void *data)
+{
+    char *cli;
+    QDict *resp;
+    QList *cpus;
+    const QObject *e;
+
+    cli = make_cli(data, "-smp 2 "
+        "-numa node,nodeid=0 -numa node,nodeid=1 "
+        "-numa cpu,node-id=1,thread-id=0 "
+        "-numa cpu,node-id=0,thread-id=1");
+    qtest_start(cli);
+    cpus = get_cpus(&resp);
+    g_assert(cpus);
+
+    while ((e = qlist_pop(cpus))) {
+        QDict *cpu, *props;
+        int64_t thread, node;
+
+        cpu = qobject_to_qdict(e);
+        g_assert(qdict_haskey(cpu, "props"));
+        props = qdict_get_qdict(cpu, "props");
+
+        g_assert(qdict_haskey(props, "node-id"));
+        node = qdict_get_int(props, "node-id");
+        g_assert(qdict_haskey(props, "thread-id"));
+        thread = qdict_get_int(props, "thread-id");
+
+        if (thread == 0) {
+            g_assert_cmpint(node, ==, 1);
+        } else if (thread == 1) {
+            g_assert_cmpint(node, ==, 0);
+        } else {
+            g_assert(false);
+        }
+    }
+
+    QDECREF(resp);
+    qtest_end();
+    g_free(cli);
+}
+
 int main(int argc, char **argv)
 {
     const char *args = NULL;
@@ -146,5 +284,18 @@ int main(int argc, char **argv)
     qtest_add_data_func("/numa/mon/cpus/partial", args, test_mon_partial);
     qtest_add_data_func("/numa/qmp/cpus/query-cpus", args, test_query_cpus);
 
+    if (!strcmp(arch, "i386") || !strcmp(arch, "x86_64")) {
+        qtest_add_data_func("/numa/pc/cpu/explicit", args, pc_numa_cpu);
+    }
+
+    if (!strcmp(arch, "ppc64")) {
+        qtest_add_data_func("/numa/spapr/cpu/explicit", args, spapr_numa_cpu);
+    }
+
+    if (!strcmp(arch, "aarch64")) {
+        qtest_add_data_func("/numa/aarch64/cpu/explicit", args,
+                            aarch64_numa_cpu);
+    }
+
     return g_test_run();
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards Igor Mammedov
@ 2017-03-23  6:10   ` Bharata B Rao
  2017-03-23  8:48     ` Igor Mammedov
  2017-03-28  4:19   ` David Gibson
  2017-04-25 14:48   ` Andrew Jones
  2 siblings, 1 reply; 77+ messages in thread
From: Bharata B Rao @ 2017-03-23  6:10 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Peter Maydell, Andrew Jones, Eduardo Habkost,
	qemu-arm, qemu-ppc, Shannon Zhao, Paolo Bonzini, David Gibson

On Wed, Mar 22, 2017 at 7:02 PM, Igor Mammedov <imammedo@redhat.com> wrote:

> diff --git a/numa.c b/numa.c
> index e01cb54..b6e71bc 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -294,9 +294,10 @@ static void validate_numa_cpus(void)
>      g_free(seen_cpus);
>  }
>
> -void parse_numa_opts(MachineClass *mc)
> +void parse_numa_opts(MachineState *ms)
>  {
>      int i;
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
>
>      for (i = 0; i < MAX_NODES; i++) {
>          numa_info[i].node_cpu = bitmap_new(max_cpus);
> @@ -378,14 +379,16 @@ void parse_numa_opts(MachineClass *mc)
>           * rule grouping VCPUs by socket so that VCPUs from the same
> socket
>           * would be on the same node.
>           */
> +        if (!mc->cpu_index_to_instance_props) {
> +            error_report("default CPUs to NUMA node mapping isn't
> supported");
> +            exit(1);
> +        }
>

Just trying to understand the impact of the above enforcement. So targets
and machine types that don't define ->cpu_index_to_instance_props() are
expected not to boot ? Shouldn't they have a default to fall back upon ?

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards
  2017-03-23  6:10   ` Bharata B Rao
@ 2017-03-23  8:48     ` Igor Mammedov
  0 siblings, 0 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-03-23  8:48 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: qemu-devel, Peter Maydell, Andrew Jones, Eduardo Habkost,
	qemu-arm, qemu-ppc, Shannon Zhao, Paolo Bonzini, David Gibson

On Thu, 23 Mar 2017 11:40:29 +0530
Bharata B Rao <bharata.rao@gmail.com> wrote:

> On Wed, Mar 22, 2017 at 7:02 PM, Igor Mammedov <imammedo@redhat.com> wrote:
> 
> > diff --git a/numa.c b/numa.c
> > index e01cb54..b6e71bc 100644
> > --- a/numa.c
> > +++ b/numa.c
> > @@ -294,9 +294,10 @@ static void validate_numa_cpus(void)
> >      g_free(seen_cpus);
> >  }
> >
> > -void parse_numa_opts(MachineClass *mc)
> > +void parse_numa_opts(MachineState *ms)
> >  {
> >      int i;
> > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> >
> >      for (i = 0; i < MAX_NODES; i++) {
> >          numa_info[i].node_cpu = bitmap_new(max_cpus);
> > @@ -378,14 +379,16 @@ void parse_numa_opts(MachineClass *mc)
> >           * rule grouping VCPUs by socket so that VCPUs from the same
> > socket
> >           * would be on the same node.
> >           */
> > +        if (!mc->cpu_index_to_instance_props) {
> > +            error_report("default CPUs to NUMA node mapping isn't
> > supported");
> > +            exit(1);
> > +        }
> >  
> 
> Just trying to understand the impact of the above enforcement. So targets
> and machine types that don't define ->cpu_index_to_instance_props() are
> expected not to boot ? Shouldn't they have a default to fall back upon ?
Currently there are 3 boards that support numa and with this series
they all implement cpu_index_to_instance_props callback,
so boards that has supported numa shouldn't be affected.

But if someone used '-numa' with a board that doesn't support numa,
it would stop booting with error message instead of silently parsing
not supported option or falling back bogus defaults (which aren't
used anyway).


> Regards,
> Bharata.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 15/23] QMP: include CpuInstanceProperties into query_cpus output output
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 15/23] QMP: include CpuInstanceProperties into query_cpus output output Igor Mammedov
@ 2017-03-23 13:19   ` Eric Blake
  2017-03-24 12:20     ` Igor Mammedov
  0 siblings, 1 reply; 77+ messages in thread
From: Eric Blake @ 2017-03-23 13:19 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 1327 bytes --]

On 03/22/2017 08:32 AM, Igor Mammedov wrote:
> if board supports CpuInstanceProperties, report them for
> each CPU thread listed. Main motivation for this is to
> provide these properties introspection via QMP interface
> for using in test cases to verify numa node to cpu mapping,
> which includes not only boards that support cpu hotplug
> and have this info in query-hotpluggable-cpus (pc/spapr)
> but also for boards that don't not support hotpluggable-cpus
> but support numa mapping (virt-arm).
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---

> @@ -1860,6 +1863,12 @@ CpuInfoList *qmp_query_cpus(Error **errp)
>  #else
>          info->value->arch = CPU_INFO_ARCH_OTHER;
>  #endif
> +        if ((info->value->has_props = !!mc->cpu_index_to_instance_props)) {

checkpatch.pl doesn't flag that? We generally try to avoid side-effects
inside conditionals.

> +            CpuInstanceProperties *props;
> +            props = g_malloc0(sizeof(*props));
> +            *props = mc->cpu_index_to_instance_props(ms, cpu->cpu_index);
> +            info->value->props =  props;

Why two spaces after =?

With those cleaned up,
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 22/23] numa: add '-numa cpu, ...' option for property based node mapping
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 22/23] numa: add '-numa cpu, ...' option for property based node mapping Igor Mammedov
@ 2017-03-23 13:23   ` Eric Blake
  2017-03-24 13:29     ` Igor Mammedov
  2017-03-28  5:16   ` David Gibson
  1 sibling, 1 reply; 77+ messages in thread
From: Eric Blake @ 2017-03-23 13:23 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel
  Cc: Eduardo Habkost, Peter Maydell, Andrew Jones, David Gibson,
	Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 2924 bytes --]

On 03/22/2017 08:32 AM, Igor Mammedov wrote:
> legacy cpu to node mapping is using cpu index values to map
> VCPU to node with help of '-numa node,nodeid=node,cpus=x[-y]'
> option. However cpu index is internal concept and QEMU users
> have to guess /reimplement qemu's logic/ to map it to
> a concrete cpu socket/core/thread to make sane CPUs
> placement across numa nodes.
> 
> This patch allows to map cpu objects to numa nodes using
> the same properties as used for cpus with -device/device_add
> (socket-id/core-id/thread-id/node-id).
> 
> At present valid properties/values to address CPUs could be
> fetched using hotpluggable-cpus monitor/qmp command, it will
> require user to start qemu twice when creating domain to fetch
> possible CPUs for a machine type/-smp layout first and
> then the second time with numa explicit mapping for actual
> usage. The first step results could be saved and reused to
> set/change mapping later as far as machine type/-smp stays
> the same.
> 
> Proposed impl. supports exact and wildcard matching to
> simplify CLI and allow to set mapping for a specific cpu
> or group of cpu objects specified by matched properties.
> 
> For example:
> 
>    # exact mapping x86
>    -numa cpu,node-id=x,socket-id=y,core-id=z,thread-id=n
> 
>    # exact mapping SPAPR
>    -numa cpu,node-id=x,core-id=y
> 
>    # wildcard mapping, all cpu objects that match socket-id=y
>    # are mapped to node-id=x
>    -numa cpu,node-id=x,socket-id=y
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  numa.c           | 13 +++++++++++++
>  qapi-schema.json |  7 +++++--
>  qemu-options.hx  | 23 ++++++++++++++++++++++-
>  3 files changed, 40 insertions(+), 3 deletions(-)
> 

>  
> +@samp{cpu} option is new alternative to @samp{cpus} option

s/is/is a/

> +uses @samp{socket-id|core-id|thread-id} properties to assign

s/uses/which uses/

> +CPU objects to a @var{node} using topology layout properties of CPU.
> +Set of properties is machine specific, and depends on used machine

s/Set/The set/

> +type/@samp{smp} options. It could be queried with @samp{hotpluggable-cpus}
> +monitor command.
> +@samp{node-id} property specifies @var{node} to which CPU object
> +will be assigned, it's required for @var{node} to be declared
> +with @samp{node} option before it's used with @samp{cpu} option.
> +
> +For example:
> +@example
> +-M pc \
> +-smp 1,sockets=2,maxcpus=2 \
> +-numa node,nodeid=0 -numa node,nodeid=1 \
> +-numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=1,socket-id=1
> +@end example
> +
>  @samp{mem} assigns a given RAM amount to a node. @samp{memdev}
>  assigns RAM from a given memory backend device to a node. If
>  @samp{mem} and @samp{memdev} are omitted in all nodes, RAM is
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 15/23] QMP: include CpuInstanceProperties into query_cpus output output
  2017-03-23 13:19   ` Eric Blake
@ 2017-03-24 12:20     ` Igor Mammedov
  0 siblings, 0 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-03-24 12:20 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, Peter Maydell, Andrew Jones, Eduardo Habkost,
	qemu-arm, qemu-ppc, Shannon Zhao, Paolo Bonzini, David Gibson

On Thu, 23 Mar 2017 08:19:24 -0500
Eric Blake <eblake@redhat.com> wrote:

> On 03/22/2017 08:32 AM, Igor Mammedov wrote:
> > if board supports CpuInstanceProperties, report them for
> > each CPU thread listed. Main motivation for this is to
> > provide these properties introspection via QMP interface
> > for using in test cases to verify numa node to cpu mapping,
> > which includes not only boards that support cpu hotplug
> > and have this info in query-hotpluggable-cpus (pc/spapr)
> > but also for boards that don't not support hotpluggable-cpus
> > but support numa mapping (virt-arm).
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---  
> 
> > @@ -1860,6 +1863,12 @@ CpuInfoList *qmp_query_cpus(Error **errp)
> >  #else
> >          info->value->arch = CPU_INFO_ARCH_OTHER;
> >  #endif
> > +        if ((info->value->has_props = !!mc->cpu_index_to_instance_props)) {  
> 
> checkpatch.pl doesn't flag that? We generally try to avoid side-effects
> inside conditionals.
it does, fixed in v2 branch
(lazy me skipped checkpatch since QMP test case been added,
I've also fixed another checkpatch error in the next patch)
 
> > +            CpuInstanceProperties *props;
> > +            props = g_malloc0(sizeof(*props));
> > +            *props = mc->cpu_index_to_instance_props(ms, cpu->cpu_index);
> > +            info->value->props =  props;  
> 
> Why two spaces after =?
fixed

> 
> With those cleaned up,
> Reviewed-by: Eric Blake <eblake@redhat.com>
Thanks! 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 22/23] numa: add '-numa cpu, ...' option for property based node mapping
  2017-03-23 13:23   ` Eric Blake
@ 2017-03-24 13:29     ` Igor Mammedov
  0 siblings, 0 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-03-24 13:29 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, Peter Maydell, Andrew Jones, Eduardo Habkost,
	qemu-arm, qemu-ppc, Shannon Zhao, Paolo Bonzini, David Gibson

On Thu, 23 Mar 2017 08:23:32 -0500
Eric Blake <eblake@redhat.com> wrote:

...
> >  
> > +@samp{cpu} option is new alternative to @samp{cpus} option  
> 
> s/is/is a/
> 
> > +uses @samp{socket-id|core-id|thread-id} properties to assign  
> 
> s/uses/which uses/
> 
> > +CPU objects to a @var{node} using topology layout properties of CPU.
> > +Set of properties is machine specific, and depends on used machine  
> 
> s/Set/The set/
> 
Fixed in v2 branch

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 01/23] tests: add CPUs to numa node mapping test
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 01/23] tests: add CPUs to numa node mapping test Igor Mammedov
@ 2017-03-27  0:31   ` David Gibson
  0 siblings, 0 replies; 77+ messages in thread
From: David Gibson @ 2017-03-27  0:31 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Eduardo Habkost, Peter Maydell, Andrew Jones,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 5647 bytes --]

On Wed, Mar 22, 2017 at 02:32:26PM +0100, Igor Mammedov wrote:
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  tests/Makefile.include |   5 +++
>  tests/numa-test.c      | 106 +++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 111 insertions(+)
>  create mode 100644 tests/numa-test.c
> 
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index 402e71c..4547b01 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -260,6 +260,7 @@ check-qtest-i386-y += tests/test-filter-mirror$(EXESUF)
>  check-qtest-i386-y += tests/test-filter-redirector$(EXESUF)
>  check-qtest-i386-y += tests/postcopy-test$(EXESUF)
>  check-qtest-i386-y += tests/test-x86-cpuid-compat$(EXESUF)
> +check-qtest-i386-y += tests/numa-test$(EXESUF)
>  check-qtest-x86_64-y += $(check-qtest-i386-y)
>  gcov-files-i386-y += i386-softmmu/hw/timer/mc146818rtc.c
>  gcov-files-x86_64-y = $(subst i386-softmmu/,x86_64-softmmu/,$(gcov-files-i386-y))
> @@ -300,6 +301,7 @@ check-qtest-ppc64-y += tests/test-netfilter$(EXESUF)
>  check-qtest-ppc64-y += tests/test-filter-mirror$(EXESUF)
>  check-qtest-ppc64-y += tests/test-filter-redirector$(EXESUF)
>  check-qtest-ppc64-y += tests/display-vga-test$(EXESUF)
> +check-qtest-ppc64-y += tests/numa-test$(EXESUF)
>  check-qtest-ppc64-$(CONFIG_EVENTFD) += tests/ivshmem-test$(EXESUF)
>  
>  check-qtest-sh4-y = tests/endianness-test$(EXESUF)
> @@ -324,6 +326,8 @@ gcov-files-arm-y += arm-softmmu/hw/block/virtio-blk.c
>  check-qtest-arm-y += tests/test-arm-mptimer$(EXESUF)
>  gcov-files-arm-y += hw/timer/arm_mptimer.c
>  
> +check-qtest-aarch64-y = tests/numa-test$(EXESUF)
> +
>  check-qtest-microblazeel-y = $(check-qtest-microblaze-y)
>  
>  check-qtest-xtensaeb-y = $(check-qtest-xtensa-y)
> @@ -747,6 +751,7 @@ tests/vhost-user-bridge$(EXESUF): tests/vhost-user-bridge.o contrib/libvhost-use
>  tests/test-uuid$(EXESUF): tests/test-uuid.o $(test-util-obj-y)
>  tests/test-arm-mptimer$(EXESUF): tests/test-arm-mptimer.o
>  tests/test-qapi-util$(EXESUF): tests/test-qapi-util.o $(test-util-obj-y)
> +tests/numa-test$(EXESUF): tests/numa-test.o
>  
>  tests/migration/stress$(EXESUF): tests/migration/stress.o
>  	$(call quiet-command, $(LINKPROG) -static -O3 $(PTHREAD_LIB) -o $@ $< ,"LINK","$(TARGET_DIR)$@")
> diff --git a/tests/numa-test.c b/tests/numa-test.c
> new file mode 100644
> index 0000000..f5da0c8
> --- /dev/null
> +++ b/tests/numa-test.c
> @@ -0,0 +1,106 @@
> +/*
> + * NUMA configuration test cases
> + *
> + * Copyright (c) 2017 Red Hat Inc.
> + * Authors:
> + *  Igor Mammedov <imammedo@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "libqtest.h"
> +
> +static char *make_cli(const char *generic_cli, const char *test_cli)
> +{
> +    return g_strdup_printf("%s %s", generic_cli ? generic_cli : "", test_cli);
> +}
> +
> +static char *hmp_info_numa(void)
> +{
> +    QDict *resp;
> +    char *s;
> +
> +    resp = qmp("{ 'execute': 'human-monitor-command', 'arguments': "
> +                      "{ 'command-line': 'info numa '} }");
> +    g_assert(resp);
> +    g_assert(qdict_haskey(resp, "return"));
> +    s = g_strdup(qdict_get_str(resp, "return"));
> +    g_assert(s);
> +    QDECREF(resp);
> +    return s;
> +}
> +
> +static void test_mon_explicit(const void *data)
> +{
> +    char *s;
> +    char *cli;
> +
> +    cli = make_cli(data, "-smp 8 "
> +                   "-numa node,nodeid=0,cpus=0-3 "
> +                   "-numa node,nodeid=1,cpus=4-7 ");
> +    qtest_start(cli);
> +
> +    s = hmp_info_numa();
> +    g_assert(strstr(s, "node 0 cpus: 0 1 2 3"));
> +    g_assert(strstr(s, "node 1 cpus: 4 5 6 7"));
> +    g_free(s);
> +
> +    qtest_end();
> +    g_free(cli);
> +}
> +
> +static void test_mon_default(const void *data)
> +{
> +    char *s;
> +    char *cli;
> +
> +    cli = make_cli(data, "-smp 8 -numa node -numa node");
> +    qtest_start(cli);
> +
> +    s = hmp_info_numa();
> +    g_assert(strstr(s, "node 0 cpus: 0 2 4 6"));
> +    g_assert(strstr(s, "node 1 cpus: 1 3 5 7"));
> +    g_free(s);
> +
> +    qtest_end();
> +    g_free(cli);
> +}
> +
> +static void test_mon_partial(const void *data)
> +{
> +    char *s;
> +    char *cli;
> +
> +    cli = make_cli(data, "-smp 8 "
> +                   "-numa node,nodeid=0,cpus=0-1 "
> +                   "-numa node,nodeid=1,cpus=4-5 ");
> +    qtest_start(cli);
> +
> +    s = hmp_info_numa();
> +    g_assert(strstr(s, "node 0 cpus: 0 1 2 3 6 7"));
> +    g_assert(strstr(s, "node 1 cpus: 4 5"));
> +    g_free(s);
> +
> +    qtest_end();
> +    g_free(cli);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +    const char *args = NULL;
> +    const char *arch = qtest_get_arch();
> +
> +    if (strcmp(arch, "aarch64") == 0) {
> +        args = "-machine virt";
> +    }
> +
> +    g_test_init(&argc, &argv, NULL);
> +
> +    qtest_add_data_func("/numa/mon/default", args, test_mon_default);
> +    qtest_add_data_func("/numa/mon/cpus/explicit", args, test_mon_explicit);
> +    qtest_add_data_func("/numa/mon/cpus/partial", args, test_mon_partial);
> +
> +    return g_test_run();
> +}

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards Igor Mammedov
  2017-03-23  6:10   ` Bharata B Rao
@ 2017-03-28  4:19   ` David Gibson
  2017-03-28 10:53     ` Igor Mammedov
  2017-04-20 14:29     ` Igor Mammedov
  2017-04-25 14:48   ` Andrew Jones
  2 siblings, 2 replies; 77+ messages in thread
From: David Gibson @ 2017-03-28  4:19 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Eduardo Habkost, Peter Maydell, Andrew Jones,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 13096 bytes --]

On Wed, Mar 22, 2017 at 02:32:30PM +0100, Igor Mammedov wrote:
> Originally CPU threads were by default assigned in
> round-robin fashion. However it was causing issues in
> guest since CPU threads from the same socket/core could
> be placed on different NUMA nodes.
> Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
> fixed it by grouping threads within a socket on the same node
> introducing cpu_index_to_socket_id() callback and commit
> 20bb648d (spapr: Fix default NUMA node allocation for threads)
> reused callback to fix similar issues for SPAPR machine
> even though socket doesn't make much sense there.
> 
> As result QEMU ended up having 3 default distribution rules
> used by 3 targets /virt-arm, spapr, pc/.
> 
> In effort of moving NUMA mapping for CPUs into possible_cpus,
> generalize default mapping in numa.c by making boards decide
> on default mapping and let them explicitly tell generic
> numa code to which node a CPU thread belongs to by replacing
> cpu_index_to_socket_id() with @cpu_index_to_instance_props()
> which provides default node_id assigned by board to specified
> cpu_index.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> Patch only moves source of default mapping to possible_cpus[]
> and leaves the rest of NUMA handling to numa_info[node_id].node_cpu
> bitmaps. It's up to follow up patches to replace bitmaps
> with possible_cpus[] internally.
> ---
>  include/hw/boards.h   |  8 ++++++--
>  include/sysemu/numa.h |  2 +-
>  hw/arm/virt.c         | 19 +++++++++++++++++--
>  hw/i386/pc.c          | 22 ++++++++++++++++------
>  hw/ppc/spapr.c        | 27 ++++++++++++++++++++-------
>  numa.c                | 15 +++++++++------
>  vl.c                  |  2 +-
>  7 files changed, 70 insertions(+), 25 deletions(-)
> 
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 269d0ba..1dd0fde 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -74,7 +74,10 @@ typedef struct {
>   *    of HotplugHandler object, which handles hotplug operation
>   *    for a given @dev. It may return NULL if @dev doesn't require
>   *    any actions to be performed by hotplug handler.
> - * @cpu_index_to_socket_id:
> + * @cpu_index_to_instance_props:
> + *    used to provide @cpu_index to socket/core/thread number mapping, allowing
> + *    legacy code to perform maping from cpu_index to topology properties
> + *    Returns: tuple of socket/core/thread ids given cpu_index belongs to.
>   *    used to provide @cpu_index to socket number mapping, allowing
>   *    a machine to group CPU threads belonging to the same socket/package
>   *    Returns: socket number given cpu_index belongs to.
> @@ -138,7 +141,8 @@ struct MachineClass {
>  
>      HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
>                                             DeviceState *dev);
> -    unsigned (*cpu_index_to_socket_id)(unsigned cpu_index);
> +    CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine,
> +                                                         unsigned cpu_index);
>      const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
>  };
>  
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index 8f09dcf..46ea6c7 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -24,7 +24,7 @@ typedef struct node_info {
>  } NodeInfo;
>  
>  extern NodeInfo numa_info[MAX_NODES];
> -void parse_numa_opts(MachineClass *mc);
> +void parse_numa_opts(MachineState *ms);
>  void numa_post_machine_init(void);
>  void query_numa_node_mem(uint64_t node_mem[]);
>  extern QemuOptsList qemu_numa_opts;
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 0cbcbc1..8748d25 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1554,6 +1554,16 @@ static void virt_set_gic_version(Object *obj, const char *value, Error **errp)
>      }
>  }
>  
> +static CpuInstanceProperties
> +virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> +{
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> +
> +    assert(cpu_index < possible_cpus->len);
> +    return possible_cpus->cpus[cpu_index].props;;
> +}
> +

It seems a bit weird to have a machine specific hook to pull the
property information when one way or another it's coming from the
possible_cpus table, which is already constructed by a machine
specific hook.  Could we add a range or list of cpu_index values to
each possible_cpus entry instead, and have a generic lookup of the
right entry based on that?


>  static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
>  {
>      int n;
> @@ -1573,8 +1583,12 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
>          ms->possible_cpus->cpus[n].props.has_thread_id = true;
>          ms->possible_cpus->cpus[n].props.thread_id = n;
>  
> -        /* TODO: add 'has_node/node' here to describe
> -           to which node core belongs */
> +        /* default distribution of CPUs over NUMA nodes */
> +        if (nb_numa_nodes) {
> +            /* preset values but do not enable them i.e. 'has_node_id = false',
> +             * board will enable them if manual mapping wasn't present on CLI */

I'm a little confused by this comment, since I don't see any board
code altering has_node_id.

> +            ms->possible_cpus->cpus[n].props.node_id = n % nb_numa_nodes;;
> +        }
>      }
>      return ms->possible_cpus;
>  }
> @@ -1596,6 +1610,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
>      /* We know we will never create a pre-ARMv7 CPU which needs 1K pages */
>      mc->minimum_page_bits = 12;
>      mc->possible_cpu_arch_ids = virt_possible_cpu_arch_ids;
> +    mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
>  }
>  
>  static const TypeInfo virt_machine_info = {
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index d24388e..7031100 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -2245,12 +2245,14 @@ static void pc_machine_reset(void)
>      }
>  }
>  
> -static unsigned pc_cpu_index_to_socket_id(unsigned cpu_index)
> +static CpuInstanceProperties
> +pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
>  {
> -    X86CPUTopoInfo topo;
> -    x86_topo_ids_from_idx(smp_cores, smp_threads, cpu_index,
> -                          &topo);
> -    return topo.pkg_id;
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> +
> +    assert(cpu_index < possible_cpus->len);
> +    return possible_cpus->cpus[cpu_index].props;;

Since the pc and arm version of this are basically identical, I wonder
if that should actually be the default implementation.  If we need it
at all.

>  }
>  
>  static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
> @@ -2282,6 +2284,14 @@ static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
>          ms->possible_cpus->cpus[i].props.core_id = topo.core_id;
>          ms->possible_cpus->cpus[i].props.has_thread_id = true;
>          ms->possible_cpus->cpus[i].props.thread_id = topo.smt_id;
> +
> +        /* default distribution of CPUs over NUMA nodes */
> +        if (nb_numa_nodes) {
> +            /* preset values but do not enable them i.e. 'has_node_id = false',
> +             * board will enable them if manual mapping wasn't present on CLI */
> +            ms->possible_cpus->cpus[i].props.node_id =
> +                topo.pkg_id % nb_numa_nodes;
> +        }
>      }
>      return ms->possible_cpus;
>  }
> @@ -2324,7 +2334,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
>      pcmc->acpi_data_size = 0x20000 + 0x8000;
>      pcmc->save_tsc_khz = true;
>      mc->get_hotplug_handler = pc_get_hotpug_handler;
> -    mc->cpu_index_to_socket_id = pc_cpu_index_to_socket_id;
> +    mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
>      mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
>      mc->has_hotpluggable_cpus = true;
>      mc->default_boot_order = "cad";
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 6ee566d..9dcbbcc 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2921,11 +2921,18 @@ static HotplugHandler *spapr_get_hotplug_handler(MachineState *machine,
>      return NULL;
>  }
>  
> -static unsigned spapr_cpu_index_to_socket_id(unsigned cpu_index)
> +static CpuInstanceProperties
> +spapr_cpu_index_to_props(MachineState *machine, unsigned cpu_index)
>  {
> -    /* Allocate to NUMA nodes on a "socket" basis (not that concept of
> -     * socket means much for the paravirtualized PAPR platform) */
> -    return cpu_index / smp_threads / smp_cores;
> +    CPUArchId *core_slot;
> +    MachineClass *mc = MACHINE_GET_CLASS(machine);
> +    int core_id = cpu_index / smp_threads * smp_threads;

I don't think you need this.  AIUI the purpose of
spapr_find_cpu_slot() is that it already finds the right CPU slot from
a cpu_index, so you can just pass the cpu_index directly.

> +
> +    /* make sure possible_cpu are intialized */
> +    mc->possible_cpu_arch_ids(machine);
> +    core_slot = spapr_find_cpu_slot(machine, core_id, NULL);
> +    assert(core_slot);
> +    return core_slot->props;
>  }
>  
>  static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
> @@ -2952,8 +2959,14 @@ static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
>          machine->possible_cpus->cpus[i].arch_id = core_id;
>          machine->possible_cpus->cpus[i].props.has_core_id = true;
>          machine->possible_cpus->cpus[i].props.core_id = core_id;
> -        /* TODO: add 'has_node/node' here to describe
> -           to which node core belongs */
> +
> +        /* default distribution of CPUs over NUMA nodes */
> +        if (nb_numa_nodes) {
> +            /* preset values but do not enable them i.e. 'has_node_id = false',
> +             * board will enable them if manual mapping wasn't present on CLI */
> +            machine->possible_cpus->cpus[i].props.node_id =
> +                core_id / smp_threads / smp_cores % nb_numa_nodes;
> +        }
>      }
>      return machine->possible_cpus;
>  }
> @@ -3076,7 +3089,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>      hc->pre_plug = spapr_machine_device_pre_plug;
>      hc->plug = spapr_machine_device_plug;
>      hc->unplug = spapr_machine_device_unplug;
> -    mc->cpu_index_to_socket_id = spapr_cpu_index_to_socket_id;
> +    mc->cpu_index_to_instance_props = spapr_cpu_index_to_props;
>      mc->possible_cpu_arch_ids = spapr_possible_cpu_arch_ids;
>      hc->unplug_request = spapr_machine_device_unplug_request;
>  
> diff --git a/numa.c b/numa.c
> index e01cb54..b6e71bc 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -294,9 +294,10 @@ static void validate_numa_cpus(void)
>      g_free(seen_cpus);
>  }
>  
> -void parse_numa_opts(MachineClass *mc)
> +void parse_numa_opts(MachineState *ms)
>  {
>      int i;
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
>  
>      for (i = 0; i < MAX_NODES; i++) {
>          numa_info[i].node_cpu = bitmap_new(max_cpus);
> @@ -378,14 +379,16 @@ void parse_numa_opts(MachineClass *mc)
>           * rule grouping VCPUs by socket so that VCPUs from the same socket
>           * would be on the same node.
>           */
> +        if (!mc->cpu_index_to_instance_props) {
> +            error_report("default CPUs to NUMA node mapping isn't supported");
> +            exit(1);
> +        }
>          if (i == nb_numa_nodes) {
>              for (i = 0; i < max_cpus; i++) {
> -                unsigned node_id = i % nb_numa_nodes;
> -                if (mc->cpu_index_to_socket_id) {
> -                    node_id = mc->cpu_index_to_socket_id(i) % nb_numa_nodes;
> -                }
> +                CpuInstanceProperties props;
> +                props = mc->cpu_index_to_instance_props(ms, i);
>  
> -                set_bit(i, numa_info[node_id].node_cpu);
> +                set_bit(i, numa_info[props.node_id].node_cpu);
>              }
>          }
>  
> diff --git a/vl.c b/vl.c
> index 0b4ed52..5ffb9c3 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -4498,7 +4498,7 @@ int main(int argc, char **argv, char **envp)
>      default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
>      default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);
>  
> -    parse_numa_opts(machine_class);
> +    parse_numa_opts(current_machine);
>  
>      if (qemu_opts_foreach(qemu_find_opts("mon"),
>                            mon_init_func, NULL, NULL)) {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 06/23] spapr: add node-id property to sPAPR core
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 06/23] spapr: add node-id property to sPAPR core Igor Mammedov
@ 2017-03-28  4:23   ` David Gibson
  0 siblings, 0 replies; 77+ messages in thread
From: David Gibson @ 2017-03-28  4:23 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Eduardo Habkost, Peter Maydell, Andrew Jones,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 5044 bytes --]

On Wed, Mar 22, 2017 at 02:32:31PM +0100, Igor Mammedov wrote:
> it will allow switching from cpu_index to core based numa
> mapping in follow up patches.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  include/hw/ppc/spapr_cpu_core.h |  1 +
>  include/qom/cpu.h               |  2 ++
>  hw/ppc/spapr.c                  | 17 +++++++++++++++++
>  hw/ppc/spapr_cpu_core.c         | 11 ++++++++---
>  4 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h
> index 3c35665..93051e9 100644
> --- a/include/hw/ppc/spapr_cpu_core.h
> +++ b/include/hw/ppc/spapr_cpu_core.h
> @@ -27,6 +27,7 @@ typedef struct sPAPRCPUCore {
>  
>      /*< public >*/
>      void *threads;
> +    int node_id;
>  } sPAPRCPUCore;
>  
>  typedef struct sPAPRCPUCoreClass {
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index c3292ef..7f27d56 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -258,6 +258,8 @@ typedef void (*run_on_cpu_func)(CPUState *cpu, run_on_cpu_data data);
>  
>  struct qemu_work_item;
>  
> +#define CPU_UNSET_NUMA_NODE_ID -1
> +
>  /**
>   * CPUState:
>   * @cpu_index: CPU index (informative).
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 9dcbbcc..9c61721 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2770,9 +2770,11 @@ static void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>      MachineClass *mc = MACHINE_GET_CLASS(hotplug_dev);
>      Error *local_err = NULL;
>      CPUCore *cc = CPU_CORE(dev);
> +    sPAPRCPUCore *sc = SPAPR_CPU_CORE(dev);
>      char *base_core_type = spapr_get_cpu_core_type(machine->cpu_model);
>      const char *type = object_get_typename(OBJECT(dev));
>      CPUArchId *core_slot;
> +    int node_id;
>      int index;
>  
>      if (dev->hotplugged && !mc->has_hotpluggable_cpus) {
> @@ -2801,6 +2803,21 @@ static void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>          goto out;
>      }
>  
> +    node_id = numa_get_node_for_cpu(cc->core_id);
> +    if (node_id == nb_numa_nodes) {
> +        /* by default CPUState::numa_node was 0 if it's not set via CLI
> +         * keep it this way for now but in future we probably should
> +         * refuse to start up with incomplete numa mapping */
> +        node_id = 0;
> +    }
> +    if (sc->node_id == CPU_UNSET_NUMA_NODE_ID) {
> +        sc->node_id = node_id;
> +    } else if (sc->node_id != node_id) {
> +        error_setg(&local_err, "node-id %d must match numa node specified"
> +            "with -numa option for cpu-index %d", sc->node_id, cc->core_id);
> +        goto out;
> +    }
> +
>  out:
>      g_free(base_core_type);
>      error_propagate(errp, local_err);
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index 6883f09..25988f8 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -163,7 +163,6 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error **errp)
>      const char *typename = object_class_get_name(scc->cpu_class);
>      size_t size = object_type_get_instance_size(typename);
>      Error *local_err = NULL;
> -    int core_node_id = numa_get_node_for_cpu(cc->core_id);;
>      void *obj;
>      int i, j;
>  
> @@ -181,10 +180,10 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error **errp)
>  
>          /* Set NUMA node for the added CPUs  */
>          node_id = numa_get_node_for_cpu(cs->cpu_index);
> -        if (node_id != core_node_id) {
> +        if (node_id != sc->node_id) {
>              error_setg(&local_err, "Invalid node-id=%d of thread[cpu-index: %d]"
>                  " on CPU[core-id: %d, node-id: %d], node-id must be the same",
> -                 node_id, cs->cpu_index, cc->core_id, core_node_id);
> +                 node_id, cs->cpu_index, cc->core_id, sc->node_id);
>              goto err;
>          }
>          if (node_id < nb_numa_nodes) {
> @@ -250,6 +249,11 @@ static const char *spapr_core_models[] = {
>      "POWER9_v1.0",
>  };
>  
> +static Property spapr_cpu_core_properties[] = {
> +    DEFINE_PROP_INT32("node-id", sPAPRCPUCore, node_id, CPU_UNSET_NUMA_NODE_ID),
> +    DEFINE_PROP_END_OF_LIST()
> +};
> +
>  void spapr_cpu_core_class_init(ObjectClass *oc, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(oc);
> @@ -257,6 +261,7 @@ void spapr_cpu_core_class_init(ObjectClass *oc, void *data)
>  
>      dc->realize = spapr_cpu_core_realize;
>      dc->unrealize = spapr_cpu_core_unrealizefn;
> +    dc->props = spapr_cpu_core_properties;
>      scc->cpu_class = cpu_class_by_name(TYPE_POWERPC_CPU, data);
>      g_assert(scc->cpu_class);
>  }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 10/23] numa: mirror cpu to node mapping in MachineState::possible_cpus
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 10/23] numa: mirror cpu to node mapping in MachineState::possible_cpus Igor Mammedov
@ 2017-03-28  4:44   ` David Gibson
  2017-04-12 21:15   ` Eduardo Habkost
  2017-04-13 13:58   ` Eduardo Habkost
  2 siblings, 0 replies; 77+ messages in thread
From: David Gibson @ 2017-03-28  4:44 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Eduardo Habkost, Peter Maydell, Andrew Jones,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 6077 bytes --]

On Wed, Mar 22, 2017 at 02:32:35PM +0100, Igor Mammedov wrote:
> Introduce machine_set_cpu_numa_node() helper that stores
> node mapping for CPU in MachineState::possible_cpus.
> CPU and node it belongs to is specified by 'props' argument.
> 
> Patch doesn't remove old way of storing mapping in
> numa_info[X].node_cpu as removing it at the same time
> makes patch rather big. Instead it just mirrors mapping
> in possible_cpus and follow up per target patches will
> switch to possible_cpus and numa_info[X].node_cpu will
> be removed once there isn't any users left.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  include/hw/boards.h |  2 ++
>  hw/core/machine.c   | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  numa.c              |  8 +++++++
>  3 files changed, 78 insertions(+)
> 
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 1dd0fde..40f30f1 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -42,6 +42,8 @@ bool machine_dump_guest_core(MachineState *machine);
>  bool machine_mem_merge(MachineState *machine);
>  void machine_register_compat_props(MachineState *machine);
>  HotpluggableCPUList *machine_query_hotpluggable_cpus(MachineState *machine);
> +void machine_set_cpu_numa_node(MachineState *machine,
> +                               CpuInstanceProperties *props, Error **errp);
>  
>  /**
>   * CPUArchId:
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 0d92672..6ff0b45 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -388,6 +388,74 @@ HotpluggableCPUList *machine_query_hotpluggable_cpus(MachineState *machine)
>      return head;
>  }
>  
> +void machine_set_cpu_numa_node(MachineState *machine,
> +                               CpuInstanceProperties *props, Error **errp)
> +{
> +    MachineClass *mc = MACHINE_GET_CLASS(machine);
> +    bool match = false;
> +    int i;
> +
> +    if (!mc->possible_cpu_arch_ids) {
> +        error_setg(errp, "mapping of CPUs to NUMA node is not supported");
> +        return;
> +    }
> +
> +    /* force board to initialize possible_cpus if it hasn't been done yet */
> +    mc->possible_cpu_arch_ids(machine);
> +
> +    for (i = 0; i < machine->possible_cpus->len; i++) {
> +        CPUArchId *slot = &machine->possible_cpus->cpus[i];
> +
> +        /* reject unsupported by board properties */
> +        if (props->has_thread_id && !slot->props.has_thread_id) {
> +            error_setg(errp, "thread-id is not supported");
> +            return;
> +        }
> +
> +        if (props->has_core_id && !slot->props.has_core_id) {
> +            error_setg(errp, "core-id is not supported");
> +            return;
> +        }
> +
> +        if (props->has_socket_id && !slot->props.has_socket_id) {
> +            error_setg(errp, "socket-id is not supported");
> +            return;
> +        }
> +
> +        /* skip slots with explicit mismatch */
> +        if (props->has_thread_id && props->thread_id != slot->props.thread_id) {
> +                continue;
> +        }
> +
> +        if (props->has_core_id && props->core_id != slot->props.core_id) {
> +                continue;
> +        }
> +
> +        if (props->has_socket_id && props->socket_id != slot->props.socket_id) {
> +                continue;
> +        }
> +
> +        /* reject assignment if slot is already assigned, for compatibility
> +         * of legacy cpu_index mapping with SPAPR core based mapping do not
> +         * error out if cpu thread and matched core have the same node-id */
> +        if (slot->props.has_node_id &&
> +            slot->props.node_id != props->node_id) {
> +            error_setg(errp, "CPU is already assigned to node-id: %" PRId64,
> +                       slot->props.node_id);
> +            return;
> +        }
> +
> +        /* assign slot to node as it's matched '-numa cpu' key */
> +        match = true;
> +        slot->props.node_id = props->node_id;
> +        slot->props.has_node_id = props->has_node_id;
> +    }
> +
> +    if (!match) {
> +        error_setg(errp, "no match found");
> +    }
> +}
> +
>  static void machine_class_init(ObjectClass *oc, void *data)
>  {
>      MachineClass *mc = MACHINE_CLASS(oc);
> diff --git a/numa.c b/numa.c
> index 24c596d..44057f1 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -169,6 +169,7 @@ static void numa_node_parse(MachineState *ms, NumaNodeOptions *node,
>          exit(1);
>      }
>      for (cpus = node->cpus; cpus; cpus = cpus->next) {
> +        CpuInstanceProperties props;
>          if (cpus->value >= max_cpus) {
>              error_setg(errp,
>                         "CPU index (%" PRIu16 ")"
> @@ -177,6 +178,10 @@ static void numa_node_parse(MachineState *ms, NumaNodeOptions *node,
>              return;
>          }
>          bitmap_set(numa_info[nodenr].node_cpu, cpus->value, 1);
> +        props = mc->cpu_index_to_instance_props(ms, cpus->value);
> +        props.node_id = nodenr;
> +        props.has_node_id = true;
> +        machine_set_cpu_numa_node(ms, &props, &error_fatal);
>      }
>  
>      if (node->has_mem && node->has_memdev) {
> @@ -393,9 +398,12 @@ void parse_numa_opts(MachineState *ms)
>          if (i == nb_numa_nodes) {
>              for (i = 0; i < max_cpus; i++) {
>                  CpuInstanceProperties props;
> +                /* fetch default mapping from board and enable it */
>                  props = mc->cpu_index_to_instance_props(ms, i);
> +                props.has_node_id = true;
>  
>                  set_bit(i, numa_info[props.node_id].node_cpu);
> +                machine_set_cpu_numa_node(ms, &props, &error_fatal);
>              }
>          }
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 11/23] numa: do default mapping based on possible_cpus instead of node_cpu bitmaps
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 11/23] numa: do default mapping based on possible_cpus instead of node_cpu bitmaps Igor Mammedov
@ 2017-03-28  4:46   ` David Gibson
  0 siblings, 0 replies; 77+ messages in thread
From: David Gibson @ 2017-03-28  4:46 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Eduardo Habkost, Peter Maydell, Andrew Jones,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 2364 bytes --]

On Wed, Mar 22, 2017 at 02:32:36PM +0100, Igor Mammedov wrote:
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  numa.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/numa.c b/numa.c
> index 44057f1..ab41776 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -309,6 +309,7 @@ static void validate_numa_cpus(void)
>  void parse_numa_opts(MachineState *ms)
>  {
>      int i;
> +    const CPUArchIdList *possible_cpus;
>      MachineClass *mc = MACHINE_GET_CLASS(ms);
>  
>      for (i = 0; i < MAX_NODES; i++) {
> @@ -379,11 +380,6 @@ void parse_numa_opts(MachineState *ms)
>  
>          numa_set_mem_ranges();
>  
> -        for (i = 0; i < nb_numa_nodes; i++) {
> -            if (!bitmap_empty(numa_info[i].node_cpu, max_cpus)) {
> -                break;
> -            }
> -        }
>          /* Historically VCPUs were assigned in round-robin order to NUMA
>           * nodes. However it causes issues with guest not handling it nice
>           * in case where cores/threads from a multicore CPU appear on
> @@ -391,11 +387,20 @@ void parse_numa_opts(MachineState *ms)
>           * rule grouping VCPUs by socket so that VCPUs from the same socket
>           * would be on the same node.
>           */
> -        if (!mc->cpu_index_to_instance_props) {
> +        if (!mc->cpu_index_to_instance_props || !mc->possible_cpu_arch_ids) {
>              error_report("default CPUs to NUMA node mapping isn't supported");
>              exit(1);
>          }
> -        if (i == nb_numa_nodes) {
> +
> +        possible_cpus = mc->possible_cpu_arch_ids(ms);
> +        for (i = 0; i < possible_cpus->len; i++) {
> +            if (possible_cpus->cpus[i].props.has_node_id) {
> +                break;
> +            }
> +        }
> +
> +        /* no CPUs are assigned to NUMA nodes */
> +        if (i == possible_cpus->len) {
>              for (i = 0; i < max_cpus; i++) {
>                  CpuInstanceProperties props;
>                  /* fetch default mapping from board and enable it */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 17/23] numa: remove no longer used numa_get_node_for_cpu()
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 17/23] numa: remove no longer used numa_get_node_for_cpu() Igor Mammedov
@ 2017-03-28  4:54   ` David Gibson
  0 siblings, 0 replies; 77+ messages in thread
From: David Gibson @ 2017-03-28  4:54 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Eduardo Habkost, Peter Maydell, Andrew Jones,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 1757 bytes --]

On Wed, Mar 22, 2017 at 02:32:42PM +0100, Igor Mammedov wrote:
> it's been replaced by fetching mapping info from possible_cpus
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  include/sysemu/numa.h |  4 ----
>  numa.c                | 14 --------------
>  2 files changed, 18 deletions(-)
> 
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index 46ea6c7..c67763a 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -31,8 +31,4 @@ extern QemuOptsList qemu_numa_opts;
>  void numa_set_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node);
>  void numa_unset_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node);
>  uint32_t numa_get_node(ram_addr_t addr, Error **errp);
> -
> -/* on success returns node index in numa_info,
> - * on failure returns nb_numa_nodes */
> -int numa_get_node_for_cpu(int idx);
>  #endif
> diff --git a/numa.c b/numa.c
> index ab41776..187c93f 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -583,20 +583,6 @@ MemdevList *qmp_query_memdev(Error **errp)
>      return list;
>  }
>  
> -int numa_get_node_for_cpu(int idx)
> -{
> -    int i;
> -
> -    assert(idx < max_cpus);
> -
> -    for (i = 0; i < nb_numa_nodes; i++) {
> -        if (test_bit(idx, numa_info[i].node_cpu)) {
> -            break;
> -        }
> -    }
> -    return i;
> -}
> -
>  void ram_block_notifier_add(RAMBlockNotifier *n)
>  {
>      QLIST_INSERT_HEAD(&ram_list.ramblock_notifiers, n, next);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 18/23] numa: remove no longer need numa_post_machine_init()
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 18/23] numa: remove no longer need numa_post_machine_init() Igor Mammedov
@ 2017-03-28  4:55   ` David Gibson
  0 siblings, 0 replies; 77+ messages in thread
From: David Gibson @ 2017-03-28  4:55 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Eduardo Habkost, Peter Maydell, Andrew Jones,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 2318 bytes --]

On Wed, Mar 22, 2017 at 02:32:43PM +0100, Igor Mammedov wrote:
> CPUState::numa_node is still in use but now it's set by
> board when it creates CPU objects. So there isn't any
> need to set it again after all CPU's are created,
> since it's been already set.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  include/sysemu/numa.h |  1 -
>  numa.c                | 15 ---------------
>  vl.c                  |  2 --
>  3 files changed, 18 deletions(-)
> 
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index c67763a..345bb94 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -25,7 +25,6 @@ typedef struct node_info {
>  
>  extern NodeInfo numa_info[MAX_NODES];
>  void parse_numa_opts(MachineState *ms);
> -void numa_post_machine_init(void);
>  void query_numa_node_mem(uint64_t node_mem[]);
>  extern QemuOptsList qemu_numa_opts;
>  void numa_set_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node);
> diff --git a/numa.c b/numa.c
> index 187c93f..8461c96 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -418,21 +418,6 @@ void parse_numa_opts(MachineState *ms)
>      }
>  }
>  
> -void numa_post_machine_init(void)
> -{
> -    CPUState *cpu;
> -    int i;
> -
> -    CPU_FOREACH(cpu) {
> -        for (i = 0; i < nb_numa_nodes; i++) {
> -            assert(cpu->cpu_index < max_cpus);
> -            if (test_bit(cpu->cpu_index, numa_info[i].node_cpu)) {
> -                cpu->numa_node = i;
> -            }
> -        }
> -    }
> -}
> -
>  static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
>                                             const char *name,
>                                             uint64_t ram_size)
> diff --git a/vl.c b/vl.c
> index 5ffb9c3..e5c1620 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -4587,8 +4587,6 @@ int main(int argc, char **argv, char **envp)
>  
>      cpu_synchronize_all_post_init();
>  
> -    numa_post_machine_init();
> -
>      rom_reset_order_override();
>  
>      /*

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 20/23] numa: use possible_cpus for not mapped CPUs check
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 20/23] numa: use possible_cpus for not mapped CPUs check Igor Mammedov
@ 2017-03-28  5:13   ` David Gibson
  0 siblings, 0 replies; 77+ messages in thread
From: David Gibson @ 2017-03-28  5:13 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Eduardo Habkost, Peter Maydell, Andrew Jones,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 4370 bytes --]

On Wed, Mar 22, 2017 at 02:32:45PM +0100, Igor Mammedov wrote:
> and remove corresponding part in numa.c that uses
> node_cpu bitmaps.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
> It's one more less user of node_cpu bitmpas, following
> commit will remove the last user along with
> node_cpu itself.
> ---
>  hw/core/machine.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  numa.c            | 10 ----------
>  2 files changed, 58 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index d284a63..ab51d2c 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -19,6 +19,7 @@
>  #include "sysemu/sysemu.h"
>  #include "qemu/error-report.h"
>  #include "qemu/cutils.h"
> +#include "sysemu/numa.h"
>  
>  static char *machine_get_accel(Object *obj, Error **errp)
>  {
> @@ -643,9 +644,66 @@ bool machine_mem_merge(MachineState *machine)
>      return machine->mem_merge;
>  }
>  
> +static char *cpu_slot_to_string(const CPUArchId *cpu)
> +{
> +    GString *s = g_string_new(NULL);
> +    if (cpu->props.has_socket_id) {
> +        g_string_append_printf(s, "socket-id: %"PRId64, cpu->props.socket_id);
> +    }
> +    if (cpu->props.has_core_id) {
> +        if (s->len) {
> +            g_string_append_printf(s, ", ");
> +        }
> +        g_string_append_printf(s, "core-id: %"PRId64, cpu->props.core_id);
> +    }
> +    if (cpu->props.has_thread_id) {
> +        if (s->len) {
> +            g_string_append_printf(s, ", ");
> +        }
> +        g_string_append_printf(s, "thread-id: %"PRId64, cpu->props.thread_id);
> +    }
> +    return g_string_free(s, false);
> +}
> +
> +static void machine_numa_validate(MachineState *machine)
> +{
> +    int i;
> +    GString *s = g_string_new(NULL);
> +    MachineClass *mc = MACHINE_GET_CLASS(machine);
> +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(machine);
> +
> +    assert(nb_numa_nodes);
> +    for (i = 0; i < possible_cpus->len; i++) {
> +        const CPUArchId *cpu_slot = &possible_cpus->cpus[i];
> +
> +        /* at this point numa mappings are initilized by CLI options
> +         * or with default mappings so it's sufficient to list
> +         * all not yet mapped CPUs here */
> +        /* TODO: make it hard error in future */
> +        if (!cpu_slot->props.has_node_id) {
> +            char *cpu_str = cpu_slot_to_string(cpu_slot);
> +            g_string_append_printf(s, "%sCPU %d [%s]", s->len ? ", " : "", i,
> +                                   cpu_str);
> +            g_free(cpu_str);
> +        }
> +    }
> +    if (s->len) {
> +        error_report("warning: CPU(s) not present in any NUMA nodes: %s",
> +                     s->str);
> +        error_report("warning: All CPU(s) up to maxcpus should be described "
> +                     "in NUMA config, ability to start up with partial NUMA "
> +                     "mappings is obsoleted and will be removed in future");
> +    }
> +    g_string_free(s, true);
> +}
> +
>  void machine_run_board_init(MachineState *machine)
>  {
>      MachineClass *machine_class = MACHINE_GET_CLASS(machine);
> +
> +    if (nb_numa_nodes) {
> +        machine_numa_validate(machine);
> +    }
>      machine_class->init(machine);
>  }
>  
> diff --git a/numa.c b/numa.c
> index 8461c96..523558f 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -293,16 +293,6 @@ static void validate_numa_cpus(void)
>          bitmap_or(seen_cpus, seen_cpus,
>                    numa_info[i].node_cpu, max_cpus);
>      }
> -
> -    if (!bitmap_full(seen_cpus, max_cpus)) {
> -        char *msg;
> -        bitmap_complement(seen_cpus, seen_cpus, max_cpus);
> -        msg = enumerate_cpus(seen_cpus, max_cpus);
> -        error_report("warning: CPU(s) not present in any NUMA nodes: %s", msg);
> -        error_report("warning: All CPU(s) up to maxcpus should be described "
> -                     "in NUMA config");
> -        g_free(msg);
> -    }
>      g_free(seen_cpus);
>  }
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 21/23] numa: remove node_cpu bitmaps as they are no longer used
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 21/23] numa: remove node_cpu bitmaps as they are no longer used Igor Mammedov
@ 2017-03-28  5:13   ` David Gibson
  0 siblings, 0 replies; 77+ messages in thread
From: David Gibson @ 2017-03-28  5:13 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Eduardo Habkost, Peter Maydell, Andrew Jones,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 4118 bytes --]

On Wed, Mar 22, 2017 at 02:32:46PM +0100, Igor Mammedov wrote:
> Postfactum "CPU(s) present in multiple NUMA nodes" check
> was the last user of node_cpu bitmaps, but it's not need
> as machine_set_cpu_numa_node() does the similar check at
> the time mapping is set for cpus (i.e. when -numa cpus=
> is parsed) and ensures that cpu can be mapped only to
> one node.
> 
> Remove duplicate check based on node_cpu bitmaps and
> since the last user is gone remove node_cpu as well,
> which completes internal transition from legacy bitmap
> based mapping storage to possible_cpus storage.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  include/sysemu/numa.h |  1 -
>  numa.c                | 42 ------------------------------------------
>  2 files changed, 43 deletions(-)
> 
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index 345bb94..796ee94 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -17,7 +17,6 @@ struct numa_addr_range {
>  
>  typedef struct node_info {
>      uint64_t node_mem;
> -    unsigned long *node_cpu;
>      struct HostMemoryBackend *node_memdev;
>      bool present;
>      QLIST_HEAD(, numa_addr_range) addr; /* List to store address ranges */
> diff --git a/numa.c b/numa.c
> index 523558f..088fae3 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -177,7 +177,6 @@ static void numa_node_parse(MachineState *ms, NumaNodeOptions *node,
>                         cpus->value, max_cpus);
>              return;
>          }
> -        bitmap_set(numa_info[nodenr].node_cpu, cpus->value, 1);
>          props = mc->cpu_index_to_instance_props(ms, cpus->value);
>          props.node_id = nodenr;
>          props.has_node_id = true;
> @@ -261,51 +260,12 @@ end:
>      return 0;
>  }
>  
> -static char *enumerate_cpus(unsigned long *cpus, int max_cpus)
> -{
> -    int cpu;
> -    bool first = true;
> -    GString *s = g_string_new(NULL);
> -
> -    for (cpu = find_first_bit(cpus, max_cpus);
> -        cpu < max_cpus;
> -        cpu = find_next_bit(cpus, max_cpus, cpu + 1)) {
> -        g_string_append_printf(s, "%s%d", first ? "" : " ", cpu);
> -        first = false;
> -    }
> -    return g_string_free(s, FALSE);
> -}
> -
> -static void validate_numa_cpus(void)
> -{
> -    int i;
> -    unsigned long *seen_cpus = bitmap_new(max_cpus);
> -
> -    for (i = 0; i < nb_numa_nodes; i++) {
> -        if (bitmap_intersects(seen_cpus, numa_info[i].node_cpu, max_cpus)) {
> -            bitmap_and(seen_cpus, seen_cpus,
> -                       numa_info[i].node_cpu, max_cpus);
> -            error_report("CPU(s) present in multiple NUMA nodes: %s",
> -                         enumerate_cpus(seen_cpus, max_cpus));
> -            g_free(seen_cpus);
> -            exit(EXIT_FAILURE);
> -        }
> -        bitmap_or(seen_cpus, seen_cpus,
> -                  numa_info[i].node_cpu, max_cpus);
> -    }
> -    g_free(seen_cpus);
> -}
> -
>  void parse_numa_opts(MachineState *ms)
>  {
>      int i;
>      const CPUArchIdList *possible_cpus;
>      MachineClass *mc = MACHINE_GET_CLASS(ms);
>  
> -    for (i = 0; i < MAX_NODES; i++) {
> -        numa_info[i].node_cpu = bitmap_new(max_cpus);
> -    }
> -
>      if (qemu_opts_foreach(qemu_find_opts("numa"), parse_numa, ms, NULL)) {
>          exit(1);
>      }
> @@ -397,12 +357,10 @@ void parse_numa_opts(MachineState *ms)
>                  props = mc->cpu_index_to_instance_props(ms, i);
>                  props.has_node_id = true;
>  
> -                set_bit(i, numa_info[props.node_id].node_cpu);
>                  machine_set_cpu_numa_node(ms, &props, &error_fatal);
>              }
>          }
>  
> -        validate_numa_cpus();
>      } else {
>          numa_set_mem_node_id(0, ram_size, 0);
>      }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 22/23] numa: add '-numa cpu, ...' option for property based node mapping
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 22/23] numa: add '-numa cpu, ...' option for property based node mapping Igor Mammedov
  2017-03-23 13:23   ` Eric Blake
@ 2017-03-28  5:16   ` David Gibson
  2017-03-28 11:09     ` Igor Mammedov
  1 sibling, 1 reply; 77+ messages in thread
From: David Gibson @ 2017-03-28  5:16 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Eduardo Habkost, Peter Maydell, Andrew Jones,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 5976 bytes --]

On Wed, Mar 22, 2017 at 02:32:47PM +0100, Igor Mammedov wrote:
> legacy cpu to node mapping is using cpu index values to map
> VCPU to node with help of '-numa node,nodeid=node,cpus=x[-y]'
> option. However cpu index is internal concept and QEMU users
> have to guess /reimplement qemu's logic/ to map it to
> a concrete cpu socket/core/thread to make sane CPUs
> placement across numa nodes.
> 
> This patch allows to map cpu objects to numa nodes using
> the same properties as used for cpus with -device/device_add
> (socket-id/core-id/thread-id/node-id).
> 
> At present valid properties/values to address CPUs could be
> fetched using hotpluggable-cpus monitor/qmp command, it will
> require user to start qemu twice when creating domain to fetch
> possible CPUs for a machine type/-smp layout first and
> then the second time with numa explicit mapping for actual
> usage. The first step results could be saved and reused to
> set/change mapping later as far as machine type/-smp stays
> the same.
> 
> Proposed impl. supports exact and wildcard matching to
> simplify CLI and allow to set mapping for a specific cpu
> or group of cpu objects specified by matched properties.
> 
> For example:
> 
>    # exact mapping x86
>    -numa cpu,node-id=x,socket-id=y,core-id=z,thread-id=n
> 
>    # exact mapping SPAPR
>    -numa cpu,node-id=x,core-id=y
> 
>    # wildcard mapping, all cpu objects that match socket-id=y
>    # are mapped to node-id=x
>    -numa cpu,node-id=x,socket-id=y
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

What's the rationale for adding a new CLI, rather than adding node-id
properties to the appropriate objects with -device, -global or -set as
appropriate?

> ---
>  numa.c           | 13 +++++++++++++
>  qapi-schema.json |  7 +++++--
>  qemu-options.hx  | 23 ++++++++++++++++++++++-
>  3 files changed, 40 insertions(+), 3 deletions(-)
> 
> diff --git a/numa.c b/numa.c
> index 088fae3..588586b 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -246,6 +246,19 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
>          }
>          nb_numa_nodes++;
>          break;
> +    case NUMA_OPTIONS_TYPE_CPU:
> +        if (!object->u.cpu.has_node_id) {
> +            error_setg(&err, "Missing mandatory node-id property");
> +            goto end;
> +        }
> +        if (!numa_info[object->u.cpu.node_id].present) {
> +            error_setg(&err, "Invalid node-id=%" PRId64 ", NUMA node must be "
> +                "defined with -numa node,nodeid=ID before it's used with "
> +                "-numa cpu,node-id=ID", object->u.cpu.node_id);
> +            goto end;
> +        }
> +        machine_set_cpu_numa_node(ms, &object->u.cpu, &err);
> +        break;
>      default:
>          abort();
>      }
> diff --git a/qapi-schema.json b/qapi-schema.json
> index a6b5955..a9a1d5e 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -5673,10 +5673,12 @@
>  ##
>  # @NumaOptionsType:
>  #
> +# @cpu: property based CPU(s) to node mapping (Since: 2.10)
> +#
>  # Since: 2.1
>  ##
>  { 'enum': 'NumaOptionsType',
> -  'data': [ 'node' ] }
> +  'data': [ 'node', 'cpu' ] }
>  
>  ##
>  # @NumaOptions:
> @@ -5689,7 +5691,8 @@
>    'base': { 'type': 'NumaOptionsType' },
>    'discriminator': 'type',
>    'data': {
> -    'node': 'NumaNodeOptions' }}
> +    'node': 'NumaNodeOptions',
> +    'cpu': 'CpuInstanceProperties' }}
>  
>  ##
>  # @NumaNodeOptions:
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 99af8ed..2185c34 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -139,13 +139,16 @@ ETEXI
>  
>  DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>      "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
> -    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n", QEMU_ARCH_ALL)
> +    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
> +    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n", QEMU_ARCH_ALL)
>  STEXI
>  @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
>  @itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
> +@itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
>  @findex -numa
>  Define a NUMA node and assign RAM and VCPUs to it.
>  
> +Legacy VCPU assignment uses @samp{cpus} option where
>  @var{firstcpu} and @var{lastcpu} are CPU indexes. Each
>  @samp{cpus} option represent a contiguous range of CPU indexes
>  (or a single VCPU if @var{lastcpu} is omitted). A non-contiguous
> @@ -159,6 +162,24 @@ a NUMA node:
>  -numa node,cpus=0-2,cpus=5
>  @end example
>  
> +@samp{cpu} option is new alternative to @samp{cpus} option
> +uses @samp{socket-id|core-id|thread-id} properties to assign
> +CPU objects to a @var{node} using topology layout properties of CPU.
> +Set of properties is machine specific, and depends on used machine
> +type/@samp{smp} options. It could be queried with @samp{hotpluggable-cpus}
> +monitor command.
> +@samp{node-id} property specifies @var{node} to which CPU object
> +will be assigned, it's required for @var{node} to be declared
> +with @samp{node} option before it's used with @samp{cpu} option.
> +
> +For example:
> +@example
> +-M pc \
> +-smp 1,sockets=2,maxcpus=2 \
> +-numa node,nodeid=0 -numa node,nodeid=1 \
> +-numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=1,socket-id=1
> +@end example
> +
>  @samp{mem} assigns a given RAM amount to a node. @samp{memdev}
>  assigns RAM from a given memory backend device to a node. If
>  @samp{mem} and @samp{memdev} are omitted in all nodes, RAM is

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards
  2017-03-28  4:19   ` David Gibson
@ 2017-03-28 10:53     ` Igor Mammedov
  2017-03-29  2:24       ` David Gibson
  2017-04-20 14:29     ` Igor Mammedov
  1 sibling, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-03-28 10:53 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, Eduardo Habkost, Peter Maydell, Andrew Jones,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

On Tue, 28 Mar 2017 15:19:20 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Mar 22, 2017 at 02:32:30PM +0100, Igor Mammedov wrote:
> > Originally CPU threads were by default assigned in
> > round-robin fashion. However it was causing issues in
> > guest since CPU threads from the same socket/core could
> > be placed on different NUMA nodes.
> > Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
> > fixed it by grouping threads within a socket on the same node
> > introducing cpu_index_to_socket_id() callback and commit
> > 20bb648d (spapr: Fix default NUMA node allocation for threads)
> > reused callback to fix similar issues for SPAPR machine
> > even though socket doesn't make much sense there.
> > 
> > As result QEMU ended up having 3 default distribution rules
> > used by 3 targets /virt-arm, spapr, pc/.
> > 
> > In effort of moving NUMA mapping for CPUs into possible_cpus,
> > generalize default mapping in numa.c by making boards decide
> > on default mapping and let them explicitly tell generic
> > numa code to which node a CPU thread belongs to by replacing
> > cpu_index_to_socket_id() with @cpu_index_to_instance_props()
> > which provides default node_id assigned by board to specified
> > cpu_index.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > Patch only moves source of default mapping to possible_cpus[]
> > and leaves the rest of NUMA handling to numa_info[node_id].node_cpu
> > bitmaps. It's up to follow up patches to replace bitmaps
> > with possible_cpus[] internally.
> > ---
> >  include/hw/boards.h   |  8 ++++++--
> >  include/sysemu/numa.h |  2 +-
> >  hw/arm/virt.c         | 19 +++++++++++++++++--
> >  hw/i386/pc.c          | 22 ++++++++++++++++------
> >  hw/ppc/spapr.c        | 27 ++++++++++++++++++++-------
> >  numa.c                | 15 +++++++++------
> >  vl.c                  |  2 +-
> >  7 files changed, 70 insertions(+), 25 deletions(-)
> > 
> > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > index 269d0ba..1dd0fde 100644
> > --- a/include/hw/boards.h
> > +++ b/include/hw/boards.h
> > @@ -74,7 +74,10 @@ typedef struct {
> >   *    of HotplugHandler object, which handles hotplug operation
> >   *    for a given @dev. It may return NULL if @dev doesn't require
> >   *    any actions to be performed by hotplug handler.
> > - * @cpu_index_to_socket_id:
> > + * @cpu_index_to_instance_props:
> > + *    used to provide @cpu_index to socket/core/thread number mapping, allowing
> > + *    legacy code to perform maping from cpu_index to topology properties
> > + *    Returns: tuple of socket/core/thread ids given cpu_index belongs to.
> >   *    used to provide @cpu_index to socket number mapping, allowing
> >   *    a machine to group CPU threads belonging to the same socket/package
> >   *    Returns: socket number given cpu_index belongs to.
> > @@ -138,7 +141,8 @@ struct MachineClass {
> >  
> >      HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
> >                                             DeviceState *dev);
> > -    unsigned (*cpu_index_to_socket_id)(unsigned cpu_index);
> > +    CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine,
> > +                                                         unsigned cpu_index);
> >      const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
> >  };
> >  
> > diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> > index 8f09dcf..46ea6c7 100644
> > --- a/include/sysemu/numa.h
> > +++ b/include/sysemu/numa.h
> > @@ -24,7 +24,7 @@ typedef struct node_info {
> >  } NodeInfo;
> >  
> >  extern NodeInfo numa_info[MAX_NODES];
> > -void parse_numa_opts(MachineClass *mc);
> > +void parse_numa_opts(MachineState *ms);
> >  void numa_post_machine_init(void);
> >  void query_numa_node_mem(uint64_t node_mem[]);
> >  extern QemuOptsList qemu_numa_opts;
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index 0cbcbc1..8748d25 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -1554,6 +1554,16 @@ static void virt_set_gic_version(Object *obj, const char *value, Error **errp)
> >      }
> >  }
> >  
> > +static CpuInstanceProperties
> > +virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> > +{
> > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> > +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> > +
> > +    assert(cpu_index < possible_cpus->len);
> > +    return possible_cpus->cpus[cpu_index].props;;
> > +}
> > +
> 
> It seems a bit weird to have a machine specific hook to pull the
> property information when one way or another it's coming from the
> possible_cpus table, which is already constructed by a machine
> specific hook.  Could we add a range or list of cpu_index values to
> each possible_cpus entry instead, and have a generic lookup of the
> right entry based on that?
> 
> 
> >  static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
> >  {
> >      int n;
> > @@ -1573,8 +1583,12 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
> >          ms->possible_cpus->cpus[n].props.has_thread_id = true;
> >          ms->possible_cpus->cpus[n].props.thread_id = n;
> >  
> > -        /* TODO: add 'has_node/node' here to describe
> > -           to which node core belongs */
> > +        /* default distribution of CPUs over NUMA nodes */
> > +        if (nb_numa_nodes) {
> > +            /* preset values but do not enable them i.e. 'has_node_id = false',
> > +             * board will enable them if manual mapping wasn't present on CLI */
> 
> I'm a little confused by this comment, since I don't see any board
> code altering has_node_id.
> 
> > +            ms->possible_cpus->cpus[n].props.node_id = n % nb_numa_nodes;;
> > +        }
> >      }
> >      return ms->possible_cpus;
> >  }
> > @@ -1596,6 +1610,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
> >      /* We know we will never create a pre-ARMv7 CPU which needs 1K pages */
> >      mc->minimum_page_bits = 12;
> >      mc->possible_cpu_arch_ids = virt_possible_cpu_arch_ids;
> > +    mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
> >  }
> >  
> >  static const TypeInfo virt_machine_info = {
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index d24388e..7031100 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -2245,12 +2245,14 @@ static void pc_machine_reset(void)
> >      }
> >  }
> >  
> > -static unsigned pc_cpu_index_to_socket_id(unsigned cpu_index)
> > +static CpuInstanceProperties
> > +pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> >  {
> > -    X86CPUTopoInfo topo;
> > -    x86_topo_ids_from_idx(smp_cores, smp_threads, cpu_index,
> > -                          &topo);
> > -    return topo.pkg_id;
> > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> > +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> > +
> > +    assert(cpu_index < possible_cpus->len);
> > +    return possible_cpus->cpus[cpu_index].props;;
> 
> Since the pc and arm version of this are basically identical, I wonder
> if that should actually be the default implementation.  If we need it
> at all.
ARM is still moving target and props are not really defined for it yet,
so I'd like to keep it separate for now and when it stabilizes we can think
about generalizing it.

> 
> >  }
> >  
> >  static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
> > @@ -2282,6 +2284,14 @@ static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
> >          ms->possible_cpus->cpus[i].props.core_id = topo.core_id;
> >          ms->possible_cpus->cpus[i].props.has_thread_id = true;
> >          ms->possible_cpus->cpus[i].props.thread_id = topo.smt_id;
> > +
> > +        /* default distribution of CPUs over NUMA nodes */
> > +        if (nb_numa_nodes) {
> > +            /* preset values but do not enable them i.e. 'has_node_id = false',
> > +             * board will enable them if manual mapping wasn't present on CLI */
> > +            ms->possible_cpus->cpus[i].props.node_id =
> > +                topo.pkg_id % nb_numa_nodes;
> > +        }
> >      }
> >      return ms->possible_cpus;
> >  }
> > @@ -2324,7 +2334,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
> >      pcmc->acpi_data_size = 0x20000 + 0x8000;
> >      pcmc->save_tsc_khz = true;
> >      mc->get_hotplug_handler = pc_get_hotpug_handler;
> > -    mc->cpu_index_to_socket_id = pc_cpu_index_to_socket_id;
> > +    mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
> >      mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
> >      mc->has_hotpluggable_cpus = true;
> >      mc->default_boot_order = "cad";
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 6ee566d..9dcbbcc 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -2921,11 +2921,18 @@ static HotplugHandler *spapr_get_hotplug_handler(MachineState *machine,
> >      return NULL;
> >  }
> >  
> > -static unsigned spapr_cpu_index_to_socket_id(unsigned cpu_index)
> > +static CpuInstanceProperties
> > +spapr_cpu_index_to_props(MachineState *machine, unsigned cpu_index)
> >  {
> > -    /* Allocate to NUMA nodes on a "socket" basis (not that concept of
> > -     * socket means much for the paravirtualized PAPR platform) */
> > -    return cpu_index / smp_threads / smp_cores;
> > +    CPUArchId *core_slot;
> > +    MachineClass *mc = MACHINE_GET_CLASS(machine);
> > +    int core_id = cpu_index / smp_threads * smp_threads;
> 
> I don't think you need this.  AIUI the purpose of
> spapr_find_cpu_slot() is that it already finds the right CPU slot from
> a cpu_index, so you can just pass the cpu_index directly.
ok, will do in v2

> 
> > +
> > +    /* make sure possible_cpu are intialized */
> > +    mc->possible_cpu_arch_ids(machine);
> > +    core_slot = spapr_find_cpu_slot(machine, core_id, NULL);
> > +    assert(core_slot);
> > +    return core_slot->props;
> >  }
> >  
> >  static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
> > @@ -2952,8 +2959,14 @@ static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
> >          machine->possible_cpus->cpus[i].arch_id = core_id;
> >          machine->possible_cpus->cpus[i].props.has_core_id = true;
> >          machine->possible_cpus->cpus[i].props.core_id = core_id;
> > -        /* TODO: add 'has_node/node' here to describe
> > -           to which node core belongs */
> > +
> > +        /* default distribution of CPUs over NUMA nodes */
> > +        if (nb_numa_nodes) {
> > +            /* preset values but do not enable them i.e. 'has_node_id = false',
> > +             * board will enable them if manual mapping wasn't present on CLI */
> > +            machine->possible_cpus->cpus[i].props.node_id =
> > +                core_id / smp_threads / smp_cores % nb_numa_nodes;
> > +        }
> >      }
> >      return machine->possible_cpus;
> >  }
> > @@ -3076,7 +3089,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
> >      hc->pre_plug = spapr_machine_device_pre_plug;
> >      hc->plug = spapr_machine_device_plug;
> >      hc->unplug = spapr_machine_device_unplug;
> > -    mc->cpu_index_to_socket_id = spapr_cpu_index_to_socket_id;
> > +    mc->cpu_index_to_instance_props = spapr_cpu_index_to_props;
> >      mc->possible_cpu_arch_ids = spapr_possible_cpu_arch_ids;
> >      hc->unplug_request = spapr_machine_device_unplug_request;
> >  
> > diff --git a/numa.c b/numa.c
> > index e01cb54..b6e71bc 100644
> > --- a/numa.c
> > +++ b/numa.c
> > @@ -294,9 +294,10 @@ static void validate_numa_cpus(void)
> >      g_free(seen_cpus);
> >  }
> >  
> > -void parse_numa_opts(MachineClass *mc)
> > +void parse_numa_opts(MachineState *ms)
> >  {
> >      int i;
> > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> >  
> >      for (i = 0; i < MAX_NODES; i++) {
> >          numa_info[i].node_cpu = bitmap_new(max_cpus);
> > @@ -378,14 +379,16 @@ void parse_numa_opts(MachineClass *mc)
> >           * rule grouping VCPUs by socket so that VCPUs from the same socket
> >           * would be on the same node.
> >           */
> > +        if (!mc->cpu_index_to_instance_props) {
> > +            error_report("default CPUs to NUMA node mapping isn't supported");
> > +            exit(1);
> > +        }
> >          if (i == nb_numa_nodes) {
> >              for (i = 0; i < max_cpus; i++) {
> > -                unsigned node_id = i % nb_numa_nodes;
> > -                if (mc->cpu_index_to_socket_id) {
> > -                    node_id = mc->cpu_index_to_socket_id(i) % nb_numa_nodes;
> > -                }
> > +                CpuInstanceProperties props;
> > +                props = mc->cpu_index_to_instance_props(ms, i);
> >  
> > -                set_bit(i, numa_info[node_id].node_cpu);
> > +                set_bit(i, numa_info[props.node_id].node_cpu);
> >              }
> >          }
> >  
> > diff --git a/vl.c b/vl.c
> > index 0b4ed52..5ffb9c3 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -4498,7 +4498,7 @@ int main(int argc, char **argv, char **envp)
> >      default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
> >      default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);
> >  
> > -    parse_numa_opts(machine_class);
> > +    parse_numa_opts(current_machine);
> >  
> >      if (qemu_opts_foreach(qemu_find_opts("mon"),
> >                            mon_init_func, NULL, NULL)) {
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 22/23] numa: add '-numa cpu, ...' option for property based node mapping
  2017-03-28  5:16   ` David Gibson
@ 2017-03-28 11:09     ` Igor Mammedov
  2017-03-29  2:27       ` David Gibson
  0 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-03-28 11:09 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, Eduardo Habkost, Peter Maydell, Andrew Jones,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

On Tue, 28 Mar 2017 16:16:02 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Mar 22, 2017 at 02:32:47PM +0100, Igor Mammedov wrote:
> > legacy cpu to node mapping is using cpu index values to map
> > VCPU to node with help of '-numa node,nodeid=node,cpus=x[-y]'
> > option. However cpu index is internal concept and QEMU users
> > have to guess /reimplement qemu's logic/ to map it to
> > a concrete cpu socket/core/thread to make sane CPUs
> > placement across numa nodes.
> > 
> > This patch allows to map cpu objects to numa nodes using
> > the same properties as used for cpus with -device/device_add
> > (socket-id/core-id/thread-id/node-id).
> > 
> > At present valid properties/values to address CPUs could be
> > fetched using hotpluggable-cpus monitor/qmp command, it will
> > require user to start qemu twice when creating domain to fetch
> > possible CPUs for a machine type/-smp layout first and
> > then the second time with numa explicit mapping for actual
> > usage. The first step results could be saved and reused to
> > set/change mapping later as far as machine type/-smp stays
> > the same.
> > 
> > Proposed impl. supports exact and wildcard matching to
> > simplify CLI and allow to set mapping for a specific cpu
> > or group of cpu objects specified by matched properties.
> > 
> > For example:
> > 
> >    # exact mapping x86
> >    -numa cpu,node-id=x,socket-id=y,core-id=z,thread-id=n
> > 
> >    # exact mapping SPAPR
> >    -numa cpu,node-id=x,core-id=y
> > 
> >    # wildcard mapping, all cpu objects that match socket-id=y
> >    # are mapped to node-id=x
> >    -numa cpu,node-id=x,socket-id=y
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> 
> What's the rationale for adding a new CLI, rather than adding node-id
> properties to the appropriate objects with -device, -global or -set as
> appropriate?
 '-global' applies to all cpus, while '-device,-set' applies to present
 at boot time cpus only. So they do not work for the case of possible but
 not present at boot time objects. For ACPI based targets, we need to have
 numa mapping at boot time to build ACPI SRAT table.
 I don't know if it's important for spapr/fdt, but it uses current predefined
 mapping with -numa node,cpus=x-y and new CLI hides from user internal
 cpu_index and allows to use the same properties as we use for -device cpu,...
 to define mapping to numa nodes for present/possible cpus.

> 
> > ---
> >  numa.c           | 13 +++++++++++++
> >  qapi-schema.json |  7 +++++--
> >  qemu-options.hx  | 23 ++++++++++++++++++++++-
> >  3 files changed, 40 insertions(+), 3 deletions(-)
> > 
> > diff --git a/numa.c b/numa.c
> > index 088fae3..588586b 100644
> > --- a/numa.c
> > +++ b/numa.c
> > @@ -246,6 +246,19 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
> >          }
> >          nb_numa_nodes++;
> >          break;
> > +    case NUMA_OPTIONS_TYPE_CPU:
> > +        if (!object->u.cpu.has_node_id) {
> > +            error_setg(&err, "Missing mandatory node-id property");
> > +            goto end;
> > +        }
> > +        if (!numa_info[object->u.cpu.node_id].present) {
> > +            error_setg(&err, "Invalid node-id=%" PRId64 ", NUMA node must be "
> > +                "defined with -numa node,nodeid=ID before it's used with "
> > +                "-numa cpu,node-id=ID", object->u.cpu.node_id);
> > +            goto end;
> > +        }
> > +        machine_set_cpu_numa_node(ms, &object->u.cpu, &err);
> > +        break;
> >      default:
> >          abort();
> >      }
> > diff --git a/qapi-schema.json b/qapi-schema.json
> > index a6b5955..a9a1d5e 100644
> > --- a/qapi-schema.json
> > +++ b/qapi-schema.json
> > @@ -5673,10 +5673,12 @@
> >  ##
> >  # @NumaOptionsType:
> >  #
> > +# @cpu: property based CPU(s) to node mapping (Since: 2.10)
> > +#
> >  # Since: 2.1
> >  ##
> >  { 'enum': 'NumaOptionsType',
> > -  'data': [ 'node' ] }
> > +  'data': [ 'node', 'cpu' ] }
> >  
> >  ##
> >  # @NumaOptions:
> > @@ -5689,7 +5691,8 @@
> >    'base': { 'type': 'NumaOptionsType' },
> >    'discriminator': 'type',
> >    'data': {
> > -    'node': 'NumaNodeOptions' }}
> > +    'node': 'NumaNodeOptions',
> > +    'cpu': 'CpuInstanceProperties' }}
> >  
> >  ##
> >  # @NumaNodeOptions:
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 99af8ed..2185c34 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -139,13 +139,16 @@ ETEXI
> >  
> >  DEF("numa", HAS_ARG, QEMU_OPTION_numa,
> >      "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
> > -    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n", QEMU_ARCH_ALL)
> > +    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
> > +    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n", QEMU_ARCH_ALL)
> >  STEXI
> >  @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
> >  @itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
> > +@itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
> >  @findex -numa
> >  Define a NUMA node and assign RAM and VCPUs to it.
> >  
> > +Legacy VCPU assignment uses @samp{cpus} option where
> >  @var{firstcpu} and @var{lastcpu} are CPU indexes. Each
> >  @samp{cpus} option represent a contiguous range of CPU indexes
> >  (or a single VCPU if @var{lastcpu} is omitted). A non-contiguous
> > @@ -159,6 +162,24 @@ a NUMA node:
> >  -numa node,cpus=0-2,cpus=5
> >  @end example
> >  
> > +@samp{cpu} option is new alternative to @samp{cpus} option
> > +uses @samp{socket-id|core-id|thread-id} properties to assign
> > +CPU objects to a @var{node} using topology layout properties of CPU.
> > +Set of properties is machine specific, and depends on used machine
> > +type/@samp{smp} options. It could be queried with @samp{hotpluggable-cpus}
> > +monitor command.
> > +@samp{node-id} property specifies @var{node} to which CPU object
> > +will be assigned, it's required for @var{node} to be declared
> > +with @samp{node} option before it's used with @samp{cpu} option.
> > +
> > +For example:
> > +@example
> > +-M pc \
> > +-smp 1,sockets=2,maxcpus=2 \
> > +-numa node,nodeid=0 -numa node,nodeid=1 \
> > +-numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=1,socket-id=1
> > +@end example
> > +
> >  @samp{mem} assigns a given RAM amount to a node. @samp{memdev}
> >  assigns RAM from a given memory backend device to a node. If
> >  @samp{mem} and @samp{memdev} are omitted in all nodes, RAM is
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards
  2017-03-28 10:53     ` Igor Mammedov
@ 2017-03-29  2:24       ` David Gibson
  2017-03-29 11:48         ` Igor Mammedov
  0 siblings, 1 reply; 77+ messages in thread
From: David Gibson @ 2017-03-29  2:24 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Eduardo Habkost, Peter Maydell, Andrew Jones,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 3483 bytes --]

On Tue, Mar 28, 2017 at 12:53:10PM +0200, Igor Mammedov wrote:
> On Tue, 28 Mar 2017 15:19:20 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, Mar 22, 2017 at 02:32:30PM +0100, Igor Mammedov wrote:
> > > Originally CPU threads were by default assigned in
> > > round-robin fashion. However it was causing issues in
> > > guest since CPU threads from the same socket/core could
> > > be placed on different NUMA nodes.
> > > Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
> > > fixed it by grouping threads within a socket on the same node
> > > introducing cpu_index_to_socket_id() callback and commit
> > > 20bb648d (spapr: Fix default NUMA node allocation for threads)
> > > reused callback to fix similar issues for SPAPR machine
> > > even though socket doesn't make much sense there.
> > > 
> > > As result QEMU ended up having 3 default distribution rules
> > > used by 3 targets /virt-arm, spapr, pc/.
> > > 
> > > In effort of moving NUMA mapping for CPUs into possible_cpus,
> > > generalize default mapping in numa.c by making boards decide
> > > on default mapping and let them explicitly tell generic
> > > numa code to which node a CPU thread belongs to by replacing
> > > cpu_index_to_socket_id() with @cpu_index_to_instance_props()
> > > which provides default node_id assigned by board to specified
> > > cpu_index.
> > > 
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
[snip]
> > > +static CpuInstanceProperties
> > > +virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> > > +{
> > > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> > > +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> > > +
> > > +    assert(cpu_index < possible_cpus->len);
> > > +    return possible_cpus->cpus[cpu_index].props;;
> > > +}
> > > +
> > 
> > It seems a bit weird to have a machine specific hook to pull the
> > property information when one way or another it's coming from the
> > possible_cpus table, which is already constructed by a machine
> > specific hook.  Could we add a range or list of cpu_index values to
> > each possible_cpus entry instead, and have a generic lookup of the
> > right entry based on that?

[snip]
> > > -static unsigned pc_cpu_index_to_socket_id(unsigned cpu_index)
> > > +static CpuInstanceProperties
> > > +pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> > >  {
> > > -    X86CPUTopoInfo topo;
> > > -    x86_topo_ids_from_idx(smp_cores, smp_threads, cpu_index,
> > > -                          &topo);
> > > -    return topo.pkg_id;
> > > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> > > +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> > > +
> > > +    assert(cpu_index < possible_cpus->len);
> > > +    return possible_cpus->cpus[cpu_index].props;;
> > 
> > Since the pc and arm version of this are basically identical, I wonder
> > if that should actually be the default implementation.  If we need it
> > at all.
> ARM is still moving target and props are not really defined for it yet,
> so I'd like to keep it separate for now and when it stabilizes we can think
> about generalizing it.

Fair enough.

Any thoughts on my more general query above

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 22/23] numa: add '-numa cpu, ...' option for property based node mapping
  2017-03-28 11:09     ` Igor Mammedov
@ 2017-03-29  2:27       ` David Gibson
  2017-03-29 12:08         ` Igor Mammedov
  0 siblings, 1 reply; 77+ messages in thread
From: David Gibson @ 2017-03-29  2:27 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Eduardo Habkost, Peter Maydell, Andrew Jones,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 7581 bytes --]

On Tue, Mar 28, 2017 at 01:09:11PM +0200, Igor Mammedov wrote:
> On Tue, 28 Mar 2017 16:16:02 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, Mar 22, 2017 at 02:32:47PM +0100, Igor Mammedov wrote:
> > > legacy cpu to node mapping is using cpu index values to map
> > > VCPU to node with help of '-numa node,nodeid=node,cpus=x[-y]'
> > > option. However cpu index is internal concept and QEMU users
> > > have to guess /reimplement qemu's logic/ to map it to
> > > a concrete cpu socket/core/thread to make sane CPUs
> > > placement across numa nodes.
> > > 
> > > This patch allows to map cpu objects to numa nodes using
> > > the same properties as used for cpus with -device/device_add
> > > (socket-id/core-id/thread-id/node-id).
> > > 
> > > At present valid properties/values to address CPUs could be
> > > fetched using hotpluggable-cpus monitor/qmp command, it will
> > > require user to start qemu twice when creating domain to fetch
> > > possible CPUs for a machine type/-smp layout first and
> > > then the second time with numa explicit mapping for actual
> > > usage. The first step results could be saved and reused to
> > > set/change mapping later as far as machine type/-smp stays
> > > the same.
> > > 
> > > Proposed impl. supports exact and wildcard matching to
> > > simplify CLI and allow to set mapping for a specific cpu
> > > or group of cpu objects specified by matched properties.
> > > 
> > > For example:
> > > 
> > >    # exact mapping x86
> > >    -numa cpu,node-id=x,socket-id=y,core-id=z,thread-id=n
> > > 
> > >    # exact mapping SPAPR
> > >    -numa cpu,node-id=x,core-id=y
> > > 
> > >    # wildcard mapping, all cpu objects that match socket-id=y
> > >    # are mapped to node-id=x
> > >    -numa cpu,node-id=x,socket-id=y
> > > 
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > 
> > What's the rationale for adding a new CLI, rather than adding node-id
> > properties to the appropriate objects with -device, -global or -set as
> > appropriate?
>  '-global' applies to all cpus, while '-device,-set' applies to present
>  at boot time cpus only. So they do not work for the case of possible but
>  not present at boot time objects.

Ah!  Of course.

> For ACPI based targets, we need to have
>  numa mapping at boot time to build ACPI SRAT table.
>  I don't know if it's important for spapr/fdt,

Not in the same way.  For spapr the device tree fragment for the new
cpu is supplied to the guest at hotplug time rather than having to be
in the initial device tree.  So for us, node could be supplied with
device_add.

> but it uses current predefined
>  mapping with -numa node,cpus=x-y and new CLI hides from user internal
>  cpu_index and allows to use the same properties as we use for -device cpu,...
>  to define mapping to numa nodes for present/possible cpus.
> 
> > 
> > > ---
> > >  numa.c           | 13 +++++++++++++
> > >  qapi-schema.json |  7 +++++--
> > >  qemu-options.hx  | 23 ++++++++++++++++++++++-
> > >  3 files changed, 40 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/numa.c b/numa.c
> > > index 088fae3..588586b 100644
> > > --- a/numa.c
> > > +++ b/numa.c
> > > @@ -246,6 +246,19 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
> > >          }
> > >          nb_numa_nodes++;
> > >          break;
> > > +    case NUMA_OPTIONS_TYPE_CPU:
> > > +        if (!object->u.cpu.has_node_id) {
> > > +            error_setg(&err, "Missing mandatory node-id property");
> > > +            goto end;
> > > +        }
> > > +        if (!numa_info[object->u.cpu.node_id].present) {
> > > +            error_setg(&err, "Invalid node-id=%" PRId64 ", NUMA node must be "
> > > +                "defined with -numa node,nodeid=ID before it's used with "
> > > +                "-numa cpu,node-id=ID", object->u.cpu.node_id);
> > > +            goto end;
> > > +        }
> > > +        machine_set_cpu_numa_node(ms, &object->u.cpu, &err);
> > > +        break;
> > >      default:
> > >          abort();
> > >      }
> > > diff --git a/qapi-schema.json b/qapi-schema.json
> > > index a6b5955..a9a1d5e 100644
> > > --- a/qapi-schema.json
> > > +++ b/qapi-schema.json
> > > @@ -5673,10 +5673,12 @@
> > >  ##
> > >  # @NumaOptionsType:
> > >  #
> > > +# @cpu: property based CPU(s) to node mapping (Since: 2.10)
> > > +#
> > >  # Since: 2.1
> > >  ##
> > >  { 'enum': 'NumaOptionsType',
> > > -  'data': [ 'node' ] }
> > > +  'data': [ 'node', 'cpu' ] }
> > >  
> > >  ##
> > >  # @NumaOptions:
> > > @@ -5689,7 +5691,8 @@
> > >    'base': { 'type': 'NumaOptionsType' },
> > >    'discriminator': 'type',
> > >    'data': {
> > > -    'node': 'NumaNodeOptions' }}
> > > +    'node': 'NumaNodeOptions',
> > > +    'cpu': 'CpuInstanceProperties' }}
> > >  
> > >  ##
> > >  # @NumaNodeOptions:
> > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > index 99af8ed..2185c34 100644
> > > --- a/qemu-options.hx
> > > +++ b/qemu-options.hx
> > > @@ -139,13 +139,16 @@ ETEXI
> > >  
> > >  DEF("numa", HAS_ARG, QEMU_OPTION_numa,
> > >      "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
> > > -    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n", QEMU_ARCH_ALL)
> > > +    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
> > > +    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n", QEMU_ARCH_ALL)
> > >  STEXI
> > >  @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
> > >  @itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
> > > +@itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
> > >  @findex -numa
> > >  Define a NUMA node and assign RAM and VCPUs to it.
> > >  
> > > +Legacy VCPU assignment uses @samp{cpus} option where
> > >  @var{firstcpu} and @var{lastcpu} are CPU indexes. Each
> > >  @samp{cpus} option represent a contiguous range of CPU indexes
> > >  (or a single VCPU if @var{lastcpu} is omitted). A non-contiguous
> > > @@ -159,6 +162,24 @@ a NUMA node:
> > >  -numa node,cpus=0-2,cpus=5
> > >  @end example
> > >  
> > > +@samp{cpu} option is new alternative to @samp{cpus} option
> > > +uses @samp{socket-id|core-id|thread-id} properties to assign
> > > +CPU objects to a @var{node} using topology layout properties of CPU.
> > > +Set of properties is machine specific, and depends on used machine
> > > +type/@samp{smp} options. It could be queried with @samp{hotpluggable-cpus}
> > > +monitor command.
> > > +@samp{node-id} property specifies @var{node} to which CPU object
> > > +will be assigned, it's required for @var{node} to be declared
> > > +with @samp{node} option before it's used with @samp{cpu} option.
> > > +
> > > +For example:
> > > +@example
> > > +-M pc \
> > > +-smp 1,sockets=2,maxcpus=2 \
> > > +-numa node,nodeid=0 -numa node,nodeid=1 \
> > > +-numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=1,socket-id=1
> > > +@end example
> > > +
> > >  @samp{mem} assigns a given RAM amount to a node. @samp{memdev}
> > >  assigns RAM from a given memory backend device to a node. If
> > >  @samp{mem} and @samp{memdev} are omitted in all nodes, RAM is
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards
  2017-03-29  2:24       ` David Gibson
@ 2017-03-29 11:48         ` Igor Mammedov
  0 siblings, 0 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-03-29 11:48 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, Eduardo Habkost, Peter Maydell, Andrew Jones,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

On Wed, 29 Mar 2017 13:24:49 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Tue, Mar 28, 2017 at 12:53:10PM +0200, Igor Mammedov wrote:
> > On Tue, 28 Mar 2017 15:19:20 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Wed, Mar 22, 2017 at 02:32:30PM +0100, Igor Mammedov wrote:  
> > > > Originally CPU threads were by default assigned in
> > > > round-robin fashion. However it was causing issues in
> > > > guest since CPU threads from the same socket/core could
> > > > be placed on different NUMA nodes.
> > > > Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
> > > > fixed it by grouping threads within a socket on the same node
> > > > introducing cpu_index_to_socket_id() callback and commit
> > > > 20bb648d (spapr: Fix default NUMA node allocation for threads)
> > > > reused callback to fix similar issues for SPAPR machine
> > > > even though socket doesn't make much sense there.
> > > > 
> > > > As result QEMU ended up having 3 default distribution rules
> > > > used by 3 targets /virt-arm, spapr, pc/.
> > > > 
> > > > In effort of moving NUMA mapping for CPUs into possible_cpus,
> > > > generalize default mapping in numa.c by making boards decide
> > > > on default mapping and let them explicitly tell generic
> > > > numa code to which node a CPU thread belongs to by replacing
> > > > cpu_index_to_socket_id() with @cpu_index_to_instance_props()
> > > > which provides default node_id assigned by board to specified
> > > > cpu_index.
> > > > 
> > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> [snip]
> > > > +static CpuInstanceProperties
> > > > +virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> > > > +{
> > > > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> > > > +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> > > > +
> > > > +    assert(cpu_index < possible_cpus->len);
> > > > +    return possible_cpus->cpus[cpu_index].props;;
> > > > +}
> > > > +  
> > > 
> > > It seems a bit weird to have a machine specific hook to pull the
> > > property information when one way or another it's coming from the
> > > possible_cpus table, which is already constructed by a machine
> > > specific hook.  Could we add a range or list of cpu_index values to
> > > each possible_cpus entry instead, and have a generic lookup of the
> > > right entry based on that?  
> 
> [snip]
> > > > -static unsigned pc_cpu_index_to_socket_id(unsigned cpu_index)
> > > > +static CpuInstanceProperties
> > > > +pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> > > >  {
> > > > -    X86CPUTopoInfo topo;
> > > > -    x86_topo_ids_from_idx(smp_cores, smp_threads, cpu_index,
> > > > -                          &topo);
> > > > -    return topo.pkg_id;
> > > > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> > > > +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> > > > +
> > > > +    assert(cpu_index < possible_cpus->len);
> > > > +    return possible_cpus->cpus[cpu_index].props;;  
> > > 
> > > Since the pc and arm version of this are basically identical, I wonder
> > > if that should actually be the default implementation.  If we need it
> > > at all.  
> > ARM is still moving target and props are not really defined for it yet,
> > so I'd like to keep it separate for now and when it stabilizes we can think
> > about generalizing it.  
> 
> Fair enough.
> 
> Any thoughts on my more general query above
None so far.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 22/23] numa: add '-numa cpu, ...' option for property based node mapping
  2017-03-29  2:27       ` David Gibson
@ 2017-03-29 12:08         ` Igor Mammedov
  2017-04-03  4:40           ` David Gibson
  0 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-03-29 12:08 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, Eduardo Habkost, Peter Maydell, Andrew Jones,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

On Wed, 29 Mar 2017 13:27:23 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Tue, Mar 28, 2017 at 01:09:11PM +0200, Igor Mammedov wrote:
> > On Tue, 28 Mar 2017 16:16:02 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Wed, Mar 22, 2017 at 02:32:47PM +0100, Igor Mammedov wrote:  
> > > > legacy cpu to node mapping is using cpu index values to map
> > > > VCPU to node with help of '-numa node,nodeid=node,cpus=x[-y]'
> > > > option. However cpu index is internal concept and QEMU users
> > > > have to guess /reimplement qemu's logic/ to map it to
> > > > a concrete cpu socket/core/thread to make sane CPUs
> > > > placement across numa nodes.
> > > > 
> > > > This patch allows to map cpu objects to numa nodes using
> > > > the same properties as used for cpus with -device/device_add
> > > > (socket-id/core-id/thread-id/node-id).
> > > > 
> > > > At present valid properties/values to address CPUs could be
> > > > fetched using hotpluggable-cpus monitor/qmp command, it will
> > > > require user to start qemu twice when creating domain to fetch
> > > > possible CPUs for a machine type/-smp layout first and
> > > > then the second time with numa explicit mapping for actual
> > > > usage. The first step results could be saved and reused to
> > > > set/change mapping later as far as machine type/-smp stays
> > > > the same.
> > > > 
> > > > Proposed impl. supports exact and wildcard matching to
> > > > simplify CLI and allow to set mapping for a specific cpu
> > > > or group of cpu objects specified by matched properties.
> > > > 
> > > > For example:
> > > > 
> > > >    # exact mapping x86
> > > >    -numa cpu,node-id=x,socket-id=y,core-id=z,thread-id=n
> > > > 
> > > >    # exact mapping SPAPR
> > > >    -numa cpu,node-id=x,core-id=y
> > > > 
> > > >    # wildcard mapping, all cpu objects that match socket-id=y
> > > >    # are mapped to node-id=x
> > > >    -numa cpu,node-id=x,socket-id=y
> > > > 
> > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> > > 
> > > What's the rationale for adding a new CLI, rather than adding node-id
> > > properties to the appropriate objects with -device, -global or -set as
> > > appropriate?  
> >  '-global' applies to all cpus, while '-device,-set' applies to present
> >  at boot time cpus only. So they do not work for the case of possible but
> >  not present at boot time objects.  
> 
> Ah!  Of course.
> 
> > For ACPI based targets, we need to have
> >  numa mapping at boot time to build ACPI SRAT table.
> >  I don't know if it's important for spapr/fdt,  
> 
> Not in the same way.  For spapr the device tree fragment for the new
> cpu is supplied to the guest at hotplug time rather than having to be
> in the initial device tree.  So for us, node could be supplied with
> device_add.
I've implemented cpu.node-id check in the same way for all targets
for spapr it's patch patch 06/23 which forces cpu.node-id to match
whatever mapping has been provided with -numa cpu[s]
OR
with implied default /0/ if mapping for cpu hasn't been specified
with -numa explicitly.

That way it won't break legacy machines and on compat code is necessary,
I'd would leave it up to you with patch on top of this to lift restriction/make
it more relaxed for spapr if you think it won't break anything.

Although from libvirt pov, I'd prefer to treat all targets uniformly,
which narrows choice down to '-numa' mapping approach that it uses now.

> > but it uses current predefined
> >  mapping with -numa node,cpus=x-y and new CLI hides from user internal
> >  cpu_index and allows to use the same properties as we use for -device cpu,...
> >  to define mapping to numa nodes for present/possible cpus.
...

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 22/23] numa: add '-numa cpu, ...' option for property based node mapping
  2017-03-29 12:08         ` Igor Mammedov
@ 2017-04-03  4:40           ` David Gibson
  0 siblings, 0 replies; 77+ messages in thread
From: David Gibson @ 2017-04-03  4:40 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Eduardo Habkost, Peter Maydell, Andrew Jones,
	Eric Blake, Paolo Bonzini, Shannon Zhao, qemu-arm, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 4414 bytes --]

On Wed, Mar 29, 2017 at 02:08:58PM +0200, Igor Mammedov wrote:
> On Wed, 29 Mar 2017 13:27:23 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Tue, Mar 28, 2017 at 01:09:11PM +0200, Igor Mammedov wrote:
> > > On Tue, 28 Mar 2017 16:16:02 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > > > On Wed, Mar 22, 2017 at 02:32:47PM +0100, Igor Mammedov wrote:  
> > > > > legacy cpu to node mapping is using cpu index values to map
> > > > > VCPU to node with help of '-numa node,nodeid=node,cpus=x[-y]'
> > > > > option. However cpu index is internal concept and QEMU users
> > > > > have to guess /reimplement qemu's logic/ to map it to
> > > > > a concrete cpu socket/core/thread to make sane CPUs
> > > > > placement across numa nodes.
> > > > > 
> > > > > This patch allows to map cpu objects to numa nodes using
> > > > > the same properties as used for cpus with -device/device_add
> > > > > (socket-id/core-id/thread-id/node-id).
> > > > > 
> > > > > At present valid properties/values to address CPUs could be
> > > > > fetched using hotpluggable-cpus monitor/qmp command, it will
> > > > > require user to start qemu twice when creating domain to fetch
> > > > > possible CPUs for a machine type/-smp layout first and
> > > > > then the second time with numa explicit mapping for actual
> > > > > usage. The first step results could be saved and reused to
> > > > > set/change mapping later as far as machine type/-smp stays
> > > > > the same.
> > > > > 
> > > > > Proposed impl. supports exact and wildcard matching to
> > > > > simplify CLI and allow to set mapping for a specific cpu
> > > > > or group of cpu objects specified by matched properties.
> > > > > 
> > > > > For example:
> > > > > 
> > > > >    # exact mapping x86
> > > > >    -numa cpu,node-id=x,socket-id=y,core-id=z,thread-id=n
> > > > > 
> > > > >    # exact mapping SPAPR
> > > > >    -numa cpu,node-id=x,core-id=y
> > > > > 
> > > > >    # wildcard mapping, all cpu objects that match socket-id=y
> > > > >    # are mapped to node-id=x
> > > > >    -numa cpu,node-id=x,socket-id=y
> > > > > 
> > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> > > > 
> > > > What's the rationale for adding a new CLI, rather than adding node-id
> > > > properties to the appropriate objects with -device, -global or -set as
> > > > appropriate?  
> > >  '-global' applies to all cpus, while '-device,-set' applies to present
> > >  at boot time cpus only. So they do not work for the case of possible but
> > >  not present at boot time objects.  
> > 
> > Ah!  Of course.
> > 
> > > For ACPI based targets, we need to have
> > >  numa mapping at boot time to build ACPI SRAT table.
> > >  I don't know if it's important for spapr/fdt,  
> > 
> > Not in the same way.  For spapr the device tree fragment for the new
> > cpu is supplied to the guest at hotplug time rather than having to be
> > in the initial device tree.  So for us, node could be supplied with
> > device_add.
> I've implemented cpu.node-id check in the same way for all targets
> for spapr it's patch patch 06/23 which forces cpu.node-id to match
> whatever mapping has been provided with -numa cpu[s]
> OR
> with implied default /0/ if mapping for cpu hasn't been specified
> with -numa explicitly.
> 
> That way it won't break legacy machines and on compat code is necessary,
> I'd would leave it up to you with patch on top of this to lift restriction/make
> it more relaxed for spapr if you think it won't break anything.
> 
> Although from libvirt pov, I'd prefer to treat all targets uniformly,
> which narrows choice down to '-numa' mapping approach that it uses
> now.

Yeah, that makes sense.  If we ever have a compelling reason to allow
node designation at device_add time on Power, we can relax the
restrictions then.  I doubt it will ever happen.

> 
> > > but it uses current predefined
> > >  mapping with -numa node,cpus=x-y and new CLI hides from user internal
> > >  cpu_index and allows to use the same properties as we use for -device cpu,...
> > >  to define mapping to numa nodes for present/possible cpus.
> ...
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option
  2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
                   ` (22 preceding siblings ...)
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 23/23] tests: check -numa node, cpu=props_list usecase Igor Mammedov
@ 2017-04-12 20:18 ` Eduardo Habkost
  23 siblings, 0 replies; 77+ messages in thread
From: Eduardo Habkost @ 2017-04-12 20:18 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Peter Maydell, Andrew Jones, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Wed, Mar 22, 2017 at 02:32:25PM +0100, Igor Mammedov wrote:
> Changes since RFC:
>     * convert all targets that support numa (Eduardo)
>     * add numa CLI tests
>     * support wildcard matching with "-numa cpu,..." (Paolo)
> 
> Series introduces a new CLI option to allow mapping cpus to numa
> nodes using public properties [socket|core|thread]-ids instead of
> internal cpu_index and moving internal handling of cpu<->node
> mapping from cpu_index based global bitmaps to MachineState.
> 
> New '-numa cpu' option is supported only on PC and SPAPR
> machines that implement hotpluggable-cpus query.
> ARM machine user-facing interface stays cpu_index based due to
> lack of hotpluggable-cpus support, but internally cpu<->node
> mapping will be using the common for PC/SPAPR/ARM approach
> (i.e. store mapping info in MachineState:possible_cpus)
> 
> It only provides CLI interface to do mapping, there is no QMP
> one as I haven't found a suitable place/way to update/set mapping
> after machine_done for QEMU started with -S (stopped mode) so that
> mgmt could query hopluggable-cpus first, then map them to numa nodes
> in runtime before actually allowing guest to run.
> 
> Another alternative I've been considering is to add CLI option
> similar to -S but that would pause initialization before machine_init()
> callback is run so that user can get CPU layout with hopluggable-cpus,
> then map CPUs to numa nodes and unpause to let machine_init() initialize
> machine using previously predefined numa mapping.
> Such option might also be useful for other usecases.

I would support this approach. This would help on other use cases
as well, and it's what I suggsted at KVM Forum last year:
http://www.linux-kvm.org/images/4/46/03x06A-Eduardo_HabkostMachine-type_Introspection_and_Configuration_Where_Are_We_Going.pdf

But I would treat it as a future plan, as it might take some time
until we refactor the main-loop/QMP code to allow this to happen.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 07/23] pc: add node-id property to CPU
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 07/23] pc: add node-id property to CPU Igor Mammedov
@ 2017-04-12 21:02   ` Eduardo Habkost
  2017-04-19 11:14     ` Igor Mammedov
  0 siblings, 1 reply; 77+ messages in thread
From: Eduardo Habkost @ 2017-04-12 21:02 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Peter Maydell, Andrew Jones, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Wed, Mar 22, 2017 at 02:32:32PM +0100, Igor Mammedov wrote:
> it will allow switching from cpu_index to property based
> numa mapping in follow up patches.

I am not sure I understand all the consequences of this, so I
will give it a try:

"node-id" is an existing field in CpuInstanceProperties.
CpuInstanceProperties is used on both query-hotpluggable-cpus
output and in MachineState::possible_cpus.

We will start using MachineState::possible_cpus to keep track of
NUMA CPU affinity, and that means query-hotpluggable-cpus will
start reporting a "node-id" property when a NUMA mapping is
configured.

To allow query-hotpluggable-cpus to report "node-id", the CPU
objects must have a "node-id" property that can be set. This
patch adds the "node-id" property to X86CPU.

Is this description accurate? Is the presence of "node-id" in
query-hotpluggable-cpus the only reason we really need this
patch, or is there something else that requires the "node-id"
property?

Why exactly do we need to change the output of
query-hotpluggable-cpus for all machines to include "node-id", to
make "-numa cpu" work?  Did you consider saving node_id inside
CPUArchId and outside CpuInstanceProperties, so
query-hotplugabble-cpus output won't be affected by "-numa cpu"?

I'm asking this because I believe we will eventually need a
mechanism that lets management check what are the valid arguments
for "-numa cpu" for a given machine, and it looks like
query-hotpluggable-cpus is already the right mechanism for that.
But we can't make query-hotpluggable-cpus output depend on "-numa
cpu" input, if the "-numa cpu" input will also depend on
query-hotpluggable-cpus output.

> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  hw/i386/pc.c      | 17 +++++++++++++++++
>  target/i386/cpu.c |  1 +
>  2 files changed, 18 insertions(+)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 7031100..873bbfa 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1895,6 +1895,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>                              DeviceState *dev, Error **errp)
>  {
>      int idx;
> +    int node_id;
>      CPUState *cs;
>      CPUArchId *cpu_slot;
>      X86CPUTopoInfo topo;
> @@ -1984,6 +1985,22 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>  
>      cs = CPU(cpu);
>      cs->cpu_index = idx;
> +
> +    node_id = numa_get_node_for_cpu(cs->cpu_index);
> +    if (node_id == nb_numa_nodes) {
> +        /* by default CPUState::numa_node was 0 if it's not set via CLI
> +         * keep it this way for now but in future we probably should
> +         * refuse to start up with incomplete numa mapping */
> +        node_id = 0;
> +    }
> +    if (cs->numa_node == CPU_UNSET_NUMA_NODE_ID) {
> +        cs->numa_node = node_id;
> +    } else if (cs->numa_node != node_id) {
> +            error_setg(errp, "node-id %d must match numa node specified"
> +                "with -numa option for cpu-index %d",
> +                cs->numa_node, cs->cpu_index);
> +            return;
> +    }
>  }
>  
>  static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 7aa7622..d690244 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -3974,6 +3974,7 @@ static Property x86_cpu_properties[] = {
>      DEFINE_PROP_INT32("core-id", X86CPU, core_id, -1),
>      DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, -1),
>  #endif
> +    DEFINE_PROP_INT32("node-id", CPUState, numa_node, CPU_UNSET_NUMA_NODE_ID),
>      DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false),
>      { .name  = "hv-spinlocks", .info  = &qdev_prop_spinlocks },
>      DEFINE_PROP_BOOL("hv-relaxed", X86CPU, hyperv_relaxed_timing, false),
> -- 
> 2.7.4
> 
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 10/23] numa: mirror cpu to node mapping in MachineState::possible_cpus
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 10/23] numa: mirror cpu to node mapping in MachineState::possible_cpus Igor Mammedov
  2017-03-28  4:44   ` David Gibson
@ 2017-04-12 21:15   ` Eduardo Habkost
  2017-04-19  9:52     ` Igor Mammedov
  2017-04-13 13:58   ` Eduardo Habkost
  2 siblings, 1 reply; 77+ messages in thread
From: Eduardo Habkost @ 2017-04-12 21:15 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Peter Maydell, Andrew Jones, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Wed, Mar 22, 2017 at 02:32:35PM +0100, Igor Mammedov wrote:
> Introduce machine_set_cpu_numa_node() helper that stores
> node mapping for CPU in MachineState::possible_cpus.
> CPU and node it belongs to is specified by 'props' argument.
> 
> Patch doesn't remove old way of storing mapping in
> numa_info[X].node_cpu as removing it at the same time
> makes patch rather big. Instead it just mirrors mapping
> in possible_cpus and follow up per target patches will
> switch to possible_cpus and numa_info[X].node_cpu will
> be removed once there isn't any users left.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

So, this patch is the one that makes "-numa" and "-numa cpu"
affect query-hotpluggable-cpus output.

Before this patch:

  $ qemu-system-x86_64 -smp 2 -m 2G -numa node -numa node -numa node -numa node
  [run qmp-shell]
  (QEMU) query-hotpluggable-cpus
  {
      "return": [
          {
              "qom-path": "/machine/unattached/device[2]",
              "type": "qemu64-x86_64-cpu",
              "vcpus-count": 1,
              "props": {
                  "socket-id": 1,
                  "core-id": 0,
                  "thread-id": 0
              }
          },
          {
              "qom-path": "/machine/unattached/device[0]",
              "type": "qemu64-x86_64-cpu",
              "vcpus-count": 1,
              "props": {
                  "socket-id": 0,
                  "core-id": 0,
                  "thread-id": 0
              }
          }
      ]
  }


After this patch:

  $ qemu-system-x86_64 -smp 2 -m 2G -numa node -numa node -numa node -numa node
  [run qmp-shell]
  (QEMU) query-hotpluggable-cpus
  {
      "return": [
          {
              "qom-path": "/machine/unattached/device[2]",
              "type": "qemu64-x86_64-cpu",
              "vcpus-count": 1,
              "props": {
                  "socket-id": 1,
                  "node-id": 1,
                  "core-id": 0,
                  "thread-id": 0
              }
          },
          {
              "qom-path": "/machine/unattached/device[0]",
              "type": "qemu64-x86_64-cpu",
              "vcpus-count": 1,
              "props": {
                  "socket-id": 0,
                  "node-id": 0,
                  "core-id": 0,
                  "thread-id": 0
              }
          }
      ]
  }


As noted in another message, I am not sure we really should make
"-numa" affect query-hotpluggable-cpus output unconditionally (I
believe we shouldn't). But we do, we need to document this very
clearly.


> ---
>  include/hw/boards.h |  2 ++
>  hw/core/machine.c   | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  numa.c              |  8 +++++++
>  3 files changed, 78 insertions(+)
> 
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 1dd0fde..40f30f1 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -42,6 +42,8 @@ bool machine_dump_guest_core(MachineState *machine);
>  bool machine_mem_merge(MachineState *machine);
>  void machine_register_compat_props(MachineState *machine);
>  HotpluggableCPUList *machine_query_hotpluggable_cpus(MachineState *machine);
> +void machine_set_cpu_numa_node(MachineState *machine,
> +                               CpuInstanceProperties *props, Error **errp);
>  
>  /**
>   * CPUArchId:
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 0d92672..6ff0b45 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -388,6 +388,74 @@ HotpluggableCPUList *machine_query_hotpluggable_cpus(MachineState *machine)
>      return head;
>  }
>  
> +void machine_set_cpu_numa_node(MachineState *machine,
> +                               CpuInstanceProperties *props, Error **errp)
> +{
> +    MachineClass *mc = MACHINE_GET_CLASS(machine);
> +    bool match = false;
> +    int i;
> +
> +    if (!mc->possible_cpu_arch_ids) {
> +        error_setg(errp, "mapping of CPUs to NUMA node is not supported");
> +        return;
> +    }
> +
> +    /* force board to initialize possible_cpus if it hasn't been done yet */
> +    mc->possible_cpu_arch_ids(machine);
> +
> +    for (i = 0; i < machine->possible_cpus->len; i++) {
> +        CPUArchId *slot = &machine->possible_cpus->cpus[i];
> +
> +        /* reject unsupported by board properties */
> +        if (props->has_thread_id && !slot->props.has_thread_id) {
> +            error_setg(errp, "thread-id is not supported");
> +            return;
> +        }
> +
> +        if (props->has_core_id && !slot->props.has_core_id) {
> +            error_setg(errp, "core-id is not supported");
> +            return;
> +        }
> +
> +        if (props->has_socket_id && !slot->props.has_socket_id) {
> +            error_setg(errp, "socket-id is not supported");
> +            return;
> +        }
> +
> +        /* skip slots with explicit mismatch */
> +        if (props->has_thread_id && props->thread_id != slot->props.thread_id) {
> +                continue;
> +        }
> +
> +        if (props->has_core_id && props->core_id != slot->props.core_id) {
> +                continue;
> +        }
> +
> +        if (props->has_socket_id && props->socket_id != slot->props.socket_id) {
> +                continue;
> +        }
> +
> +        /* reject assignment if slot is already assigned, for compatibility
> +         * of legacy cpu_index mapping with SPAPR core based mapping do not
> +         * error out if cpu thread and matched core have the same node-id */
> +        if (slot->props.has_node_id &&
> +            slot->props.node_id != props->node_id) {
> +            error_setg(errp, "CPU is already assigned to node-id: %" PRId64,
> +                       slot->props.node_id);
> +            return;
> +        }
> +
> +        /* assign slot to node as it's matched '-numa cpu' key */
> +        match = true;
> +        slot->props.node_id = props->node_id;
> +        slot->props.has_node_id = props->has_node_id;
> +    }
> +
> +    if (!match) {
> +        error_setg(errp, "no match found");
> +    }
> +}
> +
>  static void machine_class_init(ObjectClass *oc, void *data)
>  {
>      MachineClass *mc = MACHINE_CLASS(oc);
> diff --git a/numa.c b/numa.c
> index 24c596d..44057f1 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -169,6 +169,7 @@ static void numa_node_parse(MachineState *ms, NumaNodeOptions *node,
>          exit(1);
>      }
>      for (cpus = node->cpus; cpus; cpus = cpus->next) {
> +        CpuInstanceProperties props;
>          if (cpus->value >= max_cpus) {
>              error_setg(errp,
>                         "CPU index (%" PRIu16 ")"
> @@ -177,6 +178,10 @@ static void numa_node_parse(MachineState *ms, NumaNodeOptions *node,
>              return;
>          }
>          bitmap_set(numa_info[nodenr].node_cpu, cpus->value, 1);
> +        props = mc->cpu_index_to_instance_props(ms, cpus->value);
> +        props.node_id = nodenr;
> +        props.has_node_id = true;
> +        machine_set_cpu_numa_node(ms, &props, &error_fatal);
>      }
>  
>      if (node->has_mem && node->has_memdev) {
> @@ -393,9 +398,12 @@ void parse_numa_opts(MachineState *ms)
>          if (i == nb_numa_nodes) {
>              for (i = 0; i < max_cpus; i++) {
>                  CpuInstanceProperties props;
> +                /* fetch default mapping from board and enable it */
>                  props = mc->cpu_index_to_instance_props(ms, i);
> +                props.has_node_id = true;
>  
>                  set_bit(i, numa_info[props.node_id].node_cpu);
> +                machine_set_cpu_numa_node(ms, &props, &error_fatal);
>              }
>          }
>  
> -- 
> 2.7.4
> 
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 10/23] numa: mirror cpu to node mapping in MachineState::possible_cpus
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 10/23] numa: mirror cpu to node mapping in MachineState::possible_cpus Igor Mammedov
  2017-03-28  4:44   ` David Gibson
  2017-04-12 21:15   ` Eduardo Habkost
@ 2017-04-13 13:58   ` Eduardo Habkost
  2017-04-19  9:31     ` Igor Mammedov
  2 siblings, 1 reply; 77+ messages in thread
From: Eduardo Habkost @ 2017-04-13 13:58 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Peter Maydell, Andrew Jones, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Wed, Mar 22, 2017 at 02:32:35PM +0100, Igor Mammedov wrote:
> Introduce machine_set_cpu_numa_node() helper that stores
> node mapping for CPU in MachineState::possible_cpus.
> CPU and node it belongs to is specified by 'props' argument.
> 
> Patch doesn't remove old way of storing mapping in
> numa_info[X].node_cpu as removing it at the same time
> makes patch rather big. Instead it just mirrors mapping
> in possible_cpus and follow up per target patches will
> switch to possible_cpus and numa_info[X].node_cpu will
> be removed once there isn't any users left.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  include/hw/boards.h |  2 ++
>  hw/core/machine.c   | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  numa.c              |  8 +++++++
>  3 files changed, 78 insertions(+)
> 
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 1dd0fde..40f30f1 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -42,6 +42,8 @@ bool machine_dump_guest_core(MachineState *machine);
>  bool machine_mem_merge(MachineState *machine);
>  void machine_register_compat_props(MachineState *machine);
>  HotpluggableCPUList *machine_query_hotpluggable_cpus(MachineState *machine);
> +void machine_set_cpu_numa_node(MachineState *machine,
> +                               CpuInstanceProperties *props, Error **errp);
>  
>  /**
>   * CPUArchId:
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 0d92672..6ff0b45 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -388,6 +388,74 @@ HotpluggableCPUList *machine_query_hotpluggable_cpus(MachineState *machine)
>      return head;
>  }
>  
> +void machine_set_cpu_numa_node(MachineState *machine,
> +                               CpuInstanceProperties *props, Error **errp)

If you change this to:

  void cpu_slot_set_numa_node(CPUArchId *slot, uint64_t node_id,
                              Error **errp);

and move the CPU slot lookup code from machine_set_cpu_numa_node()
to a helper:

  CPUArchId *machine_get_cpu_slot(MachineState *machine,
                                  CpuInstanceProperties *props, Error **errp);

and change cpu_index_to_cpu_instance_props to return CPUArchId:

  CPUArchId *cpu_index_to_cpu_slot(MachineState *machine, int cpu_index);

We could simply have this on "-numa cpu" code:

    slot = machine_get_cpu_slot(machine, props);
    cpu_slot_set_numa_node(slot, node_id);

and this on the legacy "-numa node,cpu=..." code:

    slot = mc->cpu_index_to_cpu_slot(machine, i);
    cpu_slot_set_numa_node(slot, node_id);

I believe we will be able to reuse machine_get_cpu_slot() to
replace pc_find_cpu_slot() and spapr_find_cpu_slot() later.

(I also suggest renaming "CPUArchId" and "possible CPUs" to
"CPUSlot" and "CPU slots" in the code and comments. This would
help people reviewing the code, but it can be done later if you
prefer.)

[...]
> @@ -177,6 +178,10 @@ static void numa_node_parse(MachineState *ms, NumaNodeOptions *node,
>              return;
>          }
>          bitmap_set(numa_info[nodenr].node_cpu, cpus->value, 1);
> +        props = mc->cpu_index_to_instance_props(ms, cpus->value);
> +        props.node_id = nodenr;
> +        props.has_node_id = true;
> +        machine_set_cpu_numa_node(ms, &props, &error_fatal);
>      }
>  
>      if (node->has_mem && node->has_memdev) {
> @@ -393,9 +398,12 @@ void parse_numa_opts(MachineState *ms)
>          if (i == nb_numa_nodes) {
>              for (i = 0; i < max_cpus; i++) {
>                  CpuInstanceProperties props;
> +                /* fetch default mapping from board and enable it */
>                  props = mc->cpu_index_to_instance_props(ms, i);
> +                props.has_node_id = true;
>  
>                  set_bit(i, numa_info[props.node_id].node_cpu);
> +                machine_set_cpu_numa_node(ms, &props, &error_fatal);
>              }
>          }
>  
> -- 
> 2.7.4
> 
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 10/23] numa: mirror cpu to node mapping in MachineState::possible_cpus
  2017-04-13 13:58   ` Eduardo Habkost
@ 2017-04-19  9:31     ` Igor Mammedov
  2017-04-26 11:02       ` Eduardo Habkost
  0 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-04-19  9:31 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: qemu-devel, Peter Maydell, Andrew Jones, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Thu, 13 Apr 2017 10:58:05 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Wed, Mar 22, 2017 at 02:32:35PM +0100, Igor Mammedov wrote:
> > Introduce machine_set_cpu_numa_node() helper that stores
> > node mapping for CPU in MachineState::possible_cpus.
> > CPU and node it belongs to is specified by 'props' argument.
> > 
> > Patch doesn't remove old way of storing mapping in
> > numa_info[X].node_cpu as removing it at the same time
> > makes patch rather big. Instead it just mirrors mapping
> > in possible_cpus and follow up per target patches will
> > switch to possible_cpus and numa_info[X].node_cpu will
> > be removed once there isn't any users left.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  include/hw/boards.h |  2 ++
> >  hw/core/machine.c   | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  numa.c              |  8 +++++++
> >  3 files changed, 78 insertions(+)
> > 
> > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > index 1dd0fde..40f30f1 100644
> > --- a/include/hw/boards.h
> > +++ b/include/hw/boards.h
> > @@ -42,6 +42,8 @@ bool machine_dump_guest_core(MachineState *machine);
> >  bool machine_mem_merge(MachineState *machine);
> >  void machine_register_compat_props(MachineState *machine);
> >  HotpluggableCPUList *machine_query_hotpluggable_cpus(MachineState *machine);
> > +void machine_set_cpu_numa_node(MachineState *machine,
> > +                               CpuInstanceProperties *props, Error **errp);
> >  
> >  /**
> >   * CPUArchId:
> > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > index 0d92672..6ff0b45 100644
> > --- a/hw/core/machine.c
> > +++ b/hw/core/machine.c
> > @@ -388,6 +388,74 @@ HotpluggableCPUList *machine_query_hotpluggable_cpus(MachineState *machine)
> >      return head;
> >  }
> >  
> > +void machine_set_cpu_numa_node(MachineState *machine,
> > +                               CpuInstanceProperties *props, Error **errp)  
> 
> If you change this to:
> 
>   void cpu_slot_set_numa_node(CPUArchId *slot, uint64_t node_id,
>                               Error **errp);
> 
> and move the CPU slot lookup code from machine_set_cpu_numa_node()
> to a helper:
> 
>   CPUArchId *machine_get_cpu_slot(MachineState *machine,
>                                   CpuInstanceProperties *props, Error **errp);
it would work in case of exact 1:1 lookup, but Paolo asked for
wildcard support (i.e. -numa cpu,node-id=x,socket-id=y should set
mapping for all cpus in socket y).
So I'd prefer to keep machine_set_cpu_numa_node() as is,
with it series splits nicely in clean and bisectable patches
without breaking anything in the middle.


> and change cpu_index_to_cpu_instance_props to return CPUArchId:
> 
>   CPUArchId *cpu_index_to_cpu_slot(MachineState *machine, int cpu_index);
> 
> We could simply have this on "-numa cpu" code:
> 
>     slot = machine_get_cpu_slot(machine, props);
>     cpu_slot_set_numa_node(slot, node_id);
> 
> and this on the legacy "-numa node,cpu=..." code:
> 
>     slot = mc->cpu_index_to_cpu_slot(machine, i);
>     cpu_slot_set_numa_node(slot, node_id);
> 
> I believe we will be able to reuse machine_get_cpu_slot() to
> replace pc_find_cpu_slot() and spapr_find_cpu_slot() later.
As I already replied to David, xxx_find_cpu_slot() possibly might
be generalized but I'd like to postpone it until ARM topology
is materialized and merged.

Another reason I'd like to postpone generalization is that
xxx_find_cpu_slot() could be optimized in target specific way
replacing CPU lookup in array with computational expression.
It will make lookup O(1) function and it could be used as
a better replacement for qemu_get_cpu()/cpu_exists() but
I haven't looked into this yet.

 
> (I also suggest renaming "CPUArchId" and "possible CPUs" to
> "CPUSlot" and "CPU slots" in the code and comments. This would
> help people reviewing the code, but it can be done later if you
> prefer.)
I'm fine with changing CPUArchId to CPUSlot but I'd leave
"possible CPUs" as is, since it precisely describes what it is,
"CPU slots" is too ambiguous.

> 
> [...]
> > @@ -177,6 +178,10 @@ static void numa_node_parse(MachineState *ms, NumaNodeOptions *node,
> >              return;
> >          }
> >          bitmap_set(numa_info[nodenr].node_cpu, cpus->value, 1);
> > +        props = mc->cpu_index_to_instance_props(ms, cpus->value);
> > +        props.node_id = nodenr;
> > +        props.has_node_id = true;
> > +        machine_set_cpu_numa_node(ms, &props, &error_fatal);
> >      }
> >  
> >      if (node->has_mem && node->has_memdev) {
> > @@ -393,9 +398,12 @@ void parse_numa_opts(MachineState *ms)
> >          if (i == nb_numa_nodes) {
> >              for (i = 0; i < max_cpus; i++) {
> >                  CpuInstanceProperties props;
> > +                /* fetch default mapping from board and enable it */
> >                  props = mc->cpu_index_to_instance_props(ms, i);
> > +                props.has_node_id = true;
> >  
> >                  set_bit(i, numa_info[props.node_id].node_cpu);
> > +                machine_set_cpu_numa_node(ms, &props, &error_fatal);
> >              }
> >          }
> >  
> > -- 
> > 2.7.4
> > 
> >   
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 10/23] numa: mirror cpu to node mapping in MachineState::possible_cpus
  2017-04-12 21:15   ` Eduardo Habkost
@ 2017-04-19  9:52     ` Igor Mammedov
  2017-04-26 11:04       ` Eduardo Habkost
  0 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-04-19  9:52 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Peter Maydell, Andrew Jones, qemu-devel, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Wed, 12 Apr 2017 18:15:29 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Wed, Mar 22, 2017 at 02:32:35PM +0100, Igor Mammedov wrote:
> > Introduce machine_set_cpu_numa_node() helper that stores
> > node mapping for CPU in MachineState::possible_cpus.
> > CPU and node it belongs to is specified by 'props' argument.
> > 
> > Patch doesn't remove old way of storing mapping in
> > numa_info[X].node_cpu as removing it at the same time
> > makes patch rather big. Instead it just mirrors mapping
> > in possible_cpus and follow up per target patches will
> > switch to possible_cpus and numa_info[X].node_cpu will
> > be removed once there isn't any users left.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> 
> So, this patch is the one that makes "-numa" and "-numa cpu"
> affect query-hotpluggable-cpus output.
that was intent behind series.

[...]
> As noted in another message, I am not sure we really should make
> "-numa" affect query-hotpluggable-cpus output unconditionally (I
> believe we shouldn't). But we do, we need to document this very
> clearly.
What place would you suggest to document this at?

[...]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 07/23] pc: add node-id property to CPU
  2017-04-12 21:02   ` Eduardo Habkost
@ 2017-04-19 11:14     ` Igor Mammedov
  2017-04-26 12:21       ` Eduardo Habkost
  0 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-04-19 11:14 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: qemu-devel, Peter Maydell, Andrew Jones, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Wed, 12 Apr 2017 18:02:39 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Wed, Mar 22, 2017 at 02:32:32PM +0100, Igor Mammedov wrote:
> > it will allow switching from cpu_index to property based
> > numa mapping in follow up patches.  
> 
> I am not sure I understand all the consequences of this, so I
> will give it a try:
> 
> "node-id" is an existing field in CpuInstanceProperties.
> CpuInstanceProperties is used on both query-hotpluggable-cpus
> output and in MachineState::possible_cpus.
> 
> We will start using MachineState::possible_cpus to keep track of
> NUMA CPU affinity, and that means query-hotpluggable-cpus will
> start reporting a "node-id" property when a NUMA mapping is
> configured.
> 
> To allow query-hotpluggable-cpus to report "node-id", the CPU
> objects must have a "node-id" property that can be set. This
> patch adds the "node-id" property to X86CPU.
> 
> Is this description accurate? Is the presence of "node-id" in
> query-hotpluggable-cpus the only reason we really need this
> patch, or is there something else that requires the "node-id"
> property?
That accurate description, node-id is in the same 'address'
properties category as socket/core/thread-id. So if you have
numa enabled machine you'd see node-id property in
query-hotpluggable-cpus.


> Why exactly do we need to change the output of
> query-hotpluggable-cpus for all machines to include "node-id", to
> make "-numa cpu" work?
It's for introspection as well as for consolidating topology data
in a single place purposes and complements already outputed
socket/core/thread-id address properties with numa node-id.
That way one doesn't need yet another command for introspecting
numa mapping for cpus and use existing query-hotpluggable-cpus
for full topology description.

>  Did you consider saving node_id inside
> CPUArchId and outside CpuInstanceProperties, so
> query-hotplugabble-cpus output won't be affected by "-numa cpu"?
nope, intent was to make node-id visible if numa is enabled and
I think that intent was there from the very begging when
query-hotplugabble-cpus was introduced with CpuInstanceProperties
having node_id field but unused since it has been out of scope
of cpu hotplug.


> I'm asking this because I believe we will eventually need a
> mechanism that lets management check what are the valid arguments
> for "-numa cpu" for a given machine, and it looks like
> query-hotpluggable-cpus is already the right mechanism for that.
it's problem similar with -device cpu_foo,...

> But we can't make query-hotpluggable-cpus output depend on "-numa
> cpu" input, if the "-numa cpu" input will also depend on
> query-hotpluggable-cpus output.
I don't think that query-hotpluggable-cpus must be independent of
'-numa' option.

query-hotpluggable-cpus is a function of -smp and machine type and
it's output is dynamic and can change during runtime so we've never
made promise to make it static. I think it's ok to make it depend
on -numa as extra input argument when present.

It bothers me as well, that '-numa cpu' as well as '-device cpu_foo'
options depend on query-hotpluggable-cpus and when we considered
generic '-device cpu' support, we though that initially
query-hotpluggable-cpus could be used to get list of CPUs
for given -smp/machine combination and then it could be used
for composing proper CLI. That makes mgmt to start QEMU twice
when creating configuration for the 1st time, but end result CLI
could be reused without repeating query step again provided
topology/machine stays the same. The same applies to '-numa cpu'.

In future to avoid starting QEMU twice we were thinking about
configuring QEMU from QMP at runtime, that's where preconfigure
approach could be used to help solving it in the future:

  1. introduce pause before machine_init CLI option to allow
     preconfig machine from qmp/monitor
  2. make query-hotpluggable-cpus usable at preconfig time
  3. start qemu with needed number of numa nodes and default mapping:
         #qemu -smp ... -numa node,nodeid=0 -node node,nodeid=1
  4. get possible cpus list
  5. add qmp/monitor command variant for '-numa cpu' to set numa mapping
  6. optionally, set new numa mapping and get updated
     possible cpus list with query-hotpluggable-cpus
  7. optionally, add extra cpus with device_add using updated
     cpus list and get updated cpus list as it's been changed again.
  8. unpause preconfig stage and let qemu continue to execute
     machine_init and the rest.

Since we would need to implement QMP configuration for '-device cpu',
we as well might reuse it for custom numa mapping.
 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  hw/i386/pc.c      | 17 +++++++++++++++++
> >  target/i386/cpu.c |  1 +
> >  2 files changed, 18 insertions(+)
> > 
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index 7031100..873bbfa 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -1895,6 +1895,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
> >                              DeviceState *dev, Error **errp)
> >  {
> >      int idx;
> > +    int node_id;
> >      CPUState *cs;
> >      CPUArchId *cpu_slot;
> >      X86CPUTopoInfo topo;
> > @@ -1984,6 +1985,22 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
> >  
> >      cs = CPU(cpu);
> >      cs->cpu_index = idx;
> > +
> > +    node_id = numa_get_node_for_cpu(cs->cpu_index);
> > +    if (node_id == nb_numa_nodes) {
> > +        /* by default CPUState::numa_node was 0 if it's not set via CLI
> > +         * keep it this way for now but in future we probably should
> > +         * refuse to start up with incomplete numa mapping */
> > +        node_id = 0;
> > +    }
> > +    if (cs->numa_node == CPU_UNSET_NUMA_NODE_ID) {
> > +        cs->numa_node = node_id;
> > +    } else if (cs->numa_node != node_id) {
> > +            error_setg(errp, "node-id %d must match numa node specified"
> > +                "with -numa option for cpu-index %d",
> > +                cs->numa_node, cs->cpu_index);
> > +            return;
> > +    }
> >  }
> >  
> >  static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index 7aa7622..d690244 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -3974,6 +3974,7 @@ static Property x86_cpu_properties[] = {
> >      DEFINE_PROP_INT32("core-id", X86CPU, core_id, -1),
> >      DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, -1),
> >  #endif
> > +    DEFINE_PROP_INT32("node-id", CPUState, numa_node, CPU_UNSET_NUMA_NODE_ID),
> >      DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false),
> >      { .name  = "hv-spinlocks", .info  = &qdev_prop_spinlocks },
> >      DEFINE_PROP_BOOL("hv-relaxed", X86CPU, hyperv_relaxed_timing, false),
> > -- 
> > 2.7.4
> > 
> >   
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards
  2017-03-28  4:19   ` David Gibson
  2017-03-28 10:53     ` Igor Mammedov
@ 2017-04-20 14:29     ` Igor Mammedov
  1 sibling, 0 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-04-20 14:29 UTC (permalink / raw)
  To: David Gibson
  Cc: Peter Maydell, Andrew Jones, Eduardo Habkost, qemu-devel,
	qemu-arm, qemu-ppc, Shannon Zhao, Paolo Bonzini

On Tue, 28 Mar 2017 15:19:20 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Mar 22, 2017 at 02:32:30PM +0100, Igor Mammedov wrote:
[...]
answering to questions that I forgot to answer before

> > @@ -1554,6 +1554,16 @@ static void virt_set_gic_version(Object *obj, const char *value, Error **errp)
> >      }
> >  }
> >  
> > +static CpuInstanceProperties
> > +virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> > +{
> > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> > +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> > +
> > +    assert(cpu_index < possible_cpus->len);
> > +    return possible_cpus->cpus[cpu_index].props;;
> > +}
> > +  
> 
> It seems a bit weird to have a machine specific hook to pull the
> property information when one way or another it's coming from the
> possible_cpus table, which is already constructed by a machine
> specific hook.  Could we add a range or list of cpu_index values to
> each possible_cpus entry instead, and have a generic lookup of the
> right entry based on that?
Mainly I dislike the idea because it adds duplicate data to manage
that could be computed from already stored there CpuInstanceProperties.

And secondly if it were just 1 number then generic lookup would be trivial
but with list it becomes cumbersome to manage and implementation
larger then 3 *_cpu_index_to_props() hooks combined, it's not worth it
in foreseeable future.
 
> >  static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
> >  {
> >      int n;
> > @@ -1573,8 +1583,12 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
> >          ms->possible_cpus->cpus[n].props.has_thread_id = true;
> >          ms->possible_cpus->cpus[n].props.thread_id = n;
> >  
> > -        /* TODO: add 'has_node/node' here to describe
> > -           to which node core belongs */
> > +        /* default distribution of CPUs over NUMA nodes */
> > +        if (nb_numa_nodes) {
> > +            /* preset values but do not enable them i.e. 'has_node_id = false',
> > +             * board will enable them if manual mapping wasn't present on CLI */  
> 
> I'm a little confused by this comment, since I don't see any board
> code altering has_node_id.
it happens in the last 2 hunks of patch 10/23
may be I should write it like this:

+            /* preset values but do not enable them i.e. 'has_node_id = false',
+             * numa initialization code will enable them later if manual mapping
+             * wasn't present on CLI */

> 
> > +            ms->possible_cpus->cpus[n].props.node_id = n % nb_numa_nodes;;
> > +        }
> >      }
> >      return ms->possible_cpus;
> >  }
[...]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 02/23] hw/arm/virt: extract mp-affinity calculation in separate function
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 02/23] hw/arm/virt: extract mp-affinity calculation in separate function Igor Mammedov
@ 2017-04-25 14:09   ` Andrew Jones
  2017-04-25 14:39     ` Igor Mammedov
  0 siblings, 1 reply; 77+ messages in thread
From: Andrew Jones @ 2017-04-25 14:09 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Peter Maydell, Eduardo Habkost, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Wed, Mar 22, 2017 at 02:32:27PM +0100, Igor Mammedov wrote:
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  hw/arm/virt.c | 59 ++++++++++++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 42 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 5f62a03..484754e 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1194,6 +1194,45 @@ void virt_machine_done(Notifier *notifier, void *data)
>      virt_build_smbios(vms);
>  }
>  
> +static uint64_t virt_idx2mp_affinity(VirtMachineState *vms, int idx)

I think I'd prefer virt_cpu_mp_affinity, or anything without a '2' in it
for the name.

> +{
> +    uint64_t mp_affinity;
> +    uint8_t clustersz;
> +    VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
> +
> +    if (!vmc->disallow_affinity_adjustment) {
> +        uint8_t aff0, aff1;

The else part also needs aff0, aff1; might as well move them up.

> +
> +        if (vms->gic_version == 3) {
> +            clustersz = GICV3_TARGETLIST_BITS;
> +        } else {
> +            clustersz = GIC_TARGETLIST_BITS;
> +        }
> +
> +        /* Adjust MPIDR like 64-bit KVM hosts, which incorporate the
> +         * GIC's target-list limitations. 32-bit KVM hosts currently
> +         * always create clusters of 4 CPUs, but that is expected to
> +         * change when they gain support for gicv3. When KVM is enabled
> +         * it will override the changes we make here, therefore our
> +         * purposes are to make TCG consistent (with 64-bit KVM hosts)
> +         * and to improve SGI efficiency.
> +         */
> +        aff1 = idx / clustersz;
> +        aff0 = idx % clustersz;
> +        mp_affinity = (aff1 << ARM_AFF1_SHIFT) | aff0;
> +    } else {
> +        /* This cpu-id-to-MPIDR affinity is used only for TCG;
> +         * KVM will override it. We don't support setting cluster ID
> +         * ([16..23]) (known as Aff2 in later ARM ARM versions), or any of
> +         * the higher affinity level fields, so these bits always RAZ.
> +         */
> +        uint32_t Aff1 = idx / ARM_DEFAULT_CPUS_PER_CLUSTER;
> +        uint32_t Aff0 = idx % ARM_DEFAULT_CPUS_PER_CLUSTER;
> +        mp_affinity = (Aff1 << ARM_AFF1_SHIFT) | Aff0;
> +    }

Maybe we should create an ARM CPU function

 uint64_t arm_cpu_mp_affinity(int idx, uint8_t clustersz)
 {
   uint32_t Aff1 = idx / clustersz;
   uint32_t Aff0 = idx % clustersz;
   return (Aff1 << ARM_AFF1_SHIFT) | Aff0;
 }

which we'd use in arm_cpu_realizefn() and here

 static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
 {
   if (VIRT_MACHINE_GET_CLASS(vms)->disallow_affinity_adjustment) {
      return arm_cpu_mp_affinity(idx, ARM_DEFAULT_CPUS_PER_CLUSTER);
   }

   /* ...big comment... */
   return arm_cpu_mp_affinity(idx, vms->gic_version == 3 ?
                          GICV3_TARGETLIST_BITS : GIC_TARGETLIST_BITS);
 }

> +    return mp_affinity;
> +}
> +
>  static void machvirt_init(MachineState *machine)
>  {
>      VirtMachineState *vms = VIRT_MACHINE(machine);
> @@ -1210,7 +1249,6 @@ static void machvirt_init(MachineState *machine)
>      CPUClass *cc;
>      Error *err = NULL;
>      bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
> -    uint8_t clustersz;
>  
>      if (!cpu_model) {
>          cpu_model = "cortex-a15";
> @@ -1263,10 +1301,8 @@ static void machvirt_init(MachineState *machine)
>       */
>      if (vms->gic_version == 3) {
>          virt_max_cpus = vms->memmap[VIRT_GIC_REDIST].size / 0x20000;
> -        clustersz = GICV3_TARGETLIST_BITS;
>      } else {
>          virt_max_cpus = GIC_NCPU;
> -        clustersz = GIC_TARGETLIST_BITS;
>      }
>  
>      if (max_cpus > virt_max_cpus) {
> @@ -1326,20 +1362,9 @@ static void machvirt_init(MachineState *machine)
>  
>      for (n = 0; n < smp_cpus; n++) {
>          Object *cpuobj = object_new(typename);
> -        if (!vmc->disallow_affinity_adjustment) {
> -            /* Adjust MPIDR like 64-bit KVM hosts, which incorporate the
> -             * GIC's target-list limitations. 32-bit KVM hosts currently
> -             * always create clusters of 4 CPUs, but that is expected to
> -             * change when they gain support for gicv3. When KVM is enabled
> -             * it will override the changes we make here, therefore our
> -             * purposes are to make TCG consistent (with 64-bit KVM hosts)
> -             * and to improve SGI efficiency.
> -             */
> -            uint8_t aff1 = n / clustersz;
> -            uint8_t aff0 = n % clustersz;
> -            object_property_set_int(cpuobj, (aff1 << ARM_AFF1_SHIFT) | aff0,
> -                                    "mp-affinity", NULL);
> -        }
> +
> +        object_property_set_int(cpuobj, virt_idx2mp_affinity(vms, n),
> +                                "mp-affinity", NULL);
>  
>          if (!vms->secure) {
>              object_property_set_bool(cpuobj, false, "has_el3", NULL);
> -- 
> 2.7.4
> 
>

Thanks,
drew 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 03/23] hw/arm/virt: use machine->possible_cpus for storing possible topology info
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 03/23] hw/arm/virt: use machine->possible_cpus for storing possible topology info Igor Mammedov
@ 2017-04-25 14:28   ` Andrew Jones
  2017-04-25 14:36     ` Igor Mammedov
  0 siblings, 1 reply; 77+ messages in thread
From: Andrew Jones @ 2017-04-25 14:28 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Peter Maydell, Eduardo Habkost, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Wed, Mar 22, 2017 at 02:32:28PM +0100, Igor Mammedov wrote:
> for now precalculate and store mp_afinity in possible_cpus
> as ARM cpus don't have socket/core/thread-id properties yet.
> In follow patches possible_cpus will be used for storing
> and setting NUMA node mapping and replace legacy bitmap
> based numa_info[node_id].node_cpu/numa_get_node_for_cpu()
> 
> For the lack of better idea, this patch cannibalizes
> possible_cpus.cpus[x].props.thread_id so that
> *_cpu_index_to_props() callback could return addressable
> by props CPU which will used by machine_set_cpu_numa_node()
> in follow up patches to assign a CPU to node. But
> cannibalizing is fine for now as that thread_id isn't exposed
> to users (no hotpluggable_cpus callback support for ARM yet)
> and it will be used only internally until 'device_add cpu'
> is supported where we can decide on which properties to use.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  hw/arm/virt.c | 39 ++++++++++++++++++++++++++++++++++++---
>  1 file changed, 36 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 484754e..4de46b1 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1237,6 +1237,7 @@ static void machvirt_init(MachineState *machine)
>  {
>      VirtMachineState *vms = VIRT_MACHINE(machine);
>      VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(machine);
> +    MachineClass *mc = MACHINE_GET_CLASS(machine);
>      qemu_irq pic[NUM_IRQS];
>      MemoryRegion *sysmem = get_system_memory();
>      MemoryRegion *secure_sysmem = NULL;
> @@ -1360,10 +1361,16 @@ static void machvirt_init(MachineState *machine)
>          exit(1);
>      }
>  
> -    for (n = 0; n < smp_cpus; n++) {
> -        Object *cpuobj = object_new(typename);
> +    mc->possible_cpu_arch_ids(machine);

Hmm, this is a bit too subtle for my taste. Making this call, which by
name implies it returns the possible CPU Arch-IDs (which indeed it does),
but without assigning its return value to anything, looks wrong. I see
the reason to do so, though, is because on first call it will do the
allocation. I'd prefer a wrapper

 static void virt_alloc_possible_cpus(MachineState *ms)
 {
    /* First invocation does allocation and initialization. */
    (void)MACHINE_GET_CLASS(ms)->possible_cpu_arch_ids(ms);
 }

> +    for (n = 0; n < machine->possible_cpus->len; n++) {
> +        Object *cpuobj;
>  
> -        object_property_set_int(cpuobj, virt_idx2mp_affinity(vms, n),
> +        if (n >= smp_cpus) {
> +            break;
> +        }
> +
> +        cpuobj = object_new(typename);
> +        object_property_set_int(cpuobj, machine->possible_cpus->cpus[n].arch_id,
>                                  "mp-affinity", NULL);
>  
>          if (!vms->secure) {
> @@ -1543,6 +1550,31 @@ static void virt_set_gic_version(Object *obj, const char *value, Error **errp)
>      }
>  }
>  
> +static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
> +{
> +    int n;
> +    VirtMachineState *vms = VIRT_MACHINE(ms);
> +
> +    if (ms->possible_cpus) {
> +        assert(ms->possible_cpus->len == max_cpus);
> +        return ms->possible_cpus;
> +    }
> +
> +    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
> +                                  sizeof(CPUArchId) * max_cpus);
> +    ms->possible_cpus->len = max_cpus;
> +    for (n = 0; n < ms->possible_cpus->len; n++) {
> +        ms->possible_cpus->cpus[n].arch_id =
> +            virt_idx2mp_affinity(vms, n);
> +        ms->possible_cpus->cpus[n].props.has_thread_id = true;
> +        ms->possible_cpus->cpus[n].props.thread_id = n;
> +
> +        /* TODO: add 'has_node/node' here to describe
> +           to which node core belongs */
> +    }
> +    return ms->possible_cpus;
> +}
> +
>  static void virt_machine_class_init(ObjectClass *oc, void *data)
>  {
>      MachineClass *mc = MACHINE_CLASS(oc);
> @@ -1559,6 +1591,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
>      mc->pci_allow_0_address = true;
>      /* We know we will never create a pre-ARMv7 CPU which needs 1K pages */
>      mc->minimum_page_bits = 12;
> +    mc->possible_cpu_arch_ids = virt_possible_cpu_arch_ids;
>  }
>  
>  static const TypeInfo virt_machine_info = {
> -- 
> 2.7.4
> 
> 

Thanks,
drew

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 04/23] hw/arm/virt: explicitly allocate cpu_index for cpus
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 04/23] hw/arm/virt: explicitly allocate cpu_index for cpus Igor Mammedov
@ 2017-04-25 14:33   ` Andrew Jones
  0 siblings, 0 replies; 77+ messages in thread
From: Andrew Jones @ 2017-04-25 14:33 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Peter Maydell, Eduardo Habkost, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Wed, Mar 22, 2017 at 02:32:29PM +0100, Igor Mammedov wrote:
> Currently cpu_index is implicitly auto assigned during
> cpu.realize() time cpu_exec_realizefn()->cpu_list_add().
> 
> It happens to match index in possible_cpus so take
> control over it and make board initialize cpu_index
> to possible_cpus index explicitly. It will at least
> document that board is in control of it and when
> '-device cpu' support comes it will keep cpu_index
> stable regardless of order cpus are created so it won't
> break migration.
> Within this series it will be used for internal
> conversion from storing cpu_index based NUMA node
> bitmaps to property based mapping with possible_cpus,
> And will allow map cpu_index to a CPU entry in
> possible_cpus array.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  hw/arm/virt.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 4de46b1..0cbcbc1 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1364,6 +1364,7 @@ static void machvirt_init(MachineState *machine)
>      mc->possible_cpu_arch_ids(machine);
>      for (n = 0; n < machine->possible_cpus->len; n++) {
>          Object *cpuobj;
> +        CPUState *cs;
>  
>          if (n >= smp_cpus) {
>              break;
> @@ -1373,6 +1374,9 @@ static void machvirt_init(MachineState *machine)
>          object_property_set_int(cpuobj, machine->possible_cpus->cpus[n].arch_id,
>                                  "mp-affinity", NULL);
>  
> +        cs = CPU(cpuobj);
> +        cs->cpu_index = n;
> +
>          if (!vms->secure) {
>              object_property_set_bool(cpuobj, false, "has_el3", NULL);
>          }
> -- 
> 2.7.4
> 
>

Reviewed-by: Andrew Jones <drjones@redhat.com>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 03/23] hw/arm/virt: use machine->possible_cpus for storing possible topology info
  2017-04-25 14:28   ` Andrew Jones
@ 2017-04-25 14:36     ` Igor Mammedov
  0 siblings, 0 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-04-25 14:36 UTC (permalink / raw)
  To: Andrew Jones
  Cc: qemu-devel, Peter Maydell, Eduardo Habkost, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Tue, 25 Apr 2017 16:28:27 +0200
Andrew Jones <drjones@redhat.com> wrote:

> On Wed, Mar 22, 2017 at 02:32:28PM +0100, Igor Mammedov wrote:
> > for now precalculate and store mp_afinity in possible_cpus
> > as ARM cpus don't have socket/core/thread-id properties yet.
> > In follow patches possible_cpus will be used for storing
> > and setting NUMA node mapping and replace legacy bitmap
> > based numa_info[node_id].node_cpu/numa_get_node_for_cpu()
> > 
> > For the lack of better idea, this patch cannibalizes
> > possible_cpus.cpus[x].props.thread_id so that
> > *_cpu_index_to_props() callback could return addressable
> > by props CPU which will used by machine_set_cpu_numa_node()
> > in follow up patches to assign a CPU to node. But
> > cannibalizing is fine for now as that thread_id isn't exposed
> > to users (no hotpluggable_cpus callback support for ARM yet)
> > and it will be used only internally until 'device_add cpu'
> > is supported where we can decide on which properties to use.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  hw/arm/virt.c | 39 ++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 36 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index 484754e..4de46b1 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -1237,6 +1237,7 @@ static void machvirt_init(MachineState *machine)
> >  {
> >      VirtMachineState *vms = VIRT_MACHINE(machine);
> >      VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(machine);
> > +    MachineClass *mc = MACHINE_GET_CLASS(machine);
> >      qemu_irq pic[NUM_IRQS];
> >      MemoryRegion *sysmem = get_system_memory();
> >      MemoryRegion *secure_sysmem = NULL;
> > @@ -1360,10 +1361,16 @@ static void machvirt_init(MachineState *machine)
> >          exit(1);
> >      }
> >  
> > -    for (n = 0; n < smp_cpus; n++) {
> > -        Object *cpuobj = object_new(typename);
> > +    mc->possible_cpu_arch_ids(machine);
> 
> Hmm, this is a bit too subtle for my taste. Making this call, which by
> name implies it returns the possible CPU Arch-IDs (which indeed it does),
> but without assigning its return value to anything, looks wrong. I see
> the reason to do so, though, is because on first call it will do the
> allocation. I'd prefer a wrapper
> 
>  static void virt_alloc_possible_cpus(MachineState *ms)
>  {
>     /* First invocation does allocation and initialization. */
>     (void)MACHINE_GET_CLASS(ms)->possible_cpu_arch_ids(ms);
>  }
Ok, I'll make it generic
machine_alloc_possible_cpus(MachineState *ms)
and use it for x86/ppc as well

> 
> > +    for (n = 0; n < machine->possible_cpus->len; n++) {
> > +        Object *cpuobj;
> >  
> > -        object_property_set_int(cpuobj, virt_idx2mp_affinity(vms, n),
> > +        if (n >= smp_cpus) {
> > +            break;
> > +        }
> > +
> > +        cpuobj = object_new(typename);
> > +        object_property_set_int(cpuobj, machine->possible_cpus->cpus[n].arch_id,
> >                                  "mp-affinity", NULL);
> >  
> >          if (!vms->secure) {
> > @@ -1543,6 +1550,31 @@ static void virt_set_gic_version(Object *obj, const char *value, Error **errp)
> >      }
> >  }
> >  
> > +static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
> > +{
> > +    int n;
> > +    VirtMachineState *vms = VIRT_MACHINE(ms);
> > +
> > +    if (ms->possible_cpus) {
> > +        assert(ms->possible_cpus->len == max_cpus);
> > +        return ms->possible_cpus;
> > +    }
> > +
> > +    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
> > +                                  sizeof(CPUArchId) * max_cpus);
> > +    ms->possible_cpus->len = max_cpus;
> > +    for (n = 0; n < ms->possible_cpus->len; n++) {
> > +        ms->possible_cpus->cpus[n].arch_id =
> > +            virt_idx2mp_affinity(vms, n);
> > +        ms->possible_cpus->cpus[n].props.has_thread_id = true;
> > +        ms->possible_cpus->cpus[n].props.thread_id = n;
> > +
> > +        /* TODO: add 'has_node/node' here to describe
> > +           to which node core belongs */
> > +    }
> > +    return ms->possible_cpus;
> > +}
> > +
> >  static void virt_machine_class_init(ObjectClass *oc, void *data)
> >  {
> >      MachineClass *mc = MACHINE_CLASS(oc);
> > @@ -1559,6 +1591,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
> >      mc->pci_allow_0_address = true;
> >      /* We know we will never create a pre-ARMv7 CPU which needs 1K pages */
> >      mc->minimum_page_bits = 12;
> > +    mc->possible_cpu_arch_ids = virt_possible_cpu_arch_ids;
> >  }
> >  
> >  static const TypeInfo virt_machine_info = {
> > -- 
> > 2.7.4
> > 
> > 
> 
> Thanks,
> drew

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 02/23] hw/arm/virt: extract mp-affinity calculation in separate function
  2017-04-25 14:09   ` Andrew Jones
@ 2017-04-25 14:39     ` Igor Mammedov
  0 siblings, 0 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-04-25 14:39 UTC (permalink / raw)
  To: Andrew Jones
  Cc: qemu-devel, Peter Maydell, Eduardo Habkost, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Tue, 25 Apr 2017 16:09:26 +0200
Andrew Jones <drjones@redhat.com> wrote:

> On Wed, Mar 22, 2017 at 02:32:27PM +0100, Igor Mammedov wrote:
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  hw/arm/virt.c | 59 ++++++++++++++++++++++++++++++++++++++++++-----------------
> >  1 file changed, 42 insertions(+), 17 deletions(-)
> > 
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index 5f62a03..484754e 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -1194,6 +1194,45 @@ void virt_machine_done(Notifier *notifier, void *data)
> >      virt_build_smbios(vms);
> >  }
> >  
> > +static uint64_t virt_idx2mp_affinity(VirtMachineState *vms, int idx)
> 
> I think I'd prefer virt_cpu_mp_affinity, or anything without a '2' in it
> for the name.
I'll fix it in v2

> 
> > +{
> > +    uint64_t mp_affinity;
> > +    uint8_t clustersz;
> > +    VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
> > +
> > +    if (!vmc->disallow_affinity_adjustment) {
> > +        uint8_t aff0, aff1;
> 
> The else part also needs aff0, aff1; might as well move them up.
sure

> 
> > +
> > +        if (vms->gic_version == 3) {
> > +            clustersz = GICV3_TARGETLIST_BITS;
> > +        } else {
> > +            clustersz = GIC_TARGETLIST_BITS;
> > +        }
> > +
> > +        /* Adjust MPIDR like 64-bit KVM hosts, which incorporate the
> > +         * GIC's target-list limitations. 32-bit KVM hosts currently
> > +         * always create clusters of 4 CPUs, but that is expected to
> > +         * change when they gain support for gicv3. When KVM is enabled
> > +         * it will override the changes we make here, therefore our
> > +         * purposes are to make TCG consistent (with 64-bit KVM hosts)
> > +         * and to improve SGI efficiency.
> > +         */
> > +        aff1 = idx / clustersz;
> > +        aff0 = idx % clustersz;
> > +        mp_affinity = (aff1 << ARM_AFF1_SHIFT) | aff0;
> > +    } else {
> > +        /* This cpu-id-to-MPIDR affinity is used only for TCG;
> > +         * KVM will override it. We don't support setting cluster ID
> > +         * ([16..23]) (known as Aff2 in later ARM ARM versions), or any of
> > +         * the higher affinity level fields, so these bits always RAZ.
> > +         */
> > +        uint32_t Aff1 = idx / ARM_DEFAULT_CPUS_PER_CLUSTER;
> > +        uint32_t Aff0 = idx % ARM_DEFAULT_CPUS_PER_CLUSTER;
> > +        mp_affinity = (Aff1 << ARM_AFF1_SHIFT) | Aff0;
> > +    }
> 
> Maybe we should create an ARM CPU function
> 
>  uint64_t arm_cpu_mp_affinity(int idx, uint8_t clustersz)
>  {
>    uint32_t Aff1 = idx / clustersz;
>    uint32_t Aff0 = idx % clustersz;
>    return (Aff1 << ARM_AFF1_SHIFT) | Aff0;
>  }
I'll add and use it in v2

> which we'd use in arm_cpu_realizefn() and here
> 
>  static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
>  {
>    if (VIRT_MACHINE_GET_CLASS(vms)->disallow_affinity_adjustment) {
>       return arm_cpu_mp_affinity(idx, ARM_DEFAULT_CPUS_PER_CLUSTER);
>    }
> 
>    /* ...big comment... */
>    return arm_cpu_mp_affinity(idx, vms->gic_version == 3 ?
>                           GICV3_TARGETLIST_BITS : GIC_TARGETLIST_BITS);
>  }
> 
> > +    return mp_affinity;
> > +}
> > +
> >  static void machvirt_init(MachineState *machine)
> >  {
> >      VirtMachineState *vms = VIRT_MACHINE(machine);
> > @@ -1210,7 +1249,6 @@ static void machvirt_init(MachineState *machine)
> >      CPUClass *cc;
> >      Error *err = NULL;
> >      bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
> > -    uint8_t clustersz;
> >  
> >      if (!cpu_model) {
> >          cpu_model = "cortex-a15";
> > @@ -1263,10 +1301,8 @@ static void machvirt_init(MachineState *machine)
> >       */
> >      if (vms->gic_version == 3) {
> >          virt_max_cpus = vms->memmap[VIRT_GIC_REDIST].size / 0x20000;
> > -        clustersz = GICV3_TARGETLIST_BITS;
> >      } else {
> >          virt_max_cpus = GIC_NCPU;
> > -        clustersz = GIC_TARGETLIST_BITS;
> >      }
> >  
> >      if (max_cpus > virt_max_cpus) {
> > @@ -1326,20 +1362,9 @@ static void machvirt_init(MachineState *machine)
> >  
> >      for (n = 0; n < smp_cpus; n++) {
> >          Object *cpuobj = object_new(typename);
> > -        if (!vmc->disallow_affinity_adjustment) {
> > -            /* Adjust MPIDR like 64-bit KVM hosts, which incorporate the
> > -             * GIC's target-list limitations. 32-bit KVM hosts currently
> > -             * always create clusters of 4 CPUs, but that is expected to
> > -             * change when they gain support for gicv3. When KVM is enabled
> > -             * it will override the changes we make here, therefore our
> > -             * purposes are to make TCG consistent (with 64-bit KVM hosts)
> > -             * and to improve SGI efficiency.
> > -             */
> > -            uint8_t aff1 = n / clustersz;
> > -            uint8_t aff0 = n % clustersz;
> > -            object_property_set_int(cpuobj, (aff1 << ARM_AFF1_SHIFT) | aff0,
> > -                                    "mp-affinity", NULL);
> > -        }
> > +
> > +        object_property_set_int(cpuobj, virt_idx2mp_affinity(vms, n),
> > +                                "mp-affinity", NULL);
> >  
> >          if (!vms->secure) {
> >              object_property_set_bool(cpuobj, false, "has_el3", NULL);
> > -- 
> > 2.7.4
> > 
> >
> 
> Thanks,
> drew 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards Igor Mammedov
  2017-03-23  6:10   ` Bharata B Rao
  2017-03-28  4:19   ` David Gibson
@ 2017-04-25 14:48   ` Andrew Jones
  2017-04-25 15:07     ` Igor Mammedov
  2 siblings, 1 reply; 77+ messages in thread
From: Andrew Jones @ 2017-04-25 14:48 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Peter Maydell, Eduardo Habkost, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Wed, Mar 22, 2017 at 02:32:30PM +0100, Igor Mammedov wrote:
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 0cbcbc1..8748d25 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1554,6 +1554,16 @@ static void virt_set_gic_version(Object *obj, const char *value, Error **errp)
>      }
>  }
>  
> +static CpuInstanceProperties
> +virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> +{
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> +
> +    assert(cpu_index < possible_cpus->len);
> +    return possible_cpus->cpus[cpu_index].props;;
> +}
> +
>  static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
>  {
>      int n;
> @@ -1573,8 +1583,12 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
>          ms->possible_cpus->cpus[n].props.has_thread_id = true;
>          ms->possible_cpus->cpus[n].props.thread_id = n;
>  
> -        /* TODO: add 'has_node/node' here to describe
> -           to which node core belongs */
> +        /* default distribution of CPUs over NUMA nodes */
> +        if (nb_numa_nodes) {
> +            /* preset values but do not enable them i.e. 'has_node_id = false',
> +             * board will enable them if manual mapping wasn't present on CLI */
> +            ms->possible_cpus->cpus[n].props.node_id = n % nb_numa_nodes;;

extra ;

drew

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards
  2017-04-25 14:48   ` Andrew Jones
@ 2017-04-25 15:07     ` Igor Mammedov
  0 siblings, 0 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-04-25 15:07 UTC (permalink / raw)
  To: Andrew Jones
  Cc: qemu-devel, Peter Maydell, Eduardo Habkost, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Tue, 25 Apr 2017 16:48:30 +0200
Andrew Jones <drjones@redhat.com> wrote:

> On Wed, Mar 22, 2017 at 02:32:30PM +0100, Igor Mammedov wrote:
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index 0cbcbc1..8748d25 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -1554,6 +1554,16 @@ static void virt_set_gic_version(Object *obj, const char *value, Error **errp)
> >      }
> >  }
> >  
> > +static CpuInstanceProperties
> > +virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> > +{
> > +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> > +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> > +
> > +    assert(cpu_index < possible_cpus->len);
> > +    return possible_cpus->cpus[cpu_index].props;;
> > +}
> > +
> >  static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
> >  {
> >      int n;
> > @@ -1573,8 +1583,12 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
> >          ms->possible_cpus->cpus[n].props.has_thread_id = true;
> >          ms->possible_cpus->cpus[n].props.thread_id = n;
> >  
> > -        /* TODO: add 'has_node/node' here to describe
> > -           to which node core belongs */
> > +        /* default distribution of CPUs over NUMA nodes */
> > +        if (nb_numa_nodes) {
> > +            /* preset values but do not enable them i.e. 'has_node_id = false',
> > +             * board will enable them if manual mapping wasn't present on CLI */
> > +            ms->possible_cpus->cpus[n].props.node_id = n % nb_numa_nodes;;
> 
> extra ;
fixed in v2

> 
> drew

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 14/23] virt-arm: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 14/23] virt-arm: " Igor Mammedov
@ 2017-04-25 17:06   ` Andrew Jones
  2017-04-26 10:54     ` Igor Mammedov
  0 siblings, 1 reply; 77+ messages in thread
From: Andrew Jones @ 2017-04-25 17:06 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Peter Maydell, Eduardo Habkost, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Wed, Mar 22, 2017 at 02:32:39PM +0100, Igor Mammedov wrote:
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  hw/arm/virt-acpi-build.c | 19 +++++++------------
>  hw/arm/virt.c            | 13 +++++++------
>  2 files changed, 14 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 0835e59..ce7499c 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -486,30 +486,25 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>      AcpiSystemResourceAffinityTable *srat;
>      AcpiSratProcessorGiccAffinity *core;
>      AcpiSratMemoryAffinity *numamem;
> -    int i, j, srat_start;
> +    int i, srat_start;
>      uint64_t mem_base;
> -    uint32_t *cpu_node = g_malloc0(vms->smp_cpus * sizeof(uint32_t));
> -
> -    for (i = 0; i < vms->smp_cpus; i++) {
> -        j = numa_get_node_for_cpu(i);
> -        if (j < nb_numa_nodes) {
> -                cpu_node[i] = j;
> -        }
> -    }
> +    MachineClass *mc = MACHINE_GET_CLASS(vms);
> +    const CPUArchIdList *cpu_list = mc->possible_cpu_arch_ids(MACHINE(vms));
>  
>      srat_start = table_data->len;
>      srat = acpi_data_push(table_data, sizeof(*srat));
>      srat->reserved1 = cpu_to_le32(1);
>  
> -    for (i = 0; i < vms->smp_cpus; ++i) {
> +    for (i = 0; i < cpu_list->len; ++i) {
> +        int node_id = cpu_list->cpus[i].props.has_node_id ?
> +            cpu_list->cpus[i].props.node_id : 0;
>          core = acpi_data_push(table_data, sizeof(*core));
>          core->type = ACPI_SRAT_PROCESSOR_GICC;
>          core->length = sizeof(*core);
> -        core->proximity = cpu_to_le32(cpu_node[i]);
> +        core->proximity = cpu_to_le32(node_id);
>          core->acpi_processor_uid = cpu_to_le32(i);
>          core->flags = cpu_to_le32(1);
>      }
> -    g_free(cpu_node);
>  
>      mem_base = vms->memmap[VIRT_MEM].base;
>      for (i = 0; i < nb_numa_nodes; ++i) {
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 68d44f3..0a75df5 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -338,7 +338,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
>  {
>      int cpu;
>      int addr_cells = 1;
> -    unsigned int i;
> +    const MachineState *ms = MACHINE(vms);
>  
>      /*
>       * From Documentation/devicetree/bindings/arm/cpus.txt
> @@ -369,6 +369,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
>      for (cpu = vms->smp_cpus - 1; cpu >= 0; cpu--) {
>          char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
>          ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
> +        CPUState *cs = CPU(armcpu);
>  
>          qemu_fdt_add_subnode(vms->fdt, nodename);
>          qemu_fdt_setprop_string(vms->fdt, nodename, "device_type", "cpu");
> @@ -389,9 +390,9 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
>                                    armcpu->mp_affinity);
>          }
>  
> -        i = numa_get_node_for_cpu(cpu);
> -        if (i < nb_numa_nodes) {
> -            qemu_fdt_setprop_cell(vms->fdt, nodename, "numa-node-id", i);
> +        if (ms->possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
> +            qemu_fdt_setprop_cell(vms->fdt, nodename, "numa-node-id",
> +                ms->possible_cpus->cpus[cs->cpu_index].props.node_id);
>          }
>  
>          g_free(nodename);
> @@ -1378,8 +1379,8 @@ static void machvirt_init(MachineState *machine)
>          cs = CPU(cpuobj);
>          cs->cpu_index = n;
>  
> -        node_id = numa_get_node_for_cpu(cs->cpu_index);
> -        if (node_id == nb_numa_nodes) {
> +        node_id = machine->possible_cpus->cpus[cs->cpu_index].props.node_id;
> +        if (!machine->possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
>              /* by default CPUState::numa_node was 0 if it's not set via CLI
>               * keep it this way for now but in future we probably should
>               * refuse to start up with incomplete numa mapping */
> -- 
> 2.7.4
> 
>

We now have many machine->possible_cpus->cpus[index].props.[has_]node_id
instances. I think we need inline accessors added to include/sysemu/numa.h
like

 static inline bool numa_has_node_id(MachineState *ms, int index)
 {
   return ms->possible_cpus->cpus[index].props.has_node_id;
 }

 static inline int numa_node_id(MachineState *ms, int index)
 {
   return ms->possible_cpus->cpus[index].props.node_id;
 }

 ...

to improve readability and maintainability.

Or, instead, we could provide macros to allow assignments, e.g.

 #define NUMA_HAS_NODE_ID(ms, index) \
   ((ms)->possible_cpus->cpus[index].props.has_node_id)
 #define NUMA_NODE_ID(ms, index) \
   ((ms)->possible_cpus->cpus[index].props.node_id)


Thanks,
drew

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 08/23] virt-arm: add node-id property to CPU
  2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 08/23] virt-arm: " Igor Mammedov
@ 2017-04-25 17:16   ` Andrew Jones
  2017-04-26 10:47     ` Igor Mammedov
  0 siblings, 1 reply; 77+ messages in thread
From: Andrew Jones @ 2017-04-25 17:16 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Peter Maydell, Eduardo Habkost, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Wed, Mar 22, 2017 at 02:32:33PM +0100, Igor Mammedov wrote:
> it will allow switching from cpu_index to property based
> numa mapping in follow up patches.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  hw/arm/virt.c    | 15 +++++++++++++++
>  target/arm/cpu.c |  1 +
>  2 files changed, 16 insertions(+)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 8748d25..68d44f3 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1365,6 +1365,7 @@ static void machvirt_init(MachineState *machine)
>      for (n = 0; n < machine->possible_cpus->len; n++) {
>          Object *cpuobj;
>          CPUState *cs;
> +        int node_id;
>  
>          if (n >= smp_cpus) {
>              break;
> @@ -1377,6 +1378,20 @@ static void machvirt_init(MachineState *machine)
>          cs = CPU(cpuobj);
>          cs->cpu_index = n;
>  
> +        node_id = numa_get_node_for_cpu(cs->cpu_index);
> +        if (node_id == nb_numa_nodes) {
> +            /* by default CPUState::numa_node was 0 if it's not set via CLI
> +             * keep it this way for now but in future we probably should
> +             * refuse to start up with incomplete numa mapping */
> +             node_id = 0;

Do other architectures already abort on incomplete numa? If so, it'd be
nice to do that for mach-virt sooner than later. I guess we just need
another compat variable for 2.9 and older machine types.  I think libvirt
always supplies all cpu-node mappings, if any, so there shouldn't be an
issue with it.

Thanks,
drew

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 08/23] virt-arm: add node-id property to CPU
  2017-04-25 17:16   ` Andrew Jones
@ 2017-04-26 10:47     ` Igor Mammedov
  0 siblings, 0 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-04-26 10:47 UTC (permalink / raw)
  To: Andrew Jones
  Cc: qemu-devel, Peter Maydell, Eduardo Habkost, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Tue, 25 Apr 2017 19:16:13 +0200
Andrew Jones <drjones@redhat.com> wrote:

> On Wed, Mar 22, 2017 at 02:32:33PM +0100, Igor Mammedov wrote:
> > it will allow switching from cpu_index to property based
> > numa mapping in follow up patches.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  hw/arm/virt.c    | 15 +++++++++++++++
> >  target/arm/cpu.c |  1 +
> >  2 files changed, 16 insertions(+)
> > 
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index 8748d25..68d44f3 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -1365,6 +1365,7 @@ static void machvirt_init(MachineState *machine)
> >      for (n = 0; n < machine->possible_cpus->len; n++) {
> >          Object *cpuobj;
> >          CPUState *cs;
> > +        int node_id;
> >  
> >          if (n >= smp_cpus) {
> >              break;
> > @@ -1377,6 +1378,20 @@ static void machvirt_init(MachineState *machine)
> >          cs = CPU(cpuobj);
> >          cs->cpu_index = n;
> >  
> > +        node_id = numa_get_node_for_cpu(cs->cpu_index);
> > +        if (node_id == nb_numa_nodes) {
> > +            /* by default CPUState::numa_node was 0 if it's not set via CLI
> > +             * keep it this way for now but in future we probably should
> > +             * refuse to start up with incomplete numa mapping */
> > +             node_id = 0;  
> 
> Do other architectures already abort on incomplete numa? If so, it'd be
> nice to do that for mach-virt sooner than later. I guess we just need
> another compat variable for 2.9 and older machine types.  I think libvirt
> always supplies all cpu-node mappings, if any, so there shouldn't be an
> issue with it.
so far we only print warning messages but allow to proceed (it global policy),
intent was that further down the road we would obsolete it and turn them in
hard errors (maybe without keeping even compat stuff).
I guess we can do it for all supported archs at the same time in 2-3 releases
after announcing it (patch 20/23 adds obsoleted message).

> 
> Thanks,
> drew

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 14/23] virt-arm: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  2017-04-25 17:06   ` Andrew Jones
@ 2017-04-26 10:54     ` Igor Mammedov
  2017-04-26 11:27       ` Andrew Jones
  0 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-04-26 10:54 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, Eduardo Habkost, qemu-devel, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Tue, 25 Apr 2017 19:06:34 +0200
Andrew Jones <drjones@redhat.com> wrote:

> On Wed, Mar 22, 2017 at 02:32:39PM +0100, Igor Mammedov wrote:
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  hw/arm/virt-acpi-build.c | 19 +++++++------------
> >  hw/arm/virt.c            | 13 +++++++------
> >  2 files changed, 14 insertions(+), 18 deletions(-)
> > 
> > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > index 0835e59..ce7499c 100644
> > --- a/hw/arm/virt-acpi-build.c
> > +++ b/hw/arm/virt-acpi-build.c
> > @@ -486,30 +486,25 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> >      AcpiSystemResourceAffinityTable *srat;
> >      AcpiSratProcessorGiccAffinity *core;
> >      AcpiSratMemoryAffinity *numamem;
> > -    int i, j, srat_start;
> > +    int i, srat_start;
> >      uint64_t mem_base;
> > -    uint32_t *cpu_node = g_malloc0(vms->smp_cpus * sizeof(uint32_t));
> > -
> > -    for (i = 0; i < vms->smp_cpus; i++) {
> > -        j = numa_get_node_for_cpu(i);
> > -        if (j < nb_numa_nodes) {
> > -                cpu_node[i] = j;
> > -        }
> > -    }
> > +    MachineClass *mc = MACHINE_GET_CLASS(vms);
> > +    const CPUArchIdList *cpu_list = mc->possible_cpu_arch_ids(MACHINE(vms));
> >  
> >      srat_start = table_data->len;
> >      srat = acpi_data_push(table_data, sizeof(*srat));
> >      srat->reserved1 = cpu_to_le32(1);
> >  
> > -    for (i = 0; i < vms->smp_cpus; ++i) {
> > +    for (i = 0; i < cpu_list->len; ++i) {
> > +        int node_id = cpu_list->cpus[i].props.has_node_id ?
> > +            cpu_list->cpus[i].props.node_id : 0;
> >          core = acpi_data_push(table_data, sizeof(*core));
> >          core->type = ACPI_SRAT_PROCESSOR_GICC;
> >          core->length = sizeof(*core);
> > -        core->proximity = cpu_to_le32(cpu_node[i]);
> > +        core->proximity = cpu_to_le32(node_id);
> >          core->acpi_processor_uid = cpu_to_le32(i);
> >          core->flags = cpu_to_le32(1);
> >      }
> > -    g_free(cpu_node);
> >  
> >      mem_base = vms->memmap[VIRT_MEM].base;
> >      for (i = 0; i < nb_numa_nodes; ++i) {
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index 68d44f3..0a75df5 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -338,7 +338,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
> >  {
> >      int cpu;
> >      int addr_cells = 1;
> > -    unsigned int i;
> > +    const MachineState *ms = MACHINE(vms);
> >  
> >      /*
> >       * From Documentation/devicetree/bindings/arm/cpus.txt
> > @@ -369,6 +369,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
> >      for (cpu = vms->smp_cpus - 1; cpu >= 0; cpu--) {
> >          char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
> >          ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
> > +        CPUState *cs = CPU(armcpu);
> >  
> >          qemu_fdt_add_subnode(vms->fdt, nodename);
> >          qemu_fdt_setprop_string(vms->fdt, nodename, "device_type", "cpu");
> > @@ -389,9 +390,9 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
> >                                    armcpu->mp_affinity);
> >          }
> >  
> > -        i = numa_get_node_for_cpu(cpu);
> > -        if (i < nb_numa_nodes) {
> > -            qemu_fdt_setprop_cell(vms->fdt, nodename, "numa-node-id", i);
> > +        if (ms->possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
> > +            qemu_fdt_setprop_cell(vms->fdt, nodename, "numa-node-id",
> > +                ms->possible_cpus->cpus[cs->cpu_index].props.node_id);
> >          }
> >  
> >          g_free(nodename);
> > @@ -1378,8 +1379,8 @@ static void machvirt_init(MachineState *machine)
> >          cs = CPU(cpuobj);
> >          cs->cpu_index = n;
> >  
> > -        node_id = numa_get_node_for_cpu(cs->cpu_index);
> > -        if (node_id == nb_numa_nodes) {
> > +        node_id = machine->possible_cpus->cpus[cs->cpu_index].props.node_id;
> > +        if (!machine->possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
> >              /* by default CPUState::numa_node was 0 if it's not set via CLI
> >               * keep it this way for now but in future we probably should
> >               * refuse to start up with incomplete numa mapping */
> > -- 
> > 2.7.4
> > 
> >  
> 
> We now have many machine->possible_cpus->cpus[index].props.[has_]node_id
> instances. I think we need inline accessors added to include/sysemu/numa.h
> like
> 
>  static inline bool numa_has_node_id(MachineState *ms, int index)
>  {
>    return ms->possible_cpus->cpus[index].props.has_node_id;
>  }
> 
>  static inline int numa_node_id(MachineState *ms, int index)
>  {
>    return ms->possible_cpus->cpus[index].props.node_id;
>  }
> 
>  ...
> 
> to improve readability and maintainability.
I dislike this kind of one-line wrappers as it hurts readability
and maintainability of code for me as I'm forced to jump
around code every time I see such wrapper to recall what and
how it does. Code still fits in one line so I'd like to keep
it wrapper-less in this case if you don't insist on the change.

> 
> Or, instead, we could provide macros to allow assignments, e.g.
> 
>  #define NUMA_HAS_NODE_ID(ms, index) \
>    ((ms)->possible_cpus->cpus[index].props.has_node_id)
>  #define NUMA_NODE_ID(ms, index) \
>    ((ms)->possible_cpus->cpus[index].props.node_id)
ditto + worse debuggability 

> 
> Thanks,
> drew
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 10/23] numa: mirror cpu to node mapping in MachineState::possible_cpus
  2017-04-19  9:31     ` Igor Mammedov
@ 2017-04-26 11:02       ` Eduardo Habkost
  0 siblings, 0 replies; 77+ messages in thread
From: Eduardo Habkost @ 2017-04-26 11:02 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Peter Maydell, Andrew Jones, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

Just noticed I didn't reply to this yet:

On Wed, Apr 19, 2017 at 11:31:05AM +0200, Igor Mammedov wrote:
> On Thu, 13 Apr 2017 10:58:05 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Wed, Mar 22, 2017 at 02:32:35PM +0100, Igor Mammedov wrote:
> > > Introduce machine_set_cpu_numa_node() helper that stores
> > > node mapping for CPU in MachineState::possible_cpus.
> > > CPU and node it belongs to is specified by 'props' argument.
> > > 
> > > Patch doesn't remove old way of storing mapping in
> > > numa_info[X].node_cpu as removing it at the same time
> > > makes patch rather big. Instead it just mirrors mapping
> > > in possible_cpus and follow up per target patches will
> > > switch to possible_cpus and numa_info[X].node_cpu will
> > > be removed once there isn't any users left.
> > > 
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > ---
> > >  include/hw/boards.h |  2 ++
> > >  hw/core/machine.c   | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  numa.c              |  8 +++++++
> > >  3 files changed, 78 insertions(+)
> > > 
> > > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > > index 1dd0fde..40f30f1 100644
> > > --- a/include/hw/boards.h
> > > +++ b/include/hw/boards.h
> > > @@ -42,6 +42,8 @@ bool machine_dump_guest_core(MachineState *machine);
> > >  bool machine_mem_merge(MachineState *machine);
> > >  void machine_register_compat_props(MachineState *machine);
> > >  HotpluggableCPUList *machine_query_hotpluggable_cpus(MachineState *machine);
> > > +void machine_set_cpu_numa_node(MachineState *machine,
> > > +                               CpuInstanceProperties *props, Error **errp);
> > >  
> > >  /**
> > >   * CPUArchId:
> > > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > > index 0d92672..6ff0b45 100644
> > > --- a/hw/core/machine.c
> > > +++ b/hw/core/machine.c
> > > @@ -388,6 +388,74 @@ HotpluggableCPUList *machine_query_hotpluggable_cpus(MachineState *machine)
> > >      return head;
> > >  }
> > >  
> > > +void machine_set_cpu_numa_node(MachineState *machine,
> > > +                               CpuInstanceProperties *props, Error **errp)  
> > 
> > If you change this to:
> > 
> >   void cpu_slot_set_numa_node(CPUArchId *slot, uint64_t node_id,
> >                               Error **errp);
> > 
> > and move the CPU slot lookup code from machine_set_cpu_numa_node()
> > to a helper:
> > 
> >   CPUArchId *machine_get_cpu_slot(MachineState *machine,
> >                                   CpuInstanceProperties *props, Error **errp);
> it would work in case of exact 1:1 lookup, but Paolo asked for
> wildcard support (i.e. -numa cpu,node-id=x,socket-id=y should set
> mapping for all cpus in socket y).
> So I'd prefer to keep machine_set_cpu_numa_node() as is,
> with it series splits nicely in clean and bisectable patches
> without breaking anything in the middle.

Makes sense to me. And this is also a good reason we won't have a
1:1 mapping from query-hotpluggable-cpus output to -numa cpu
input. (So this answers my question about the intent to change
query-hotpluggable-cpus output)

> 
> 
> > and change cpu_index_to_cpu_instance_props to return CPUArchId:
> > 
> >   CPUArchId *cpu_index_to_cpu_slot(MachineState *machine, int cpu_index);
> > 
> > We could simply have this on "-numa cpu" code:
> > 
> >     slot = machine_get_cpu_slot(machine, props);
> >     cpu_slot_set_numa_node(slot, node_id);
> > 
> > and this on the legacy "-numa node,cpu=..." code:
> > 
> >     slot = mc->cpu_index_to_cpu_slot(machine, i);
> >     cpu_slot_set_numa_node(slot, node_id);
> > 
> > I believe we will be able to reuse machine_get_cpu_slot() to
> > replace pc_find_cpu_slot() and spapr_find_cpu_slot() later.
> As I already replied to David, xxx_find_cpu_slot() possibly might
> be generalized but I'd like to postpone it until ARM topology
> is materialized and merged.

OK.

> 
> Another reason I'd like to postpone generalization is that
> xxx_find_cpu_slot() could be optimized in target specific way
> replacing CPU lookup in array with computational expression.
> It will make lookup O(1) function and it could be used as
> a better replacement for qemu_get_cpu()/cpu_exists() but
> I haven't looked into this yet.

I would prefer optimization in generic code to machine-specific
code just for optimization. But both approaches are valid, let's
see how this evolves.

> 
>  
> > (I also suggest renaming "CPUArchId" and "possible CPUs" to
> > "CPUSlot" and "CPU slots" in the code and comments. This would
> > help people reviewing the code, but it can be done later if you
> > prefer.)
> I'm fine with changing CPUArchId to CPUSlot but I'd leave
> "possible CPUs" as is, since it precisely describes what it is,
> "CPU slots" is too ambiguous.

No problem to me. We can discuss that later, anyway.

Thanks!

-- 
Eduardo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 10/23] numa: mirror cpu to node mapping in MachineState::possible_cpus
  2017-04-19  9:52     ` Igor Mammedov
@ 2017-04-26 11:04       ` Eduardo Habkost
  0 siblings, 0 replies; 77+ messages in thread
From: Eduardo Habkost @ 2017-04-26 11:04 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Peter Maydell, Andrew Jones, qemu-devel, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Wed, Apr 19, 2017 at 11:52:45AM +0200, Igor Mammedov wrote:
> On Wed, 12 Apr 2017 18:15:29 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Wed, Mar 22, 2017 at 02:32:35PM +0100, Igor Mammedov wrote:
> > > Introduce machine_set_cpu_numa_node() helper that stores
> > > node mapping for CPU in MachineState::possible_cpus.
> > > CPU and node it belongs to is specified by 'props' argument.
> > > 
> > > Patch doesn't remove old way of storing mapping in
> > > numa_info[X].node_cpu as removing it at the same time
> > > makes patch rather big. Instead it just mirrors mapping
> > > in possible_cpus and follow up per target patches will
> > > switch to possible_cpus and numa_info[X].node_cpu will
> > > be removed once there isn't any users left.
> > > 
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> > 
> > So, this patch is the one that makes "-numa" and "-numa cpu"
> > affect query-hotpluggable-cpus output.
> that was intent behind series.
> 
> [...]
> > As noted in another message, I am not sure we really should make
> > "-numa" affect query-hotpluggable-cpus output unconditionally (I
> > believe we shouldn't). But we do, we need to document this very
> > clearly.
> What place would you suggest to document this at?

qemu-options.hx documentation for -numa cpu, NumaCpuOptions docs
on qapi-schema, and/or query-hotpluggable-cpus docs on
qapi-schema.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 14/23] virt-arm: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  2017-04-26 10:54     ` Igor Mammedov
@ 2017-04-26 11:27       ` Andrew Jones
  2017-04-27 13:24         ` Igor Mammedov
  0 siblings, 1 reply; 77+ messages in thread
From: Andrew Jones @ 2017-04-26 11:27 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Peter Maydell, Eduardo Habkost, qemu-devel, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Wed, Apr 26, 2017 at 12:54:33PM +0200, Igor Mammedov wrote:
> On Tue, 25 Apr 2017 19:06:34 +0200
> Andrew Jones <drjones@redhat.com> wrote:
> 
> > On Wed, Mar 22, 2017 at 02:32:39PM +0100, Igor Mammedov wrote:
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > ---
> > >  hw/arm/virt-acpi-build.c | 19 +++++++------------
> > >  hw/arm/virt.c            | 13 +++++++------
> > >  2 files changed, 14 insertions(+), 18 deletions(-)
> > > 
> > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > > index 0835e59..ce7499c 100644
> > > --- a/hw/arm/virt-acpi-build.c
> > > +++ b/hw/arm/virt-acpi-build.c
> > > @@ -486,30 +486,25 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> > >      AcpiSystemResourceAffinityTable *srat;
> > >      AcpiSratProcessorGiccAffinity *core;
> > >      AcpiSratMemoryAffinity *numamem;
> > > -    int i, j, srat_start;
> > > +    int i, srat_start;
> > >      uint64_t mem_base;
> > > -    uint32_t *cpu_node = g_malloc0(vms->smp_cpus * sizeof(uint32_t));
> > > -
> > > -    for (i = 0; i < vms->smp_cpus; i++) {
> > > -        j = numa_get_node_for_cpu(i);
> > > -        if (j < nb_numa_nodes) {
> > > -                cpu_node[i] = j;
> > > -        }
> > > -    }
> > > +    MachineClass *mc = MACHINE_GET_CLASS(vms);
> > > +    const CPUArchIdList *cpu_list = mc->possible_cpu_arch_ids(MACHINE(vms));
> > >  
> > >      srat_start = table_data->len;
> > >      srat = acpi_data_push(table_data, sizeof(*srat));
> > >      srat->reserved1 = cpu_to_le32(1);
> > >  
> > > -    for (i = 0; i < vms->smp_cpus; ++i) {
> > > +    for (i = 0; i < cpu_list->len; ++i) {
> > > +        int node_id = cpu_list->cpus[i].props.has_node_id ?
> > > +            cpu_list->cpus[i].props.node_id : 0;
> > >          core = acpi_data_push(table_data, sizeof(*core));
> > >          core->type = ACPI_SRAT_PROCESSOR_GICC;
> > >          core->length = sizeof(*core);
> > > -        core->proximity = cpu_to_le32(cpu_node[i]);
> > > +        core->proximity = cpu_to_le32(node_id);
> > >          core->acpi_processor_uid = cpu_to_le32(i);
> > >          core->flags = cpu_to_le32(1);
> > >      }
> > > -    g_free(cpu_node);
> > >  
> > >      mem_base = vms->memmap[VIRT_MEM].base;
> > >      for (i = 0; i < nb_numa_nodes; ++i) {
> > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > > index 68d44f3..0a75df5 100644
> > > --- a/hw/arm/virt.c
> > > +++ b/hw/arm/virt.c
> > > @@ -338,7 +338,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
> > >  {
> > >      int cpu;
> > >      int addr_cells = 1;
> > > -    unsigned int i;
> > > +    const MachineState *ms = MACHINE(vms);
> > >  
> > >      /*
> > >       * From Documentation/devicetree/bindings/arm/cpus.txt
> > > @@ -369,6 +369,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
> > >      for (cpu = vms->smp_cpus - 1; cpu >= 0; cpu--) {
> > >          char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
> > >          ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
> > > +        CPUState *cs = CPU(armcpu);
> > >  
> > >          qemu_fdt_add_subnode(vms->fdt, nodename);
> > >          qemu_fdt_setprop_string(vms->fdt, nodename, "device_type", "cpu");
> > > @@ -389,9 +390,9 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
> > >                                    armcpu->mp_affinity);
> > >          }
> > >  
> > > -        i = numa_get_node_for_cpu(cpu);
> > > -        if (i < nb_numa_nodes) {
> > > -            qemu_fdt_setprop_cell(vms->fdt, nodename, "numa-node-id", i);
> > > +        if (ms->possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
> > > +            qemu_fdt_setprop_cell(vms->fdt, nodename, "numa-node-id",
> > > +                ms->possible_cpus->cpus[cs->cpu_index].props.node_id);
> > >          }
> > >  
> > >          g_free(nodename);
> > > @@ -1378,8 +1379,8 @@ static void machvirt_init(MachineState *machine)
> > >          cs = CPU(cpuobj);
> > >          cs->cpu_index = n;
> > >  
> > > -        node_id = numa_get_node_for_cpu(cs->cpu_index);
> > > -        if (node_id == nb_numa_nodes) {
> > > +        node_id = machine->possible_cpus->cpus[cs->cpu_index].props.node_id;
> > > +        if (!machine->possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
> > >              /* by default CPUState::numa_node was 0 if it's not set via CLI
> > >               * keep it this way for now but in future we probably should
> > >               * refuse to start up with incomplete numa mapping */
> > > -- 
> > > 2.7.4
> > > 
> > >  
> > 
> > We now have many machine->possible_cpus->cpus[index].props.[has_]node_id
> > instances. I think we need inline accessors added to include/sysemu/numa.h
> > like
> > 
> >  static inline bool numa_has_node_id(MachineState *ms, int index)
> >  {
> >    return ms->possible_cpus->cpus[index].props.has_node_id;
> >  }
> > 
> >  static inline int numa_node_id(MachineState *ms, int index)
> >  {
> >    return ms->possible_cpus->cpus[index].props.node_id;
> >  }
> > 
> >  ...
> > 
> > to improve readability and maintainability.
> I dislike this kind of one-line wrappers as it hurts readability
> and maintainability of code for me as I'm forced to jump
> around code every time I see such wrapper to recall what and
> how it does. Code still fits in one line so I'd like to keep
> it wrapper-less in this case if you don't insist on the change.

I prefer to jump around in code (jump once, read many) to going
blind looking for one or two char differences in 50 char long
variable names. So, to whatever degree I can, I insist :-)

> 
> > 
> > Or, instead, we could provide macros to allow assignments, e.g.
> > 
> >  #define NUMA_HAS_NODE_ID(ms, index) \
> >    ((ms)->possible_cpus->cpus[index].props.has_node_id)
> >  #define NUMA_NODE_ID(ms, index) \
> >    ((ms)->possible_cpus->cpus[index].props.node_id)
> ditto + worse debuggability 

I prefer the functions myself. I think the assignments to these
properties are rare enough that we only need the read accessors.

Thanks,
drew

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 07/23] pc: add node-id property to CPU
  2017-04-19 11:14     ` Igor Mammedov
@ 2017-04-26 12:21       ` Eduardo Habkost
  2017-04-27 13:14         ` Igor Mammedov
  0 siblings, 1 reply; 77+ messages in thread
From: Eduardo Habkost @ 2017-04-26 12:21 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Peter Maydell, Andrew Jones, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson


(Sorry for taking so long to continue the discussion. I thought I
was convinced setting node-id on HotpluggableCPU.props was
required, but after thinking more about it, it looks like it's
still not required.)

On Wed, Apr 19, 2017 at 01:14:58PM +0200, Igor Mammedov wrote:
> On Wed, 12 Apr 2017 18:02:39 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Wed, Mar 22, 2017 at 02:32:32PM +0100, Igor Mammedov wrote:
> > > it will allow switching from cpu_index to property based
> > > numa mapping in follow up patches.  
> > 
> > I am not sure I understand all the consequences of this, so I
> > will give it a try:
> > 
> > "node-id" is an existing field in CpuInstanceProperties.
> > CpuInstanceProperties is used on both query-hotpluggable-cpus
> > output and in MachineState::possible_cpus.
> > 
> > We will start using MachineState::possible_cpus to keep track of
> > NUMA CPU affinity, and that means query-hotpluggable-cpus will
> > start reporting a "node-id" property when a NUMA mapping is
> > configured.
> > 
> > To allow query-hotpluggable-cpus to report "node-id", the CPU
> > objects must have a "node-id" property that can be set. This
> > patch adds the "node-id" property to X86CPU.
> > 
> > Is this description accurate? Is the presence of "node-id" in
> > query-hotpluggable-cpus the only reason we really need this
> > patch, or is there something else that requires the "node-id"
> > property?
> That accurate description, node-id is in the same 'address'
> properties category as socket/core/thread-id. So if you have
> numa enabled machine you'd see node-id property in
> query-hotpluggable-cpus.

I agree that we can make -numa cpu affect query-hotpluggable-cpus
output (i.e. affect some field on HotpluggableCPU.props).

But it looks like we disagree about the purpose of
HotpluggableCPU.props:

I believe HotpluggableCPU.props is just an opaque identifier for
the location we want to plug the CPU, and the only requirement is
that it should be unique and have all the information device_add
needs. As socket IDs are already unique on our existing machines,
and socket<=>node mapping is already configured using -numa cpu,
node-id doesn't need to be in HotpluggableCPU.props. (see example
below)

I don't think clients should assume topology information in
HotpluggableCPU.props is always present, because the field has a
different purpose: letting clients know what are the required
device_add arguments. If we need introspection of CPU topology,
we can add new fields to HotpluggableCPU, outside 'props'.

> 
> 
> > Why exactly do we need to change the output of
> > query-hotpluggable-cpus for all machines to include "node-id", to
> > make "-numa cpu" work?
> It's for introspection as well as for consolidating topology data
> in a single place purposes and complements already outputed
> socket/core/thread-id address properties with numa node-id.
> That way one doesn't need yet another command for introspecting
> numa mapping for cpus and use existing query-hotpluggable-cpus
> for full topology description.

I don't disagree about including node-id in HotpluggableCPU
struct, I am just unsure about including it in
HotpluggableCPU.props.

My question is: if we use:

 -numa cpu,socket=2,core=1,thread=0,node-id=3

and then:

 device_add ...,socket=2,core=1,thread=0
 (omitting node-id on device_add)

won't it work exactly the same, and place the new CPU on NUMA
node 3?

In this case, if we don't need a node-id argument on device_add,
so node-id doesn't need to be in HotpluggableCPU.props.

> 
> >  Did you consider saving node_id inside
> > CPUArchId and outside CpuInstanceProperties, so
> > query-hotplugabble-cpus output won't be affected by "-numa cpu"?
> nope, intent was to make node-id visible if numa is enabled and
> I think that intent was there from the very begging when
> query-hotplugabble-cpus was introduced with CpuInstanceProperties
> having node_id field but unused since it has been out of scope
> of cpu hotplug.
> 
> 
> > I'm asking this because I believe we will eventually need a
> > mechanism that lets management check what are the valid arguments
> > for "-numa cpu" for a given machine, and it looks like
> > query-hotpluggable-cpus is already the right mechanism for that.
> it's problem similar with -device cpu_foo,...

True.

> 
> > But we can't make query-hotpluggable-cpus output depend on "-numa
> > cpu" input, if the "-numa cpu" input will also depend on
> > query-hotpluggable-cpus output.
> I don't think that query-hotpluggable-cpus must be independent of
> '-numa' option.
> 
> query-hotpluggable-cpus is a function of -smp and machine type and
> it's output is dynamic and can change during runtime so we've never
> made promise to make it static. I think it's ok to make it depend
> on -numa as extra input argument when present.

OK, I agree that we can make -numa cpu affect
query-hotpluggable-cpus output.

I also think it might be OK to make -numa affect
HotpluggableCPU.props, as clients should be prepared for that. I
just want to understand if we really _have_ to make it so.
Because not including it would help us avoid surprises, and even
simplify the code (making this series shorter).

> 
> It bothers me as well, that '-numa cpu' as well as '-device cpu_foo'
> options depend on query-hotpluggable-cpus and when we considered
> generic '-device cpu' support, we though that initially
> query-hotpluggable-cpus could be used to get list of CPUs
> for given -smp/machine combination and then it could be used
> for composing proper CLI. That makes mgmt to start QEMU twice
> when creating configuration for the 1st time, but end result CLI
> could be reused without repeating query step again provided
> topology/machine stays the same. The same applies to '-numa cpu'.
> 
> In future to avoid starting QEMU twice we were thinking about
> configuring QEMU from QMP at runtime, that's where preconfigure
> approach could be used to help solving it in the future:
> 
>   1. introduce pause before machine_init CLI option to allow
>      preconfig machine from qmp/monitor
>   2. make query-hotpluggable-cpus usable at preconfig time
>   3. start qemu with needed number of numa nodes and default mapping:
>          #qemu -smp ... -numa node,nodeid=0 -node node,nodeid=1
>   4. get possible cpus list

This is where things can get tricky: if we have the default
mapping set, step 4 would return "node-id" already set on all
possible CPUs.

>   5. add qmp/monitor command variant for '-numa cpu' to set numa mapping

This is where I think we would make things simpler: if node-id
isn't present on 'props', we can simply document the arguments
that identify the CPU for the numa-cpu command as "just use the
properties you get on query-hotpluggable-cpus.props". Clients
would be able to treat CpuInstanceProperties as an opaque CPU
slot identifier.

i.e. I think this would be a better way to define and document
the interface:

##
# @NumaCpuOptions:
#
# Mapping of a given CPU (or a set of CPUs) to a NUMA node.
#
# @cpu: Properties identifying the CPU(s). Use the 'props' field of
#       query-hotpluggable-cpus for possible values for this
#       field.
#       TODO: describe what happens when 'cpu' matches
#       multiple slots.
# @node-id: NUMA node where the CPUs are going to be located.
##
{ 'struct': 'NumaCpuOptions',
  'data': {
   'cpu': 'CpuInstanceProperties',
   'node-id': 'int' } }

This separates "what identifies the CPU slot(s) we are
configuring" from "what identifies the node ID we are binding
to".

In case we have trouble making this struct work with QemuOpts, we
could do this (temporarily?):

##
# @NumaCpuOptions:
#
# Mapping of a given CPU (or a set of CPUs) to a NUMA node.
#
# @cpu: Properties identifying the CPU(s). Use the 'props' field of
#       query-hotpluggable-cpus for possible values for this
#       field.
#       TODO: describe what happens when 'cpu' matches
#       multiple slots.
# @node-id: NUMA node where the CPUs are going to be located.
#
# @socket-id: Shortcut for cpu.socket-id, to make this struct
#             friendly to QemuOpts.
# @core-id: Shortcut for cpu.core-id, to make this struct
#           friendly to QemuOpts.
# @thread-id: Shortcut for cpu.thread-id, to make this struct
#             friendly to QemuOpts.
##
{ 'struct': 'NumaCpuOptions',
  'data': {
   '*cpu': 'CpuInstanceProperties',
   '*socket-id': 'int',
   '*core-id': 'int',
   '*thread-id': 'int',
   'node-id': 'int' } }

>   6. optionally, set new numa mapping and get updated
>      possible cpus list with query-hotpluggable-cpus
>   7. optionally, add extra cpus with device_add using updated
>      cpus list and get updated cpus list as it's been changed again.
>   8. unpause preconfig stage and let qemu continue to execute
>      machine_init and the rest.
> 
> Since we would need to implement QMP configuration for '-device cpu',
> we as well might reuse it for custom numa mapping.
> 
>  [...]

-- 
Eduardo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 07/23] pc: add node-id property to CPU
  2017-04-26 12:21       ` Eduardo Habkost
@ 2017-04-27 13:14         ` Igor Mammedov
  2017-04-27 16:32           ` Eduardo Habkost
  0 siblings, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-04-27 13:14 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: qemu-devel, Peter Maydell, Andrew Jones, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson, Peter Krempa

On Wed, 26 Apr 2017 09:21:38 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

adding Peter to CC list

[...]

> On Wed, Apr 19, 2017 at 01:14:58PM +0200, Igor Mammedov wrote:
> > On Wed, 12 Apr 2017 18:02:39 -0300
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> >   
> > > On Wed, Mar 22, 2017 at 02:32:32PM +0100, Igor Mammedov wrote:  
> > > > it will allow switching from cpu_index to property based
> > > > numa mapping in follow up patches.    
> > > 
> > > I am not sure I understand all the consequences of this, so I
> > > will give it a try:
> > > 
> > > "node-id" is an existing field in CpuInstanceProperties.
> > > CpuInstanceProperties is used on both query-hotpluggable-cpus
> > > output and in MachineState::possible_cpus.
> > > 
> > > We will start using MachineState::possible_cpus to keep track of
> > > NUMA CPU affinity, and that means query-hotpluggable-cpus will
> > > start reporting a "node-id" property when a NUMA mapping is
> > > configured.
> > > 
> > > To allow query-hotpluggable-cpus to report "node-id", the CPU
> > > objects must have a "node-id" property that can be set. This
> > > patch adds the "node-id" property to X86CPU.
> > > 
> > > Is this description accurate? Is the presence of "node-id" in
> > > query-hotpluggable-cpus the only reason we really need this
> > > patch, or is there something else that requires the "node-id"
> > > property?  
> > That accurate description, node-id is in the same 'address'
> > properties category as socket/core/thread-id. So if you have
> > numa enabled machine you'd see node-id property in
> > query-hotpluggable-cpus.  
> 
> I agree that we can make -numa cpu affect query-hotpluggable-cpus
> output (i.e. affect some field on HotpluggableCPU.props).
> 
> But it looks like we disagree about the purpose of
> HotpluggableCPU.props:
> 
> I believe HotpluggableCPU.props is just an opaque identifier for
> the location we want to plug the CPU, and the only requirement is
> that it should be unique and have all the information device_add
> needs. As socket IDs are already unique on our existing machines,
> and socket<=>node mapping is already configured using -numa cpu,
> node-id doesn't need to be in HotpluggableCPU.props. (see example
> below)
node-id is also location property which logically complements
to socket/core/thread properties.  Also socket is not necessarily
unique id that maps 1:1 to node-id from generic pov.
BTW -numa cpu[s] is not the only way to specify mapping,
it could be specified like we do with pc-dimm:
   device_add pc-dimm,node=x

Looking at it more genericly, there could be the same
socket-ids for different nodes, then we would have to add
node-id to props anyway and end up with 2 node-id, one in props
and another in the parent struct.


> I don't think clients should assume topology information in
> HotpluggableCPU.props is always present, because the field has a
> different purpose: letting clients know what are the required
> device_add arguments. If we need introspection of CPU topology,
> we can add new fields to HotpluggableCPU, outside 'props'.
looking at existing clients (libvirt), it doesn't treat 'props'
as opaque set, but parses it into topology information (my guess
is that because it's the sole source such info from QEMU).
Actually we never forbade this, the only requirement for
props was that mgmt should provide those properties to create
a cpu. Property names where designed in topology/location
friendly terms so that clients could make the sense from them.

So I wouldn't try now to reduce meaning of 'props' to
opaque as you suggest.

[..]
> My question is: if we use:
> 
>  -numa cpu,socket=2,core=1,thread=0,node-id=3
> 
> and then:
> 
>  device_add ...,socket=2,core=1,thread=0
>  (omitting node-id on device_add)
> 
> won't it work exactly the same, and place the new CPU on NUMA
> node 3?
yep, it's allowed for compat reasons:
  1: to allow DST start with old CLI variant, that didn't have node-id
     (migration)
  2: to let old libvirt hotplug CPUs, it doesn't treat 'props'
     as opaque set that is just replayed to device_add,
     instead it composes command from topo info it got
     from QEMU and unfortunately node-id is only read but is
     not emitted when device_add is composed
    (I consider this bug but it's out in the wild so we have to deal with it)

we can't enforce presence in these cases or at least have to
keep it relaxed for old machine types.
 
> In this case, if we don't need a node-id argument on device_add,
> so node-id doesn't need to be in HotpluggableCPU.props.
I'd say we currently don't have to (for above reasons) but
it doesn't hurt and actually allows to use pc-dimm way of
mapping CPUs to nodes as David noted. i.e.:
  -device cpu-foo,node-id=x,...
without any of -numa cpu[s] options on CLI.
It's currently explicitly disabled but should work if one
doesn't care about hotplug or if target doesn't care about
mapping at startup (sPAPR), it also might work for x86 as
well using _PXM method in ACPI.
(But that's out of scope of this series and needs more
testing as some guest OSes might expect populated SRAT
to work correctly).

[...]
> > 
> > In future to avoid starting QEMU twice we were thinking about
> > configuring QEMU from QMP at runtime, that's where preconfigure
> > approach could be used to help solving it in the future:
> > 
> >   1. introduce pause before machine_init CLI option to allow
> >      preconfig machine from qmp/monitor
> >   2. make query-hotpluggable-cpus usable at preconfig time
> >   3. start qemu with needed number of numa nodes and default mapping:
> >          #qemu -smp ... -numa node,nodeid=0 -node node,nodeid=1
> >   4. get possible cpus list  
> 
> This is where things can get tricky: if we have the default
> mapping set, step 4 would return "node-id" already set on all
> possible CPUs.
that would depend on impl.
 - display node-id with default preset values to override
 - do not set defaults and force user to do mapping

> >   5. add qmp/monitor command variant for '-numa cpu' to set numa mapping  
> 
> This is where I think we would make things simpler: if node-id
> isn't present on 'props', we can simply document the arguments
> that identify the CPU for the numa-cpu command as "just use the
> properties you get on query-hotpluggable-cpus.props". Clients
> would be able to treat CpuInstanceProperties as an opaque CPU
> slot identifier.
> 
> i.e. I think this would be a better way to define and document
> the interface:
> 
> ##
> # @NumaCpuOptions:
> #
> # Mapping of a given CPU (or a set of CPUs) to a NUMA node.
> #
> # @cpu: Properties identifying the CPU(s). Use the 'props' field of
> #       query-hotpluggable-cpus for possible values for this
> #       field.
> #       TODO: describe what happens when 'cpu' matches
> #       multiple slots.
> # @node-id: NUMA node where the CPUs are going to be located.
> ##
> { 'struct': 'NumaCpuOptions',
>   'data': {
>    'cpu': 'CpuInstanceProperties',
>    'node-id': 'int' } }
> 
> This separates "what identifies the CPU slot(s) we are
> configuring" from "what identifies the node ID we are binding
> to".
Doesn't look any simpler to me, I'd better document node-id usage
in props, like

 #
 # A discriminated record of NUMA options. (for OptsVisitor)
 #
+# For 'cpu' type as arguments use a set of cpu properties returned
+# by query-hotpluggable-cpus[].props, where node-id could be used
+# to override default node mapping. Since: 2.10
+#
 # Since: 2.1
 ##
 { 'union': 'NumaOptions',
   'base': { 'type': 'NumaOptionsType' },
   'discriminator': 'type',
   'data': {
-    'node': 'NumaNodeOptions' }}
+    'node': 'NumaNodeOptions',
+    'cpu' : 'CpuInstanceProperties' }}
 
 ##
 # @NumaNodeOptions:


> In case we have trouble making this struct work with QemuOpts, we
> could do this (temporarily?):
> 
> ##
> # @NumaCpuOptions:
> #
> # Mapping of a given CPU (or a set of CPUs) to a NUMA node.
> #
> # @cpu: Properties identifying the CPU(s). Use the 'props' field of
> #       query-hotpluggable-cpus for possible values for this
> #       field.
> #       TODO: describe what happens when 'cpu' matches
> #       multiple slots.
> # @node-id: NUMA node where the CPUs are going to be located.
> #
> # @socket-id: Shortcut for cpu.socket-id, to make this struct
> #             friendly to QemuOpts.
> # @core-id: Shortcut for cpu.core-id, to make this struct
> #           friendly to QemuOpts.
> # @thread-id: Shortcut for cpu.thread-id, to make this struct
> #             friendly to QemuOpts.
> ##
> { 'struct': 'NumaCpuOptions',
>   'data': {
>    '*cpu': 'CpuInstanceProperties',
>    '*socket-id': 'int',
>    '*core-id': 'int',
>    '*thread-id': 'int',
>    'node-id': 'int' } }
> 
> >   6. optionally, set new numa mapping and get updated
> >      possible cpus list with query-hotpluggable-cpus
> >   7. optionally, add extra cpus with device_add using updated
> >      cpus list and get updated cpus list as it's been changed again.
> >   8. unpause preconfig stage and let qemu continue to execute
> >      machine_init and the rest.
> > 
> > Since we would need to implement QMP configuration for '-device cpu',
> > we as well might reuse it for custom numa mapping.
> > 
> >  [...]  
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 14/23] virt-arm: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu()
  2017-04-26 11:27       ` Andrew Jones
@ 2017-04-27 13:24         ` Igor Mammedov
  0 siblings, 0 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-04-27 13:24 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Peter Maydell, Eduardo Habkost, qemu-devel, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson

On Wed, 26 Apr 2017 13:27:52 +0200
Andrew Jones <drjones@redhat.com> wrote:

> On Wed, Apr 26, 2017 at 12:54:33PM +0200, Igor Mammedov wrote:
> > On Tue, 25 Apr 2017 19:06:34 +0200
> > Andrew Jones <drjones@redhat.com> wrote:
> >   
> > > On Wed, Mar 22, 2017 at 02:32:39PM +0100, Igor Mammedov wrote:  
> > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > ---
> > > >  hw/arm/virt-acpi-build.c | 19 +++++++------------
> > > >  hw/arm/virt.c            | 13 +++++++------
> > > >  2 files changed, 14 insertions(+), 18 deletions(-)
> > > > 
> > > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > > > index 0835e59..ce7499c 100644
> > > > --- a/hw/arm/virt-acpi-build.c
> > > > +++ b/hw/arm/virt-acpi-build.c
> > > > @@ -486,30 +486,25 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> > > >      AcpiSystemResourceAffinityTable *srat;
> > > >      AcpiSratProcessorGiccAffinity *core;
> > > >      AcpiSratMemoryAffinity *numamem;
> > > > -    int i, j, srat_start;
> > > > +    int i, srat_start;
> > > >      uint64_t mem_base;
> > > > -    uint32_t *cpu_node = g_malloc0(vms->smp_cpus * sizeof(uint32_t));
> > > > -
> > > > -    for (i = 0; i < vms->smp_cpus; i++) {
> > > > -        j = numa_get_node_for_cpu(i);
> > > > -        if (j < nb_numa_nodes) {
> > > > -                cpu_node[i] = j;
> > > > -        }
> > > > -    }
> > > > +    MachineClass *mc = MACHINE_GET_CLASS(vms);
> > > > +    const CPUArchIdList *cpu_list = mc->possible_cpu_arch_ids(MACHINE(vms));
> > > >  
> > > >      srat_start = table_data->len;
> > > >      srat = acpi_data_push(table_data, sizeof(*srat));
> > > >      srat->reserved1 = cpu_to_le32(1);
> > > >  
> > > > -    for (i = 0; i < vms->smp_cpus; ++i) {
> > > > +    for (i = 0; i < cpu_list->len; ++i) {
> > > > +        int node_id = cpu_list->cpus[i].props.has_node_id ?
> > > > +            cpu_list->cpus[i].props.node_id : 0;
> > > >          core = acpi_data_push(table_data, sizeof(*core));
> > > >          core->type = ACPI_SRAT_PROCESSOR_GICC;
> > > >          core->length = sizeof(*core);
> > > > -        core->proximity = cpu_to_le32(cpu_node[i]);
> > > > +        core->proximity = cpu_to_le32(node_id);
> > > >          core->acpi_processor_uid = cpu_to_le32(i);
> > > >          core->flags = cpu_to_le32(1);
> > > >      }
> > > > -    g_free(cpu_node);
> > > >  
> > > >      mem_base = vms->memmap[VIRT_MEM].base;
> > > >      for (i = 0; i < nb_numa_nodes; ++i) {
> > > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > > > index 68d44f3..0a75df5 100644
> > > > --- a/hw/arm/virt.c
> > > > +++ b/hw/arm/virt.c
> > > > @@ -338,7 +338,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
> > > >  {
> > > >      int cpu;
> > > >      int addr_cells = 1;
> > > > -    unsigned int i;
> > > > +    const MachineState *ms = MACHINE(vms);
> > > >  
> > > >      /*
> > > >       * From Documentation/devicetree/bindings/arm/cpus.txt
> > > > @@ -369,6 +369,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
> > > >      for (cpu = vms->smp_cpus - 1; cpu >= 0; cpu--) {
> > > >          char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
> > > >          ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
> > > > +        CPUState *cs = CPU(armcpu);
> > > >  
> > > >          qemu_fdt_add_subnode(vms->fdt, nodename);
> > > >          qemu_fdt_setprop_string(vms->fdt, nodename, "device_type", "cpu");
> > > > @@ -389,9 +390,9 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
> > > >                                    armcpu->mp_affinity);
> > > >          }
> > > >  
> > > > -        i = numa_get_node_for_cpu(cpu);
> > > > -        if (i < nb_numa_nodes) {
> > > > -            qemu_fdt_setprop_cell(vms->fdt, nodename, "numa-node-id", i);
> > > > +        if (ms->possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
> > > > +            qemu_fdt_setprop_cell(vms->fdt, nodename, "numa-node-id",
> > > > +                ms->possible_cpus->cpus[cs->cpu_index].props.node_id);
> > > >          }
> > > >  
> > > >          g_free(nodename);
> > > > @@ -1378,8 +1379,8 @@ static void machvirt_init(MachineState *machine)
> > > >          cs = CPU(cpuobj);
> > > >          cs->cpu_index = n;
> > > >  
> > > > -        node_id = numa_get_node_for_cpu(cs->cpu_index);
> > > > -        if (node_id == nb_numa_nodes) {
> > > > +        node_id = machine->possible_cpus->cpus[cs->cpu_index].props.node_id;
> > > > +        if (!machine->possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
> > > >              /* by default CPUState::numa_node was 0 if it's not set via CLI
> > > >               * keep it this way for now but in future we probably should
> > > >               * refuse to start up with incomplete numa mapping */
> > > > -- 
> > > > 2.7.4
> > > > 
> > > >    
> > > 
> > > We now have many machine->possible_cpus->cpus[index].props.[has_]node_id
> > > instances. I think we need inline accessors added to include/sysemu/numa.h
> > > like
> > > 
> > >  static inline bool numa_has_node_id(MachineState *ms, int index)
> > >  {
> > >    return ms->possible_cpus->cpus[index].props.has_node_id;
> > >  }
> > > 
> > >  static inline int numa_node_id(MachineState *ms, int index)
> > >  {
> > >    return ms->possible_cpus->cpus[index].props.node_id;
> > >  }
> > > 
> > >  ...
> > > 
> > > to improve readability and maintainability.  
> > I dislike this kind of one-line wrappers as it hurts readability
> > and maintainability of code for me as I'm forced to jump
> > around code every time I see such wrapper to recall what and
> > how it does. Code still fits in one line so I'd like to keep
> > it wrapper-less in this case if you don't insist on the change.  
> 
> I prefer to jump around in code (jump once, read many) to going
> blind looking for one or two char differences in 50 char long
> variable names. So, to whatever degree I can, I insist :-)
Ok, I'll convert it to wrapper

> 
> >   
> > > 
> > > Or, instead, we could provide macros to allow assignments, e.g.
> > > 
> > >  #define NUMA_HAS_NODE_ID(ms, index) \
> > >    ((ms)->possible_cpus->cpus[index].props.has_node_id)
> > >  #define NUMA_NODE_ID(ms, index) \
> > >    ((ms)->possible_cpus->cpus[index].props.node_id)  
> > ditto + worse debuggability   
> 
> I prefer the functions myself. I think the assignments to these
> properties are rare enough that we only need the read accessors.
> 
> Thanks,
> drew

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 07/23] pc: add node-id property to CPU
  2017-04-27 13:14         ` Igor Mammedov
@ 2017-04-27 16:32           ` Eduardo Habkost
  2017-04-27 17:25             ` Igor Mammedov
  2017-05-02  4:27             ` David Gibson
  0 siblings, 2 replies; 77+ messages in thread
From: Eduardo Habkost @ 2017-04-27 16:32 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Peter Maydell, Andrew Jones, qemu-arm, qemu-ppc,
	Shannon Zhao, Paolo Bonzini, David Gibson, Peter Krempa

On Thu, Apr 27, 2017 at 03:14:06PM +0200, Igor Mammedov wrote:
> On Wed, 26 Apr 2017 09:21:38 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> adding Peter to CC list
> 
> [...]
> 
> > On Wed, Apr 19, 2017 at 01:14:58PM +0200, Igor Mammedov wrote:
> > > On Wed, 12 Apr 2017 18:02:39 -0300
> > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > >   
> > > > On Wed, Mar 22, 2017 at 02:32:32PM +0100, Igor Mammedov wrote:  
> > > > > it will allow switching from cpu_index to property based
> > > > > numa mapping in follow up patches.    
> > > > 
> > > > I am not sure I understand all the consequences of this, so I
> > > > will give it a try:
> > > > 
> > > > "node-id" is an existing field in CpuInstanceProperties.
> > > > CpuInstanceProperties is used on both query-hotpluggable-cpus
> > > > output and in MachineState::possible_cpus.
> > > > 
> > > > We will start using MachineState::possible_cpus to keep track of
> > > > NUMA CPU affinity, and that means query-hotpluggable-cpus will
> > > > start reporting a "node-id" property when a NUMA mapping is
> > > > configured.
> > > > 
> > > > To allow query-hotpluggable-cpus to report "node-id", the CPU
> > > > objects must have a "node-id" property that can be set. This
> > > > patch adds the "node-id" property to X86CPU.
> > > > 
> > > > Is this description accurate? Is the presence of "node-id" in
> > > > query-hotpluggable-cpus the only reason we really need this
> > > > patch, or is there something else that requires the "node-id"
> > > > property?  
> > > That accurate description, node-id is in the same 'address'
> > > properties category as socket/core/thread-id. So if you have
> > > numa enabled machine you'd see node-id property in
> > > query-hotpluggable-cpus.  
> > 
> > I agree that we can make -numa cpu affect query-hotpluggable-cpus
> > output (i.e. affect some field on HotpluggableCPU.props).
> > 
> > But it looks like we disagree about the purpose of
> > HotpluggableCPU.props:
> > 
> > I believe HotpluggableCPU.props is just an opaque identifier for
> > the location we want to plug the CPU, and the only requirement is
> > that it should be unique and have all the information device_add
> > needs. As socket IDs are already unique on our existing machines,
> > and socket<=>node mapping is already configured using -numa cpu,
> > node-id doesn't need to be in HotpluggableCPU.props. (see example
> > below)
> node-id is also location property which logically complements
> to socket/core/thread properties.  Also socket is not necessarily
> unique id that maps 1:1 to node-id from generic pov.
> BTW -numa cpu[s] is not the only way to specify mapping,
> it could be specified like we do with pc-dimm:
>    device_add pc-dimm,node=x
> 
> Looking at it more genericly, there could be the same
> socket-ids for different nodes, then we would have to add
> node-id to props anyway and end up with 2 node-id, one in props
> and another in the parent struct.

This is where my expectations are different: I think
HotpluggableCPU.props is just an identifier property for CPU
slots that is used for device_add (and will be used for -numa
cpu), and isn't supposed to be be interpreted by clients.

The problem I see is that the property has two completely
different purposes: identifying a given CPU slot for device_add
(and -numa cpu), and introspection of topology information about
the CPU slot. Today we are lucky and those goals don't conflict
with each other, but I worry this might cause trouble in the
future.

> 
> 
> > I don't think clients should assume topology information in
> > HotpluggableCPU.props is always present, because the field has a
> > different purpose: letting clients know what are the required
> > device_add arguments. If we need introspection of CPU topology,
> > we can add new fields to HotpluggableCPU, outside 'props'.
> looking at existing clients (libvirt), it doesn't treat 'props'
> as opaque set, but parses it into topology information (my guess
> is that because it's the sole source such info from QEMU).
> Actually we never forbade this, the only requirement for
> props was that mgmt should provide those properties to create
> a cpu. Property names where designed in topology/location
> friendly terms so that clients could make the sense from them.
> 
> So I wouldn't try now to reduce meaning of 'props' to
> opaque as you suggest.

I see. This means my expectation is not met even today. I am not
thrilled about it, but that's OK.

> 
> [..]
> > My question is: if we use:
> > 
> >  -numa cpu,socket=2,core=1,thread=0,node-id=3
> > 
> > and then:
> > 
> >  device_add ...,socket=2,core=1,thread=0
> >  (omitting node-id on device_add)
> > 
> > won't it work exactly the same, and place the new CPU on NUMA
> > node 3?
> yep, it's allowed for compat reasons:
>   1: to allow DST start with old CLI variant, that didn't have node-id
>      (migration)
>   2: to let old libvirt hotplug CPUs, it doesn't treat 'props'
>      as opaque set that is just replayed to device_add,
>      instead it composes command from topo info it got
>      from QEMU and unfortunately node-id is only read but is
>      not emitted when device_add is composed
>     (I consider this bug but it's out in the wild so we have to deal with it)
> 
> we can't enforce presence in these cases or at least have to
> keep it relaxed for old machine types.

I see.

>  
> > In this case, if we don't need a node-id argument on device_add,
> > so node-id doesn't need to be in HotpluggableCPU.props.
> I'd say we currently don't have to (for above reasons) but
> it doesn't hurt and actually allows to use pc-dimm way of
> mapping CPUs to nodes as David noted. i.e.:
>   -device cpu-foo,node-id=x,...
> without any of -numa cpu[s] options on CLI.
> It's currently explicitly disabled but should work if one
> doesn't care about hotplug or if target doesn't care about
> mapping at startup (sPAPR), it also might work for x86 as
> well using _PXM method in ACPI.
> (But that's out of scope of this series and needs more
> testing as some guest OSes might expect populated SRAT
> to work correctly).

Yep. I understand that setting node-id is useful, I just didn't
expect it to be mandatory and included on HotpluggableCPU.props.

> 
> [...]
> > > 
> > > In future to avoid starting QEMU twice we were thinking about
> > > configuring QEMU from QMP at runtime, that's where preconfigure
> > > approach could be used to help solving it in the future:
> > > 
> > >   1. introduce pause before machine_init CLI option to allow
> > >      preconfig machine from qmp/monitor
> > >   2. make query-hotpluggable-cpus usable at preconfig time
> > >   3. start qemu with needed number of numa nodes and default mapping:
> > >          #qemu -smp ... -numa node,nodeid=0 -node node,nodeid=1
> > >   4. get possible cpus list  
> > 
> > This is where things can get tricky: if we have the default
> > mapping set, step 4 would return "node-id" already set on all
> > possible CPUs.
> that would depend on impl.
>  - display node-id with default preset values to override
>  - do not set defaults and force user to do mapping

Right. We could choose to initialize default values much later,
and leave it uninitialized.

> 
> > >   5. add qmp/monitor command variant for '-numa cpu' to set numa mapping  
> > 
> > This is where I think we would make things simpler: if node-id
> > isn't present on 'props', we can simply document the arguments
> > that identify the CPU for the numa-cpu command as "just use the
> > properties you get on query-hotpluggable-cpus.props". Clients
> > would be able to treat CpuInstanceProperties as an opaque CPU
> > slot identifier.
> > 
> > i.e. I think this would be a better way to define and document
> > the interface:
> > 
> > ##
> > # @NumaCpuOptions:
> > #
> > # Mapping of a given CPU (or a set of CPUs) to a NUMA node.
> > #
> > # @cpu: Properties identifying the CPU(s). Use the 'props' field of
> > #       query-hotpluggable-cpus for possible values for this
> > #       field.
> > #       TODO: describe what happens when 'cpu' matches
> > #       multiple slots.
> > # @node-id: NUMA node where the CPUs are going to be located.
> > ##
> > { 'struct': 'NumaCpuOptions',
> >   'data': {
> >    'cpu': 'CpuInstanceProperties',
> >    'node-id': 'int' } }
> > 
> > This separates "what identifies the CPU slot(s) we are
> > configuring" from "what identifies the node ID we are binding
> > to".
> Doesn't look any simpler to me, I'd better document node-id usage
> in props, like
> 

Well, it still looks simpler and more intuitive to me, but just
because it matches my initial expectations about the semantics of
query-hotpluggable-cpus and CpuInstanceProperties. If your
alternative is very clearly documented (like below), it is not a
problem to me.

>  #
>  # A discriminated record of NUMA options. (for OptsVisitor)
>  #
> +# For 'cpu' type as arguments use a set of cpu properties returned
> +# by query-hotpluggable-cpus[].props, where node-id could be used
> +# to override default node mapping. Since: 2.10
> +#
>  # Since: 2.1
>  ##
>  { 'union': 'NumaOptions',
>    'base': { 'type': 'NumaOptionsType' },
>    'discriminator': 'type',
>    'data': {
> -    'node': 'NumaNodeOptions' }}
> +    'node': 'NumaNodeOptions',
> +    'cpu' : 'CpuInstanceProperties' }}

I worry about not being able to add extra options to "-numa cpu"
in the future without affecting HotpluggableCPU.props too. Being
able to document the semantics of -numa cpu inside a dedicated
NumaCpuOptions struct would be nice too.

I believe this can be addressed by defing "NumaCpuOptions" with
"CpuInstanceProperties" as base:

 { 'union': 'NumaOptions',
   'base': { 'type': 'NumaOptionsType' },
   'discriminator': 'type',
   'data': {
     'node': 'NumaNodeOptions',
     'cpu' : 'NumaCpuOptions' }}

##
# Options for -numa cpu,...
#
# "-numa cpu" accepts the same set of cpu properties returned by
# query-hotpluggable-cpus[].props, where node-id could be used to
# override default node mapping.
#
# Since: 2.10
##
{ 'struct': 'NumaCpuOptions',
  'base': 'CpuInstanceProperties' }

>  


>  ##
>  # @NumaNodeOptions:
> 
> 
> > In case we have trouble making this struct work with QemuOpts, we
> > could do this (temporarily?):
> > 
> > ##
> > # @NumaCpuOptions:
> > #
> > # Mapping of a given CPU (or a set of CPUs) to a NUMA node.
> > #
> > # @cpu: Properties identifying the CPU(s). Use the 'props' field of
> > #       query-hotpluggable-cpus for possible values for this
> > #       field.
> > #       TODO: describe what happens when 'cpu' matches
> > #       multiple slots.
> > # @node-id: NUMA node where the CPUs are going to be located.
> > #
> > # @socket-id: Shortcut for cpu.socket-id, to make this struct
> > #             friendly to QemuOpts.
> > # @core-id: Shortcut for cpu.core-id, to make this struct
> > #           friendly to QemuOpts.
> > # @thread-id: Shortcut for cpu.thread-id, to make this struct
> > #             friendly to QemuOpts.
> > ##
> > { 'struct': 'NumaCpuOptions',
> >   'data': {
> >    '*cpu': 'CpuInstanceProperties',
> >    '*socket-id': 'int',
> >    '*core-id': 'int',
> >    '*thread-id': 'int',
> >    'node-id': 'int' } }
> > 
> > >   6. optionally, set new numa mapping and get updated
> > >      possible cpus list with query-hotpluggable-cpus
> > >   7. optionally, add extra cpus with device_add using updated
> > >      cpus list and get updated cpus list as it's been changed again.
> > >   8. unpause preconfig stage and let qemu continue to execute
> > >      machine_init and the rest.
> > > 
> > > Since we would need to implement QMP configuration for '-device cpu',
> > > we as well might reuse it for custom numa mapping.
> > > 
> > >  [...]  
> > 
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 07/23] pc: add node-id property to CPU
  2017-04-27 16:32           ` Eduardo Habkost
@ 2017-04-27 17:25             ` Igor Mammedov
  2017-04-27 17:32               ` Eduardo Habkost
  2017-05-02  4:27             ` David Gibson
  1 sibling, 1 reply; 77+ messages in thread
From: Igor Mammedov @ 2017-04-27 17:25 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Peter Maydell, Andrew Jones, Peter Krempa, qemu-devel, qemu-arm,
	qemu-ppc, Shannon Zhao, Paolo Bonzini, David Gibson

On Thu, 27 Apr 2017 13:32:25 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Thu, Apr 27, 2017 at 03:14:06PM +0200, Igor Mammedov wrote:
> > On Wed, 26 Apr 2017 09:21:38 -0300
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > 
> > adding Peter to CC list
> > 
> > [...]
> > 
> > > On Wed, Apr 19, 2017 at 01:14:58PM +0200, Igor Mammedov wrote:
> > > > On Wed, 12 Apr 2017 18:02:39 -0300
> > > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > >   
> > > > > On Wed, Mar 22, 2017 at 02:32:32PM +0100, Igor Mammedov wrote:  
> > > > > > it will allow switching from cpu_index to property based
> > > > > > numa mapping in follow up patches.    
> > > > > 
> > > > > I am not sure I understand all the consequences of this, so I
> > > > > will give it a try:
> > > > > 
> > > > > "node-id" is an existing field in CpuInstanceProperties.
> > > > > CpuInstanceProperties is used on both query-hotpluggable-cpus
> > > > > output and in MachineState::possible_cpus.
> > > > > 
> > > > > We will start using MachineState::possible_cpus to keep track of
> > > > > NUMA CPU affinity, and that means query-hotpluggable-cpus will
> > > > > start reporting a "node-id" property when a NUMA mapping is
> > > > > configured.
> > > > > 
> > > > > To allow query-hotpluggable-cpus to report "node-id", the CPU
> > > > > objects must have a "node-id" property that can be set. This
> > > > > patch adds the "node-id" property to X86CPU.
> > > > > 
> > > > > Is this description accurate? Is the presence of "node-id" in
> > > > > query-hotpluggable-cpus the only reason we really need this
> > > > > patch, or is there something else that requires the "node-id"
> > > > > property?  
> > > > That accurate description, node-id is in the same 'address'
> > > > properties category as socket/core/thread-id. So if you have
> > > > numa enabled machine you'd see node-id property in
> > > > query-hotpluggable-cpus.  
> > > 
> > > I agree that we can make -numa cpu affect query-hotpluggable-cpus
> > > output (i.e. affect some field on HotpluggableCPU.props).
> > > 
> > > But it looks like we disagree about the purpose of
> > > HotpluggableCPU.props:
> > > 
> > > I believe HotpluggableCPU.props is just an opaque identifier for
> > > the location we want to plug the CPU, and the only requirement is
> > > that it should be unique and have all the information device_add
> > > needs. As socket IDs are already unique on our existing machines,
> > > and socket<=>node mapping is already configured using -numa cpu,
> > > node-id doesn't need to be in HotpluggableCPU.props. (see example
> > > below)
> > node-id is also location property which logically complements
> > to socket/core/thread properties.  Also socket is not necessarily
> > unique id that maps 1:1 to node-id from generic pov.
> > BTW -numa cpu[s] is not the only way to specify mapping,
> > it could be specified like we do with pc-dimm:
> >    device_add pc-dimm,node=x
> > 
> > Looking at it more genericly, there could be the same
> > socket-ids for different nodes, then we would have to add
> > node-id to props anyway and end up with 2 node-id, one in props
> > and another in the parent struct.
> 
> This is where my expectations are different: I think
> HotpluggableCPU.props is just an identifier property for CPU
> slots that is used for device_add (and will be used for -numa
> cpu), and isn't supposed to be be interpreted by clients.
> 
> The problem I see is that the property has two completely
> different purposes: identifying a given CPU slot for device_add
> (and -numa cpu), and introspection of topology information about
> the CPU slot. Today we are lucky and those goals don't conflict
> with each other, but I worry this might cause trouble in the
> future.
> 
> > 
> > 
> > > I don't think clients should assume topology information in
> > > HotpluggableCPU.props is always present, because the field has a
> > > different purpose: letting clients know what are the required
> > > device_add arguments. If we need introspection of CPU topology,
> > > we can add new fields to HotpluggableCPU, outside 'props'.
> > looking at existing clients (libvirt), it doesn't treat 'props'
> > as opaque set, but parses it into topology information (my guess
> > is that because it's the sole source such info from QEMU).
> > Actually we never forbade this, the only requirement for
> > props was that mgmt should provide those properties to create
> > a cpu. Property names where designed in topology/location
> > friendly terms so that clients could make the sense from them.
> > 
> > So I wouldn't try now to reduce meaning of 'props' to
> > opaque as you suggest.
> 
> I see. This means my expectation is not met even today. I am not
> thrilled about it, but that's OK.
> 
> > 
> > [..]
> > > My question is: if we use:
> > > 
> > >  -numa cpu,socket=2,core=1,thread=0,node-id=3
> > > 
> > > and then:
> > > 
> > >  device_add ...,socket=2,core=1,thread=0
> > >  (omitting node-id on device_add)
> > > 
> > > won't it work exactly the same, and place the new CPU on NUMA
> > > node 3?
> > yep, it's allowed for compat reasons:
> >   1: to allow DST start with old CLI variant, that didn't have node-id
> >      (migration)
> >   2: to let old libvirt hotplug CPUs, it doesn't treat 'props'
> >      as opaque set that is just replayed to device_add,
> >      instead it composes command from topo info it got
> >      from QEMU and unfortunately node-id is only read but is
> >      not emitted when device_add is composed
> >     (I consider this bug but it's out in the wild so we have to deal with it)
> > 
> > we can't enforce presence in these cases or at least have to
> > keep it relaxed for old machine types.
> 
> I see.
> 
> >  
> > > In this case, if we don't need a node-id argument on device_add,
> > > so node-id doesn't need to be in HotpluggableCPU.props.
> > I'd say we currently don't have to (for above reasons) but
> > it doesn't hurt and actually allows to use pc-dimm way of
> > mapping CPUs to nodes as David noted. i.e.:
> >   -device cpu-foo,node-id=x,...
> > without any of -numa cpu[s] options on CLI.
> > It's currently explicitly disabled but should work if one
> > doesn't care about hotplug or if target doesn't care about
> > mapping at startup (sPAPR), it also might work for x86 as
> > well using _PXM method in ACPI.
> > (But that's out of scope of this series and needs more
> > testing as some guest OSes might expect populated SRAT
> > to work correctly).
> 
> Yep. I understand that setting node-id is useful, I just didn't
> expect it to be mandatory and included on HotpluggableCPU.props.
> 
> > 
> > [...]
> > > > 
> > > > In future to avoid starting QEMU twice we were thinking about
> > > > configuring QEMU from QMP at runtime, that's where preconfigure
> > > > approach could be used to help solving it in the future:
> > > > 
> > > >   1. introduce pause before machine_init CLI option to allow
> > > >      preconfig machine from qmp/monitor
> > > >   2. make query-hotpluggable-cpus usable at preconfig time
> > > >   3. start qemu with needed number of numa nodes and default mapping:
> > > >          #qemu -smp ... -numa node,nodeid=0 -node node,nodeid=1
> > > >   4. get possible cpus list  
> > > 
> > > This is where things can get tricky: if we have the default
> > > mapping set, step 4 would return "node-id" already set on all
> > > possible CPUs.
> > that would depend on impl.
> >  - display node-id with default preset values to override
> >  - do not set defaults and force user to do mapping
> 
> Right. We could choose to initialize default values much later,
> and leave it uninitialized.
> 
> > 
> > > >   5. add qmp/monitor command variant for '-numa cpu' to set numa mapping  
> > > 
> > > This is where I think we would make things simpler: if node-id
> > > isn't present on 'props', we can simply document the arguments
> > > that identify the CPU for the numa-cpu command as "just use the
> > > properties you get on query-hotpluggable-cpus.props". Clients
> > > would be able to treat CpuInstanceProperties as an opaque CPU
> > > slot identifier.
> > > 
> > > i.e. I think this would be a better way to define and document
> > > the interface:
> > > 
> > > ##
> > > # @NumaCpuOptions:
> > > #
> > > # Mapping of a given CPU (or a set of CPUs) to a NUMA node.
> > > #
> > > # @cpu: Properties identifying the CPU(s). Use the 'props' field of
> > > #       query-hotpluggable-cpus for possible values for this
> > > #       field.
> > > #       TODO: describe what happens when 'cpu' matches
> > > #       multiple slots.
> > > # @node-id: NUMA node where the CPUs are going to be located.
> > > ##
> > > { 'struct': 'NumaCpuOptions',
> > >   'data': {
> > >    'cpu': 'CpuInstanceProperties',
> > >    'node-id': 'int' } }
> > > 
> > > This separates "what identifies the CPU slot(s) we are
> > > configuring" from "what identifies the node ID we are binding
> > > to".
> > Doesn't look any simpler to me, I'd better document node-id usage
> > in props, like
> > 
> 
> Well, it still looks simpler and more intuitive to me, but just
> because it matches my initial expectations about the semantics of
> query-hotpluggable-cpus and CpuInstanceProperties. If your
> alternative is very clearly documented (like below), it is not a
> problem to me.
> 
> >  #
> >  # A discriminated record of NUMA options. (for OptsVisitor)
> >  #
> > +# For 'cpu' type as arguments use a set of cpu properties returned
> > +# by query-hotpluggable-cpus[].props, where node-id could be used
> > +# to override default node mapping. Since: 2.10
> > +#
> >  # Since: 2.1
> >  ##
> >  { 'union': 'NumaOptions',
> >    'base': { 'type': 'NumaOptionsType' },
> >    'discriminator': 'type',
> >    'data': {
> > -    'node': 'NumaNodeOptions' }}
> > +    'node': 'NumaNodeOptions',
> > +    'cpu' : 'CpuInstanceProperties' }}
> 
> I worry about not being able to add extra options to "-numa cpu"
> in the future without affecting HotpluggableCPU.props too. Being
> able to document the semantics of -numa cpu inside a dedicated
> NumaCpuOptions struct would be nice too.
> 
> I believe this can be addressed by defing "NumaCpuOptions" with
> "CpuInstanceProperties" as base:
> 
>  { 'union': 'NumaOptions',
>    'base': { 'type': 'NumaOptionsType' },
>    'discriminator': 'type',
>    'data': {
>      'node': 'NumaNodeOptions',
>      'cpu' : 'NumaCpuOptions' }}
> 
> ##
> # Options for -numa cpu,...
> #
> # "-numa cpu" accepts the same set of cpu properties returned by
> # query-hotpluggable-cpus[].props, where node-id could be used to
> # override default node mapping.
> #
> # Since: 2.10
> ##
> { 'struct': 'NumaCpuOptions',
>   'base': 'CpuInstanceProperties' }
is it inheritance or encapsulation?
if it's encapsulation, wouldn't look nice, but we can
duplicate fields from CpuInstanceProperties in NumaCpuOptions
like you proposed below and marshal them into CpuInstanceProperties
inside of parse_numa() where needed.

> 
> >  
> 
> 
> >  ##
> >  # @NumaNodeOptions:
> > 
> > 
> > > In case we have trouble making this struct work with QemuOpts, we
> > > could do this (temporarily?):
> > > 
> > > ##
> > > # @NumaCpuOptions:
> > > #
> > > # Mapping of a given CPU (or a set of CPUs) to a NUMA node.
> > > #
> > > # @cpu: Properties identifying the CPU(s). Use the 'props' field of
> > > #       query-hotpluggable-cpus for possible values for this
> > > #       field.
> > > #       TODO: describe what happens when 'cpu' matches
> > > #       multiple slots.
> > > # @node-id: NUMA node where the CPUs are going to be located.
> > > #
> > > # @socket-id: Shortcut for cpu.socket-id, to make this struct
> > > #             friendly to QemuOpts.
> > > # @core-id: Shortcut for cpu.core-id, to make this struct
> > > #           friendly to QemuOpts.
> > > # @thread-id: Shortcut for cpu.thread-id, to make this struct
> > > #             friendly to QemuOpts.
> > > ##
> > > { 'struct': 'NumaCpuOptions',
> > >   'data': {
> > >    '*cpu': 'CpuInstanceProperties',
> > >    '*socket-id': 'int',
> > >    '*core-id': 'int',
> > >    '*thread-id': 'int',
> > >    'node-id': 'int' } }
> > > 
> > > >   6. optionally, set new numa mapping and get updated
> > > >      possible cpus list with query-hotpluggable-cpus
> > > >   7. optionally, add extra cpus with device_add using updated
> > > >      cpus list and get updated cpus list as it's been changed again.
> > > >   8. unpause preconfig stage and let qemu continue to execute
> > > >      machine_init and the rest.
> > > > 
> > > > Since we would need to implement QMP configuration for '-device cpu',
> > > > we as well might reuse it for custom numa mapping.
> > > > 
> > > >  [...]  
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 07/23] pc: add node-id property to CPU
  2017-04-27 17:25             ` Igor Mammedov
@ 2017-04-27 17:32               ` Eduardo Habkost
  0 siblings, 0 replies; 77+ messages in thread
From: Eduardo Habkost @ 2017-04-27 17:32 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Peter Maydell, Andrew Jones, Peter Krempa, qemu-devel, qemu-arm,
	qemu-ppc, Shannon Zhao, Paolo Bonzini, David Gibson

On Thu, Apr 27, 2017 at 07:25:23PM +0200, Igor Mammedov wrote:
[...]
> > >  #
> > >  # A discriminated record of NUMA options. (for OptsVisitor)
> > >  #
> > > +# For 'cpu' type as arguments use a set of cpu properties returned
> > > +# by query-hotpluggable-cpus[].props, where node-id could be used
> > > +# to override default node mapping. Since: 2.10
> > > +#
> > >  # Since: 2.1
> > >  ##
> > >  { 'union': 'NumaOptions',
> > >    'base': { 'type': 'NumaOptionsType' },
> > >    'discriminator': 'type',
> > >    'data': {
> > > -    'node': 'NumaNodeOptions' }}
> > > +    'node': 'NumaNodeOptions',
> > > +    'cpu' : 'CpuInstanceProperties' }}
> > 
> > I worry about not being able to add extra options to "-numa cpu"
> > in the future without affecting HotpluggableCPU.props too. Being
> > able to document the semantics of -numa cpu inside a dedicated
> > NumaCpuOptions struct would be nice too.
> > 
> > I believe this can be addressed by defing "NumaCpuOptions" with
> > "CpuInstanceProperties" as base:
> > 
> >  { 'union': 'NumaOptions',
> >    'base': { 'type': 'NumaOptionsType' },
> >    'discriminator': 'type',
> >    'data': {
> >      'node': 'NumaNodeOptions',
> >      'cpu' : 'NumaCpuOptions' }}
> > 
> > ##
> > # Options for -numa cpu,...
> > #
> > # "-numa cpu" accepts the same set of cpu properties returned by
> > # query-hotpluggable-cpus[].props, where node-id could be used to
> > # override default node mapping.
> > #
> > # Since: 2.10
> > ##
> > { 'struct': 'NumaCpuOptions',
> >   'base': 'CpuInstanceProperties' }
> is it inheritance or encapsulation?

If I understood the docs correctly, it's inheritance. I didn't
test it, though.

> if it's encapsulation, wouldn't look nice, but we can
> duplicate fields from CpuInstanceProperties in NumaCpuOptions
> like you proposed below and marshal them into CpuInstanceProperties
> inside of parse_numa() where needed.

I think inheritance will work. But if it doesn't, I don't mind
either: we can duplicate the fields like you suggest, or use
CpuInstanceProperties directly like you did above.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 07/23] pc: add node-id property to CPU
  2017-04-27 16:32           ` Eduardo Habkost
  2017-04-27 17:25             ` Igor Mammedov
@ 2017-05-02  4:27             ` David Gibson
  2017-05-02  8:28               ` Igor Mammedov
  1 sibling, 1 reply; 77+ messages in thread
From: David Gibson @ 2017-05-02  4:27 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Igor Mammedov, qemu-devel, Peter Maydell, Andrew Jones, qemu-arm,
	qemu-ppc, Shannon Zhao, Paolo Bonzini, Peter Krempa

[-- Attachment #1: Type: text/plain, Size: 4499 bytes --]

On Thu, Apr 27, 2017 at 01:32:25PM -0300, Eduardo Habkost wrote:
> On Thu, Apr 27, 2017 at 03:14:06PM +0200, Igor Mammedov wrote:
> > On Wed, 26 Apr 2017 09:21:38 -0300
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > 
> > adding Peter to CC list
> > 
> > [...]
> > 
> > > On Wed, Apr 19, 2017 at 01:14:58PM +0200, Igor Mammedov wrote:
> > > > On Wed, 12 Apr 2017 18:02:39 -0300
> > > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > >   
> > > > > On Wed, Mar 22, 2017 at 02:32:32PM +0100, Igor Mammedov wrote:  
> > > > > > it will allow switching from cpu_index to property based
> > > > > > numa mapping in follow up patches.    
> > > > > 
> > > > > I am not sure I understand all the consequences of this, so I
> > > > > will give it a try:
> > > > > 
> > > > > "node-id" is an existing field in CpuInstanceProperties.
> > > > > CpuInstanceProperties is used on both query-hotpluggable-cpus
> > > > > output and in MachineState::possible_cpus.
> > > > > 
> > > > > We will start using MachineState::possible_cpus to keep track of
> > > > > NUMA CPU affinity, and that means query-hotpluggable-cpus will
> > > > > start reporting a "node-id" property when a NUMA mapping is
> > > > > configured.
> > > > > 
> > > > > To allow query-hotpluggable-cpus to report "node-id", the CPU
> > > > > objects must have a "node-id" property that can be set. This
> > > > > patch adds the "node-id" property to X86CPU.
> > > > > 
> > > > > Is this description accurate? Is the presence of "node-id" in
> > > > > query-hotpluggable-cpus the only reason we really need this
> > > > > patch, or is there something else that requires the "node-id"
> > > > > property?  
> > > > That accurate description, node-id is in the same 'address'
> > > > properties category as socket/core/thread-id. So if you have
> > > > numa enabled machine you'd see node-id property in
> > > > query-hotpluggable-cpus.  
> > > 
> > > I agree that we can make -numa cpu affect query-hotpluggable-cpus
> > > output (i.e. affect some field on HotpluggableCPU.props).
> > > 
> > > But it looks like we disagree about the purpose of
> > > HotpluggableCPU.props:
> > > 
> > > I believe HotpluggableCPU.props is just an opaque identifier for
> > > the location we want to plug the CPU, and the only requirement is
> > > that it should be unique and have all the information device_add
> > > needs. As socket IDs are already unique on our existing machines,
> > > and socket<=>node mapping is already configured using -numa cpu,
> > > node-id doesn't need to be in HotpluggableCPU.props. (see example
> > > below)
> > node-id is also location property which logically complements
> > to socket/core/thread properties.  Also socket is not necessarily
> > unique id that maps 1:1 to node-id from generic pov.
> > BTW -numa cpu[s] is not the only way to specify mapping,
> > it could be specified like we do with pc-dimm:
> >    device_add pc-dimm,node=x
> > 
> > Looking at it more genericly, there could be the same
> > socket-ids for different nodes, then we would have to add
> > node-id to props anyway and end up with 2 node-id, one in props
> > and another in the parent struct.
> 
> This is where my expectations are different: I think
> HotpluggableCPU.props is just an identifier property for CPU
> slots that is used for device_add (and will be used for -numa
> cpu), and isn't supposed to be be interpreted by clients.
> 
> The problem I see is that the property has two completely
> different purposes: identifying a given CPU slot for device_add
> (and -numa cpu), and introspection of topology information about
> the CPU slot. Today we are lucky and those goals don't conflict
> with each other, but I worry this might cause trouble in the
> future.

Yeah, I share your concern.  And even if we allow that the topology
information may be read by the user, at the moment the
socket/core/thread values are "read only" in the sense that the client
should do nothing by read them from the query (possibly look at them
for its own interest) and echo them back verbatim to device_add.

node id is different because it's something the user/management might
want to actually choose.  So it seems dubious to me that it's in the
same structure.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 07/23] pc: add node-id property to CPU
  2017-05-02  4:27             ` David Gibson
@ 2017-05-02  8:28               ` Igor Mammedov
  0 siblings, 0 replies; 77+ messages in thread
From: Igor Mammedov @ 2017-05-02  8:28 UTC (permalink / raw)
  To: David Gibson
  Cc: Eduardo Habkost, qemu-devel, Peter Maydell, Andrew Jones,
	qemu-arm, qemu-ppc, Shannon Zhao, Paolo Bonzini, Peter Krempa

On Tue, 2 May 2017 14:27:12 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Apr 27, 2017 at 01:32:25PM -0300, Eduardo Habkost wrote:
> > On Thu, Apr 27, 2017 at 03:14:06PM +0200, Igor Mammedov wrote:  
> > > On Wed, 26 Apr 2017 09:21:38 -0300
> > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > 
> > > adding Peter to CC list
> > > 
> > > [...]
> > >   
> > > > On Wed, Apr 19, 2017 at 01:14:58PM +0200, Igor Mammedov wrote:  
> > > > > On Wed, 12 Apr 2017 18:02:39 -0300
> > > > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > > >     
> > > > > > On Wed, Mar 22, 2017 at 02:32:32PM +0100, Igor Mammedov wrote:    
> > > > > > > it will allow switching from cpu_index to property based
> > > > > > > numa mapping in follow up patches.      
> > > > > > 
> > > > > > I am not sure I understand all the consequences of this, so I
> > > > > > will give it a try:
> > > > > > 
> > > > > > "node-id" is an existing field in CpuInstanceProperties.
> > > > > > CpuInstanceProperties is used on both query-hotpluggable-cpus
> > > > > > output and in MachineState::possible_cpus.
> > > > > > 
> > > > > > We will start using MachineState::possible_cpus to keep track of
> > > > > > NUMA CPU affinity, and that means query-hotpluggable-cpus will
> > > > > > start reporting a "node-id" property when a NUMA mapping is
> > > > > > configured.
> > > > > > 
> > > > > > To allow query-hotpluggable-cpus to report "node-id", the CPU
> > > > > > objects must have a "node-id" property that can be set. This
> > > > > > patch adds the "node-id" property to X86CPU.
> > > > > > 
> > > > > > Is this description accurate? Is the presence of "node-id" in
> > > > > > query-hotpluggable-cpus the only reason we really need this
> > > > > > patch, or is there something else that requires the "node-id"
> > > > > > property?    
> > > > > That accurate description, node-id is in the same 'address'
> > > > > properties category as socket/core/thread-id. So if you have
> > > > > numa enabled machine you'd see node-id property in
> > > > > query-hotpluggable-cpus.    
> > > > 
> > > > I agree that we can make -numa cpu affect query-hotpluggable-cpus
> > > > output (i.e. affect some field on HotpluggableCPU.props).
> > > > 
> > > > But it looks like we disagree about the purpose of
> > > > HotpluggableCPU.props:
> > > > 
> > > > I believe HotpluggableCPU.props is just an opaque identifier for
> > > > the location we want to plug the CPU, and the only requirement is
> > > > that it should be unique and have all the information device_add
> > > > needs. As socket IDs are already unique on our existing machines,
> > > > and socket<=>node mapping is already configured using -numa cpu,
> > > > node-id doesn't need to be in HotpluggableCPU.props. (see example
> > > > below)  
> > > node-id is also location property which logically complements
> > > to socket/core/thread properties.  Also socket is not necessarily
> > > unique id that maps 1:1 to node-id from generic pov.
> > > BTW -numa cpu[s] is not the only way to specify mapping,
> > > it could be specified like we do with pc-dimm:
> > >    device_add pc-dimm,node=x
> > > 
> > > Looking at it more genericly, there could be the same
> > > socket-ids for different nodes, then we would have to add
> > > node-id to props anyway and end up with 2 node-id, one in props
> > > and another in the parent struct.  
> > 
> > This is where my expectations are different: I think
> > HotpluggableCPU.props is just an identifier property for CPU
> > slots that is used for device_add (and will be used for -numa
> > cpu), and isn't supposed to be be interpreted by clients.
> > 
> > The problem I see is that the property has two completely
> > different purposes: identifying a given CPU slot for device_add
> > (and -numa cpu), and introspection of topology information about
> > the CPU slot. Today we are lucky and those goals don't conflict
> > with each other, but I worry this might cause trouble in the
> > future.  
> 
> Yeah, I share your concern.  And even if we allow that the topology
> information may be read by the user, at the moment the
> socket/core/thread values are "read only" in the sense that the client
> should do nothing by read them from the query (possibly look at them
> for its own interest) and echo them back verbatim to device_add.
> 
> node id is different because it's something the user/management might
> want to actually choose.  So it seems dubious to me that it's in the
> same structure.
node-id is 'read only' when it comes to device_add so far.

^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2017-05-02  8:28 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 01/23] tests: add CPUs to numa node mapping test Igor Mammedov
2017-03-27  0:31   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 02/23] hw/arm/virt: extract mp-affinity calculation in separate function Igor Mammedov
2017-04-25 14:09   ` Andrew Jones
2017-04-25 14:39     ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 03/23] hw/arm/virt: use machine->possible_cpus for storing possible topology info Igor Mammedov
2017-04-25 14:28   ` Andrew Jones
2017-04-25 14:36     ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 04/23] hw/arm/virt: explicitly allocate cpu_index for cpus Igor Mammedov
2017-04-25 14:33   ` Andrew Jones
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards Igor Mammedov
2017-03-23  6:10   ` Bharata B Rao
2017-03-23  8:48     ` Igor Mammedov
2017-03-28  4:19   ` David Gibson
2017-03-28 10:53     ` Igor Mammedov
2017-03-29  2:24       ` David Gibson
2017-03-29 11:48         ` Igor Mammedov
2017-04-20 14:29     ` Igor Mammedov
2017-04-25 14:48   ` Andrew Jones
2017-04-25 15:07     ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 06/23] spapr: add node-id property to sPAPR core Igor Mammedov
2017-03-28  4:23   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 07/23] pc: add node-id property to CPU Igor Mammedov
2017-04-12 21:02   ` Eduardo Habkost
2017-04-19 11:14     ` Igor Mammedov
2017-04-26 12:21       ` Eduardo Habkost
2017-04-27 13:14         ` Igor Mammedov
2017-04-27 16:32           ` Eduardo Habkost
2017-04-27 17:25             ` Igor Mammedov
2017-04-27 17:32               ` Eduardo Habkost
2017-05-02  4:27             ` David Gibson
2017-05-02  8:28               ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 08/23] virt-arm: " Igor Mammedov
2017-04-25 17:16   ` Andrew Jones
2017-04-26 10:47     ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 09/23] numa: add check that board supports cpu_index to node mapping Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 10/23] numa: mirror cpu to node mapping in MachineState::possible_cpus Igor Mammedov
2017-03-28  4:44   ` David Gibson
2017-04-12 21:15   ` Eduardo Habkost
2017-04-19  9:52     ` Igor Mammedov
2017-04-26 11:04       ` Eduardo Habkost
2017-04-13 13:58   ` Eduardo Habkost
2017-04-19  9:31     ` Igor Mammedov
2017-04-26 11:02       ` Eduardo Habkost
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 11/23] numa: do default mapping based on possible_cpus instead of node_cpu bitmaps Igor Mammedov
2017-03-28  4:46   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 12/23] pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 13/23] spapr: " Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 14/23] virt-arm: " Igor Mammedov
2017-04-25 17:06   ` Andrew Jones
2017-04-26 10:54     ` Igor Mammedov
2017-04-26 11:27       ` Andrew Jones
2017-04-27 13:24         ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 15/23] QMP: include CpuInstanceProperties into query_cpus output output Igor Mammedov
2017-03-23 13:19   ` Eric Blake
2017-03-24 12:20     ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 16/23] tests: numa: add case for QMP command query-cpus Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 17/23] numa: remove no longer used numa_get_node_for_cpu() Igor Mammedov
2017-03-28  4:54   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 18/23] numa: remove no longer need numa_post_machine_init() Igor Mammedov
2017-03-28  4:55   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 19/23] machine: call machine init from wrapper Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 20/23] numa: use possible_cpus for not mapped CPUs check Igor Mammedov
2017-03-28  5:13   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 21/23] numa: remove node_cpu bitmaps as they are no longer used Igor Mammedov
2017-03-28  5:13   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 22/23] numa: add '-numa cpu, ...' option for property based node mapping Igor Mammedov
2017-03-23 13:23   ` Eric Blake
2017-03-24 13:29     ` Igor Mammedov
2017-03-28  5:16   ` David Gibson
2017-03-28 11:09     ` Igor Mammedov
2017-03-29  2:27       ` David Gibson
2017-03-29 12:08         ` Igor Mammedov
2017-04-03  4:40           ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 23/23] tests: check -numa node, cpu=props_list usecase Igor Mammedov
2017-04-12 20:18 ` [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Eduardo Habkost

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.