All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/5] numa: code consolidation and fixes
@ 2017-05-23 14:38 Igor Mammedov
  2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 1/5] numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr Igor Mammedov
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Igor Mammedov @ 2017-05-23 14:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-arm, qemu-ppc, Eduardo Habkost, David Gibson, Andrew Jones

changelog since v1:
  (Eduardo)
     - user error_abort in numa_cpu_pre_plug()
     - make default_mapping boolean
     - redo default mapping detection loop as a combo of for/if
     - return back lost TODO comment
     - new patch removing numa_node from generic CPUState
  - drop silence test patch as it's already in pull req on list
  - new patch [3/5] to fix missing _PXM/fdt nodes for implicitly mapped CPUs
  - new patch dropping fallback to node 0


git repo for testing:
   https://github.com/imammedo/qemu.git cphp_numa_cfg_follow_up_v3_cleanups_v2

CC: qemu-arm@nongnu.org
CC: qemu-ppc@nongnu.org
CC: Eduardo Habkost <ehabkost@redhat.com>
CC: David Gibson <david@gibson.dropbear.id.au>
CC: Andrew Jones <drjones@redhat.com>

Igor Mammedov (5):
  numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr
  numa: move default mapping init to machine
  numa: make sure that all cpus in has has_node_id set if numa is
    enabled
  numa: fallback to default NUMA node instead of node 0
  numa: move numa_node from CPUState into target specific classes

 include/qom/cpu.h        |  2 --
 include/sysemu/numa.h    |  1 +
 target/arm/cpu.h         |  2 ++
 target/i386/cpu.h        |  1 +
 target/ppc/cpu.h         |  1 +
 hw/arm/virt-acpi-build.c |  4 +---
 hw/arm/virt.c            | 16 ++--------------
 hw/core/machine.c        | 34 ++++++++++++++++++++++++----------
 hw/i386/acpi-build.c     |  3 +--
 hw/i386/pc.c             | 21 ++-------------------
 hw/ppc/spapr.c           | 41 ++++++++++++-----------------------------
 hw/ppc/spapr_cpu_core.c  |  4 +++-
 monitor.c                | 11 +++++++----
 numa.c                   | 43 +++++++++++++++++--------------------------
 target/arm/cpu.c         |  2 +-
 target/i386/cpu.c        |  2 +-
 16 files changed, 76 insertions(+), 112 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 1/5] numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr
  2017-05-23 14:38 [Qemu-devel] [PATCH v2 0/5] numa: code consolidation and fixes Igor Mammedov
@ 2017-05-23 14:38 ` Igor Mammedov
  2017-05-26 15:01   ` Eduardo Habkost
  2017-05-26 15:33   ` Eduardo Habkost
  2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 2/5] numa: move default mapping init to machine Igor Mammedov
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 19+ messages in thread
From: Igor Mammedov @ 2017-05-23 14:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-arm, qemu-ppc, Eduardo Habkost, David Gibson, Andrew Jones

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
v2:
  user error_abort in numa_cpu_pre_plug()
     Eduardo Habkost <ehabkost@redhat.com>
---
 include/sysemu/numa.h |  1 +
 hw/arm/virt.c         | 16 ++--------------
 hw/i386/pc.c          | 17 +----------------
 hw/ppc/spapr.c        | 17 +----------------
 numa.c                | 22 ++++++++++++++++++++++
 5 files changed, 27 insertions(+), 46 deletions(-)

diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 7ffde5b..610eece 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -35,4 +35,5 @@ void numa_legacy_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
                                  int nb_nodes, ram_addr_t size);
 void numa_default_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
                                   int nb_nodes, ram_addr_t size);
+void numa_cpu_pre_plug(const CPUArchId *slot, DeviceState *dev, Error **errp);
 #endif
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index c7c8159..ce676df 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1351,7 +1351,6 @@ static void machvirt_init(MachineState *machine)
     for (n = 0; n < possible_cpus->len; n++) {
         Object *cpuobj;
         CPUState *cs;
-        int node_id;
 
         if (n >= smp_cpus) {
             break;
@@ -1364,19 +1363,8 @@ static void machvirt_init(MachineState *machine)
         cs = CPU(cpuobj);
         cs->cpu_index = n;
 
-        node_id = possible_cpus->cpus[cs->cpu_index].props.node_id;
-        if (!possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
-            /* by default CPUState::numa_node was 0 if it's not set via CLI
-             * keep it this way for now but in future we probably should
-             * refuse to start up with incomplete numa mapping */
-             node_id = 0;
-        }
-        if (cs->numa_node == CPU_UNSET_NUMA_NODE_ID) {
-            cs->numa_node = node_id;
-        } else {
-            /* CPU isn't device_add compatible yet, this shouldn't happen */
-            error_setg(&error_abort, "user set node-id not implemented");
-        }
+        numa_cpu_pre_plug(&possible_cpus->cpus[cs->cpu_index], DEVICE(cpuobj),
+                          &error_fatal);
 
         if (!vms->secure) {
             object_property_set_bool(cpuobj, false, "has_el3", NULL);
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 816bfa8..cf09949 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1893,7 +1893,6 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
                             DeviceState *dev, Error **errp)
 {
     int idx;
-    int node_id;
     CPUState *cs;
     CPUArchId *cpu_slot;
     X86CPUTopoInfo topo;
@@ -1984,21 +1983,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
     cs = CPU(cpu);
     cs->cpu_index = idx;
 
-    node_id = cpu_slot->props.node_id;
-    if (!cpu_slot->props.has_node_id) {
-        /* by default CPUState::numa_node was 0 if it's not set via CLI
-         * keep it this way for now but in future we probably should
-         * refuse to start up with incomplete numa mapping */
-        node_id = 0;
-    }
-    if (cs->numa_node == CPU_UNSET_NUMA_NODE_ID) {
-        cs->numa_node = node_id;
-    } else if (cs->numa_node != node_id) {
-            error_setg(errp, "node-id %d must match numa node specified"
-                "with -numa option for cpu-index %d",
-                cs->numa_node, cs->cpu_index);
-            return;
-    }
+    numa_cpu_pre_plug(cpu_slot, dev, errp);
 }
 
 static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0980d73..c7fee8b 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2831,11 +2831,9 @@ static void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
     MachineClass *mc = MACHINE_GET_CLASS(hotplug_dev);
     Error *local_err = NULL;
     CPUCore *cc = CPU_CORE(dev);
-    sPAPRCPUCore *sc = SPAPR_CPU_CORE(dev);
     char *base_core_type = spapr_get_cpu_core_type(machine->cpu_model);
     const char *type = object_get_typename(OBJECT(dev));
     CPUArchId *core_slot;
-    int node_id;
     int index;
 
     if (dev->hotplugged && !mc->has_hotpluggable_cpus) {
@@ -2870,20 +2868,7 @@ static void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
         goto out;
     }
 
-    node_id = core_slot->props.node_id;
-    if (!core_slot->props.has_node_id) {
-        /* by default CPUState::numa_node was 0 if it's not set via CLI
-         * keep it this way for now but in future we probably should
-         * refuse to start up with incomplete numa mapping */
-        node_id = 0;
-    }
-    if (sc->node_id == CPU_UNSET_NUMA_NODE_ID) {
-        sc->node_id = node_id;
-    } else if (sc->node_id != node_id) {
-        error_setg(&local_err, "node-id %d must match numa node specified"
-            "with -numa option for cpu-index %d", sc->node_id, cc->core_id);
-        goto out;
-    }
+    numa_cpu_pre_plug(core_slot, dev, &local_err);
 
 out:
     g_free(base_core_type);
diff --git a/numa.c b/numa.c
index ca73145..723f5f8 100644
--- a/numa.c
+++ b/numa.c
@@ -534,6 +534,28 @@ void parse_numa_opts(MachineState *ms)
     }
 }
 
+void numa_cpu_pre_plug(const CPUArchId *slot, DeviceState *dev, Error **errp)
+{
+    int node_id = object_property_get_int(OBJECT(dev), "node-id", &error_abort);
+
+    if (node_id == CPU_UNSET_NUMA_NODE_ID) {
+        /* by default CPUState::numa_node was 0 if it wasn't set explicitly
+         * TODO: make it error when incomplete numa mapping support is removed
+         */
+        node_id = 0;
+
+        /* due to bug in libvirt, it doesn't pass node-id from props on
+         * device_add as expected, so we have to fix it up here */
+        if (slot->props.has_node_id) {
+            node_id = slot->props.node_id;
+        }
+        object_property_set_int(OBJECT(dev), node_id, "node-id", errp);
+    } else if (node_id != slot->props.node_id) {
+        error_setg(errp, "node-id=%d must match numa node specified "
+                   "with -numa option", node_id);
+    }
+}
+
 static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
                                            const char *name,
                                            uint64_t ram_size)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 2/5] numa: move default mapping init to machine
  2017-05-23 14:38 [Qemu-devel] [PATCH v2 0/5] numa: code consolidation and fixes Igor Mammedov
  2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 1/5] numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr Igor Mammedov
@ 2017-05-23 14:38 ` Igor Mammedov
  2017-05-26 15:02   ` Eduardo Habkost
  2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 3/5] numa: make sure that all cpus in has has_node_id set if numa is enabled Igor Mammedov
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Igor Mammedov @ 2017-05-23 14:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-arm, qemu-ppc, Eduardo Habkost, David Gibson, Andrew Jones

there is no need use cpu_index_to_instance_props() for setting
default cpu -> node mapping. Generic machine code can do it
without cpu_index by just enabling already preset defaults
in possible_cpus.

PS:
as bonus it makes one less user of cpu_index_to_instance_props()

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v2:
  - make default_mapping boolean, Eduardo Habkost <ehabkost@redhat.com>
  - redo default mapping detection loop as a combo of
    for/if, Eduardo Habkost <ehabkost@redhat.com>
  - return back lost TODO comment, Eduardo Habkost <ehabkost@redhat.com>
---
 hw/core/machine.c | 33 +++++++++++++++++++++++----------
 numa.c            | 26 --------------------------
 2 files changed, 23 insertions(+), 36 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index fd6a436..aaf3cff 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -700,26 +700,39 @@ static char *cpu_slot_to_string(const CPUArchId *cpu)
     return g_string_free(s, false);
 }
 
-static void machine_numa_validate(MachineState *machine)
+static void machine_numa_finish_init(MachineState *machine)
 {
     int i;
+    bool default_mapping;
     GString *s = g_string_new(NULL);
     MachineClass *mc = MACHINE_GET_CLASS(machine);
     const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(machine);
 
     assert(nb_numa_nodes);
     for (i = 0; i < possible_cpus->len; i++) {
+        if (possible_cpus->cpus[i].props.has_node_id) {
+            break;
+        }
+    }
+    default_mapping = (i == possible_cpus->len);
+
+    for (i = 0; i < possible_cpus->len; i++) {
         const CPUArchId *cpu_slot = &possible_cpus->cpus[i];
 
-        /* at this point numa mappings are initilized by CLI options
-         * or with default mappings so it's sufficient to list
-         * all not yet mapped CPUs here */
-        /* TODO: make it hard error in future */
         if (!cpu_slot->props.has_node_id) {
-            char *cpu_str = cpu_slot_to_string(cpu_slot);
-            g_string_append_printf(s, "%sCPU %d [%s]", s->len ? ", " : "", i,
-                                   cpu_str);
-            g_free(cpu_str);
+            if (default_mapping) {
+                /* fetch default mapping from board and enable it */
+                CpuInstanceProperties props = cpu_slot->props;
+                props.has_node_id = true;
+                machine_set_cpu_numa_node(machine, &props, &error_fatal);
+            } else {
+                /* record slots with not set mapping,
+                 * TODO: make it hard error in future */
+                char *cpu_str = cpu_slot_to_string(cpu_slot);
+                g_string_append_printf(s, "%sCPU %d [%s]",
+                                       s->len ? ", " : "", i, cpu_str);
+                g_free(cpu_str);
+            }
         }
     }
     if (s->len) {
@@ -737,7 +750,7 @@ void machine_run_board_init(MachineState *machine)
     MachineClass *machine_class = MACHINE_GET_CLASS(machine);
 
     if (nb_numa_nodes) {
-        machine_numa_validate(machine);
+        machine_numa_finish_init(machine);
     }
     machine_class->init(machine);
 }
diff --git a/numa.c b/numa.c
index 723f5f8..e30702e 100644
--- a/numa.c
+++ b/numa.c
@@ -427,7 +427,6 @@ void numa_default_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
 void parse_numa_opts(MachineState *ms)
 {
     int i;
-    const CPUArchIdList *possible_cpus;
     MachineClass *mc = MACHINE_GET_CLASS(ms);
 
     if (qemu_opts_foreach(qemu_find_opts("numa"), parse_numa, ms, NULL)) {
@@ -485,31 +484,6 @@ void parse_numa_opts(MachineState *ms)
 
         numa_set_mem_ranges();
 
-        /* assign CPUs to nodes using board provided default mapping */
-        if (!mc->cpu_index_to_instance_props || !mc->possible_cpu_arch_ids) {
-            error_report("default CPUs to NUMA node mapping isn't supported");
-            exit(1);
-        }
-
-        possible_cpus = mc->possible_cpu_arch_ids(ms);
-        for (i = 0; i < possible_cpus->len; i++) {
-            if (possible_cpus->cpus[i].props.has_node_id) {
-                break;
-            }
-        }
-
-        /* no CPUs are assigned to NUMA nodes */
-        if (i == possible_cpus->len) {
-            for (i = 0; i < max_cpus; i++) {
-                CpuInstanceProperties props;
-                /* fetch default mapping from board and enable it */
-                props = mc->cpu_index_to_instance_props(ms, i);
-                props.has_node_id = true;
-
-                machine_set_cpu_numa_node(ms, &props, &error_fatal);
-            }
-        }
-
         /* QEMU needs at least all unique node pair distances to build
          * the whole NUMA distance table. QEMU treats the distance table
          * as symmetric by default, i.e. distance A->B == distance B->A.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 3/5] numa: make sure that all cpus in has has_node_id set if numa is enabled
  2017-05-23 14:38 [Qemu-devel] [PATCH v2 0/5] numa: code consolidation and fixes Igor Mammedov
  2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 1/5] numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr Igor Mammedov
  2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 2/5] numa: move default mapping init to machine Igor Mammedov
@ 2017-05-23 14:38 ` Igor Mammedov
  2017-05-26 16:06   ` Eduardo Habkost
  2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 4/5] numa: fallback to default NUMA node instead of node 0 Igor Mammedov
  2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 5/5] numa: move numa_node from CPUState into target specific classes Igor Mammedov
  4 siblings, 1 reply; 19+ messages in thread
From: Igor Mammedov @ 2017-05-23 14:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-arm, qemu-ppc, Eduardo Habkost, David Gibson, Andrew Jones

It fixes/add missing _PXM object for non mapped CPU (x86)
and missing fdt node (virt-arm).

It ensures that possible_cpus contains complete mapping if
numa is enabled by the time machine_init() is executed.

As result non completely mapped CPUs:
 1) appear in ACPI/fdt blobs
 2) QMP query-hotpluggable-cpus command shows bound nodes for such CPUs
 3) allows to drop checks for has_node_id in numa only code,
   reducing number of invariants incomplete mapping could produce
 4) moves fixup/implicit node init from runtime numa_cpu_pre_plug()
   (when CPU object is created) to machine_numa_finish_init() which
   helps to fix [1, 2] and make possible_cpus complete source
   of numa mapping available even before CPUs are created.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/arm/virt-acpi-build.c |  4 +---
 hw/core/machine.c        | 16 ++++++++++------
 hw/i386/acpi-build.c     |  3 +--
 hw/i386/pc.c             |  4 +---
 numa.c                   |  7 +------
 5 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index e585206..977a794 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -496,12 +496,10 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     srat->reserved1 = cpu_to_le32(1);
 
     for (i = 0; i < cpu_list->len; ++i) {
-        int node_id = cpu_list->cpus[i].props.has_node_id ?
-            cpu_list->cpus[i].props.node_id : 0;
         core = acpi_data_push(table_data, sizeof(*core));
         core->type = ACPI_SRAT_PROCESSOR_GICC;
         core->length = sizeof(*core);
-        core->proximity = cpu_to_le32(node_id);
+        core->proximity = cpu_to_le32(cpu_list->cpus[i].props.node_id);
         core->acpi_processor_uid = cpu_to_le32(i);
         core->flags = cpu_to_le32(1);
     }
diff --git a/hw/core/machine.c b/hw/core/machine.c
index aaf3cff..964b75d 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -720,19 +720,23 @@ static void machine_numa_finish_init(MachineState *machine)
         const CPUArchId *cpu_slot = &possible_cpus->cpus[i];
 
         if (!cpu_slot->props.has_node_id) {
-            if (default_mapping) {
-                /* fetch default mapping from board and enable it */
-                CpuInstanceProperties props = cpu_slot->props;
-                props.has_node_id = true;
-                machine_set_cpu_numa_node(machine, &props, &error_fatal);
-            } else {
+            /* fetch default mapping from board and enable it */
+            CpuInstanceProperties props = cpu_slot->props;
+
+            if (!default_mapping) {
                 /* record slots with not set mapping,
                  * TODO: make it hard error in future */
                 char *cpu_str = cpu_slot_to_string(cpu_slot);
                 g_string_append_printf(s, "%sCPU %d [%s]",
                                        s->len ? ", " : "", i, cpu_str);
                 g_free(cpu_str);
+
+                /* non mapped cpus used to fallback to node 0 */
+                props.node_id = 0;
             }
+
+            props.has_node_id = true;
+            machine_set_cpu_numa_node(machine, &props, &error_fatal);
         }
     }
     if (s->len) {
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index afcadac..873880d 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2335,8 +2335,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
     srat->reserved1 = cpu_to_le32(1);
 
     for (i = 0; i < apic_ids->len; i++) {
-        int node_id = apic_ids->cpus[i].props.has_node_id ?
-            apic_ids->cpus[i].props.node_id : 0;
+        int node_id = apic_ids->cpus[i].props.node_id;
         uint32_t apic_id = apic_ids->cpus[i].arch_id;
 
         if (apic_id < 255) {
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index cf09949..84f0a6f 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -788,9 +788,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
     for (i = 0; i < cpus->len; i++) {
         unsigned int apic_id = cpus->cpus[i].arch_id;
         assert(apic_id < pcms->apic_id_limit);
-        if (cpus->cpus[i].props.has_node_id) {
-            numa_fw_cfg[apic_id + 1] = cpu_to_le64(cpus->cpus[i].props.node_id);
-        }
+        numa_fw_cfg[apic_id + 1] = cpu_to_le64(cpus->cpus[i].props.node_id);
     }
     for (i = 0; i < nb_numa_nodes; i++) {
         numa_fw_cfg[pcms->apic_id_limit + 1 + i] =
diff --git a/numa.c b/numa.c
index e30702e..4ef6dea 100644
--- a/numa.c
+++ b/numa.c
@@ -513,17 +513,12 @@ void numa_cpu_pre_plug(const CPUArchId *slot, DeviceState *dev, Error **errp)
     int node_id = object_property_get_int(OBJECT(dev), "node-id", &error_abort);
 
     if (node_id == CPU_UNSET_NUMA_NODE_ID) {
-        /* by default CPUState::numa_node was 0 if it wasn't set explicitly
-         * TODO: make it error when incomplete numa mapping support is removed
-         */
-        node_id = 0;
-
         /* due to bug in libvirt, it doesn't pass node-id from props on
          * device_add as expected, so we have to fix it up here */
         if (slot->props.has_node_id) {
             node_id = slot->props.node_id;
+            object_property_set_int(OBJECT(dev), node_id, "node-id", errp);
         }
-        object_property_set_int(OBJECT(dev), node_id, "node-id", errp);
     } else if (node_id != slot->props.node_id) {
         error_setg(errp, "node-id=%d must match numa node specified "
                    "with -numa option", node_id);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 4/5] numa: fallback to default NUMA node instead of node 0
  2017-05-23 14:38 [Qemu-devel] [PATCH v2 0/5] numa: code consolidation and fixes Igor Mammedov
                   ` (2 preceding siblings ...)
  2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 3/5] numa: make sure that all cpus in has has_node_id set if numa is enabled Igor Mammedov
@ 2017-05-23 14:38 ` Igor Mammedov
  2017-05-23 14:48   ` Eduardo Habkost
  2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 5/5] numa: move numa_node from CPUState into target specific classes Igor Mammedov
  4 siblings, 1 reply; 19+ messages in thread
From: Igor Mammedov @ 2017-05-23 14:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-arm, qemu-ppc, Eduardo Habkost, David Gibson, Andrew Jones

Do the same as we did in commit
  (57924bcd8 numa: introduce machine callback for VCPU to node mapping)
but only for incomplete mapping usecase, falling back to board
provided default cpu to node mapping if user hasn't provided
mapping for CPU explicitly.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/core/machine.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 964b75d..b8df15f 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -730,9 +730,6 @@ static void machine_numa_finish_init(MachineState *machine)
                 g_string_append_printf(s, "%sCPU %d [%s]",
                                        s->len ? ", " : "", i, cpu_str);
                 g_free(cpu_str);
-
-                /* non mapped cpus used to fallback to node 0 */
-                props.node_id = 0;
             }
 
             props.has_node_id = true;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 5/5] numa: move numa_node from CPUState into target specific classes
  2017-05-23 14:38 [Qemu-devel] [PATCH v2 0/5] numa: code consolidation and fixes Igor Mammedov
                   ` (3 preceding siblings ...)
  2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 4/5] numa: fallback to default NUMA node instead of node 0 Igor Mammedov
@ 2017-05-23 14:38 ` Igor Mammedov
  2017-05-26 18:25   ` Eduardo Habkost
  4 siblings, 1 reply; 19+ messages in thread
From: Igor Mammedov @ 2017-05-23 14:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-arm, qemu-ppc, Eduardo Habkost, David Gibson, Andrew Jones

Move vcpu's assocciated numa_node field out of generic CPUState
into inherited classes that actually care about cpu<->numa mapping,
i.e: ARMCPU, PowerPCCPU, X86CPU.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 include/qom/cpu.h       |  2 --
 target/arm/cpu.h        |  2 ++
 target/i386/cpu.h       |  1 +
 target/ppc/cpu.h        |  1 +
 hw/ppc/spapr.c          | 24 +++++++++++-------------
 hw/ppc/spapr_cpu_core.c |  4 +++-
 monitor.c               | 11 +++++++----
 target/arm/cpu.c        |  2 +-
 target/i386/cpu.c       |  2 +-
 9 files changed, 27 insertions(+), 22 deletions(-)

diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 55214ce..89ddb68 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -265,7 +265,6 @@ struct qemu_work_item;
  * @cpu_index: CPU index (informative).
  * @nr_cores: Number of cores within this CPU package.
  * @nr_threads: Number of threads within this CPU.
- * @numa_node: NUMA node this CPU is belonging to.
  * @host_tid: Host thread ID.
  * @running: #true if CPU is currently running (lockless).
  * @has_waiter: #true if a CPU is currently waiting for the cpu_exec_end;
@@ -314,7 +313,6 @@ struct CPUState {
 
     int nr_cores;
     int nr_threads;
-    int numa_node;
 
     struct QemuThread *thread;
 #ifdef _WIN32
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 048faed..5ffc9d8 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -703,6 +703,8 @@ struct ARMCPU {
 
     ARMELChangeHook *el_change_hook;
     void *el_change_hook_opaque;
+
+    int32_t node_id; /* NUMA node this CPU is belonging to */
 };
 
 static inline ARMCPU *arm_env_get_cpu(CPUARMState *env)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index c4602ca..00f35a0 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1278,6 +1278,7 @@ struct X86CPU {
     int32_t socket_id;
     int32_t core_id;
     int32_t thread_id;
+    int32_t node_id; /* NUMA node this CPU is belonging to */
 };
 
 static inline X86CPU *x86_env_get_cpu(CPUX86State *env)
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 401e10e..31c052d 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1205,6 +1205,7 @@ struct PowerPCCPU {
     uint32_t compat_pvr;
     PPCVirtualHypervisor *vhyp;
     Object *intc;
+    int32_t node_id; /* NUMA node this CPU is belonging to */
 
     /* Fields related to migration compatibility hacks */
     bool pre_2_8_migration;
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index c7fee8b..96a2a74 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -178,25 +178,19 @@ static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
     return ret;
 }
 
-static int spapr_fixup_cpu_numa_dt(void *fdt, int offset, CPUState *cs)
+static int spapr_fixup_cpu_numa_dt(void *fdt, int offset, PowerPCCPU *cpu)
 {
-    int ret = 0;
-    PowerPCCPU *cpu = POWERPC_CPU(cs);
     int index = ppc_get_vcpu_dt_id(cpu);
     uint32_t associativity[] = {cpu_to_be32(0x5),
                                 cpu_to_be32(0x0),
                                 cpu_to_be32(0x0),
                                 cpu_to_be32(0x0),
-                                cpu_to_be32(cs->numa_node),
+                                cpu_to_be32(cpu->node_id),
                                 cpu_to_be32(index)};
 
     /* Advertise NUMA via ibm,associativity */
-    if (nb_numa_nodes > 1) {
-        ret = fdt_setprop(fdt, offset, "ibm,associativity", associativity,
+    return fdt_setprop(fdt, offset, "ibm,associativity", associativity,
                           sizeof(associativity));
-    }
-
-    return ret;
 }
 
 /* Populate the "ibm,pa-features" property */
@@ -321,9 +315,11 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPRMachineState *spapr)
             return ret;
         }
 
-        ret = spapr_fixup_cpu_numa_dt(fdt, offset, cs);
-        if (ret < 0) {
-            return ret;
+        if (nb_numa_nodes > 1) {
+            ret = spapr_fixup_cpu_numa_dt(fdt, offset, cpu);
+            if (ret < 0) {
+                return ret;
+            }
         }
 
         ret = spapr_fixup_cpu_smt_dt(fdt, offset, cpu, compat_smt);
@@ -538,7 +534,9 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset,
     _FDT((fdt_setprop(fdt, offset, "ibm,pft-size",
                       pft_size_prop, sizeof(pft_size_prop))));
 
-    _FDT(spapr_fixup_cpu_numa_dt(fdt, offset, cs));
+    if (nb_numa_nodes > 1) {
+        _FDT(spapr_fixup_cpu_numa_dt(fdt, offset, cpu));
+    }
 
     _FDT(spapr_fixup_cpu_smt_dt(fdt, offset, cpu, compat_smt));
 
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index a17ea07..60baf02 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -183,15 +183,17 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error **errp)
     for (i = 0; i < cc->nr_threads; i++) {
         char id[32];
         CPUState *cs;
+        PowerPCCPU *cpu;
 
         obj = sc->threads + i * size;
 
         object_initialize(obj, size, typename);
         cs = CPU(obj);
+        cpu = POWERPC_CPU(cs);
         cs->cpu_index = cc->core_id + i;
 
         /* Set NUMA node for the threads belonged to core  */
-        cs->numa_node = sc->node_id;
+        cpu->node_id = sc->node_id;
 
         snprintf(id, sizeof(id), "thread[%d]", i);
         object_property_add_child(OBJECT(sc), id, obj, &local_err);
diff --git a/monitor.c b/monitor.c
index afbacfe..b053c40 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1697,23 +1697,26 @@ static void hmp_info_mtree(Monitor *mon, const QDict *qdict)
 static void hmp_info_numa(Monitor *mon, const QDict *qdict)
 {
     int i;
-    CPUState *cpu;
     uint64_t *node_mem;
+    CpuInfoList *cpu_list, *cpu;
 
+    cpu_list = qmp_query_cpus(&error_abort);
     node_mem = g_new0(uint64_t, nb_numa_nodes);
     query_numa_node_mem(node_mem);
     monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
     for (i = 0; i < nb_numa_nodes; i++) {
         monitor_printf(mon, "node %d cpus:", i);
-        CPU_FOREACH(cpu) {
-            if (cpu->numa_node == i) {
-                monitor_printf(mon, " %d", cpu->cpu_index);
+        for (cpu = cpu_list; cpu; cpu = cpu->next) {
+            if (cpu->value->has_props && cpu->value->props->has_node_id &&
+                cpu->value->props->node_id == i) {
+                monitor_printf(mon, " %" PRIi64, cpu->value->CPU);
             }
         }
         monitor_printf(mon, "\n");
         monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i,
                        node_mem[i] >> 20);
     }
+    qapi_free_CpuInfoList(cpu_list);
     g_free(node_mem);
 }
 
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index c185eb1..09ef3a6 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1573,7 +1573,7 @@ static Property arm_cpu_properties[] = {
     DEFINE_PROP_UINT32("midr", ARMCPU, midr, 0),
     DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
                         mp_affinity, ARM64_AFFINITY_INVALID),
-    DEFINE_PROP_INT32("node-id", CPUState, numa_node, CPU_UNSET_NUMA_NODE_ID),
+    DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
     DEFINE_PROP_END_OF_LIST()
 };
 
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index a41d595..ffb5267 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3986,7 +3986,7 @@ static Property x86_cpu_properties[] = {
     DEFINE_PROP_INT32("core-id", X86CPU, core_id, -1),
     DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, -1),
 #endif
-    DEFINE_PROP_INT32("node-id", CPUState, numa_node, CPU_UNSET_NUMA_NODE_ID),
+    DEFINE_PROP_INT32("node-id", X86CPU, node_id, CPU_UNSET_NUMA_NODE_ID),
     DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false),
     { .name  = "hv-spinlocks", .info  = &qdev_prop_spinlocks },
     DEFINE_PROP_BOOL("hv-relaxed", X86CPU, hyperv_relaxed_timing, false),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/5] numa: fallback to default NUMA node instead of node 0
  2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 4/5] numa: fallback to default NUMA node instead of node 0 Igor Mammedov
@ 2017-05-23 14:48   ` Eduardo Habkost
  2017-05-23 15:44     ` Igor Mammedov
  0 siblings, 1 reply; 19+ messages in thread
From: Eduardo Habkost @ 2017-05-23 14:48 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, qemu-arm, qemu-ppc, David Gibson, Andrew Jones

On Tue, May 23, 2017 at 04:38:49PM +0200, Igor Mammedov wrote:
> Do the same as we did in commit
>   (57924bcd8 numa: introduce machine callback for VCPU to node mapping)
> but only for incomplete mapping usecase, falling back to board
> provided default cpu to node mapping if user hasn't provided
> mapping for CPU explicitly.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

This breaks migration compatibility, doesn't it? I understand
this use case is not supported, but if are going to break stuff
for people using incomplete mappings (by rejecting the
configuration on a future release), I would prefer to break it
only once.

> ---
>  hw/core/machine.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 964b75d..b8df15f 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -730,9 +730,6 @@ static void machine_numa_finish_init(MachineState *machine)
>                  g_string_append_printf(s, "%sCPU %d [%s]",
>                                         s->len ? ", " : "", i, cpu_str);
>                  g_free(cpu_str);
> -
> -                /* non mapped cpus used to fallback to node 0 */
> -                props.node_id = 0;
>              }
>  
>              props.has_node_id = true;
> -- 
> 2.7.4
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/5] numa: fallback to default NUMA node instead of node 0
  2017-05-23 14:48   ` Eduardo Habkost
@ 2017-05-23 15:44     ` Igor Mammedov
  2017-05-26 14:58       ` Eduardo Habkost
  0 siblings, 1 reply; 19+ messages in thread
From: Igor Mammedov @ 2017-05-23 15:44 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: qemu-devel, qemu-arm, qemu-ppc, David Gibson, Andrew Jones

On Tue, 23 May 2017 11:48:54 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Tue, May 23, 2017 at 04:38:49PM +0200, Igor Mammedov wrote:
> > Do the same as we did in commit
> >   (57924bcd8 numa: introduce machine callback for VCPU to node mapping)
> > but only for incomplete mapping usecase, falling back to board
> > provided default cpu to node mapping if user hasn't provided
> > mapping for CPU explicitly.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> 
> This breaks migration compatibility, doesn't it? I understand
> this use case is not supported, but if are going to break stuff
> for people using incomplete mappings (by rejecting the
> configuration on a future release), I would prefer to break it
> only once.
it's firmware change and we generally don't care nor maintain firmware compat stuff
the same as with above mentioned 
 (57924bcd8 numa: introduce machine callback for VCPU to node mapping)

> 
> > ---
> >  hw/core/machine.c | 3 ---
> >  1 file changed, 3 deletions(-)
> > 
> > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > index 964b75d..b8df15f 100644
> > --- a/hw/core/machine.c
> > +++ b/hw/core/machine.c
> > @@ -730,9 +730,6 @@ static void machine_numa_finish_init(MachineState *machine)
> >                  g_string_append_printf(s, "%sCPU %d [%s]",
> >                                         s->len ? ", " : "", i, cpu_str);
> >                  g_free(cpu_str);
> > -
> > -                /* non mapped cpus used to fallback to node 0 */
> > -                props.node_id = 0;
> >              }
> >  
> >              props.has_node_id = true;
> > -- 
> > 2.7.4
> > 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/5] numa: fallback to default NUMA node instead of node 0
  2017-05-23 15:44     ` Igor Mammedov
@ 2017-05-26 14:58       ` Eduardo Habkost
  2017-05-29  9:12         ` Igor Mammedov
  2017-05-29 12:11         ` Igor Mammedov
  0 siblings, 2 replies; 19+ messages in thread
From: Eduardo Habkost @ 2017-05-26 14:58 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: Andrew Jones, qemu-arm, qemu-ppc, qemu-devel, David Gibson

On Tue, May 23, 2017 at 05:44:03PM +0200, Igor Mammedov wrote:
> On Tue, 23 May 2017 11:48:54 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Tue, May 23, 2017 at 04:38:49PM +0200, Igor Mammedov wrote:
> > > Do the same as we did in commit
> > >   (57924bcd8 numa: introduce machine callback for VCPU to node mapping)
> > > but only for incomplete mapping usecase, falling back to board
> > > provided default cpu to node mapping if user hasn't provided
> > > mapping for CPU explicitly.
> > > 
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > 
> > This breaks migration compatibility, doesn't it? I understand
> > this use case is not supported, but if are going to break stuff
> > for people using incomplete mappings (by rejecting the
> > configuration on a future release), I would prefer to break it
> > only once.
> it's firmware change and we generally don't care nor maintain firmware compat stuff
> the same as with above mentioned 
>  (57924bcd8 numa: introduce machine callback for VCPU to node mapping)


We normally keep resulting NUMA topology compatible, even if NUMA
information is implemented only by firmware.  See how we set
MachineClass::numa_auto_assign_ram and
MachineClass::numa_mem_align_shift in older machine-types, for
example.

Sometimes we didn't keep compatibility, though (e.g. on commit
fb43b73b92 "pc: fix default VCPU to NUMA node mapping").

I wouldn't disagree with this change if we planned to support
incomplete mappings forever.  But if we are going to remove
support for incomplete mappings soon, I would prefer to break
user expectations only once.

> 
> > 
> > > ---
> > >  hw/core/machine.c | 3 ---
> > >  1 file changed, 3 deletions(-)
> > > 
> > > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > > index 964b75d..b8df15f 100644
> > > --- a/hw/core/machine.c
> > > +++ b/hw/core/machine.c
> > > @@ -730,9 +730,6 @@ static void machine_numa_finish_init(MachineState *machine)
> > >                  g_string_append_printf(s, "%sCPU %d [%s]",
> > >                                         s->len ? ", " : "", i, cpu_str);
> > >                  g_free(cpu_str);
> > > -
> > > -                /* non mapped cpus used to fallback to node 0 */
> > > -                props.node_id = 0;
> > >              }
> > >  
> > >              props.has_node_id = true;
> > > -- 
> > > 2.7.4
> > > 
> > 
> 
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/5] numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr
  2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 1/5] numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr Igor Mammedov
@ 2017-05-26 15:01   ` Eduardo Habkost
  2017-05-26 15:33   ` Eduardo Habkost
  1 sibling, 0 replies; 19+ messages in thread
From: Eduardo Habkost @ 2017-05-26 15:01 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, Andrew Jones, qemu-arm, qemu-ppc, David Gibson

On Tue, May 23, 2017 at 04:38:46PM +0200, Igor Mammedov wrote:
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Queued on machine-next, thanks.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/5] numa: move default mapping init to machine
  2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 2/5] numa: move default mapping init to machine Igor Mammedov
@ 2017-05-26 15:02   ` Eduardo Habkost
  0 siblings, 0 replies; 19+ messages in thread
From: Eduardo Habkost @ 2017-05-26 15:02 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, Andrew Jones, qemu-arm, qemu-ppc, David Gibson

On Tue, May 23, 2017 at 04:38:47PM +0200, Igor Mammedov wrote:
> there is no need use cpu_index_to_instance_props() for setting
> default cpu -> node mapping. Generic machine code can do it
> without cpu_index by just enabling already preset defaults
> in possible_cpus.
> 
> PS:
> as bonus it makes one less user of cpu_index_to_instance_props()
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Queued on numa-next, thanks!

-- 
Eduardo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/5] numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr
  2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 1/5] numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr Igor Mammedov
  2017-05-26 15:01   ` Eduardo Habkost
@ 2017-05-26 15:33   ` Eduardo Habkost
  1 sibling, 0 replies; 19+ messages in thread
From: Eduardo Habkost @ 2017-05-26 15:33 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, Andrew Jones, qemu-arm, qemu-ppc, David Gibson

On Tue, May 23, 2017 at 04:38:46PM +0200, Igor Mammedov wrote:
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> v2:
>   user error_abort in numa_cpu_pre_plug()
>      Eduardo Habkost <ehabkost@redhat.com>
> ---
>  include/sysemu/numa.h |  1 +
>  hw/arm/virt.c         | 16 ++--------------
>  hw/i386/pc.c          | 17 +----------------
>  hw/ppc/spapr.c        | 17 +----------------
>  numa.c                | 22 ++++++++++++++++++++++
>  5 files changed, 27 insertions(+), 46 deletions(-)
> 
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index 7ffde5b..610eece 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -35,4 +35,5 @@ void numa_legacy_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
>                                   int nb_nodes, ram_addr_t size);
>  void numa_default_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
>                                    int nb_nodes, ram_addr_t size);
> +void numa_cpu_pre_plug(const CPUArchId *slot, DeviceState *dev, Error **errp);
>  #endif
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index c7c8159..ce676df 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1351,7 +1351,6 @@ static void machvirt_init(MachineState *machine)
>      for (n = 0; n < possible_cpus->len; n++) {
>          Object *cpuobj;
>          CPUState *cs;
> -        int node_id;
>  
>          if (n >= smp_cpus) {
>              break;
> @@ -1364,19 +1363,8 @@ static void machvirt_init(MachineState *machine)
>          cs = CPU(cpuobj);
>          cs->cpu_index = n;
>  
> -        node_id = possible_cpus->cpus[cs->cpu_index].props.node_id;
> -        if (!possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
> -            /* by default CPUState::numa_node was 0 if it's not set via CLI
> -             * keep it this way for now but in future we probably should
> -             * refuse to start up with incomplete numa mapping */
> -             node_id = 0;
> -        }
> -        if (cs->numa_node == CPU_UNSET_NUMA_NODE_ID) {
> -            cs->numa_node = node_id;
> -        } else {
> -            /* CPU isn't device_add compatible yet, this shouldn't happen */
> -            error_setg(&error_abort, "user set node-id not implemented");
> -        }
> +        numa_cpu_pre_plug(&possible_cpus->cpus[cs->cpu_index], DEVICE(cpuobj),
> +                          &error_fatal);
>  
>          if (!vms->secure) {
>              object_property_set_bool(cpuobj, false, "has_el3", NULL);
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 816bfa8..cf09949 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1893,7 +1893,6 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>                              DeviceState *dev, Error **errp)
>  {
>      int idx;
> -    int node_id;
>      CPUState *cs;
>      CPUArchId *cpu_slot;
>      X86CPUTopoInfo topo;
> @@ -1984,21 +1983,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>      cs = CPU(cpu);
>      cs->cpu_index = idx;
>  
> -    node_id = cpu_slot->props.node_id;
> -    if (!cpu_slot->props.has_node_id) {
> -        /* by default CPUState::numa_node was 0 if it's not set via CLI
> -         * keep it this way for now but in future we probably should
> -         * refuse to start up with incomplete numa mapping */
> -        node_id = 0;
> -    }
> -    if (cs->numa_node == CPU_UNSET_NUMA_NODE_ID) {
> -        cs->numa_node = node_id;
> -    } else if (cs->numa_node != node_id) {
> -            error_setg(errp, "node-id %d must match numa node specified"
> -                "with -numa option for cpu-index %d",
> -                cs->numa_node, cs->cpu_index);
> -            return;
> -    }
> +    numa_cpu_pre_plug(cpu_slot, dev, errp);
>  }
>  
>  static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0980d73..c7fee8b 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2831,11 +2831,9 @@ static void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>      MachineClass *mc = MACHINE_GET_CLASS(hotplug_dev);
>      Error *local_err = NULL;
>      CPUCore *cc = CPU_CORE(dev);
> -    sPAPRCPUCore *sc = SPAPR_CPU_CORE(dev);
>      char *base_core_type = spapr_get_cpu_core_type(machine->cpu_model);
>      const char *type = object_get_typename(OBJECT(dev));
>      CPUArchId *core_slot;
> -    int node_id;
>      int index;
>  
>      if (dev->hotplugged && !mc->has_hotpluggable_cpus) {
> @@ -2870,20 +2868,7 @@ static void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>          goto out;
>      }
>  
> -    node_id = core_slot->props.node_id;
> -    if (!core_slot->props.has_node_id) {
> -        /* by default CPUState::numa_node was 0 if it's not set via CLI
> -         * keep it this way for now but in future we probably should
> -         * refuse to start up with incomplete numa mapping */
> -        node_id = 0;
> -    }
> -    if (sc->node_id == CPU_UNSET_NUMA_NODE_ID) {
> -        sc->node_id = node_id;
> -    } else if (sc->node_id != node_id) {
> -        error_setg(&local_err, "node-id %d must match numa node specified"
> -            "with -numa option for cpu-index %d", sc->node_id, cc->core_id);
> -        goto out;
> -    }
> +    numa_cpu_pre_plug(core_slot, dev, &local_err);
>  
>  out:
>      g_free(base_core_type);
> diff --git a/numa.c b/numa.c
> index ca73145..723f5f8 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -534,6 +534,28 @@ void parse_numa_opts(MachineState *ms)
>      }
>  }
>  
> +void numa_cpu_pre_plug(const CPUArchId *slot, DeviceState *dev, Error **errp)
> +{
> +    int node_id = object_property_get_int(OBJECT(dev), "node-id", &error_abort);
> +
> +    if (node_id == CPU_UNSET_NUMA_NODE_ID) {
> +        /* by default CPUState::numa_node was 0 if it wasn't set explicitly
> +         * TODO: make it error when incomplete numa mapping support is removed
> +         */
> +        node_id = 0;
> +
> +        /* due to bug in libvirt, it doesn't pass node-id from props on
> +         * device_add as expected, so we have to fix it up here */
> +        if (slot->props.has_node_id) {
> +            node_id = slot->props.node_id;
> +        }
> +        object_property_set_int(OBJECT(dev), node_id, "node-id", errp);
> +    } else if (node_id != slot->props.node_id) {
> +        error_setg(errp, "node-id=%d must match numa node specified "
> +                   "with -numa option", node_id);
> +    }

Current qemu.git master:

  $ ./x86_64-softmmu/qemu-system-x86_64 -smp 1,maxcpus=3 -numa node -numa node -numa node,cpus=2 -monitor stdio -display none
  (qemu) device_add qemu64-x86_64-cpu,socket-id=1,core-id=0,thread-id=0,node-id=1
  node-id 1 must match numa node specifiedwith -numa option for cpu-index 1

With this patch, there's no error:

  $ ./x86_64-softmmu/qemu-system-x86_64 -smp 1,maxcpus=3 -numa node -numa node -numa node,cpus=2 -monitor stdio -display none
  (qemu) device_add qemu64-x86_64-cpu,socket-id=1,core-id=0,thread-id=0,node-id=1
  (qemu) 

Then, patch 3/5 reverses this behavior change, and we see an error on
device_add again.

I suggest keeping the existing logic (that will be cleaned up on patch 3/5
anyway), for bisectability.

I'm removing patches 1/5 and 2/5 from numa-next, in the meantime.

> +}
> +
>  static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
>                                             const char *name,
>                                             uint64_t ram_size)
> -- 
> 2.7.4
> 
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/5] numa: make sure that all cpus in has has_node_id set if numa is enabled
  2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 3/5] numa: make sure that all cpus in has has_node_id set if numa is enabled Igor Mammedov
@ 2017-05-26 16:06   ` Eduardo Habkost
  2017-05-29 11:45     ` Igor Mammedov
  0 siblings, 1 reply; 19+ messages in thread
From: Eduardo Habkost @ 2017-05-26 16:06 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, Andrew Jones, qemu-arm, qemu-ppc, David Gibson

On Tue, May 23, 2017 at 04:38:48PM +0200, Igor Mammedov wrote:
> It fixes/add missing _PXM object for non mapped CPU (x86)
> and missing fdt node (virt-arm).
> 
> It ensures that possible_cpus contains complete mapping if
> numa is enabled by the time machine_init() is executed.
> 
> As result non completely mapped CPUs:
>  1) appear in ACPI/fdt blobs
>  2) QMP query-hotpluggable-cpus command shows bound nodes for such CPUs
>  3) allows to drop checks for has_node_id in numa only code,
>    reducing number of invariants incomplete mapping could produce
>  4) moves fixup/implicit node init from runtime numa_cpu_pre_plug()
>    (when CPU object is created) to machine_numa_finish_init() which
>    helps to fix [1, 2] and make possible_cpus complete source
>    of numa mapping available even before CPUs are created.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

Code that will be affected, but will keep the same behavior:

* pre_plug handler when node-id is omitted on device_add: it
  already sets node_id=0 if !has_node_id
  * However, pre_plug behavior is being changed when node-id is
    set. See below.
* build_srat() (both x86 and arm): already sets node_id = 0 if
  !has_node_id
* bochs_bios_init(): already uses g_new0(), so already had
  omitted entries set to 0.

Affected code that will change behavior with this patch:

* pre_plug handler when node-id is set:
  * See my reply to patch 1/5.  This patch restores the old
    behavior that rejected node-id != 0 on omitted CPUs.
* ACPI _PXM initialization
* arm/virt.c: numa-node-id will now be set for all CPUs
* QMP/HMP query-hotpluggable-cpus output
  * Desirable change, as now the QMP command will reflect what
    the guest is really seeing

Due to the _PXM and FDT changes, I have the same objection I have
for patch 4/5: if we are going to break user expectations when
using incomplete NUMA mappings, let's do that only once (when we
start rejecting incomplete NUMA mappings).

A few extra comments about the has_node_id checks below:

> ---
>  hw/arm/virt-acpi-build.c |  4 +---
>  hw/core/machine.c        | 16 ++++++++++------
>  hw/i386/acpi-build.c     |  3 +--
>  hw/i386/pc.c             |  4 +---
>  numa.c                   |  7 +------
>  5 files changed, 14 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index e585206..977a794 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -496,12 +496,10 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>      srat->reserved1 = cpu_to_le32(1);
>  
>      for (i = 0; i < cpu_list->len; ++i) {
> -        int node_id = cpu_list->cpus[i].props.has_node_id ?
> -            cpu_list->cpus[i].props.node_id : 0;
>          core = acpi_data_push(table_data, sizeof(*core));
>          core->type = ACPI_SRAT_PROCESSOR_GICC;
>          core->length = sizeof(*core);
> -        core->proximity = cpu_to_le32(node_id);
> +        core->proximity = cpu_to_le32(cpu_list->cpus[i].props.node_id);
>          core->acpi_processor_uid = cpu_to_le32(i);
>          core->flags = cpu_to_le32(1);
>      }
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index aaf3cff..964b75d 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -720,19 +720,23 @@ static void machine_numa_finish_init(MachineState *machine)
>          const CPUArchId *cpu_slot = &possible_cpus->cpus[i];
>  
>          if (!cpu_slot->props.has_node_id) {
> -            if (default_mapping) {
> -                /* fetch default mapping from board and enable it */
> -                CpuInstanceProperties props = cpu_slot->props;
> -                props.has_node_id = true;
> -                machine_set_cpu_numa_node(machine, &props, &error_fatal);
> -            } else {
> +            /* fetch default mapping from board and enable it */
> +            CpuInstanceProperties props = cpu_slot->props;
> +
> +            if (!default_mapping) {
>                  /* record slots with not set mapping,
>                   * TODO: make it hard error in future */
>                  char *cpu_str = cpu_slot_to_string(cpu_slot);
>                  g_string_append_printf(s, "%sCPU %d [%s]",
>                                         s->len ? ", " : "", i, cpu_str);
>                  g_free(cpu_str);
> +
> +                /* non mapped cpus used to fallback to node 0 */
> +                props.node_id = 0;
>              }
> +
> +            props.has_node_id = true;
> +            machine_set_cpu_numa_node(machine, &props, &error_fatal);
>          }
>      }
>      if (s->len) {
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index afcadac..873880d 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2335,8 +2335,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
>      srat->reserved1 = cpu_to_le32(1);
>  
>      for (i = 0; i < apic_ids->len; i++) {
> -        int node_id = apic_ids->cpus[i].props.has_node_id ?
> -            apic_ids->cpus[i].props.node_id : 0;
> +        int node_id = apic_ids->cpus[i].props.node_id;

What about an assert(props.has_node_id) here?

>          uint32_t apic_id = apic_ids->cpus[i].arch_id;
>  
>          if (apic_id < 255) {
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index cf09949..84f0a6f 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -788,9 +788,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
>      for (i = 0; i < cpus->len; i++) {
>          unsigned int apic_id = cpus->cpus[i].arch_id;
>          assert(apic_id < pcms->apic_id_limit);
> -        if (cpus->cpus[i].props.has_node_id) {
> -            numa_fw_cfg[apic_id + 1] = cpu_to_le64(cpus->cpus[i].props.node_id);
> -        }
> +        numa_fw_cfg[apic_id + 1] = cpu_to_le64(cpus->cpus[i].props.node_id);

Ditto.

>      }
>      for (i = 0; i < nb_numa_nodes; i++) {
>          numa_fw_cfg[pcms->apic_id_limit + 1 + i] =
> diff --git a/numa.c b/numa.c
> index e30702e..4ef6dea 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -513,17 +513,12 @@ void numa_cpu_pre_plug(const CPUArchId *slot, DeviceState *dev, Error **errp)
>      int node_id = object_property_get_int(OBJECT(dev), "node-id", &error_abort);
>  
>      if (node_id == CPU_UNSET_NUMA_NODE_ID) {
> -        /* by default CPUState::numa_node was 0 if it wasn't set explicitly
> -         * TODO: make it error when incomplete numa mapping support is removed
> -         */
> -        node_id = 0;
> -
>          /* due to bug in libvirt, it doesn't pass node-id from props on
>           * device_add as expected, so we have to fix it up here */
>          if (slot->props.has_node_id) {
>              node_id = slot->props.node_id;
> +            object_property_set_int(OBJECT(dev), node_id, "node-id", errp);
>          }
> -        object_property_set_int(OBJECT(dev), node_id, "node-id", errp);

Why not removing the has_node_id check completely, here?

In case this patch get respun, I suggest also removing the
has_node_id check at: build_cpus_aml() _PXM code.

>      } else if (node_id != slot->props.node_id) {
>          error_setg(errp, "node-id=%d must match numa node specified "
>                     "with -numa option", node_id);
> -- 
> 2.7.4
> 
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/5] numa: move numa_node from CPUState into target specific classes
  2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 5/5] numa: move numa_node from CPUState into target specific classes Igor Mammedov
@ 2017-05-26 18:25   ` Eduardo Habkost
  2017-05-29 12:12     ` Igor Mammedov
  0 siblings, 1 reply; 19+ messages in thread
From: Eduardo Habkost @ 2017-05-26 18:25 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, qemu-arm, qemu-ppc, David Gibson, Andrew Jones

On Tue, May 23, 2017 at 04:38:50PM +0200, Igor Mammedov wrote:
> Move vcpu's assocciated numa_node field out of generic CPUState
> into inherited classes that actually care about cpu<->numa mapping,
> i.e: ARMCPU, PowerPCCPU, X86CPU.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  include/qom/cpu.h       |  2 --
>  target/arm/cpu.h        |  2 ++
>  target/i386/cpu.h       |  1 +
>  target/ppc/cpu.h        |  1 +
>  hw/ppc/spapr.c          | 24 +++++++++++-------------
>  hw/ppc/spapr_cpu_core.c |  4 +++-
>  monitor.c               | 11 +++++++----
>  target/arm/cpu.c        |  2 +-
>  target/i386/cpu.c       |  2 +-
>  9 files changed, 27 insertions(+), 22 deletions(-)
> 
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 55214ce..89ddb68 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -265,7 +265,6 @@ struct qemu_work_item;
>   * @cpu_index: CPU index (informative).
>   * @nr_cores: Number of cores within this CPU package.
>   * @nr_threads: Number of threads within this CPU.
> - * @numa_node: NUMA node this CPU is belonging to.
>   * @host_tid: Host thread ID.
>   * @running: #true if CPU is currently running (lockless).
>   * @has_waiter: #true if a CPU is currently waiting for the cpu_exec_end;
> @@ -314,7 +313,6 @@ struct CPUState {
>  
>      int nr_cores;
>      int nr_threads;
> -    int numa_node;
>  
>      struct QemuThread *thread;
>  #ifdef _WIN32
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 048faed..5ffc9d8 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -703,6 +703,8 @@ struct ARMCPU {
>  
>      ARMELChangeHook *el_change_hook;
>      void *el_change_hook_opaque;
> +
> +    int32_t node_id; /* NUMA node this CPU is belonging to */
>  };
>  
>  static inline ARMCPU *arm_env_get_cpu(CPUARMState *env)
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index c4602ca..00f35a0 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -1278,6 +1278,7 @@ struct X86CPU {
>      int32_t socket_id;
>      int32_t core_id;
>      int32_t thread_id;
> +    int32_t node_id; /* NUMA node this CPU is belonging to */

I suggest placing this before socket_id, so it follows the
logical topology order: node, socket, core, thread.

>  };
>  
>  static inline X86CPU *x86_env_get_cpu(CPUX86State *env)
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 401e10e..31c052d 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1205,6 +1205,7 @@ struct PowerPCCPU {
>      uint32_t compat_pvr;
>      PPCVirtualHypervisor *vhyp;
>      Object *intc;
> +    int32_t node_id; /* NUMA node this CPU is belonging to */
>  
>      /* Fields related to migration compatibility hacks */
>      bool pre_2_8_migration;
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index c7fee8b..96a2a74 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -178,25 +178,19 @@ static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
>      return ret;
>  }
>  
> -static int spapr_fixup_cpu_numa_dt(void *fdt, int offset, CPUState *cs)
> +static int spapr_fixup_cpu_numa_dt(void *fdt, int offset, PowerPCCPU *cpu)
>  {
> -    int ret = 0;
> -    PowerPCCPU *cpu = POWERPC_CPU(cs);
>      int index = ppc_get_vcpu_dt_id(cpu);
>      uint32_t associativity[] = {cpu_to_be32(0x5),
>                                  cpu_to_be32(0x0),
>                                  cpu_to_be32(0x0),
>                                  cpu_to_be32(0x0),
> -                                cpu_to_be32(cs->numa_node),
> +                                cpu_to_be32(cpu->node_id),
>                                  cpu_to_be32(index)};
>  
>      /* Advertise NUMA via ibm,associativity */
> -    if (nb_numa_nodes > 1) {
> -        ret = fdt_setprop(fdt, offset, "ibm,associativity", associativity,

Commit message says we're just moving fields to other structs.
If you want to change the existing logic, please do it in a
separate patch.

> +    return fdt_setprop(fdt, offset, "ibm,associativity", associativity,
>                            sizeof(associativity));
> -    }
> -
> -    return ret;
>  }
>  
>  /* Populate the "ibm,pa-features" property */
> @@ -321,9 +315,11 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPRMachineState *spapr)
>              return ret;
>          }
>  
> -        ret = spapr_fixup_cpu_numa_dt(fdt, offset, cs);
> -        if (ret < 0) {
> -            return ret;
> +        if (nb_numa_nodes > 1) {
> +            ret = spapr_fixup_cpu_numa_dt(fdt, offset, cpu);

Ditto.

> +            if (ret < 0) {
> +                return ret;
> +            }
>          }
>  
>          ret = spapr_fixup_cpu_smt_dt(fdt, offset, cpu, compat_smt);
> @@ -538,7 +534,9 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset,
>      _FDT((fdt_setprop(fdt, offset, "ibm,pft-size",
>                        pft_size_prop, sizeof(pft_size_prop))));
>  
> -    _FDT(spapr_fixup_cpu_numa_dt(fdt, offset, cs));
> +    if (nb_numa_nodes > 1) {

Ditto.

> +        _FDT(spapr_fixup_cpu_numa_dt(fdt, offset, cpu));
> +    }
>  
>      _FDT(spapr_fixup_cpu_smt_dt(fdt, offset, cpu, compat_smt));
>  
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index a17ea07..60baf02 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -183,15 +183,17 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error **errp)
>      for (i = 0; i < cc->nr_threads; i++) {
>          char id[32];
>          CPUState *cs;
> +        PowerPCCPU *cpu;
>  
>          obj = sc->threads + i * size;
>  
>          object_initialize(obj, size, typename);
>          cs = CPU(obj);
> +        cpu = POWERPC_CPU(cs);
>          cs->cpu_index = cc->core_id + i;
>  
>          /* Set NUMA node for the threads belonged to core  */
> -        cs->numa_node = sc->node_id;
> +        cpu->node_id = sc->node_id;
>  
>          snprintf(id, sizeof(id), "thread[%d]", i);
>          object_property_add_child(OBJECT(sc), id, obj, &local_err);
> diff --git a/monitor.c b/monitor.c
> index afbacfe..b053c40 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -1697,23 +1697,26 @@ static void hmp_info_mtree(Monitor *mon, const QDict *qdict)
>  static void hmp_info_numa(Monitor *mon, const QDict *qdict)
>  {
>      int i;
> -    CPUState *cpu;
>      uint64_t *node_mem;
> +    CpuInfoList *cpu_list, *cpu;
>  
> +    cpu_list = qmp_query_cpus(&error_abort);
>      node_mem = g_new0(uint64_t, nb_numa_nodes);
>      query_numa_node_mem(node_mem);
>      monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
>      for (i = 0; i < nb_numa_nodes; i++) {
>          monitor_printf(mon, "node %d cpus:", i);
> -        CPU_FOREACH(cpu) {
> -            if (cpu->numa_node == i) {
> -                monitor_printf(mon, " %d", cpu->cpu_index);
> +        for (cpu = cpu_list; cpu; cpu = cpu->next) {
> +            if (cpu->value->has_props && cpu->value->props->has_node_id &&
> +                cpu->value->props->node_id == i) {
> +                monitor_printf(mon, " %" PRIi64, cpu->value->CPU);

I suggest moving this to a separate patch, too.

>              }
>          }
>          monitor_printf(mon, "\n");
>          monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i,
>                         node_mem[i] >> 20);
>      }
> +    qapi_free_CpuInfoList(cpu_list);
>      g_free(node_mem);
>  }
>  
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index c185eb1..09ef3a6 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -1573,7 +1573,7 @@ static Property arm_cpu_properties[] = {
>      DEFINE_PROP_UINT32("midr", ARMCPU, midr, 0),
>      DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
>                          mp_affinity, ARM64_AFFINITY_INVALID),
> -    DEFINE_PROP_INT32("node-id", CPUState, numa_node, CPU_UNSET_NUMA_NODE_ID),
> +    DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
>      DEFINE_PROP_END_OF_LIST()
>  };
>  
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index a41d595..ffb5267 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -3986,7 +3986,7 @@ static Property x86_cpu_properties[] = {
>      DEFINE_PROP_INT32("core-id", X86CPU, core_id, -1),
>      DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, -1),
>  #endif
> -    DEFINE_PROP_INT32("node-id", CPUState, numa_node, CPU_UNSET_NUMA_NODE_ID),
> +    DEFINE_PROP_INT32("node-id", X86CPU, node_id, CPU_UNSET_NUMA_NODE_ID),
>      DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false),
>      { .name  = "hv-spinlocks", .info  = &qdev_prop_spinlocks },
>      DEFINE_PROP_BOOL("hv-relaxed", X86CPU, hyperv_relaxed_timing, false),
> -- 
> 2.7.4
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/5] numa: fallback to default NUMA node instead of node 0
  2017-05-26 14:58       ` Eduardo Habkost
@ 2017-05-29  9:12         ` Igor Mammedov
  2017-05-29 12:11         ` Igor Mammedov
  1 sibling, 0 replies; 19+ messages in thread
From: Igor Mammedov @ 2017-05-29  9:12 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Andrew Jones, qemu-arm, qemu-ppc, qemu-devel, David Gibson

On Fri, 26 May 2017 11:58:14 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Tue, May 23, 2017 at 05:44:03PM +0200, Igor Mammedov wrote:
> > On Tue, 23 May 2017 11:48:54 -0300
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> >   
> > > On Tue, May 23, 2017 at 04:38:49PM +0200, Igor Mammedov wrote:  
> > > > Do the same as we did in commit
> > > >   (57924bcd8 numa: introduce machine callback for VCPU to node mapping)
> > > > but only for incomplete mapping usecase, falling back to board
> > > > provided default cpu to node mapping if user hasn't provided
> > > > mapping for CPU explicitly.
> > > > 
> > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> > > 
> > > This breaks migration compatibility, doesn't it? I understand
> > > this use case is not supported, but if are going to break stuff
> > > for people using incomplete mappings (by rejecting the
> > > configuration on a future release), I would prefer to break it
> > > only once.  
> > it's firmware change and we generally don't care nor maintain firmware compat stuff
> > the same as with above mentioned 
> >  (57924bcd8 numa: introduce machine callback for VCPU to node mapping)  
> 
> 
> We normally keep resulting NUMA topology compatible, even if NUMA
> information is implemented only by firmware.  See how we set
> MachineClass::numa_auto_assign_ram and
> MachineClass::numa_mem_align_shift in older machine-types, for
> example.
> 
> Sometimes we didn't keep compatibility, though (e.g. on commit
> fb43b73b92 "pc: fix default VCPU to NUMA node mapping").
> 
> I wouldn't disagree with this change if we planned to support
> incomplete mappings forever.  But if we are going to remove
> support for incomplete mappings soon, I would prefer to break
> user expectations only once.
ok, I'll keep fix up to 0 and drop this patch.

> >   
> > >   
> > > > ---
> > > >  hw/core/machine.c | 3 ---
> > > >  1 file changed, 3 deletions(-)
> > > > 
> > > > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > > > index 964b75d..b8df15f 100644
> > > > --- a/hw/core/machine.c
> > > > +++ b/hw/core/machine.c
> > > > @@ -730,9 +730,6 @@ static void machine_numa_finish_init(MachineState *machine)
> > > >                  g_string_append_printf(s, "%sCPU %d [%s]",
> > > >                                         s->len ? ", " : "", i, cpu_str);
> > > >                  g_free(cpu_str);
> > > > -
> > > > -                /* non mapped cpus used to fallback to node 0 */
> > > > -                props.node_id = 0;
> > > >              }
> > > >  
> > > >              props.has_node_id = true;
> > > > -- 
> > > > 2.7.4
> > > >   
> > >   
> > 
> >   
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/5] numa: make sure that all cpus in has has_node_id set if numa is enabled
  2017-05-26 16:06   ` Eduardo Habkost
@ 2017-05-29 11:45     ` Igor Mammedov
  2017-05-29 13:22       ` Eduardo Habkost
  0 siblings, 1 reply; 19+ messages in thread
From: Igor Mammedov @ 2017-05-29 11:45 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: qemu-devel, Andrew Jones, qemu-arm, qemu-ppc, David Gibson

On Fri, 26 May 2017 13:06:30 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Tue, May 23, 2017 at 04:38:48PM +0200, Igor Mammedov wrote:
> > It fixes/add missing _PXM object for non mapped CPU (x86)
> > and missing fdt node (virt-arm).
> > 
> > It ensures that possible_cpus contains complete mapping if
> > numa is enabled by the time machine_init() is executed.
> > 
> > As result non completely mapped CPUs:
> >  1) appear in ACPI/fdt blobs
> >  2) QMP query-hotpluggable-cpus command shows bound nodes for such CPUs
> >  3) allows to drop checks for has_node_id in numa only code,
> >    reducing number of invariants incomplete mapping could produce
> >  4) moves fixup/implicit node init from runtime numa_cpu_pre_plug()
> >    (when CPU object is created) to machine_numa_finish_init() which
> >    helps to fix [1, 2] and make possible_cpus complete source
> >    of numa mapping available even before CPUs are created.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> 
> Code that will be affected, but will keep the same behavior:
> 
> * pre_plug handler when node-id is omitted on device_add: it
>   already sets node_id=0 if !has_node_id
>   * However, pre_plug behavior is being changed when node-id is
>     set. See below.
> * build_srat() (both x86 and arm): already sets node_id = 0 if
>   !has_node_id
> * bochs_bios_init(): already uses g_new0(), so already had
>   omitted entries set to 0.
> 
> Affected code that will change behavior with this patch:
> 
> * pre_plug handler when node-id is set:
>   * See my reply to patch 1/5.  This patch restores the old
>     behavior that rejected node-id != 0 on omitted CPUs.
I'll look into it.

> * ACPI _PXM initialization
> * arm/virt.c: numa-node-id will now be set for all CPUs
> * QMP/HMP query-hotpluggable-cpus output
>   * Desirable change, as now the QMP command will reflect what
>     the guest is really seeing
> 
> Due to the _PXM and FDT changes, I have the same objection I have
> for patch 4/5: if we are going to break user expectations when
> using incomplete NUMA mappings, let's do that only once (when we
> start rejecting incomplete NUMA mappings).

FDT/PXM change is bugfix to firmware that I insist on.
Firmware should properly describe all CPUs instead of repeating
partial CPU mapping nonsense.

Wrt _PXM change see
(27111931 acpi: provide _PXM method for CPU devices if QEMU is started numa enabled)
provided we keep 0 node fallback by dropping 4/5
linux guest will behave the same as without _PXM.

And again there wasn't any compat code around that
and guest saw node-id change when it's been restarted.

Wrt FDT change, it's the same and I also talked with
Drew about it, his reply was that for ARM target
they don't care much about NUMA as not all pieces are
there to make pinning work on host side yet.

PS:
There wasn't any promise made to keep firmware side
bug compatible or versioned so users should not have
expectations about it in the first place.

Pls don't push for bug compat stuff in the firmware
as it will rise false expectations (at least until
we decide to care about it by default).

(I'd use firmware compat stuff only in cases where
it breaks guest horribly, i.e. crash/fail boot)


> A few extra comments about the has_node_id checks below:
> 
> > ---
> >  hw/arm/virt-acpi-build.c |  4 +---
> >  hw/core/machine.c        | 16 ++++++++++------
> >  hw/i386/acpi-build.c     |  3 +--
> >  hw/i386/pc.c             |  4 +---
> >  numa.c                   |  7 +------
> >  5 files changed, 14 insertions(+), 20 deletions(-)
> > 
> > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > index e585206..977a794 100644
> > --- a/hw/arm/virt-acpi-build.c
> > +++ b/hw/arm/virt-acpi-build.c
> > @@ -496,12 +496,10 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> >      srat->reserved1 = cpu_to_le32(1);
> >  
> >      for (i = 0; i < cpu_list->len; ++i) {
> > -        int node_id = cpu_list->cpus[i].props.has_node_id ?
> > -            cpu_list->cpus[i].props.node_id : 0;
> >          core = acpi_data_push(table_data, sizeof(*core));
> >          core->type = ACPI_SRAT_PROCESSOR_GICC;
> >          core->length = sizeof(*core);
> > -        core->proximity = cpu_to_le32(node_id);
> > +        core->proximity = cpu_to_le32(cpu_list->cpus[i].props.node_id);
> >          core->acpi_processor_uid = cpu_to_le32(i);
> >          core->flags = cpu_to_le32(1);
> >      }
> > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > index aaf3cff..964b75d 100644
> > --- a/hw/core/machine.c
> > +++ b/hw/core/machine.c
> > @@ -720,19 +720,23 @@ static void machine_numa_finish_init(MachineState *machine)
> >          const CPUArchId *cpu_slot = &possible_cpus->cpus[i];
> >  
> >          if (!cpu_slot->props.has_node_id) {
> > -            if (default_mapping) {
> > -                /* fetch default mapping from board and enable it */
> > -                CpuInstanceProperties props = cpu_slot->props;
> > -                props.has_node_id = true;
> > -                machine_set_cpu_numa_node(machine, &props, &error_fatal);
> > -            } else {
> > +            /* fetch default mapping from board and enable it */
> > +            CpuInstanceProperties props = cpu_slot->props;
> > +
> > +            if (!default_mapping) {
> >                  /* record slots with not set mapping,
> >                   * TODO: make it hard error in future */
> >                  char *cpu_str = cpu_slot_to_string(cpu_slot);
> >                  g_string_append_printf(s, "%sCPU %d [%s]",
> >                                         s->len ? ", " : "", i, cpu_str);
> >                  g_free(cpu_str);
> > +
> > +                /* non mapped cpus used to fallback to node 0 */
> > +                props.node_id = 0;
> >              }
> > +
> > +            props.has_node_id = true;
> > +            machine_set_cpu_numa_node(machine, &props, &error_fatal);
> >          }
> >      }
> >      if (s->len) {
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index afcadac..873880d 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -2335,8 +2335,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
> >      srat->reserved1 = cpu_to_le32(1);
> >  
> >      for (i = 0; i < apic_ids->len; i++) {
> > -        int node_id = apic_ids->cpus[i].props.has_node_id ?
> > -            apic_ids->cpus[i].props.node_id : 0;
> > +        int node_id = apic_ids->cpus[i].props.node_id;  
> 
> What about an assert(props.has_node_id) here?
I considered it but dropped idea, as it's mostly useless
since build_srat() is called only when numa is enabled.

and sticking asserts in every place just in case doesn't
seem to me like good idea as it would lead to absurd 
asserts after every line.

Considering the new code behaves the same in case !has_node_id
I don't see any reason to put assert here, the same goes
for 'Ditto" comment in the next hunk.

But if you insist I can' add assert().

> 
> >          uint32_t apic_id = apic_ids->cpus[i].arch_id;
> >  
> >          if (apic_id < 255) {
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index cf09949..84f0a6f 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -788,9 +788,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
> >      for (i = 0; i < cpus->len; i++) {
> >          unsigned int apic_id = cpus->cpus[i].arch_id;
> >          assert(apic_id < pcms->apic_id_limit);
> > -        if (cpus->cpus[i].props.has_node_id) {
> > -            numa_fw_cfg[apic_id + 1] = cpu_to_le64(cpus->cpus[i].props.node_id);
> > -        }
> > +        numa_fw_cfg[apic_id + 1] = cpu_to_le64(cpus->cpus[i].props.node_id);  
> 
> Ditto.
> 
> >      }
> >      for (i = 0; i < nb_numa_nodes; i++) {
> >          numa_fw_cfg[pcms->apic_id_limit + 1 + i] =
> > diff --git a/numa.c b/numa.c
> > index e30702e..4ef6dea 100644
> > --- a/numa.c
> > +++ b/numa.c
> > @@ -513,17 +513,12 @@ void numa_cpu_pre_plug(const CPUArchId *slot, DeviceState *dev, Error **errp)
> >      int node_id = object_property_get_int(OBJECT(dev), "node-id", &error_abort);
> >  
> >      if (node_id == CPU_UNSET_NUMA_NODE_ID) {
> > -        /* by default CPUState::numa_node was 0 if it wasn't set explicitly
> > -         * TODO: make it error when incomplete numa mapping support is removed
> > -         */
> > -        node_id = 0;
> > -
> >          /* due to bug in libvirt, it doesn't pass node-id from props on
> >           * device_add as expected, so we have to fix it up here */
> >          if (slot->props.has_node_id) {
> >              node_id = slot->props.node_id;
> > +            object_property_set_int(OBJECT(dev), node_id, "node-id", errp);
> >          }
> > -        object_property_set_int(OBJECT(dev), node_id, "node-id", errp);  
> 
> Why not removing the has_node_id check completely, here?
because numa_cpu_pre_plug() is called unconditionally
As alternative, I can replace has_node_id check with assert
and do following from callers:

 if (nb_numa_nodes) {
     numa_cpu_pre_plug()
 }


> In case this patch get respun, I suggest also removing the
> has_node_id check at: build_cpus_aml() _PXM code.
that would be wrong, build_cpus_aml() is called for both
numa and non numa use-cases, there shouldn't be _PXM
object if numa is not enabled, hence the check.

 
> >      } else if (node_id != slot->props.node_id) {
> >          error_setg(errp, "node-id=%d must match numa node specified "
> >                     "with -numa option", node_id);
> > -- 
> > 2.7.4
> > 
> >   
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/5] numa: fallback to default NUMA node instead of node 0
  2017-05-26 14:58       ` Eduardo Habkost
  2017-05-29  9:12         ` Igor Mammedov
@ 2017-05-29 12:11         ` Igor Mammedov
  1 sibling, 0 replies; 19+ messages in thread
From: Igor Mammedov @ 2017-05-29 12:11 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Andrew Jones, qemu-arm, qemu-ppc, qemu-devel, David Gibson

On Fri, 26 May 2017 11:58:14 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Tue, May 23, 2017 at 05:44:03PM +0200, Igor Mammedov wrote:
> > On Tue, 23 May 2017 11:48:54 -0300
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> >   
> > > On Tue, May 23, 2017 at 04:38:49PM +0200, Igor Mammedov wrote:  
> > > > Do the same as we did in commit
> > > >   (57924bcd8 numa: introduce machine callback for VCPU to node mapping)
> > > > but only for incomplete mapping usecase, falling back to board
> > > > provided default cpu to node mapping if user hasn't provided
> > > > mapping for CPU explicitly.
> > > > 
> > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> > > 
> > > This breaks migration compatibility, doesn't it? I understand
> > > this use case is not supported, but if are going to break stuff
> > > for people using incomplete mappings (by rejecting the
> > > configuration on a future release), I would prefer to break it
> > > only once.  
> > it's firmware change and we generally don't care nor maintain firmware compat stuff
> > the same as with above mentioned 
> >  (57924bcd8 numa: introduce machine callback for VCPU to node mapping)  
> 
> 
> We normally keep resulting NUMA topology compatible, even if NUMA
> information is implemented only by firmware.  See how we set
> MachineClass::numa_auto_assign_ram and
> MachineClass::numa_mem_align_shift in older machine-types, for
> example.
it's relative new and pure firmaware change wrt arm/i386,
but it might influence ABI for spapr, my guess is that's why
it was compat flags were introduced
(otherwise they shouldn't have been added).
So I wouldn't consider this precedent as policy change
wrt firmware changes.

 
> Sometimes we didn't keep compatibility, though (e.g. on commit
> fb43b73b92 "pc: fix default VCPU to NUMA node mapping").
and more
(27111931 acpi: provide _PXM method for CPU devices if QEMU is started numa enabled)
and this sometimes is more consistent with other firmware
changes that are not versioned and the general approach
(I don't see why numa is any more special that any other
subsystem).

> I wouldn't disagree with this change if we planned to support
> incomplete mappings forever.  But if we are going to remove
> support for incomplete mappings soon, I would prefer to break
> user expectations only once.
My take on why we can't remove incomplete mappings without
obsoleting them first is not because of firmware changes but
that it breaks wrong CLI (i.e. scripts with incomplete mapping
will fail to start guest). So it's not the same vs firmware
behavior change.

I'm agreeing to drop this patch just because I don't care much
about this particular fixup as it's small isolated part in one place
and doesn't affect the rest of the code.


> > >   
> > > > ---
> > > >  hw/core/machine.c | 3 ---
> > > >  1 file changed, 3 deletions(-)
> > > > 
> > > > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > > > index 964b75d..b8df15f 100644
> > > > --- a/hw/core/machine.c
> > > > +++ b/hw/core/machine.c
> > > > @@ -730,9 +730,6 @@ static void machine_numa_finish_init(MachineState *machine)
> > > >                  g_string_append_printf(s, "%sCPU %d [%s]",
> > > >                                         s->len ? ", " : "", i, cpu_str);
> > > >                  g_free(cpu_str);
> > > > -
> > > > -                /* non mapped cpus used to fallback to node 0 */
> > > > -                props.node_id = 0;
> > > >              }
> > > >  
> > > >              props.has_node_id = true;
> > > > -- 
> > > > 2.7.4
> > > >   
> > >   
> > 
> >   
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/5] numa: move numa_node from CPUState into target specific classes
  2017-05-26 18:25   ` Eduardo Habkost
@ 2017-05-29 12:12     ` Igor Mammedov
  0 siblings, 0 replies; 19+ messages in thread
From: Igor Mammedov @ 2017-05-29 12:12 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: qemu-devel, qemu-arm, qemu-ppc, David Gibson, Andrew Jones

On Fri, 26 May 2017 15:25:22 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Tue, May 23, 2017 at 04:38:50PM +0200, Igor Mammedov wrote:
> > Move vcpu's assocciated numa_node field out of generic CPUState
> > into inherited classes that actually care about cpu<->numa mapping,
> > i.e: ARMCPU, PowerPCCPU, X86CPU.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  include/qom/cpu.h       |  2 --
> >  target/arm/cpu.h        |  2 ++
> >  target/i386/cpu.h       |  1 +
> >  target/ppc/cpu.h        |  1 +
> >  hw/ppc/spapr.c          | 24 +++++++++++-------------
> >  hw/ppc/spapr_cpu_core.c |  4 +++-
> >  monitor.c               | 11 +++++++----
> >  target/arm/cpu.c        |  2 +-
> >  target/i386/cpu.c       |  2 +-
> >  9 files changed, 27 insertions(+), 22 deletions(-)
> > 
> > diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> > index 55214ce..89ddb68 100644
> > --- a/include/qom/cpu.h
> > +++ b/include/qom/cpu.h
> > @@ -265,7 +265,6 @@ struct qemu_work_item;
> >   * @cpu_index: CPU index (informative).
> >   * @nr_cores: Number of cores within this CPU package.
> >   * @nr_threads: Number of threads within this CPU.
> > - * @numa_node: NUMA node this CPU is belonging to.
> >   * @host_tid: Host thread ID.
> >   * @running: #true if CPU is currently running (lockless).
> >   * @has_waiter: #true if a CPU is currently waiting for the cpu_exec_end;
> > @@ -314,7 +313,6 @@ struct CPUState {
> >  
> >      int nr_cores;
> >      int nr_threads;
> > -    int numa_node;
> >  
> >      struct QemuThread *thread;
> >  #ifdef _WIN32
> > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > index 048faed..5ffc9d8 100644
> > --- a/target/arm/cpu.h
> > +++ b/target/arm/cpu.h
> > @@ -703,6 +703,8 @@ struct ARMCPU {
> >  
> >      ARMELChangeHook *el_change_hook;
> >      void *el_change_hook_opaque;
> > +
> > +    int32_t node_id; /* NUMA node this CPU is belonging to */
> >  };
> >  
> >  static inline ARMCPU *arm_env_get_cpu(CPUARMState *env)
> > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > index c4602ca..00f35a0 100644
> > --- a/target/i386/cpu.h
> > +++ b/target/i386/cpu.h
> > @@ -1278,6 +1278,7 @@ struct X86CPU {
> >      int32_t socket_id;
> >      int32_t core_id;
> >      int32_t thread_id;
> > +    int32_t node_id; /* NUMA node this CPU is belonging to */  
> 
> I suggest placing this before socket_id, so it follows the
> logical topology order: node, socket, core, thread.
ok

> 
> >  };
> >  
> >  static inline X86CPU *x86_env_get_cpu(CPUX86State *env)
> > diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> > index 401e10e..31c052d 100644
> > --- a/target/ppc/cpu.h
> > +++ b/target/ppc/cpu.h
> > @@ -1205,6 +1205,7 @@ struct PowerPCCPU {
> >      uint32_t compat_pvr;
> >      PPCVirtualHypervisor *vhyp;
> >      Object *intc;
> > +    int32_t node_id; /* NUMA node this CPU is belonging to */
> >  
> >      /* Fields related to migration compatibility hacks */
> >      bool pre_2_8_migration;
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index c7fee8b..96a2a74 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -178,25 +178,19 @@ static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
> >      return ret;
> >  }
> >  
> > -static int spapr_fixup_cpu_numa_dt(void *fdt, int offset, CPUState *cs)
> > +static int spapr_fixup_cpu_numa_dt(void *fdt, int offset, PowerPCCPU *cpu)
> >  {
> > -    int ret = 0;
> > -    PowerPCCPU *cpu = POWERPC_CPU(cs);
> >      int index = ppc_get_vcpu_dt_id(cpu);
> >      uint32_t associativity[] = {cpu_to_be32(0x5),
> >                                  cpu_to_be32(0x0),
> >                                  cpu_to_be32(0x0),
> >                                  cpu_to_be32(0x0),
> > -                                cpu_to_be32(cs->numa_node),
> > +                                cpu_to_be32(cpu->node_id),
> >                                  cpu_to_be32(index)};
> >  
> >      /* Advertise NUMA via ibm,associativity */
> > -    if (nb_numa_nodes > 1) {
> > -        ret = fdt_setprop(fdt, offset, "ibm,associativity", associativity,  
> 
> Commit message says we're just moving fields to other structs.
> If you want to change the existing logic, please do it in a
> separate patch.
ok, I'll split the patch.

> 
> > +    return fdt_setprop(fdt, offset, "ibm,associativity", associativity,
> >                            sizeof(associativity));
> > -    }
> > -
> > -    return ret;
> >  }
> >  
> >  /* Populate the "ibm,pa-features" property */
> > @@ -321,9 +315,11 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPRMachineState *spapr)
> >              return ret;
> >          }
> >  
> > -        ret = spapr_fixup_cpu_numa_dt(fdt, offset, cs);
> > -        if (ret < 0) {
> > -            return ret;
> > +        if (nb_numa_nodes > 1) {
> > +            ret = spapr_fixup_cpu_numa_dt(fdt, offset, cpu);  
> 
> Ditto.
> 
> > +            if (ret < 0) {
> > +                return ret;
> > +            }
> >          }
> >  
> >          ret = spapr_fixup_cpu_smt_dt(fdt, offset, cpu, compat_smt);
> > @@ -538,7 +534,9 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset,
> >      _FDT((fdt_setprop(fdt, offset, "ibm,pft-size",
> >                        pft_size_prop, sizeof(pft_size_prop))));
> >  
> > -    _FDT(spapr_fixup_cpu_numa_dt(fdt, offset, cs));
> > +    if (nb_numa_nodes > 1) {  
> 
> Ditto.
> 
> > +        _FDT(spapr_fixup_cpu_numa_dt(fdt, offset, cpu));
> > +    }
> >  
> >      _FDT(spapr_fixup_cpu_smt_dt(fdt, offset, cpu, compat_smt));
> >  
> > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> > index a17ea07..60baf02 100644
> > --- a/hw/ppc/spapr_cpu_core.c
> > +++ b/hw/ppc/spapr_cpu_core.c
> > @@ -183,15 +183,17 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error **errp)
> >      for (i = 0; i < cc->nr_threads; i++) {
> >          char id[32];
> >          CPUState *cs;
> > +        PowerPCCPU *cpu;
> >  
> >          obj = sc->threads + i * size;
> >  
> >          object_initialize(obj, size, typename);
> >          cs = CPU(obj);
> > +        cpu = POWERPC_CPU(cs);
> >          cs->cpu_index = cc->core_id + i;
> >  
> >          /* Set NUMA node for the threads belonged to core  */
> > -        cs->numa_node = sc->node_id;
> > +        cpu->node_id = sc->node_id;
> >  
> >          snprintf(id, sizeof(id), "thread[%d]", i);
> >          object_property_add_child(OBJECT(sc), id, obj, &local_err);
> > diff --git a/monitor.c b/monitor.c
> > index afbacfe..b053c40 100644
> > --- a/monitor.c
> > +++ b/monitor.c
> > @@ -1697,23 +1697,26 @@ static void hmp_info_mtree(Monitor *mon, const QDict *qdict)
> >  static void hmp_info_numa(Monitor *mon, const QDict *qdict)
> >  {
> >      int i;
> > -    CPUState *cpu;
> >      uint64_t *node_mem;
> > +    CpuInfoList *cpu_list, *cpu;
> >  
> > +    cpu_list = qmp_query_cpus(&error_abort);
> >      node_mem = g_new0(uint64_t, nb_numa_nodes);
> >      query_numa_node_mem(node_mem);
> >      monitor_printf(mon, "%d nodes\n", nb_numa_nodes);
> >      for (i = 0; i < nb_numa_nodes; i++) {
> >          monitor_printf(mon, "node %d cpus:", i);
> > -        CPU_FOREACH(cpu) {
> > -            if (cpu->numa_node == i) {
> > -                monitor_printf(mon, " %d", cpu->cpu_index);
> > +        for (cpu = cpu_list; cpu; cpu = cpu->next) {
> > +            if (cpu->value->has_props && cpu->value->props->has_node_id &&
> > +                cpu->value->props->node_id == i) {
> > +                monitor_printf(mon, " %" PRIi64, cpu->value->CPU);  
> 
> I suggest moving this to a separate patch, too.
ok

> 
> >              }
> >          }
> >          monitor_printf(mon, "\n");
> >          monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i,
> >                         node_mem[i] >> 20);
> >      }
> > +    qapi_free_CpuInfoList(cpu_list);
> >      g_free(node_mem);
> >  }
> >  
> > diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> > index c185eb1..09ef3a6 100644
> > --- a/target/arm/cpu.c
> > +++ b/target/arm/cpu.c
> > @@ -1573,7 +1573,7 @@ static Property arm_cpu_properties[] = {
> >      DEFINE_PROP_UINT32("midr", ARMCPU, midr, 0),
> >      DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
> >                          mp_affinity, ARM64_AFFINITY_INVALID),
> > -    DEFINE_PROP_INT32("node-id", CPUState, numa_node, CPU_UNSET_NUMA_NODE_ID),
> > +    DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
> >      DEFINE_PROP_END_OF_LIST()
> >  };
> >  
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index a41d595..ffb5267 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -3986,7 +3986,7 @@ static Property x86_cpu_properties[] = {
> >      DEFINE_PROP_INT32("core-id", X86CPU, core_id, -1),
> >      DEFINE_PROP_INT32("socket-id", X86CPU, socket_id, -1),
> >  #endif
> > -    DEFINE_PROP_INT32("node-id", CPUState, numa_node, CPU_UNSET_NUMA_NODE_ID),
> > +    DEFINE_PROP_INT32("node-id", X86CPU, node_id, CPU_UNSET_NUMA_NODE_ID),
> >      DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false),
> >      { .name  = "hv-spinlocks", .info  = &qdev_prop_spinlocks },
> >      DEFINE_PROP_BOOL("hv-relaxed", X86CPU, hyperv_relaxed_timing, false),
> > -- 
> > 2.7.4
> >   
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/5] numa: make sure that all cpus in has has_node_id set if numa is enabled
  2017-05-29 11:45     ` Igor Mammedov
@ 2017-05-29 13:22       ` Eduardo Habkost
  0 siblings, 0 replies; 19+ messages in thread
From: Eduardo Habkost @ 2017-05-29 13:22 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, Andrew Jones, qemu-arm, qemu-ppc, David Gibson

On Mon, May 29, 2017 at 01:45:36PM +0200, Igor Mammedov wrote:
> On Fri, 26 May 2017 13:06:30 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Tue, May 23, 2017 at 04:38:48PM +0200, Igor Mammedov wrote:
> > > It fixes/add missing _PXM object for non mapped CPU (x86)
> > > and missing fdt node (virt-arm).
> > > 
> > > It ensures that possible_cpus contains complete mapping if
> > > numa is enabled by the time machine_init() is executed.
> > > 
> > > As result non completely mapped CPUs:
> > >  1) appear in ACPI/fdt blobs
> > >  2) QMP query-hotpluggable-cpus command shows bound nodes for such CPUs
> > >  3) allows to drop checks for has_node_id in numa only code,
> > >    reducing number of invariants incomplete mapping could produce
> > >  4) moves fixup/implicit node init from runtime numa_cpu_pre_plug()
> > >    (when CPU object is created) to machine_numa_finish_init() which
> > >    helps to fix [1, 2] and make possible_cpus complete source
> > >    of numa mapping available even before CPUs are created.
> > > 
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> > 
> > Code that will be affected, but will keep the same behavior:
> > 
> > * pre_plug handler when node-id is omitted on device_add: it
> >   already sets node_id=0 if !has_node_id
> >   * However, pre_plug behavior is being changed when node-id is
> >     set. See below.
> > * build_srat() (both x86 and arm): already sets node_id = 0 if
> >   !has_node_id
> > * bochs_bios_init(): already uses g_new0(), so already had
> >   omitted entries set to 0.
> > 
> > Affected code that will change behavior with this patch:
> > 
> > * pre_plug handler when node-id is set:
> >   * See my reply to patch 1/5.  This patch restores the old
> >     behavior that rejected node-id != 0 on omitted CPUs.
> I'll look into it.
> 
> > * ACPI _PXM initialization
> > * arm/virt.c: numa-node-id will now be set for all CPUs
> > * QMP/HMP query-hotpluggable-cpus output
> >   * Desirable change, as now the QMP command will reflect what
> >     the guest is really seeing
> > 
> > Due to the _PXM and FDT changes, I have the same objection I have
> > for patch 4/5: if we are going to break user expectations when
> > using incomplete NUMA mappings, let's do that only once (when we
> > start rejecting incomplete NUMA mappings).
> 
> FDT/PXM change is bugfix to firmware that I insist on.
> Firmware should properly describe all CPUs instead of repeating
> partial CPU mapping nonsense.
> 
> Wrt _PXM change see
> (27111931 acpi: provide _PXM method for CPU devices if QEMU is started numa enabled)
> provided we keep 0 node fallback by dropping 4/5
> linux guest will behave the same as without _PXM.

If it is a bug fix, then I agree with it.  Also, this doesn't
change the topology of the virtual hardware, so it shouldn't
break user expectations anyway.

> 
> And again there wasn't any compat code around that
> and guest saw node-id change when it's been restarted.
> 
> Wrt FDT change, it's the same and I also talked with
> Drew about it, his reply was that for ARM target
> they don't care much about NUMA as not all pieces are
> there to make pinning work on host side yet.
> 
> PS:
> There wasn't any promise made to keep firmware side
> bug compatible or versioned so users should not have
> expectations about it in the first place.

You are right about this.

> 
> Pls don't push for bug compat stuff in the firmware
> as it will rise false expectations (at least until
> we decide to care about it by default).

I won't.  Patch 4/5 is different because it makes firmware
describe different hardware.  But this one is firmware-only, so
you are right.

> 
> (I'd use firmware compat stuff only in cases where
> it breaks guest horribly, i.e. crash/fail boot)
> 
> 
> > A few extra comments about the has_node_id checks below:
> > 
> > > ---
> > >  hw/arm/virt-acpi-build.c |  4 +---
> > >  hw/core/machine.c        | 16 ++++++++++------
> > >  hw/i386/acpi-build.c     |  3 +--
> > >  hw/i386/pc.c             |  4 +---
> > >  numa.c                   |  7 +------
> > >  5 files changed, 14 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > > index e585206..977a794 100644
> > > --- a/hw/arm/virt-acpi-build.c
> > > +++ b/hw/arm/virt-acpi-build.c
> > > @@ -496,12 +496,10 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> > >      srat->reserved1 = cpu_to_le32(1);
> > >  
> > >      for (i = 0; i < cpu_list->len; ++i) {
> > > -        int node_id = cpu_list->cpus[i].props.has_node_id ?
> > > -            cpu_list->cpus[i].props.node_id : 0;
> > >          core = acpi_data_push(table_data, sizeof(*core));
> > >          core->type = ACPI_SRAT_PROCESSOR_GICC;
> > >          core->length = sizeof(*core);
> > > -        core->proximity = cpu_to_le32(node_id);
> > > +        core->proximity = cpu_to_le32(cpu_list->cpus[i].props.node_id);
> > >          core->acpi_processor_uid = cpu_to_le32(i);
> > >          core->flags = cpu_to_le32(1);
> > >      }
> > > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > > index aaf3cff..964b75d 100644
> > > --- a/hw/core/machine.c
> > > +++ b/hw/core/machine.c
> > > @@ -720,19 +720,23 @@ static void machine_numa_finish_init(MachineState *machine)
> > >          const CPUArchId *cpu_slot = &possible_cpus->cpus[i];
> > >  
> > >          if (!cpu_slot->props.has_node_id) {
> > > -            if (default_mapping) {
> > > -                /* fetch default mapping from board and enable it */
> > > -                CpuInstanceProperties props = cpu_slot->props;
> > > -                props.has_node_id = true;
> > > -                machine_set_cpu_numa_node(machine, &props, &error_fatal);
> > > -            } else {
> > > +            /* fetch default mapping from board and enable it */
> > > +            CpuInstanceProperties props = cpu_slot->props;
> > > +
> > > +            if (!default_mapping) {
> > >                  /* record slots with not set mapping,
> > >                   * TODO: make it hard error in future */
> > >                  char *cpu_str = cpu_slot_to_string(cpu_slot);
> > >                  g_string_append_printf(s, "%sCPU %d [%s]",
> > >                                         s->len ? ", " : "", i, cpu_str);
> > >                  g_free(cpu_str);
> > > +
> > > +                /* non mapped cpus used to fallback to node 0 */
> > > +                props.node_id = 0;
> > >              }
> > > +
> > > +            props.has_node_id = true;
> > > +            machine_set_cpu_numa_node(machine, &props, &error_fatal);
> > >          }
> > >      }
> > >      if (s->len) {
> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > index afcadac..873880d 100644
> > > --- a/hw/i386/acpi-build.c
> > > +++ b/hw/i386/acpi-build.c
> > > @@ -2335,8 +2335,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
> > >      srat->reserved1 = cpu_to_le32(1);
> > >  
> > >      for (i = 0; i < apic_ids->len; i++) {
> > > -        int node_id = apic_ids->cpus[i].props.has_node_id ?
> > > -            apic_ids->cpus[i].props.node_id : 0;
> > > +        int node_id = apic_ids->cpus[i].props.node_id;  
> > 
> > What about an assert(props.has_node_id) here?
> I considered it but dropped idea, as it's mostly useless
> since build_srat() is called only when numa is enabled.

I just suggested it because I would be more confident that the
new code that ensures has_node_id is set for all slots is working
and won't break if we refactor it.  But not a big deal.

> 
> and sticking asserts in every place just in case doesn't
> seem to me like good idea as it would lead to absurd 
> asserts after every line.
> 
> Considering the new code behaves the same in case !has_node_id
> I don't see any reason to put assert here, the same goes
> for 'Ditto" comment in the next hunk.
> 
> But if you insist I can' add assert().

I won't mind if you decide to not add it.

> 
[...]
> > >      }
> > >      for (i = 0; i < nb_numa_nodes; i++) {
> > >          numa_fw_cfg[pcms->apic_id_limit + 1 + i] =
> > > diff --git a/numa.c b/numa.c
> > > index e30702e..4ef6dea 100644
> > > --- a/numa.c
> > > +++ b/numa.c
> > > @@ -513,17 +513,12 @@ void numa_cpu_pre_plug(const CPUArchId *slot, DeviceState *dev, Error **errp)
> > >      int node_id = object_property_get_int(OBJECT(dev), "node-id", &error_abort);
> > >  
> > >      if (node_id == CPU_UNSET_NUMA_NODE_ID) {
> > > -        /* by default CPUState::numa_node was 0 if it wasn't set explicitly
> > > -         * TODO: make it error when incomplete numa mapping support is removed
> > > -         */
> > > -        node_id = 0;
> > > -
> > >          /* due to bug in libvirt, it doesn't pass node-id from props on
> > >           * device_add as expected, so we have to fix it up here */
> > >          if (slot->props.has_node_id) {
> > >              node_id = slot->props.node_id;
> > > +            object_property_set_int(OBJECT(dev), node_id, "node-id", errp);
> > >          }
> > > -        object_property_set_int(OBJECT(dev), node_id, "node-id", errp);  
> > 
> > Why not removing the has_node_id check completely, here?
> because numa_cpu_pre_plug() is called unconditionally
> As alternative, I can replace has_node_id check with assert
> and do following from callers:
> 
>  if (nb_numa_nodes) {
>      numa_cpu_pre_plug()
>  }
> 
> 
> > In case this patch get respun, I suggest also removing the
> > has_node_id check at: build_cpus_aml() _PXM code.
> that would be wrong, build_cpus_aml() is called for both
> numa and non numa use-cases, there shouldn't be _PXM
> object if numa is not enabled, hence the check.
> 
>  

Oh, I forgot about the non-NUMA cases.  You're right.


-- 
Eduardo

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-05-29 13:22 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-23 14:38 [Qemu-devel] [PATCH v2 0/5] numa: code consolidation and fixes Igor Mammedov
2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 1/5] numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr Igor Mammedov
2017-05-26 15:01   ` Eduardo Habkost
2017-05-26 15:33   ` Eduardo Habkost
2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 2/5] numa: move default mapping init to machine Igor Mammedov
2017-05-26 15:02   ` Eduardo Habkost
2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 3/5] numa: make sure that all cpus in has has_node_id set if numa is enabled Igor Mammedov
2017-05-26 16:06   ` Eduardo Habkost
2017-05-29 11:45     ` Igor Mammedov
2017-05-29 13:22       ` Eduardo Habkost
2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 4/5] numa: fallback to default NUMA node instead of node 0 Igor Mammedov
2017-05-23 14:48   ` Eduardo Habkost
2017-05-23 15:44     ` Igor Mammedov
2017-05-26 14:58       ` Eduardo Habkost
2017-05-29  9:12         ` Igor Mammedov
2017-05-29 12:11         ` Igor Mammedov
2017-05-23 14:38 ` [Qemu-devel] [PATCH v2 5/5] numa: move numa_node from CPUState into target specific classes Igor Mammedov
2017-05-26 18:25   ` Eduardo Habkost
2017-05-29 12:12     ` Igor Mammedov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.