All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] pseries NUMA distance rework
@ 2020-09-01 12:56 Daniel Henrique Barboza
  2020-09-01 12:56 ` [PATCH v2 1/7] ppc: introducing spapr_numa.c NUMA code helper Daniel Henrique Barboza
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Daniel Henrique Barboza @ 2020-09-01 12:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, david

Hi,

Following the reviews of the first version [1], specially this
reply from David [2], I decided to take a step back and refactor
all the code in hw/ppc/spapr* that operates with ibm,associativity,
ibm,associativity-reference-points and ibm,max-associativity-domains.

A new file named 'spapr_numa.c' was created to gather all the
associativity related code into helpers that write NUMA/associativity
related info to the FDT. These helpers are then used in other
spapr_* files. This allows us to change NUMA related code in a
single location, instead of searching every file to see where is
associativity being written and how, and all the soon to get
more complex logic can be contained in spapr_numa.c. I consider
the end result to be better than what I ended up doing in v1.

Unlike v1, there is no NUMA distance change being done in this series.
Later on, the hub of the new NUMA distance calculation will be
spapr_numa_associativity_init(), where we'll take into consideration
user input from numa_states, handle sizes to what the PAPR kernel
understands and establish assoaciativity domains between the NUMA nodes.


Changes from v1:
- all the patches that did guest visible changes were removed. They
will be re-submitted in a follow-up series after this one.
- patch 02 from v1 will be reworked and reposted in the follow-up
series as well.
- version 1 link:
https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg03169.html



These patches were rebased using David's ppc-for-5.2 tree. Github
repo with the patches applied:

https://github.com/danielhb/qemu/tree/spapr_numa_v2


[1] https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg03169.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg04661.html
 
Daniel Henrique Barboza (7):
  ppc: introducing spapr_numa.c NUMA code helper
  ppc/spapr_nvdimm: turn spapr_dt_nvdimm() static
  spapr: introduce SpaprMachineClass::numa_assoc_array
  spapr, spapr_numa: handle vcpu ibm,associativity
  spapr, spapr_numa: move lookup-arrays handling to spapr_numa.c
  spapr_numa: move NVLink2 associativity handling to spapr_numa.c
  spapr_hcall: h_home_node_associativity now reads numa_assoc_array

 hw/ppc/meson.build            |   3 +-
 hw/ppc/spapr.c                |  91 +++---------------
 hw/ppc/spapr_hcall.c          |  16 +++-
 hw/ppc/spapr_numa.c           | 172 ++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_nvdimm.c         |  37 ++++----
 hw/ppc/spapr_pci.c            |   9 +-
 hw/ppc/spapr_pci_nvlink2.c    |  19 +---
 include/hw/ppc/spapr.h        |  13 ++-
 include/hw/ppc/spapr_numa.h   |  32 +++++++
 include/hw/ppc/spapr_nvdimm.h |   3 +-
 10 files changed, 268 insertions(+), 127 deletions(-)
 create mode 100644 hw/ppc/spapr_numa.c
 create mode 100644 include/hw/ppc/spapr_numa.h

-- 
2.26.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/7] ppc: introducing spapr_numa.c NUMA code helper
  2020-09-01 12:56 [PATCH v2 0/7] pseries NUMA distance rework Daniel Henrique Barboza
@ 2020-09-01 12:56 ` Daniel Henrique Barboza
  2020-09-01 12:56 ` [PATCH v2 2/7] ppc/spapr_nvdimm: turn spapr_dt_nvdimm() static Daniel Henrique Barboza
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Daniel Henrique Barboza @ 2020-09-01 12:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, david

We're going to make changes in how spapr handles all
ibm,associativity* related properties to enhance our current NUMA
support.

At this moment we have associativity code scattered all around
spapr_* files, with hardcoded values and array sizes. This
makes it harder to change any NUMA specific parameters in
the future. Having everything in the same place allows not
only for easier tuning, but also easier understanding since all
NUMA related code is on the same file.

This patch introduces a new file to gather all NUMA/associativity
handling code in spapr, spapr_numa.c. To get things started, let's
remove associativity-reference-points and max-associativity-domains
code from spapr_dt_rtas() to a new helper called spapr_numa_write_rtas_dt().
This will decouple spapr_dt_rtas() from the NUMA changes that
are going to happen in those two properties.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 hw/ppc/meson.build          |  3 ++-
 hw/ppc/spapr.c              | 26 ++-----------------
 hw/ppc/spapr_numa.c         | 50 +++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_numa.h | 20 +++++++++++++++
 4 files changed, 74 insertions(+), 25 deletions(-)
 create mode 100644 hw/ppc/spapr_numa.c
 create mode 100644 include/hw/ppc/spapr_numa.h

diff --git a/hw/ppc/meson.build b/hw/ppc/meson.build
index 918969b320..ffa2ec37fa 100644
--- a/hw/ppc/meson.build
+++ b/hw/ppc/meson.build
@@ -25,7 +25,8 @@ ppc_ss.add(when: 'CONFIG_PSERIES', if_true: files(
   'spapr_irq.c',
   'spapr_tpm_proxy.c',
   'spapr_nvdimm.c',
-  'spapr_rtas_ddw.c'
+  'spapr_rtas_ddw.c',
+  'spapr_numa.c',
 ))
 ppc_ss.add(when: 'CONFIG_SPAPR_RNG', if_true: files('spapr_rng.c'))
 ppc_ss.add(when: ['CONFIG_PSERIES', 'CONFIG_LINUX'], if_true: files(
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b0a04443fb..a45912acac 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -81,6 +81,7 @@
 #include "hw/mem/memory-device.h"
 #include "hw/ppc/spapr_tpm_proxy.h"
 #include "hw/ppc/spapr_nvdimm.h"
+#include "hw/ppc/spapr_numa.h"
 
 #include "monitor/monitor.h"
 
@@ -891,16 +892,9 @@ static int spapr_dt_rng(void *fdt)
 static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt)
 {
     MachineState *ms = MACHINE(spapr);
-    SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms);
     int rtas;
     GString *hypertas = g_string_sized_new(256);
     GString *qemu_hypertas = g_string_sized_new(256);
-    uint32_t refpoints[] = {
-        cpu_to_be32(0x4),
-        cpu_to_be32(0x4),
-        cpu_to_be32(0x2),
-    };
-    uint32_t nr_refpoints = ARRAY_SIZE(refpoints);
     uint64_t max_device_addr = MACHINE(spapr)->device_memory->base +
         memory_region_size(&MACHINE(spapr)->device_memory->mr);
     uint32_t lrdr_capacity[] = {
@@ -910,14 +904,6 @@ static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt)
         cpu_to_be32(SPAPR_MEMORY_BLOCK_SIZE & 0xffffffff),
         cpu_to_be32(ms->smp.max_cpus / ms->smp.threads),
     };
-    uint32_t maxdomain = cpu_to_be32(spapr->gpu_numa_id > 1 ? 1 : 0);
-    uint32_t maxdomains[] = {
-        cpu_to_be32(4),
-        maxdomain,
-        maxdomain,
-        maxdomain,
-        cpu_to_be32(spapr->gpu_numa_id),
-    };
 
     _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas"));
 
@@ -953,15 +939,7 @@ static void spapr_dt_rtas(SpaprMachineState *spapr, void *fdt)
                      qemu_hypertas->str, qemu_hypertas->len));
     g_string_free(qemu_hypertas, TRUE);
 
-    if (smc->pre_5_1_assoc_refpoints) {
-        nr_refpoints = 2;
-    }
-
-    _FDT(fdt_setprop(fdt, rtas, "ibm,associativity-reference-points",
-                     refpoints, nr_refpoints * sizeof(refpoints[0])));
-
-    _FDT(fdt_setprop(fdt, rtas, "ibm,max-associativity-domains",
-                     maxdomains, sizeof(maxdomains)));
+    spapr_numa_write_rtas_dt(spapr, fdt, rtas);
 
     /*
      * FWNMI reserves RTAS_ERROR_LOG_MAX for the machine check error log,
diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
new file mode 100644
index 0000000000..cdf3288cbd
--- /dev/null
+++ b/hw/ppc/spapr_numa.c
@@ -0,0 +1,50 @@
+/*
+ * QEMU PowerPC pSeries Logical Partition NUMA associativity handling
+ *
+ * Copyright IBM Corp. 2020
+ *
+ * Authors:
+ *  Daniel Henrique Barboza      <danielhb413@gmail.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "hw/ppc/spapr_numa.h"
+#include "hw/ppc/fdt.h"
+
+/*
+ * Helper that writes ibm,associativity-reference-points and
+ * max-associativity-domains in the RTAS pointed by @rtas
+ * in the DT @fdt.
+ */
+void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas)
+{
+    SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+    uint32_t refpoints[] = {
+        cpu_to_be32(0x4),
+        cpu_to_be32(0x4),
+        cpu_to_be32(0x2),
+    };
+    uint32_t nr_refpoints = ARRAY_SIZE(refpoints);
+    uint32_t maxdomain = cpu_to_be32(spapr->gpu_numa_id > 1 ? 1 : 0);
+    uint32_t maxdomains[] = {
+        cpu_to_be32(4),
+        maxdomain,
+        maxdomain,
+        maxdomain,
+        cpu_to_be32(spapr->gpu_numa_id),
+    };
+
+    if (smc->pre_5_1_assoc_refpoints) {
+        nr_refpoints = 2;
+    }
+
+    _FDT(fdt_setprop(fdt, rtas, "ibm,associativity-reference-points",
+                     refpoints, nr_refpoints * sizeof(refpoints[0])));
+
+    _FDT(fdt_setprop(fdt, rtas, "ibm,max-associativity-domains",
+                     maxdomains, sizeof(maxdomains)));
+}
diff --git a/include/hw/ppc/spapr_numa.h b/include/hw/ppc/spapr_numa.h
new file mode 100644
index 0000000000..7a370a8768
--- /dev/null
+++ b/include/hw/ppc/spapr_numa.h
@@ -0,0 +1,20 @@
+/*
+ * QEMU PowerPC pSeries Logical Partition NUMA associativity handling
+ *
+ * Copyright IBM Corp. 2020
+ *
+ * Authors:
+ *  Daniel Henrique Barboza      <danielhb413@gmail.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_SPAPR_NUMA_H
+#define HW_SPAPR_NUMA_H
+
+#include "hw/ppc/spapr.h"
+
+void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas);
+
+#endif /* HW_SPAPR_NUMA_H */
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/7] ppc/spapr_nvdimm: turn spapr_dt_nvdimm() static
  2020-09-01 12:56 [PATCH v2 0/7] pseries NUMA distance rework Daniel Henrique Barboza
  2020-09-01 12:56 ` [PATCH v2 1/7] ppc: introducing spapr_numa.c NUMA code helper Daniel Henrique Barboza
@ 2020-09-01 12:56 ` Daniel Henrique Barboza
  2020-09-01 12:56 ` [PATCH v2 3/7] spapr: introduce SpaprMachineClass::numa_assoc_array Daniel Henrique Barboza
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Daniel Henrique Barboza @ 2020-09-01 12:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, david

This function is only used inside spapr_nvdimm.c.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 hw/ppc/spapr_nvdimm.c         | 22 +++++++++++-----------
 include/hw/ppc/spapr_nvdimm.h |  1 -
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
index 95cbc30528..5188e2f503 100644
--- a/hw/ppc/spapr_nvdimm.c
+++ b/hw/ppc/spapr_nvdimm.c
@@ -106,16 +106,6 @@ void spapr_add_nvdimm(DeviceState *dev, uint64_t slot, Error **errp)
     }
 }
 
-int spapr_pmem_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
-                           void *fdt, int *fdt_start_offset, Error **errp)
-{
-    NVDIMMDevice *nvdimm = NVDIMM(drc->dev);
-
-    *fdt_start_offset = spapr_dt_nvdimm(fdt, 0, nvdimm);
-
-    return 0;
-}
-
 void spapr_create_nvdimm_dr_connectors(SpaprMachineState *spapr)
 {
     MachineState *machine = MACHINE(spapr);
@@ -127,7 +117,7 @@ void spapr_create_nvdimm_dr_connectors(SpaprMachineState *spapr)
 }
 
 
-int spapr_dt_nvdimm(void *fdt, int parent_offset,
+static int spapr_dt_nvdimm(void *fdt, int parent_offset,
                            NVDIMMDevice *nvdimm)
 {
     int child_offset;
@@ -184,6 +174,16 @@ int spapr_dt_nvdimm(void *fdt, int parent_offset,
     return child_offset;
 }
 
+int spapr_pmem_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
+                           void *fdt, int *fdt_start_offset, Error **errp)
+{
+    NVDIMMDevice *nvdimm = NVDIMM(drc->dev);
+
+    *fdt_start_offset = spapr_dt_nvdimm(fdt, 0, nvdimm);
+
+    return 0;
+}
+
 void spapr_dt_persistent_memory(void *fdt)
 {
     int offset = fdt_subnode_offset(fdt, 0, "persistent-memory");
diff --git a/include/hw/ppc/spapr_nvdimm.h b/include/hw/ppc/spapr_nvdimm.h
index fd1736634c..10a6d9dbbc 100644
--- a/include/hw/ppc/spapr_nvdimm.h
+++ b/include/hw/ppc/spapr_nvdimm.h
@@ -27,7 +27,6 @@ QEMU_BUILD_BUG_ON(SPAPR_MINIMUM_SCM_BLOCK_SIZE % SPAPR_MEMORY_BLOCK_SIZE);
 
 int spapr_pmem_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
                            void *fdt, int *fdt_start_offset, Error **errp);
-int spapr_dt_nvdimm(void *fdt, int parent_offset, NVDIMMDevice *nvdimm);
 void spapr_dt_persistent_memory(void *fdt);
 void spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm,
                            uint64_t size, Error **errp);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/7] spapr: introduce SpaprMachineClass::numa_assoc_array
  2020-09-01 12:56 [PATCH v2 0/7] pseries NUMA distance rework Daniel Henrique Barboza
  2020-09-01 12:56 ` [PATCH v2 1/7] ppc: introducing spapr_numa.c NUMA code helper Daniel Henrique Barboza
  2020-09-01 12:56 ` [PATCH v2 2/7] ppc/spapr_nvdimm: turn spapr_dt_nvdimm() static Daniel Henrique Barboza
@ 2020-09-01 12:56 ` Daniel Henrique Barboza
  2020-09-03  1:51   ` David Gibson
  2020-09-01 12:56 ` [PATCH v2 4/7] spapr, spapr_numa: handle vcpu ibm,associativity Daniel Henrique Barboza
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Daniel Henrique Barboza @ 2020-09-01 12:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, david

The next step to centralize all NUMA/associativity handling in
the spapr machine is to create a 'one stop place' for all
things ibm,associativity.

This patch introduces numa_assoc_array, a 2 dimensional array
that will store all ibm,associativity arrays of all NUMA nodes.
This array is initialized in a new spapr_numa_associativity_init()
function, called in spapr_machine_init(). It is being initialized
with the same values used in other ibm,associativity properties
around spapr files (i.e. all zeros, last value is node_id).
The idea is to remove all hardcoded definitions and FDT writes
of ibm,associativity arrays, doing instead a call to the new
helper spapr_numa_write_associativity_dt() helper, that will
be able to write the DT with the correct values.

We'll start small, handling the trivial cases first. The
remaining instances of ibm,associativity will be handled
next.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 hw/ppc/spapr.c                | 23 ++++++++++-------------
 hw/ppc/spapr_numa.c           | 32 ++++++++++++++++++++++++++++++++
 hw/ppc/spapr_nvdimm.c         | 19 +++++++------------
 hw/ppc/spapr_pci.c            |  9 ++-------
 include/hw/ppc/spapr.h        | 13 ++++++++++++-
 include/hw/ppc/spapr_numa.h   |  5 +++++
 include/hw/ppc/spapr_nvdimm.h |  2 +-
 7 files changed, 69 insertions(+), 34 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index a45912acac..fb9b2572fe 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -314,14 +314,9 @@ static void add_str(GString *s, const gchar *s1)
     g_string_append_len(s, s1, strlen(s1) + 1);
 }
 
-static int spapr_dt_memory_node(void *fdt, int nodeid, hwaddr start,
-                                hwaddr size)
+static int spapr_dt_memory_node(SpaprMachineState *spapr, void *fdt, int nodeid,
+                                hwaddr start, hwaddr size)
 {
-    uint32_t associativity[] = {
-        cpu_to_be32(0x4), /* length */
-        cpu_to_be32(0x0), cpu_to_be32(0x0),
-        cpu_to_be32(0x0), cpu_to_be32(nodeid)
-    };
     char mem_name[32];
     uint64_t mem_reg_property[2];
     int off;
@@ -335,8 +330,7 @@ static int spapr_dt_memory_node(void *fdt, int nodeid, hwaddr start,
     _FDT((fdt_setprop_string(fdt, off, "device_type", "memory")));
     _FDT((fdt_setprop(fdt, off, "reg", mem_reg_property,
                       sizeof(mem_reg_property))));
-    _FDT((fdt_setprop(fdt, off, "ibm,associativity", associativity,
-                      sizeof(associativity))));
+    spapr_numa_write_associativity_dt(spapr, fdt, off, nodeid);
     return off;
 }
 
@@ -649,7 +643,7 @@ static int spapr_dt_memory(SpaprMachineState *spapr, void *fdt)
         if (!mem_start) {
             /* spapr_machine_init() checks for rma_size <= node0_size
              * already */
-            spapr_dt_memory_node(fdt, i, 0, spapr->rma_size);
+            spapr_dt_memory_node(spapr, fdt, i, 0, spapr->rma_size);
             mem_start += spapr->rma_size;
             node_size -= spapr->rma_size;
         }
@@ -661,7 +655,7 @@ static int spapr_dt_memory(SpaprMachineState *spapr, void *fdt)
                 sizetmp = 1ULL << ctzl(mem_start);
             }
 
-            spapr_dt_memory_node(fdt, i, mem_start, sizetmp);
+            spapr_dt_memory_node(spapr, fdt, i, mem_start, sizetmp);
             node_size -= sizetmp;
             mem_start += sizetmp;
         }
@@ -1275,7 +1269,7 @@ void *spapr_build_fdt(SpaprMachineState *spapr, bool reset, size_t space)
 
     /* NVDIMM devices */
     if (mc->nvdimm_supported) {
-        spapr_dt_persistent_memory(fdt);
+        spapr_dt_persistent_memory(spapr, fdt);
     }
 
     return fdt;
@@ -2810,6 +2804,9 @@ static void spapr_machine_init(MachineState *machine)
      */
     spapr->gpu_numa_id = MAX(1, machine->numa_state->num_nodes);
 
+    /* Init numa_assoc_array */
+    spapr_numa_associativity_init(machine);
+
     if ((!kvm_enabled() || kvmppc_has_cap_mmu_radix()) &&
         ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_00, 0,
                               spapr->max_compat_pvr)) {
@@ -3394,7 +3391,7 @@ int spapr_lmb_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
     addr = spapr_drc_index(drc) * SPAPR_MEMORY_BLOCK_SIZE;
     node = object_property_get_uint(OBJECT(drc->dev), PC_DIMM_NODE_PROP,
                                     &error_abort);
-    *fdt_start_offset = spapr_dt_memory_node(fdt, node, addr,
+    *fdt_start_offset = spapr_dt_memory_node(spapr, fdt, node, addr,
                                              SPAPR_MEMORY_BLOCK_SIZE);
     return 0;
 }
diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
index cdf3288cbd..2cfe13eaed 100644
--- a/hw/ppc/spapr_numa.c
+++ b/hw/ppc/spapr_numa.c
@@ -15,6 +15,38 @@
 #include "hw/ppc/spapr_numa.h"
 #include "hw/ppc/fdt.h"
 
+
+void spapr_numa_associativity_init(MachineState *machine)
+{
+    SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine);
+    int nb_numa_nodes = machine->numa_state->num_nodes;
+    int i;
+
+    /*
+     * For all associativity arrays: first position is the size,
+     * position MAX_DISTANCE_REF_POINTS is always the numa_id,
+     * represented by the index 'i'.
+     *
+     * This will break on sparse NUMA setups, when/if QEMU starts
+     * to support it, because there will be no more guarantee that
+     * 'i' will be a valid node_id set by the user.
+     */
+    for (i = 0; i < nb_numa_nodes; i++) {
+        smc->numa_assoc_array[i][0] = cpu_to_be32(MAX_DISTANCE_REF_POINTS);
+        smc->numa_assoc_array[i][MAX_DISTANCE_REF_POINTS] = cpu_to_be32(i);
+    }
+}
+
+void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,
+                                       int offset, int nodeid)
+{
+    SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+
+    _FDT((fdt_setprop(fdt, offset, "ibm,associativity",
+                      smc->numa_assoc_array[nodeid],
+                      sizeof(smc->numa_assoc_array[nodeid]))));
+}
+
 /*
  * Helper that writes ibm,associativity-reference-points and
  * max-associativity-domains in the RTAS pointed by @rtas
diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
index 5188e2f503..63872054f3 100644
--- a/hw/ppc/spapr_nvdimm.c
+++ b/hw/ppc/spapr_nvdimm.c
@@ -31,6 +31,7 @@
 #include "hw/ppc/fdt.h"
 #include "qemu/range.h"
 #include "sysemu/sysemu.h"
+#include "hw/ppc/spapr_numa.h"
 
 void spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm,
                            uint64_t size, Error **errp)
@@ -117,8 +118,8 @@ void spapr_create_nvdimm_dr_connectors(SpaprMachineState *spapr)
 }
 
 
-static int spapr_dt_nvdimm(void *fdt, int parent_offset,
-                           NVDIMMDevice *nvdimm)
+static int spapr_dt_nvdimm(SpaprMachineState *spapr, void *fdt,
+                           int parent_offset, NVDIMMDevice *nvdimm)
 {
     int child_offset;
     char *buf;
@@ -128,11 +129,6 @@ static int spapr_dt_nvdimm(void *fdt, int parent_offset,
                                              &error_abort);
     uint64_t slot = object_property_get_uint(OBJECT(nvdimm), PC_DIMM_SLOT_PROP,
                                              &error_abort);
-    uint32_t associativity[] = {
-        cpu_to_be32(0x4), /* length */
-        cpu_to_be32(0x0), cpu_to_be32(0x0),
-        cpu_to_be32(0x0), cpu_to_be32(node)
-    };
     uint64_t lsize = nvdimm->label_size;
     uint64_t size = object_property_get_int(OBJECT(nvdimm), PC_DIMM_SIZE_PROP,
                                             NULL);
@@ -152,8 +148,7 @@ static int spapr_dt_nvdimm(void *fdt, int parent_offset,
     _FDT((fdt_setprop_string(fdt, child_offset, "compatible", "ibm,pmemory")));
     _FDT((fdt_setprop_string(fdt, child_offset, "device_type", "ibm,pmemory")));
 
-    _FDT((fdt_setprop(fdt, child_offset, "ibm,associativity", associativity,
-                      sizeof(associativity))));
+    spapr_numa_write_associativity_dt(spapr, fdt, child_offset, node);
 
     buf = qemu_uuid_unparse_strdup(&nvdimm->uuid);
     _FDT((fdt_setprop_string(fdt, child_offset, "ibm,unit-guid", buf)));
@@ -179,12 +174,12 @@ int spapr_pmem_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
 {
     NVDIMMDevice *nvdimm = NVDIMM(drc->dev);
 
-    *fdt_start_offset = spapr_dt_nvdimm(fdt, 0, nvdimm);
+    *fdt_start_offset = spapr_dt_nvdimm(spapr, fdt, 0, nvdimm);
 
     return 0;
 }
 
-void spapr_dt_persistent_memory(void *fdt)
+void spapr_dt_persistent_memory(SpaprMachineState *spapr, void *fdt)
 {
     int offset = fdt_subnode_offset(fdt, 0, "persistent-memory");
     GSList *iter, *nvdimms = nvdimm_get_device_list();
@@ -202,7 +197,7 @@ void spapr_dt_persistent_memory(void *fdt)
     for (iter = nvdimms; iter; iter = iter->next) {
         NVDIMMDevice *nvdimm = iter->data;
 
-        spapr_dt_nvdimm(fdt, offset, nvdimm);
+        spapr_dt_nvdimm(spapr, fdt, offset, nvdimm);
     }
     g_slist_free(nvdimms);
 
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 0a418f1e67..4d97ff6c70 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -52,6 +52,7 @@
 #include "sysemu/kvm.h"
 #include "sysemu/hostmem.h"
 #include "sysemu/numa.h"
+#include "hw/ppc/spapr_numa.h"
 
 /* Copied from the kernel arch/powerpc/platforms/pseries/msi.c */
 #define RTAS_QUERY_FN           0
@@ -2321,11 +2322,6 @@ int spapr_dt_phb(SpaprMachineState *spapr, SpaprPhbState *phb,
         cpu_to_be32(1),
         cpu_to_be32(RTAS_IBM_RESET_PE_DMA_WINDOW)
     };
-    uint32_t associativity[] = {cpu_to_be32(0x4),
-                                cpu_to_be32(0x0),
-                                cpu_to_be32(0x0),
-                                cpu_to_be32(0x0),
-                                cpu_to_be32(phb->numa_node)};
     SpaprTceTable *tcet;
     SpaprDrc *drc;
     Error *err = NULL;
@@ -2358,8 +2354,7 @@ int spapr_dt_phb(SpaprMachineState *spapr, SpaprPhbState *phb,
 
     /* Advertise NUMA via ibm,associativity */
     if (phb->numa_node != -1) {
-        _FDT(fdt_setprop(fdt, bus_off, "ibm,associativity", associativity,
-                         sizeof(associativity)));
+        spapr_numa_write_associativity_dt(spapr, fdt, bus_off, phb->numa_node);
     }
 
     /* Build the interrupt-map, this must matches what is done
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index a1e230ad39..140914f9a1 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -105,6 +105,16 @@ typedef enum {
 
 #define FDT_MAX_SIZE                    0x100000
 
+/*
+ * NUMA related macros. MAX_DISTANCE_REF_POINTS was taken
+ * from Taken from Linux kernel arch/powerpc/mm/numa.h.
+ *
+ * NUMA_ASSOC_SIZE is the base array size of an ibm,associativity
+ * array for any non-CPU resource.
+ */
+#define MAX_DISTANCE_REF_POINTS    4
+#define NUMA_ASSOC_SIZE            (MAX_DISTANCE_REF_POINTS + 1)
+
 typedef struct SpaprCapabilities SpaprCapabilities;
 struct SpaprCapabilities {
     uint8_t caps[SPAPR_CAP_NUM];
@@ -131,9 +141,10 @@ struct SpaprMachineClass {
     bool smp_threads_vsmt; /* set VSMT to smp_threads by default */
     hwaddr rma_limit;          /* clamp the RMA to this size */
     bool pre_5_1_assoc_refpoints;
+    uint32_t numa_assoc_array[MAX_NODES][NUMA_ASSOC_SIZE];
 
     void (*phb_placement)(SpaprMachineState *spapr, uint32_t index,
-                          uint64_t *buid, hwaddr *pio, 
+                          uint64_t *buid, hwaddr *pio,
                           hwaddr *mmio32, hwaddr *mmio64,
                           unsigned n_dma, uint32_t *liobns, hwaddr *nv2gpa,
                           hwaddr *nv2atsd, Error **errp);
diff --git a/include/hw/ppc/spapr_numa.h b/include/hw/ppc/spapr_numa.h
index 7a370a8768..2625e3db67 100644
--- a/include/hw/ppc/spapr_numa.h
+++ b/include/hw/ppc/spapr_numa.h
@@ -13,8 +13,13 @@
 #ifndef HW_SPAPR_NUMA_H
 #define HW_SPAPR_NUMA_H
 
+#include "hw/boards.h"
 #include "hw/ppc/spapr.h"
 
+void spapr_numa_associativity_init(MachineState *machine);
 void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas);
+void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,
+                                       int offset, int nodeid);
+
 
 #endif /* HW_SPAPR_NUMA_H */
diff --git a/include/hw/ppc/spapr_nvdimm.h b/include/hw/ppc/spapr_nvdimm.h
index 10a6d9dbbc..3eb344e8e9 100644
--- a/include/hw/ppc/spapr_nvdimm.h
+++ b/include/hw/ppc/spapr_nvdimm.h
@@ -27,7 +27,7 @@ QEMU_BUILD_BUG_ON(SPAPR_MINIMUM_SCM_BLOCK_SIZE % SPAPR_MEMORY_BLOCK_SIZE);
 
 int spapr_pmem_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
                            void *fdt, int *fdt_start_offset, Error **errp);
-void spapr_dt_persistent_memory(void *fdt);
+void spapr_dt_persistent_memory(SpaprMachineState *spapr, void *fdt);
 void spapr_nvdimm_validate(HotplugHandler *hotplug_dev, NVDIMMDevice *nvdimm,
                            uint64_t size, Error **errp);
 void spapr_add_nvdimm(DeviceState *dev, uint64_t slot, Error **errp);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 4/7] spapr, spapr_numa: handle vcpu ibm,associativity
  2020-09-01 12:56 [PATCH v2 0/7] pseries NUMA distance rework Daniel Henrique Barboza
                   ` (2 preceding siblings ...)
  2020-09-01 12:56 ` [PATCH v2 3/7] spapr: introduce SpaprMachineClass::numa_assoc_array Daniel Henrique Barboza
@ 2020-09-01 12:56 ` Daniel Henrique Barboza
  2020-09-01 12:56 ` [PATCH v2 5/7] spapr, spapr_numa: move lookup-arrays handling to spapr_numa.c Daniel Henrique Barboza
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Daniel Henrique Barboza @ 2020-09-01 12:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, david

Vcpus have an additional paramenter to be appended, vcpu_id. This
also changes the size of the of property itself, which is being
represented in index 0 of numa_assoc_array[cpu->node_id],
and defaults to MAX_DISTANCE_REF_POINTS for all cases but
vcpus.

All this logic makes more sense in spapr_numa.c, where we handle
everything NUMA and associativity. A new helper spapr_numa_fixup_cpu_dt()
was added, and spapr.c uses it the same way as it was using the former
spapr_fixup_cpu_numa_dt().

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 hw/ppc/spapr.c              | 17 +----------------
 hw/ppc/spapr_numa.c         | 28 ++++++++++++++++++++++++++++
 include/hw/ppc/spapr_numa.h |  2 ++
 3 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index fb9b2572fe..172f965fe0 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -202,21 +202,6 @@ static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
     return ret;
 }
 
-static int spapr_fixup_cpu_numa_dt(void *fdt, int offset, PowerPCCPU *cpu)
-{
-    int index = spapr_get_vcpu_id(cpu);
-    uint32_t associativity[] = {cpu_to_be32(0x5),
-                                cpu_to_be32(0x0),
-                                cpu_to_be32(0x0),
-                                cpu_to_be32(0x0),
-                                cpu_to_be32(cpu->node_id),
-                                cpu_to_be32(index)};
-
-    /* Advertise NUMA via ibm,associativity */
-    return fdt_setprop(fdt, offset, "ibm,associativity", associativity,
-                          sizeof(associativity));
-}
-
 static void spapr_dt_pa_features(SpaprMachineState *spapr,
                                  PowerPCCPU *cpu,
                                  void *fdt, int offset)
@@ -785,7 +770,7 @@ static void spapr_dt_cpu(CPUState *cs, void *fdt, int offset,
                       pft_size_prop, sizeof(pft_size_prop))));
 
     if (ms->numa_state->num_nodes > 1) {
-        _FDT(spapr_fixup_cpu_numa_dt(fdt, offset, cpu));
+        _FDT(spapr_numa_fixup_cpu_dt(spapr, fdt, offset, cpu));
     }
 
     _FDT(spapr_fixup_cpu_smt_dt(fdt, offset, cpu, compat_smt));
diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
index 2cfe13eaed..b8882d209e 100644
--- a/hw/ppc/spapr_numa.c
+++ b/hw/ppc/spapr_numa.c
@@ -47,6 +47,34 @@ void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,
                       sizeof(smc->numa_assoc_array[nodeid]))));
 }
 
+int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt,
+                            int offset, PowerPCCPU *cpu)
+{
+    SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+    uint vcpu_assoc_size = NUMA_ASSOC_SIZE + 1;
+    uint32_t vcpu_assoc[vcpu_assoc_size];
+    int index = spapr_get_vcpu_id(cpu);
+    int i;
+
+    /*
+     * VCPUs have an extra 'cpu_id' value in ibm,associativity
+     * compared to other resources. Increment the size at index
+     * 0, copy all associativity domains already set, then put
+     * cpu_id last.
+     */
+    vcpu_assoc[0] = cpu_to_be32(MAX_DISTANCE_REF_POINTS + 1);
+
+    for (i = 1; i <= MAX_DISTANCE_REF_POINTS; i++) {
+        vcpu_assoc[i] = smc->numa_assoc_array[cpu->node_id][i];
+    }
+
+    vcpu_assoc[vcpu_assoc_size - 1] = cpu_to_be32(index);
+
+    /* Advertise NUMA via ibm,associativity */
+    return fdt_setprop(fdt, offset, "ibm,associativity",
+                       vcpu_assoc, sizeof(vcpu_assoc));
+}
+
 /*
  * Helper that writes ibm,associativity-reference-points and
  * max-associativity-domains in the RTAS pointed by @rtas
diff --git a/include/hw/ppc/spapr_numa.h b/include/hw/ppc/spapr_numa.h
index 2625e3db67..f92fb4f28a 100644
--- a/include/hw/ppc/spapr_numa.h
+++ b/include/hw/ppc/spapr_numa.h
@@ -20,6 +20,8 @@ void spapr_numa_associativity_init(MachineState *machine);
 void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas);
 void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,
                                        int offset, int nodeid);
+int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt,
+                            int offset, PowerPCCPU *cpu);
 
 
 #endif /* HW_SPAPR_NUMA_H */
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 5/7] spapr, spapr_numa: move lookup-arrays handling to spapr_numa.c
  2020-09-01 12:56 [PATCH v2 0/7] pseries NUMA distance rework Daniel Henrique Barboza
                   ` (3 preceding siblings ...)
  2020-09-01 12:56 ` [PATCH v2 4/7] spapr, spapr_numa: handle vcpu ibm,associativity Daniel Henrique Barboza
@ 2020-09-01 12:56 ` Daniel Henrique Barboza
  2020-09-03  1:34   ` David Gibson
  2020-09-01 12:56 ` [PATCH v2 6/7] spapr_numa: move NVLink2 associativity " Daniel Henrique Barboza
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Daniel Henrique Barboza @ 2020-09-01 12:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, david

In a similar fashion as the previous patch, let's move the
handling of ibm,associativity-lookup-arrays from spapr.c to
spapr_numa.c. A spapr_numa_write_assoc_lookup_arrays() helper was
created, and spapr_dt_dynamic_reconfiguration_memory() can now
use it to advertise the lookup-arrays.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 hw/ppc/spapr.c              | 25 ++----------------------
 hw/ppc/spapr_numa.c         | 39 +++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_numa.h |  2 ++
 3 files changed, 43 insertions(+), 23 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 172f965fe0..65d2ccd578 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -535,13 +535,10 @@ static int spapr_dt_dynamic_reconfiguration_memory(SpaprMachineState *spapr,
                                                    void *fdt)
 {
     MachineState *machine = MACHINE(spapr);
-    int nb_numa_nodes = machine->numa_state->num_nodes;
-    int ret, i, offset;
+    int ret, offset;
     uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
     uint32_t prop_lmb_size[] = {cpu_to_be32(lmb_size >> 32),
                                 cpu_to_be32(lmb_size & 0xffffffff)};
-    uint32_t *int_buf, *cur_index, buf_len;
-    int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
     MemoryDeviceInfoList *dimms = NULL;
 
     /*
@@ -582,25 +579,7 @@ static int spapr_dt_dynamic_reconfiguration_memory(SpaprMachineState *spapr,
         return ret;
     }
 
-    /* ibm,associativity-lookup-arrays */
-    buf_len = (nr_nodes * 4 + 2) * sizeof(uint32_t);
-    cur_index = int_buf = g_malloc0(buf_len);
-    int_buf[0] = cpu_to_be32(nr_nodes);
-    int_buf[1] = cpu_to_be32(4); /* Number of entries per associativity list */
-    cur_index += 2;
-    for (i = 0; i < nr_nodes; i++) {
-        uint32_t associativity[] = {
-            cpu_to_be32(0x0),
-            cpu_to_be32(0x0),
-            cpu_to_be32(0x0),
-            cpu_to_be32(i)
-        };
-        memcpy(cur_index, associativity, sizeof(associativity));
-        cur_index += 4;
-    }
-    ret = fdt_setprop(fdt, offset, "ibm,associativity-lookup-arrays", int_buf,
-            (cur_index - int_buf) * sizeof(uint32_t));
-    g_free(int_buf);
+    ret = spapr_numa_write_assoc_lookup_arrays(spapr, fdt, offset);
 
     return ret;
 }
diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
index b8882d209e..9eb4bdbe80 100644
--- a/hw/ppc/spapr_numa.c
+++ b/hw/ppc/spapr_numa.c
@@ -75,6 +75,45 @@ int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt,
                        vcpu_assoc, sizeof(vcpu_assoc));
 }
 
+
+int spapr_numa_write_assoc_lookup_arrays(SpaprMachineState *spapr, void *fdt,
+                                         int offset)
+{
+    MachineState *machine = MACHINE(spapr);
+    SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+    int nb_numa_nodes = machine->numa_state->num_nodes;
+    int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
+    uint32_t *int_buf, *cur_index, buf_len;
+    int ret, i, j;
+
+    /* ibm,associativity-lookup-arrays */
+    buf_len = (nr_nodes * MAX_DISTANCE_REF_POINTS + 2) * sizeof(uint32_t);
+    cur_index = int_buf = g_malloc0(buf_len);
+    int_buf[0] = cpu_to_be32(nr_nodes);
+     /* Number of entries per associativity list */
+    int_buf[1] = cpu_to_be32(MAX_DISTANCE_REF_POINTS);
+    cur_index += 2;
+    for (i = 0; i < nr_nodes; i++) {
+        /*
+         * For the lookup-array we use the ibm,associativity array,
+         * from numa_assoc_array. without the first element (size).
+         */
+        uint32_t associativity[MAX_DISTANCE_REF_POINTS];
+
+        for (j = 0; j < MAX_DISTANCE_REF_POINTS; j++) {
+            associativity[j] = smc->numa_assoc_array[i][j + 1];
+        }
+
+        memcpy(cur_index, associativity, sizeof(associativity));
+        cur_index += 4;
+    }
+    ret = fdt_setprop(fdt, offset, "ibm,associativity-lookup-arrays", int_buf,
+                      (cur_index - int_buf) * sizeof(uint32_t));
+    g_free(int_buf);
+
+    return ret;
+}
+
 /*
  * Helper that writes ibm,associativity-reference-points and
  * max-associativity-domains in the RTAS pointed by @rtas
diff --git a/include/hw/ppc/spapr_numa.h b/include/hw/ppc/spapr_numa.h
index f92fb4f28a..f6127501a6 100644
--- a/include/hw/ppc/spapr_numa.h
+++ b/include/hw/ppc/spapr_numa.h
@@ -22,6 +22,8 @@ void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,
                                        int offset, int nodeid);
 int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt,
                             int offset, PowerPCCPU *cpu);
+int spapr_numa_write_assoc_lookup_arrays(SpaprMachineState *spapr, void *fdt,
+                                         int offset);
 
 
 #endif /* HW_SPAPR_NUMA_H */
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 6/7] spapr_numa: move NVLink2 associativity handling to spapr_numa.c
  2020-09-01 12:56 [PATCH v2 0/7] pseries NUMA distance rework Daniel Henrique Barboza
                   ` (4 preceding siblings ...)
  2020-09-01 12:56 ` [PATCH v2 5/7] spapr, spapr_numa: move lookup-arrays handling to spapr_numa.c Daniel Henrique Barboza
@ 2020-09-01 12:56 ` Daniel Henrique Barboza
  2020-09-03  1:56   ` David Gibson
  2020-09-01 12:56 ` [PATCH v2 7/7] spapr_hcall: h_home_node_associativity now reads numa_assoc_array Daniel Henrique Barboza
  2020-09-03  1:35 ` [PATCH v2 0/7] pseries NUMA distance rework David Gibson
  7 siblings, 1 reply; 18+ messages in thread
From: Daniel Henrique Barboza @ 2020-09-01 12:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, david

This patch adds a new spapr_numa_write_assoc_nvlink2() helper
to handle the ibm,associativity for NVLink2 GPUs.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 hw/ppc/spapr_numa.c         | 23 +++++++++++++++++++++++
 hw/ppc/spapr_pci_nvlink2.c  | 19 ++-----------------
 include/hw/ppc/spapr_numa.h |  3 +++
 3 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
index 9eb4bdbe80..785cc24624 100644
--- a/hw/ppc/spapr_numa.c
+++ b/hw/ppc/spapr_numa.c
@@ -15,6 +15,8 @@
 #include "hw/ppc/spapr_numa.h"
 #include "hw/ppc/fdt.h"
 
+/* Moved from hw/ppc/spapr_pci_nvlink2.c */
+#define SPAPR_GPU_NUMA_ID           (cpu_to_be32(1))
 
 void spapr_numa_associativity_init(MachineState *machine)
 {
@@ -114,6 +116,27 @@ int spapr_numa_write_assoc_lookup_arrays(SpaprMachineState *spapr, void *fdt,
     return ret;
 }
 
+void spapr_numa_write_assoc_nvlink2(void *fdt, int offset, int numa_id,
+                                    SpaprPhbState *sphb)
+{
+    uint32_t associativity[NUMA_ASSOC_SIZE];
+    int i;
+
+    associativity[0] = cpu_to_be32(MAX_DISTANCE_REF_POINTS);
+    for (i = 1; i < NUMA_ASSOC_SIZE; i++) {
+        associativity[i] = cpu_to_be32(numa_id);
+    };
+
+    if (sphb->pre_5_1_assoc) {
+        associativity[1] = SPAPR_GPU_NUMA_ID;
+        associativity[2] = SPAPR_GPU_NUMA_ID;
+        associativity[3] = SPAPR_GPU_NUMA_ID;
+    }
+
+    _FDT((fdt_setprop(fdt, offset, "ibm,associativity", associativity,
+                      sizeof(associativity))));
+}
+
 /*
  * Helper that writes ibm,associativity-reference-points and
  * max-associativity-domains in the RTAS pointed by @rtas
diff --git a/hw/ppc/spapr_pci_nvlink2.c b/hw/ppc/spapr_pci_nvlink2.c
index 76ae77ebc8..662a0af990 100644
--- a/hw/ppc/spapr_pci_nvlink2.c
+++ b/hw/ppc/spapr_pci_nvlink2.c
@@ -29,6 +29,7 @@
 #include "qemu/error-report.h"
 #include "hw/ppc/fdt.h"
 #include "hw/pci/pci_bridge.h"
+#include "hw/ppc/spapr_numa.h"
 
 #define PHANDLE_PCIDEV(phb, pdev)    (0x12000000 | \
                                      (((phb)->index) << 16) | ((pdev)->devfn))
@@ -37,8 +38,6 @@
 #define PHANDLE_NVLINK(phb, gn, nn)  (0x00130000 | (((phb)->index) << 8) | \
                                      ((gn) << 4) | (nn))
 
-#define SPAPR_GPU_NUMA_ID           (cpu_to_be32(1))
-
 typedef struct SpaprPhbPciNvGpuSlot {
         uint64_t tgt;
         uint64_t gpa;
@@ -360,13 +359,6 @@ void spapr_phb_nvgpu_ram_populate_dt(SpaprPhbState *sphb, void *fdt)
         Object *nv_mrobj = object_property_get_link(OBJECT(nvslot->gpdev),
                                                     "nvlink2-mr[0]",
                                                     &error_abort);
-        uint32_t associativity[] = {
-            cpu_to_be32(0x4),
-            cpu_to_be32(nvslot->numa_id),
-            cpu_to_be32(nvslot->numa_id),
-            cpu_to_be32(nvslot->numa_id),
-            cpu_to_be32(nvslot->numa_id)
-        };
         uint64_t size = object_property_get_uint(nv_mrobj, "size", NULL);
         uint64_t mem_reg[2] = { cpu_to_be64(nvslot->gpa), cpu_to_be64(size) };
         char *mem_name = g_strdup_printf("memory@%"PRIx64, nvslot->gpa);
@@ -376,14 +368,7 @@ void spapr_phb_nvgpu_ram_populate_dt(SpaprPhbState *sphb, void *fdt)
         _FDT((fdt_setprop_string(fdt, off, "device_type", "memory")));
         _FDT((fdt_setprop(fdt, off, "reg", mem_reg, sizeof(mem_reg))));
 
-        if (sphb->pre_5_1_assoc) {
-            associativity[1] = SPAPR_GPU_NUMA_ID;
-            associativity[2] = SPAPR_GPU_NUMA_ID;
-            associativity[3] = SPAPR_GPU_NUMA_ID;
-        }
-
-        _FDT((fdt_setprop(fdt, off, "ibm,associativity", associativity,
-                          sizeof(associativity))));
+        spapr_numa_write_assoc_nvlink2(fdt, off, nvslot->numa_id, sphb);
 
         _FDT((fdt_setprop_string(fdt, off, "compatible",
                                  "ibm,coherent-device-memory")));
diff --git a/include/hw/ppc/spapr_numa.h b/include/hw/ppc/spapr_numa.h
index f6127501a6..b6e0721b07 100644
--- a/include/hw/ppc/spapr_numa.h
+++ b/include/hw/ppc/spapr_numa.h
@@ -15,6 +15,7 @@
 
 #include "hw/boards.h"
 #include "hw/ppc/spapr.h"
+#include "hw/pci-host/spapr.h"
 
 void spapr_numa_associativity_init(MachineState *machine);
 void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas);
@@ -24,6 +25,8 @@ int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt,
                             int offset, PowerPCCPU *cpu);
 int spapr_numa_write_assoc_lookup_arrays(SpaprMachineState *spapr, void *fdt,
                                          int offset);
+void spapr_numa_write_assoc_nvlink2(void *fdt, int offset, int numa_id,
+                                    SpaprPhbState *sphb);
 
 
 #endif /* HW_SPAPR_NUMA_H */
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 7/7] spapr_hcall: h_home_node_associativity now reads numa_assoc_array
  2020-09-01 12:56 [PATCH v2 0/7] pseries NUMA distance rework Daniel Henrique Barboza
                   ` (5 preceding siblings ...)
  2020-09-01 12:56 ` [PATCH v2 6/7] spapr_numa: move NVLink2 associativity " Daniel Henrique Barboza
@ 2020-09-01 12:56 ` Daniel Henrique Barboza
  2020-09-03  1:46   ` David Gibson
  2020-09-03  1:35 ` [PATCH v2 0/7] pseries NUMA distance rework David Gibson
  7 siblings, 1 reply; 18+ messages in thread
From: Daniel Henrique Barboza @ 2020-09-01 12:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, david

home_node_associativity reply now uses the associativity
values for tcpu->node_id provided by numa_assoc_array.

This will avoid further changes in this code when numa_assoc_array
changes values, but it won't be enough to prevent further changes
if (falar aqui q se mudar o tamanho do array tem q mexer nessa
funcao tambem, falar q a macro associativity() deixa a automacao
de tudo mto unreadable)

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 hw/ppc/spapr_hcall.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index c1d01228c6..2ec30efdcb 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1878,9 +1878,13 @@ static target_ulong h_home_node_associativity(PowerPCCPU *cpu,
                                               target_ulong opcode,
                                               target_ulong *args)
 {
+    SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
     target_ulong flags = args[0];
     target_ulong procno = args[1];
     PowerPCCPU *tcpu;
+    uint32_t assoc_domain1;
+    uint32_t assoc_domain2;
+    uint32_t assoc_domain3;
     int idx;
 
     /* only support procno from H_REGISTER_VPA */
@@ -1893,13 +1897,21 @@ static target_ulong h_home_node_associativity(PowerPCCPU *cpu,
         return H_P2;
     }
 
+    /*
+     * Index 0 is the ibm,associativity size of the node,
+     * which isn't relevant here.
+     */
+    assoc_domain1 = smc->numa_assoc_array[tcpu->node_id][1];
+    assoc_domain2 = smc->numa_assoc_array[tcpu->node_id][2];
+    assoc_domain3 = smc->numa_assoc_array[tcpu->node_id][3];
+
     /* sequence is the same as in the "ibm,associativity" property */
 
     idx = 0;
 #define ASSOCIATIVITY(a, b) (((uint64_t)(a) << 32) | \
                              ((uint64_t)(b) & 0xffffffff))
-    args[idx++] = ASSOCIATIVITY(0, 0);
-    args[idx++] = ASSOCIATIVITY(0, tcpu->node_id);
+    args[idx++] = ASSOCIATIVITY(assoc_domain1, assoc_domain2);
+    args[idx++] = ASSOCIATIVITY(assoc_domain3, tcpu->node_id);
     args[idx++] = ASSOCIATIVITY(procno, -1);
     for ( ; idx < 6; idx++) {
         args[idx] = -1;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 5/7] spapr, spapr_numa: move lookup-arrays handling to spapr_numa.c
  2020-09-01 12:56 ` [PATCH v2 5/7] spapr, spapr_numa: move lookup-arrays handling to spapr_numa.c Daniel Henrique Barboza
@ 2020-09-03  1:34   ` David Gibson
  2020-09-03 11:22     ` Daniel Henrique Barboza
  0 siblings, 1 reply; 18+ messages in thread
From: David Gibson @ 2020-09-03  1:34 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 5651 bytes --]

On Tue, Sep 01, 2020 at 09:56:43AM -0300, Daniel Henrique Barboza wrote:
> In a similar fashion as the previous patch, let's move the
> handling of ibm,associativity-lookup-arrays from spapr.c to
> spapr_numa.c. A spapr_numa_write_assoc_lookup_arrays() helper was
> created, and spapr_dt_dynamic_reconfiguration_memory() can now
> use it to advertise the lookup-arrays.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
>  hw/ppc/spapr.c              | 25 ++----------------------
>  hw/ppc/spapr_numa.c         | 39 +++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_numa.h |  2 ++
>  3 files changed, 43 insertions(+), 23 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 172f965fe0..65d2ccd578 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -535,13 +535,10 @@ static int spapr_dt_dynamic_reconfiguration_memory(SpaprMachineState *spapr,
>                                                     void *fdt)
>  {
>      MachineState *machine = MACHINE(spapr);
> -    int nb_numa_nodes = machine->numa_state->num_nodes;
> -    int ret, i, offset;
> +    int ret, offset;
>      uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
>      uint32_t prop_lmb_size[] = {cpu_to_be32(lmb_size >> 32),
>                                  cpu_to_be32(lmb_size & 0xffffffff)};
> -    uint32_t *int_buf, *cur_index, buf_len;
> -    int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
>      MemoryDeviceInfoList *dimms = NULL;
>  
>      /*
> @@ -582,25 +579,7 @@ static int spapr_dt_dynamic_reconfiguration_memory(SpaprMachineState *spapr,
>          return ret;
>      }
>  
> -    /* ibm,associativity-lookup-arrays */
> -    buf_len = (nr_nodes * 4 + 2) * sizeof(uint32_t);
> -    cur_index = int_buf = g_malloc0(buf_len);
> -    int_buf[0] = cpu_to_be32(nr_nodes);
> -    int_buf[1] = cpu_to_be32(4); /* Number of entries per associativity list */
> -    cur_index += 2;
> -    for (i = 0; i < nr_nodes; i++) {
> -        uint32_t associativity[] = {
> -            cpu_to_be32(0x0),
> -            cpu_to_be32(0x0),
> -            cpu_to_be32(0x0),
> -            cpu_to_be32(i)
> -        };
> -        memcpy(cur_index, associativity, sizeof(associativity));
> -        cur_index += 4;
> -    }
> -    ret = fdt_setprop(fdt, offset, "ibm,associativity-lookup-arrays", int_buf,
> -            (cur_index - int_buf) * sizeof(uint32_t));
> -    g_free(int_buf);
> +    ret = spapr_numa_write_assoc_lookup_arrays(spapr, fdt, offset);
>  
>      return ret;
>  }
> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
> index b8882d209e..9eb4bdbe80 100644
> --- a/hw/ppc/spapr_numa.c
> +++ b/hw/ppc/spapr_numa.c
> @@ -75,6 +75,45 @@ int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt,
>                         vcpu_assoc, sizeof(vcpu_assoc));
>  }
>  
> +
> +int spapr_numa_write_assoc_lookup_arrays(SpaprMachineState *spapr, void *fdt,
> +                                         int offset)
> +{
> +    MachineState *machine = MACHINE(spapr);
> +    SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> +    int nb_numa_nodes = machine->numa_state->num_nodes;
> +    int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
> +    uint32_t *int_buf, *cur_index, buf_len;
> +    int ret, i, j;
> +
> +    /* ibm,associativity-lookup-arrays */
> +    buf_len = (nr_nodes * MAX_DISTANCE_REF_POINTS + 2) * sizeof(uint32_t);
> +    cur_index = int_buf = g_malloc0(buf_len);
> +    int_buf[0] = cpu_to_be32(nr_nodes);
> +     /* Number of entries per associativity list */
> +    int_buf[1] = cpu_to_be32(MAX_DISTANCE_REF_POINTS);
> +    cur_index += 2;
> +    for (i = 0; i < nr_nodes; i++) {
> +        /*
> +         * For the lookup-array we use the ibm,associativity array,
> +         * from numa_assoc_array. without the first element (size).
> +         */
> +        uint32_t associativity[MAX_DISTANCE_REF_POINTS];
> +
> +        for (j = 0; j < MAX_DISTANCE_REF_POINTS; j++) {
> +            associativity[j] = smc->numa_assoc_array[i][j + 1];
> +        }
> +
> +        memcpy(cur_index, associativity, sizeof(associativity));

AFAICT, you could just use a single memcpy() to copy from the
numa_assoc_array() into the property here, rather than using a loop
and temporary array.

> +        cur_index += 4;

Shouldn't this be  += MAX_DISTANCE_REF_POINTS?

> +    }
> +    ret = fdt_setprop(fdt, offset, "ibm,associativity-lookup-arrays", int_buf,
> +                      (cur_index - int_buf) * sizeof(uint32_t));
> +    g_free(int_buf);
> +
> +    return ret;
> +}
> +
>  /*
>   * Helper that writes ibm,associativity-reference-points and
>   * max-associativity-domains in the RTAS pointed by @rtas
> diff --git a/include/hw/ppc/spapr_numa.h b/include/hw/ppc/spapr_numa.h
> index f92fb4f28a..f6127501a6 100644
> --- a/include/hw/ppc/spapr_numa.h
> +++ b/include/hw/ppc/spapr_numa.h
> @@ -22,6 +22,8 @@ void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,
>                                         int offset, int nodeid);
>  int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt,
>                              int offset, PowerPCCPU *cpu);
> +int spapr_numa_write_assoc_lookup_arrays(SpaprMachineState *spapr, void *fdt,
> +                                         int offset);
>  
>  
>  #endif /* HW_SPAPR_NUMA_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/7] pseries NUMA distance rework
  2020-09-01 12:56 [PATCH v2 0/7] pseries NUMA distance rework Daniel Henrique Barboza
                   ` (6 preceding siblings ...)
  2020-09-01 12:56 ` [PATCH v2 7/7] spapr_hcall: h_home_node_associativity now reads numa_assoc_array Daniel Henrique Barboza
@ 2020-09-03  1:35 ` David Gibson
  2020-09-03  1:49   ` David Gibson
  7 siblings, 1 reply; 18+ messages in thread
From: David Gibson @ 2020-09-03  1:35 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3334 bytes --]

On Tue, Sep 01, 2020 at 09:56:38AM -0300, Daniel Henrique Barboza wrote:
> Hi,
> 
> Following the reviews of the first version [1], specially this
> reply from David [2], I decided to take a step back and refactor
> all the code in hw/ppc/spapr* that operates with ibm,associativity,
> ibm,associativity-reference-points and ibm,max-associativity-domains.
> 
> A new file named 'spapr_numa.c' was created to gather all the
> associativity related code into helpers that write NUMA/associativity
> related info to the FDT. These helpers are then used in other
> spapr_* files. This allows us to change NUMA related code in a
> single location, instead of searching every file to see where is
> associativity being written and how, and all the soon to get
> more complex logic can be contained in spapr_numa.c. I consider
> the end result to be better than what I ended up doing in v1.
> 
> Unlike v1, there is no NUMA distance change being done in this series.
> Later on, the hub of the new NUMA distance calculation will be
> spapr_numa_associativity_init(), where we'll take into consideration
> user input from numa_states, handle sizes to what the PAPR kernel
> understands and establish assoaciativity domains between the NUMA
> nodes.

Patches 1..4 applied to ppc-for-5.2.  Patch 5 has some nits I've
commented on.

> 
> 
> Changes from v1:
> - all the patches that did guest visible changes were removed. They
> will be re-submitted in a follow-up series after this one.
> - patch 02 from v1 will be reworked and reposted in the follow-up
> series as well.
> - version 1 link:
> https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg03169.html
> 
> 
> 
> These patches were rebased using David's ppc-for-5.2 tree. Github
> repo with the patches applied:
> 
> https://github.com/danielhb/qemu/tree/spapr_numa_v2
> 
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg03169.html
> [2] https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg04661.html
>  
> Daniel Henrique Barboza (7):
>   ppc: introducing spapr_numa.c NUMA code helper
>   ppc/spapr_nvdimm: turn spapr_dt_nvdimm() static
>   spapr: introduce SpaprMachineClass::numa_assoc_array
>   spapr, spapr_numa: handle vcpu ibm,associativity
>   spapr, spapr_numa: move lookup-arrays handling to spapr_numa.c
>   spapr_numa: move NVLink2 associativity handling to spapr_numa.c
>   spapr_hcall: h_home_node_associativity now reads numa_assoc_array
> 
>  hw/ppc/meson.build            |   3 +-
>  hw/ppc/spapr.c                |  91 +++---------------
>  hw/ppc/spapr_hcall.c          |  16 +++-
>  hw/ppc/spapr_numa.c           | 172 ++++++++++++++++++++++++++++++++++
>  hw/ppc/spapr_nvdimm.c         |  37 ++++----
>  hw/ppc/spapr_pci.c            |   9 +-
>  hw/ppc/spapr_pci_nvlink2.c    |  19 +---
>  include/hw/ppc/spapr.h        |  13 ++-
>  include/hw/ppc/spapr_numa.h   |  32 +++++++
>  include/hw/ppc/spapr_nvdimm.h |   3 +-
>  10 files changed, 268 insertions(+), 127 deletions(-)
>  create mode 100644 hw/ppc/spapr_numa.c
>  create mode 100644 include/hw/ppc/spapr_numa.h
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 7/7] spapr_hcall: h_home_node_associativity now reads numa_assoc_array
  2020-09-01 12:56 ` [PATCH v2 7/7] spapr_hcall: h_home_node_associativity now reads numa_assoc_array Daniel Henrique Barboza
@ 2020-09-03  1:46   ` David Gibson
  2020-09-03 11:17     ` Daniel Henrique Barboza
  0 siblings, 1 reply; 18+ messages in thread
From: David Gibson @ 2020-09-03  1:46 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3459 bytes --]

On Tue, Sep 01, 2020 at 09:56:45AM -0300, Daniel Henrique Barboza wrote:
> home_node_associativity reply now uses the associativity
> values for tcpu->node_id provided by numa_assoc_array.
> 
> This will avoid further changes in this code when numa_assoc_array
> changes values, but it won't be enough to prevent further changes
> if (falar aqui q se mudar o tamanho do array tem q mexer nessa
> funcao tambem, falar q a macro associativity() deixa a automacao
> de tudo mto unreadable)

Uh.. I'm guessing that was a note to yourself you intended to
translate before publishing?

> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
>  hw/ppc/spapr_hcall.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index c1d01228c6..2ec30efdcb 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1878,9 +1878,13 @@ static target_ulong h_home_node_associativity(PowerPCCPU *cpu,
>                                                target_ulong opcode,
>                                                target_ulong *args)

You could move this function to spapr_numa.c as well (the name's a bit
misleading, but spapr_hcall.c isn't really intended to hold *all*
hcall implementations, just the ones that don't have somewhere better
to live).

>  {
> +    SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
>      target_ulong flags = args[0];
>      target_ulong procno = args[1];
>      PowerPCCPU *tcpu;
> +    uint32_t assoc_domain1;
> +    uint32_t assoc_domain2;
> +    uint32_t assoc_domain3;
>      int idx;
>  
>      /* only support procno from H_REGISTER_VPA */
> @@ -1893,13 +1897,21 @@ static target_ulong h_home_node_associativity(PowerPCCPU *cpu,
>          return H_P2;
>      }
>  
> +    /*
> +     * Index 0 is the ibm,associativity size of the node,
> +     * which isn't relevant here.
> +     */
> +    assoc_domain1 = smc->numa_assoc_array[tcpu->node_id][1];
> +    assoc_domain2 = smc->numa_assoc_array[tcpu->node_id][2];
> +    assoc_domain3 = smc->numa_assoc_array[tcpu->node_id][3];

Using all these temporaries is a little ugly.  Maybe do something like:
	uint32_t *assoc = smc->numa_assoc_array[tcpu->node_id];

Then just use assoc[1], assoc[2] etc. below.

> +
>      /* sequence is the same as in the "ibm,associativity" property */
>  
>      idx = 0;
>  #define ASSOCIATIVITY(a, b) (((uint64_t)(a) << 32) | \
>                               ((uint64_t)(b) & 0xffffffff))
> -    args[idx++] = ASSOCIATIVITY(0, 0);
> -    args[idx++] = ASSOCIATIVITY(0, tcpu->node_id);
> +    args[idx++] = ASSOCIATIVITY(assoc_domain1, assoc_domain2);
> +    args[idx++] = ASSOCIATIVITY(assoc_domain3, tcpu->node_id);
>      args[idx++] = ASSOCIATIVITY(procno, -1);
>      for ( ; idx < 6; idx++) {
>          args[idx] = -1;

Better yet would be to make this handle an arbitrary length of assoc
array, further isolating this from the specifics of how we construct
the arrays.  Ideally, you'd call a common path with
spapr_numa_fixup_cpu_dt() to get the assoc array for a cpu, then here
just translate it into the in-register format the hcall expects.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/7] pseries NUMA distance rework
  2020-09-03  1:35 ` [PATCH v2 0/7] pseries NUMA distance rework David Gibson
@ 2020-09-03  1:49   ` David Gibson
  0 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2020-09-03  1:49 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1794 bytes --]

On Thu, Sep 03, 2020 at 11:35:39AM +1000, David Gibson wrote:
> On Tue, Sep 01, 2020 at 09:56:38AM -0300, Daniel Henrique Barboza wrote:
> > Hi,
> > 
> > Following the reviews of the first version [1], specially this
> > reply from David [2], I decided to take a step back and refactor
> > all the code in hw/ppc/spapr* that operates with ibm,associativity,
> > ibm,associativity-reference-points and ibm,max-associativity-domains.
> > 
> > A new file named 'spapr_numa.c' was created to gather all the
> > associativity related code into helpers that write NUMA/associativity
> > related info to the FDT. These helpers are then used in other
> > spapr_* files. This allows us to change NUMA related code in a
> > single location, instead of searching every file to see where is
> > associativity being written and how, and all the soon to get
> > more complex logic can be contained in spapr_numa.c. I consider
> > the end result to be better than what I ended up doing in v1.
> > 
> > Unlike v1, there is no NUMA distance change being done in this series.
> > Later on, the hub of the new NUMA distance calculation will be
> > spapr_numa_associativity_init(), where we'll take into consideration
> > user input from numa_states, handle sizes to what the PAPR kernel
> > understands and establish assoaciativity domains between the NUMA
> > nodes.
> 
> Patches 1..4 applied to ppc-for-5.2.  Patch 5 has some nits I've
> commented on.

Ah, sorry, I realised I missed something.  Patches 1..2 are still
applied, but patch 3 has a nit large enough to call for a respin.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/7] spapr: introduce SpaprMachineClass::numa_assoc_array
  2020-09-01 12:56 ` [PATCH v2 3/7] spapr: introduce SpaprMachineClass::numa_assoc_array Daniel Henrique Barboza
@ 2020-09-03  1:51   ` David Gibson
  2020-09-03 11:28     ` Daniel Henrique Barboza
  0 siblings, 1 reply; 18+ messages in thread
From: David Gibson @ 2020-09-03  1:51 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2538 bytes --]

On Tue, Sep 01, 2020 at 09:56:41AM -0300, Daniel Henrique Barboza wrote:
> The next step to centralize all NUMA/associativity handling in
> the spapr machine is to create a 'one stop place' for all
> things ibm,associativity.
> 
> This patch introduces numa_assoc_array, a 2 dimensional array
> that will store all ibm,associativity arrays of all NUMA nodes.
> This array is initialized in a new spapr_numa_associativity_init()
> function, called in spapr_machine_init(). It is being initialized
> with the same values used in other ibm,associativity properties
> around spapr files (i.e. all zeros, last value is node_id).
> The idea is to remove all hardcoded definitions and FDT writes
> of ibm,associativity arrays, doing instead a call to the new
> helper spapr_numa_write_associativity_dt() helper, that will
> be able to write the DT with the correct values.
> 
> We'll start small, handling the trivial cases first. The
> remaining instances of ibm,associativity will be handled
> next.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>

The idea is great, but there's one small but significant problem here:

> +void spapr_numa_associativity_init(MachineState *machine)
> +{
> +    SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine);
> +    int nb_numa_nodes = machine->numa_state->num_nodes;
> +    int i;
> +
> +    /*
> +     * For all associativity arrays: first position is the size,
> +     * position MAX_DISTANCE_REF_POINTS is always the numa_id,
> +     * represented by the index 'i'.
> +     *
> +     * This will break on sparse NUMA setups, when/if QEMU starts
> +     * to support it, because there will be no more guarantee that
> +     * 'i' will be a valid node_id set by the user.
> +     */
> +    for (i = 0; i < nb_numa_nodes; i++) {
> +        smc->numa_assoc_array[i][0] = cpu_to_be32(MAX_DISTANCE_REF_POINTS);
> +        smc->numa_assoc_array[i][MAX_DISTANCE_REF_POINTS] = cpu_to_be32(i);

This initialization is called on a machine *instance*, which means it
should treat the machine class as read-only.  i.e. the
numa_assoc_array should be in the SpaprMachineState, rather than the
class.

I mean, we'd get away with it in practice, since there's only ever
likely to be a single machine instance, but still we should correct
this.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 6/7] spapr_numa: move NVLink2 associativity handling to spapr_numa.c
  2020-09-01 12:56 ` [PATCH v2 6/7] spapr_numa: move NVLink2 associativity " Daniel Henrique Barboza
@ 2020-09-03  1:56   ` David Gibson
  2020-09-03 14:20     ` Daniel Henrique Barboza
  0 siblings, 1 reply; 18+ messages in thread
From: David Gibson @ 2020-09-03  1:56 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 5684 bytes --]

On Tue, Sep 01, 2020 at 09:56:44AM -0300, Daniel Henrique Barboza wrote:
> This patch adds a new spapr_numa_write_assoc_nvlink2() helper
> to handle the ibm,associativity for NVLink2 GPUs.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

It might be nice to "precompute" the assoc arrays for the gpus as you
now do for the regular numa nodes.  That can be a later revision, though.

> ---
>  hw/ppc/spapr_numa.c         | 23 +++++++++++++++++++++++
>  hw/ppc/spapr_pci_nvlink2.c  | 19 ++-----------------
>  include/hw/ppc/spapr_numa.h |  3 +++
>  3 files changed, 28 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
> index 9eb4bdbe80..785cc24624 100644
> --- a/hw/ppc/spapr_numa.c
> +++ b/hw/ppc/spapr_numa.c
> @@ -15,6 +15,8 @@
>  #include "hw/ppc/spapr_numa.h"
>  #include "hw/ppc/fdt.h"
>  
> +/* Moved from hw/ppc/spapr_pci_nvlink2.c */
> +#define SPAPR_GPU_NUMA_ID           (cpu_to_be32(1))
>  
>  void spapr_numa_associativity_init(MachineState *machine)
>  {
> @@ -114,6 +116,27 @@ int spapr_numa_write_assoc_lookup_arrays(SpaprMachineState *spapr, void *fdt,
>      return ret;
>  }
>  
> +void spapr_numa_write_assoc_nvlink2(void *fdt, int offset, int numa_id,
> +                                    SpaprPhbState *sphb)
> +{
> +    uint32_t associativity[NUMA_ASSOC_SIZE];
> +    int i;
> +
> +    associativity[0] = cpu_to_be32(MAX_DISTANCE_REF_POINTS);
> +    for (i = 1; i < NUMA_ASSOC_SIZE; i++) {
> +        associativity[i] = cpu_to_be32(numa_id);
> +    };
> +
> +    if (sphb->pre_5_1_assoc) {
> +        associativity[1] = SPAPR_GPU_NUMA_ID;
> +        associativity[2] = SPAPR_GPU_NUMA_ID;
> +        associativity[3] = SPAPR_GPU_NUMA_ID;
> +    }
> +
> +    _FDT((fdt_setprop(fdt, offset, "ibm,associativity", associativity,
> +                      sizeof(associativity))));
> +}
> +
>  /*
>   * Helper that writes ibm,associativity-reference-points and
>   * max-associativity-domains in the RTAS pointed by @rtas
> diff --git a/hw/ppc/spapr_pci_nvlink2.c b/hw/ppc/spapr_pci_nvlink2.c
> index 76ae77ebc8..662a0af990 100644
> --- a/hw/ppc/spapr_pci_nvlink2.c
> +++ b/hw/ppc/spapr_pci_nvlink2.c
> @@ -29,6 +29,7 @@
>  #include "qemu/error-report.h"
>  #include "hw/ppc/fdt.h"
>  #include "hw/pci/pci_bridge.h"
> +#include "hw/ppc/spapr_numa.h"
>  
>  #define PHANDLE_PCIDEV(phb, pdev)    (0x12000000 | \
>                                       (((phb)->index) << 16) | ((pdev)->devfn))
> @@ -37,8 +38,6 @@
>  #define PHANDLE_NVLINK(phb, gn, nn)  (0x00130000 | (((phb)->index) << 8) | \
>                                       ((gn) << 4) | (nn))
>  
> -#define SPAPR_GPU_NUMA_ID           (cpu_to_be32(1))
> -
>  typedef struct SpaprPhbPciNvGpuSlot {
>          uint64_t tgt;
>          uint64_t gpa;
> @@ -360,13 +359,6 @@ void spapr_phb_nvgpu_ram_populate_dt(SpaprPhbState *sphb, void *fdt)
>          Object *nv_mrobj = object_property_get_link(OBJECT(nvslot->gpdev),
>                                                      "nvlink2-mr[0]",
>                                                      &error_abort);
> -        uint32_t associativity[] = {
> -            cpu_to_be32(0x4),
> -            cpu_to_be32(nvslot->numa_id),
> -            cpu_to_be32(nvslot->numa_id),
> -            cpu_to_be32(nvslot->numa_id),
> -            cpu_to_be32(nvslot->numa_id)
> -        };
>          uint64_t size = object_property_get_uint(nv_mrobj, "size", NULL);
>          uint64_t mem_reg[2] = { cpu_to_be64(nvslot->gpa), cpu_to_be64(size) };
>          char *mem_name = g_strdup_printf("memory@%"PRIx64, nvslot->gpa);
> @@ -376,14 +368,7 @@ void spapr_phb_nvgpu_ram_populate_dt(SpaprPhbState *sphb, void *fdt)
>          _FDT((fdt_setprop_string(fdt, off, "device_type", "memory")));
>          _FDT((fdt_setprop(fdt, off, "reg", mem_reg, sizeof(mem_reg))));
>  
> -        if (sphb->pre_5_1_assoc) {
> -            associativity[1] = SPAPR_GPU_NUMA_ID;
> -            associativity[2] = SPAPR_GPU_NUMA_ID;
> -            associativity[3] = SPAPR_GPU_NUMA_ID;
> -        }
> -
> -        _FDT((fdt_setprop(fdt, off, "ibm,associativity", associativity,
> -                          sizeof(associativity))));
> +        spapr_numa_write_assoc_nvlink2(fdt, off, nvslot->numa_id, sphb);
>  
>          _FDT((fdt_setprop_string(fdt, off, "compatible",
>                                   "ibm,coherent-device-memory")));
> diff --git a/include/hw/ppc/spapr_numa.h b/include/hw/ppc/spapr_numa.h
> index f6127501a6..b6e0721b07 100644
> --- a/include/hw/ppc/spapr_numa.h
> +++ b/include/hw/ppc/spapr_numa.h
> @@ -15,6 +15,7 @@
>  
>  #include "hw/boards.h"
>  #include "hw/ppc/spapr.h"
> +#include "hw/pci-host/spapr.h"
>  
>  void spapr_numa_associativity_init(MachineState *machine);
>  void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas);
> @@ -24,6 +25,8 @@ int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt,
>                              int offset, PowerPCCPU *cpu);
>  int spapr_numa_write_assoc_lookup_arrays(SpaprMachineState *spapr, void *fdt,
>                                           int offset);
> +void spapr_numa_write_assoc_nvlink2(void *fdt, int offset, int numa_id,
> +                                    SpaprPhbState *sphb);
>  
>  
>  #endif /* HW_SPAPR_NUMA_H */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 7/7] spapr_hcall: h_home_node_associativity now reads numa_assoc_array
  2020-09-03  1:46   ` David Gibson
@ 2020-09-03 11:17     ` Daniel Henrique Barboza
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Henrique Barboza @ 2020-09-03 11:17 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel



On 9/2/20 10:46 PM, David Gibson wrote:
> On Tue, Sep 01, 2020 at 09:56:45AM -0300, Daniel Henrique Barboza wrote:
>> home_node_associativity reply now uses the associativity
>> values for tcpu->node_id provided by numa_assoc_array.
>>
>> This will avoid further changes in this code when numa_assoc_array
>> changes values, but it won't be enough to prevent further changes
>> if (falar aqui q se mudar o tamanho do array tem q mexer nessa
>> funcao tambem, falar q a macro associativity() deixa a automacao
>> de tudo mto unreadable)
> 
> Uh.. I'm guessing that was a note to yourself you intended to
> translate before publishing?

Ops! Forgot to edit it :|

> 
>>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>> ---
>>   hw/ppc/spapr_hcall.c | 16 ++++++++++++++--
>>   1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>> index c1d01228c6..2ec30efdcb 100644
>> --- a/hw/ppc/spapr_hcall.c
>> +++ b/hw/ppc/spapr_hcall.c
>> @@ -1878,9 +1878,13 @@ static target_ulong h_home_node_associativity(PowerPCCPU *cpu,
>>                                                 target_ulong opcode,
>>                                                 target_ulong *args)
> 
> You could move this function to spapr_numa.c as well (the name's a bit
> misleading, but spapr_hcall.c isn't really intended to hold *all*
> hcall implementations, just the ones that don't have somewhere better
> to live).

Ok!


> 
>>   {
>> +    SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
>>       target_ulong flags = args[0];
>>       target_ulong procno = args[1];
>>       PowerPCCPU *tcpu;
>> +    uint32_t assoc_domain1;
>> +    uint32_t assoc_domain2;
>> +    uint32_t assoc_domain3;
>>       int idx;
>>   
>>       /* only support procno from H_REGISTER_VPA */
>> @@ -1893,13 +1897,21 @@ static target_ulong h_home_node_associativity(PowerPCCPU *cpu,
>>           return H_P2;
>>       }
>>   
>> +    /*
>> +     * Index 0 is the ibm,associativity size of the node,
>> +     * which isn't relevant here.
>> +     */
>> +    assoc_domain1 = smc->numa_assoc_array[tcpu->node_id][1];
>> +    assoc_domain2 = smc->numa_assoc_array[tcpu->node_id][2];
>> +    assoc_domain3 = smc->numa_assoc_array[tcpu->node_id][3];
> 
> Using all these temporaries is a little ugly.  Maybe do something like:
> 	uint32_t *assoc = smc->numa_assoc_array[tcpu->node_id];
> 
> Then just use assoc[1], assoc[2] etc. below.
> 
>> +
>>       /* sequence is the same as in the "ibm,associativity" property */
>>   
>>       idx = 0;
>>   #define ASSOCIATIVITY(a, b) (((uint64_t)(a) << 32) | \
>>                                ((uint64_t)(b) & 0xffffffff))
>> -    args[idx++] = ASSOCIATIVITY(0, 0);
>> -    args[idx++] = ASSOCIATIVITY(0, tcpu->node_id);
>> +    args[idx++] = ASSOCIATIVITY(assoc_domain1, assoc_domain2);
>> +    args[idx++] = ASSOCIATIVITY(assoc_domain3, tcpu->node_id);
>>       args[idx++] = ASSOCIATIVITY(procno, -1);
>>       for ( ; idx < 6; idx++) {
>>           args[idx] = -1;
> 
> Better yet would be to make this handle an arbitrary length of assoc
> array, further isolating this from the specifics of how we construct
> the arrays.  Ideally, you'd call a common path with
> spapr_numa_fixup_cpu_dt() to get the assoc array for a cpu, then here
> just translate it into the in-register format the hcall expects.


Since we're moving this to spapr_numa.c then I guess it's ok to add
more juggling to handle an arbitrary size. I got a bit nervous about
adding too much stuff here in spapr_hcall.c.


Thanks,


DHB

> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 5/7] spapr, spapr_numa: move lookup-arrays handling to spapr_numa.c
  2020-09-03  1:34   ` David Gibson
@ 2020-09-03 11:22     ` Daniel Henrique Barboza
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Henrique Barboza @ 2020-09-03 11:22 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel



On 9/2/20 10:34 PM, David Gibson wrote:
> On Tue, Sep 01, 2020 at 09:56:43AM -0300, Daniel Henrique Barboza wrote:
>> In a similar fashion as the previous patch, let's move the
>> handling of ibm,associativity-lookup-arrays from spapr.c to
>> spapr_numa.c. A spapr_numa_write_assoc_lookup_arrays() helper was
>> created, and spapr_dt_dynamic_reconfiguration_memory() can now
>> use it to advertise the lookup-arrays.
>>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>> ---
>>   hw/ppc/spapr.c              | 25 ++----------------------
>>   hw/ppc/spapr_numa.c         | 39 +++++++++++++++++++++++++++++++++++++
>>   include/hw/ppc/spapr_numa.h |  2 ++
>>   3 files changed, 43 insertions(+), 23 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 172f965fe0..65d2ccd578 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -535,13 +535,10 @@ static int spapr_dt_dynamic_reconfiguration_memory(SpaprMachineState *spapr,
>>                                                      void *fdt)
>>   {
>>       MachineState *machine = MACHINE(spapr);
>> -    int nb_numa_nodes = machine->numa_state->num_nodes;
>> -    int ret, i, offset;
>> +    int ret, offset;
>>       uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
>>       uint32_t prop_lmb_size[] = {cpu_to_be32(lmb_size >> 32),
>>                                   cpu_to_be32(lmb_size & 0xffffffff)};
>> -    uint32_t *int_buf, *cur_index, buf_len;
>> -    int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
>>       MemoryDeviceInfoList *dimms = NULL;
>>   
>>       /*
>> @@ -582,25 +579,7 @@ static int spapr_dt_dynamic_reconfiguration_memory(SpaprMachineState *spapr,
>>           return ret;
>>       }
>>   
>> -    /* ibm,associativity-lookup-arrays */
>> -    buf_len = (nr_nodes * 4 + 2) * sizeof(uint32_t);
>> -    cur_index = int_buf = g_malloc0(buf_len);
>> -    int_buf[0] = cpu_to_be32(nr_nodes);
>> -    int_buf[1] = cpu_to_be32(4); /* Number of entries per associativity list */
>> -    cur_index += 2;
>> -    for (i = 0; i < nr_nodes; i++) {
>> -        uint32_t associativity[] = {
>> -            cpu_to_be32(0x0),
>> -            cpu_to_be32(0x0),
>> -            cpu_to_be32(0x0),
>> -            cpu_to_be32(i)
>> -        };
>> -        memcpy(cur_index, associativity, sizeof(associativity));
>> -        cur_index += 4;
>> -    }
>> -    ret = fdt_setprop(fdt, offset, "ibm,associativity-lookup-arrays", int_buf,
>> -            (cur_index - int_buf) * sizeof(uint32_t));
>> -    g_free(int_buf);
>> +    ret = spapr_numa_write_assoc_lookup_arrays(spapr, fdt, offset);
>>   
>>       return ret;
>>   }
>> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
>> index b8882d209e..9eb4bdbe80 100644
>> --- a/hw/ppc/spapr_numa.c
>> +++ b/hw/ppc/spapr_numa.c
>> @@ -75,6 +75,45 @@ int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt,
>>                          vcpu_assoc, sizeof(vcpu_assoc));
>>   }
>>   
>> +
>> +int spapr_numa_write_assoc_lookup_arrays(SpaprMachineState *spapr, void *fdt,
>> +                                         int offset)
>> +{
>> +    MachineState *machine = MACHINE(spapr);
>> +    SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
>> +    int nb_numa_nodes = machine->numa_state->num_nodes;
>> +    int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
>> +    uint32_t *int_buf, *cur_index, buf_len;
>> +    int ret, i, j;
>> +
>> +    /* ibm,associativity-lookup-arrays */
>> +    buf_len = (nr_nodes * MAX_DISTANCE_REF_POINTS + 2) * sizeof(uint32_t);
>> +    cur_index = int_buf = g_malloc0(buf_len);
>> +    int_buf[0] = cpu_to_be32(nr_nodes);
>> +     /* Number of entries per associativity list */
>> +    int_buf[1] = cpu_to_be32(MAX_DISTANCE_REF_POINTS);
>> +    cur_index += 2;
>> +    for (i = 0; i < nr_nodes; i++) {
>> +        /*
>> +         * For the lookup-array we use the ibm,associativity array,
>> +         * from numa_assoc_array. without the first element (size).
>> +         */
>> +        uint32_t associativity[MAX_DISTANCE_REF_POINTS];
>> +
>> +        for (j = 0; j < MAX_DISTANCE_REF_POINTS; j++) {
>> +            associativity[j] = smc->numa_assoc_array[i][j + 1];
>> +        }
>> +
>> +        memcpy(cur_index, associativity, sizeof(associativity));
> 
> AFAICT, you could just use a single memcpy() to copy from the
> numa_assoc_array() into the property here, rather than using a loop
> and temporary array.

I remember that I was having some weird problems with memcpy() and
numa_assoc_array and this is how I got around it. I'll try to sort it
out again.

> 
>> +        cur_index += 4;
> 
> Shouldn't this be  += MAX_DISTANCE_REF_POINTS?


Yeah it should. Good catch.

> 
>> +    }
>> +    ret = fdt_setprop(fdt, offset, "ibm,associativity-lookup-arrays", int_buf,
>> +                      (cur_index - int_buf) * sizeof(uint32_t));
>> +    g_free(int_buf);
>> +
>> +    return ret;
>> +}
>> +
>>   /*
>>    * Helper that writes ibm,associativity-reference-points and
>>    * max-associativity-domains in the RTAS pointed by @rtas
>> diff --git a/include/hw/ppc/spapr_numa.h b/include/hw/ppc/spapr_numa.h
>> index f92fb4f28a..f6127501a6 100644
>> --- a/include/hw/ppc/spapr_numa.h
>> +++ b/include/hw/ppc/spapr_numa.h
>> @@ -22,6 +22,8 @@ void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,
>>                                          int offset, int nodeid);
>>   int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt,
>>                               int offset, PowerPCCPU *cpu);
>> +int spapr_numa_write_assoc_lookup_arrays(SpaprMachineState *spapr, void *fdt,
>> +                                         int offset);
>>   
>>   
>>   #endif /* HW_SPAPR_NUMA_H */
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/7] spapr: introduce SpaprMachineClass::numa_assoc_array
  2020-09-03  1:51   ` David Gibson
@ 2020-09-03 11:28     ` Daniel Henrique Barboza
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Henrique Barboza @ 2020-09-03 11:28 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel



On 9/2/20 10:51 PM, David Gibson wrote:
> On Tue, Sep 01, 2020 at 09:56:41AM -0300, Daniel Henrique Barboza wrote:
>> The next step to centralize all NUMA/associativity handling in
>> the spapr machine is to create a 'one stop place' for all
>> things ibm,associativity.
>>
>> This patch introduces numa_assoc_array, a 2 dimensional array
>> that will store all ibm,associativity arrays of all NUMA nodes.
>> This array is initialized in a new spapr_numa_associativity_init()
>> function, called in spapr_machine_init(). It is being initialized
>> with the same values used in other ibm,associativity properties
>> around spapr files (i.e. all zeros, last value is node_id).
>> The idea is to remove all hardcoded definitions and FDT writes
>> of ibm,associativity arrays, doing instead a call to the new
>> helper spapr_numa_write_associativity_dt() helper, that will
>> be able to write the DT with the correct values.
>>
>> We'll start small, handling the trivial cases first. The
>> remaining instances of ibm,associativity will be handled
>> next.
>>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> 
> The idea is great, but there's one small but significant problem here:
> 
>> +void spapr_numa_associativity_init(MachineState *machine)
>> +{
>> +    SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine);
>> +    int nb_numa_nodes = machine->numa_state->num_nodes;
>> +    int i;
>> +
>> +    /*
>> +     * For all associativity arrays: first position is the size,
>> +     * position MAX_DISTANCE_REF_POINTS is always the numa_id,
>> +     * represented by the index 'i'.
>> +     *
>> +     * This will break on sparse NUMA setups, when/if QEMU starts
>> +     * to support it, because there will be no more guarantee that
>> +     * 'i' will be a valid node_id set by the user.
>> +     */
>> +    for (i = 0; i < nb_numa_nodes; i++) {
>> +        smc->numa_assoc_array[i][0] = cpu_to_be32(MAX_DISTANCE_REF_POINTS);
>> +        smc->numa_assoc_array[i][MAX_DISTANCE_REF_POINTS] = cpu_to_be32(i);
> 
> This initialization is called on a machine *instance*, which means it
> should treat the machine class as read-only.  i.e. the
> numa_assoc_array should be in the SpaprMachineState, rather than the
> class.
> 
> I mean, we'd get away with it in practice, since there's only ever
> likely to be a single machine instance, but still we should correct
> this.

Got it. I'll move it to SpaprMachineState. This will also spare a handful of lines
everywhere else since I was instantiating the class just to manipulate the matrix
(and now, in hindsight, I figured that this was a warning about the weirdness
of what I was doing).


Thanks,


DHB

> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 6/7] spapr_numa: move NVLink2 associativity handling to spapr_numa.c
  2020-09-03  1:56   ` David Gibson
@ 2020-09-03 14:20     ` Daniel Henrique Barboza
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Henrique Barboza @ 2020-09-03 14:20 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel



On 9/2/20 10:56 PM, David Gibson wrote:
> On Tue, Sep 01, 2020 at 09:56:44AM -0300, Daniel Henrique Barboza wrote:
>> This patch adds a new spapr_numa_write_assoc_nvlink2() helper
>> to handle the ibm,associativity for NVLink2 GPUs.
>>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> It might be nice to "precompute" the assoc arrays for the gpus as you
> now do for the regular numa nodes.  That can be a later revision, though.

Hmm ... I have the follow-up series with the NUMA calculation ready, and
one of the steps I had to take there was to initiate all the associativity
arrays with 'node_id' instead of leaving unintialized (reason: the kernel
make associativity matches with zeroes). In the end I'm initializing every
numa node as we do with GPUs.

I'll bring some of this future code to this series and handle GPUs like
a regular numa node as you suggested. Let's see how it goes.



Thanks,

DHB

> 
>> ---
>>   hw/ppc/spapr_numa.c         | 23 +++++++++++++++++++++++
>>   hw/ppc/spapr_pci_nvlink2.c  | 19 ++-----------------
>>   include/hw/ppc/spapr_numa.h |  3 +++
>>   3 files changed, 28 insertions(+), 17 deletions(-)
>>
>> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
>> index 9eb4bdbe80..785cc24624 100644
>> --- a/hw/ppc/spapr_numa.c
>> +++ b/hw/ppc/spapr_numa.c
>> @@ -15,6 +15,8 @@
>>   #include "hw/ppc/spapr_numa.h"
>>   #include "hw/ppc/fdt.h"
>>   
>> +/* Moved from hw/ppc/spapr_pci_nvlink2.c */
>> +#define SPAPR_GPU_NUMA_ID           (cpu_to_be32(1))
>>   
>>   void spapr_numa_associativity_init(MachineState *machine)
>>   {
>> @@ -114,6 +116,27 @@ int spapr_numa_write_assoc_lookup_arrays(SpaprMachineState *spapr, void *fdt,
>>       return ret;
>>   }
>>   
>> +void spapr_numa_write_assoc_nvlink2(void *fdt, int offset, int numa_id,
>> +                                    SpaprPhbState *sphb)
>> +{
>> +    uint32_t associativity[NUMA_ASSOC_SIZE];
>> +    int i;
>> +
>> +    associativity[0] = cpu_to_be32(MAX_DISTANCE_REF_POINTS);
>> +    for (i = 1; i < NUMA_ASSOC_SIZE; i++) {
>> +        associativity[i] = cpu_to_be32(numa_id);
>> +    };
>> +
>> +    if (sphb->pre_5_1_assoc) {
>> +        associativity[1] = SPAPR_GPU_NUMA_ID;
>> +        associativity[2] = SPAPR_GPU_NUMA_ID;
>> +        associativity[3] = SPAPR_GPU_NUMA_ID;
>> +    }
>> +
>> +    _FDT((fdt_setprop(fdt, offset, "ibm,associativity", associativity,
>> +                      sizeof(associativity))));
>> +}
>> +
>>   /*
>>    * Helper that writes ibm,associativity-reference-points and
>>    * max-associativity-domains in the RTAS pointed by @rtas
>> diff --git a/hw/ppc/spapr_pci_nvlink2.c b/hw/ppc/spapr_pci_nvlink2.c
>> index 76ae77ebc8..662a0af990 100644
>> --- a/hw/ppc/spapr_pci_nvlink2.c
>> +++ b/hw/ppc/spapr_pci_nvlink2.c
>> @@ -29,6 +29,7 @@
>>   #include "qemu/error-report.h"
>>   #include "hw/ppc/fdt.h"
>>   #include "hw/pci/pci_bridge.h"
>> +#include "hw/ppc/spapr_numa.h"
>>   
>>   #define PHANDLE_PCIDEV(phb, pdev)    (0x12000000 | \
>>                                        (((phb)->index) << 16) | ((pdev)->devfn))
>> @@ -37,8 +38,6 @@
>>   #define PHANDLE_NVLINK(phb, gn, nn)  (0x00130000 | (((phb)->index) << 8) | \
>>                                        ((gn) << 4) | (nn))
>>   
>> -#define SPAPR_GPU_NUMA_ID           (cpu_to_be32(1))
>> -
>>   typedef struct SpaprPhbPciNvGpuSlot {
>>           uint64_t tgt;
>>           uint64_t gpa;
>> @@ -360,13 +359,6 @@ void spapr_phb_nvgpu_ram_populate_dt(SpaprPhbState *sphb, void *fdt)
>>           Object *nv_mrobj = object_property_get_link(OBJECT(nvslot->gpdev),
>>                                                       "nvlink2-mr[0]",
>>                                                       &error_abort);
>> -        uint32_t associativity[] = {
>> -            cpu_to_be32(0x4),
>> -            cpu_to_be32(nvslot->numa_id),
>> -            cpu_to_be32(nvslot->numa_id),
>> -            cpu_to_be32(nvslot->numa_id),
>> -            cpu_to_be32(nvslot->numa_id)
>> -        };
>>           uint64_t size = object_property_get_uint(nv_mrobj, "size", NULL);
>>           uint64_t mem_reg[2] = { cpu_to_be64(nvslot->gpa), cpu_to_be64(size) };
>>           char *mem_name = g_strdup_printf("memory@%"PRIx64, nvslot->gpa);
>> @@ -376,14 +368,7 @@ void spapr_phb_nvgpu_ram_populate_dt(SpaprPhbState *sphb, void *fdt)
>>           _FDT((fdt_setprop_string(fdt, off, "device_type", "memory")));
>>           _FDT((fdt_setprop(fdt, off, "reg", mem_reg, sizeof(mem_reg))));
>>   
>> -        if (sphb->pre_5_1_assoc) {
>> -            associativity[1] = SPAPR_GPU_NUMA_ID;
>> -            associativity[2] = SPAPR_GPU_NUMA_ID;
>> -            associativity[3] = SPAPR_GPU_NUMA_ID;
>> -        }
>> -
>> -        _FDT((fdt_setprop(fdt, off, "ibm,associativity", associativity,
>> -                          sizeof(associativity))));
>> +        spapr_numa_write_assoc_nvlink2(fdt, off, nvslot->numa_id, sphb);
>>   
>>           _FDT((fdt_setprop_string(fdt, off, "compatible",
>>                                    "ibm,coherent-device-memory")));
>> diff --git a/include/hw/ppc/spapr_numa.h b/include/hw/ppc/spapr_numa.h
>> index f6127501a6..b6e0721b07 100644
>> --- a/include/hw/ppc/spapr_numa.h
>> +++ b/include/hw/ppc/spapr_numa.h
>> @@ -15,6 +15,7 @@
>>   
>>   #include "hw/boards.h"
>>   #include "hw/ppc/spapr.h"
>> +#include "hw/pci-host/spapr.h"
>>   
>>   void spapr_numa_associativity_init(MachineState *machine);
>>   void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas);
>> @@ -24,6 +25,8 @@ int spapr_numa_fixup_cpu_dt(SpaprMachineState *spapr, void *fdt,
>>                               int offset, PowerPCCPU *cpu);
>>   int spapr_numa_write_assoc_lookup_arrays(SpaprMachineState *spapr, void *fdt,
>>                                            int offset);
>> +void spapr_numa_write_assoc_nvlink2(void *fdt, int offset, int numa_id,
>> +                                    SpaprPhbState *sphb);
>>   
>>   
>>   #endif /* HW_SPAPR_NUMA_H */
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-09-03 14:21 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-01 12:56 [PATCH v2 0/7] pseries NUMA distance rework Daniel Henrique Barboza
2020-09-01 12:56 ` [PATCH v2 1/7] ppc: introducing spapr_numa.c NUMA code helper Daniel Henrique Barboza
2020-09-01 12:56 ` [PATCH v2 2/7] ppc/spapr_nvdimm: turn spapr_dt_nvdimm() static Daniel Henrique Barboza
2020-09-01 12:56 ` [PATCH v2 3/7] spapr: introduce SpaprMachineClass::numa_assoc_array Daniel Henrique Barboza
2020-09-03  1:51   ` David Gibson
2020-09-03 11:28     ` Daniel Henrique Barboza
2020-09-01 12:56 ` [PATCH v2 4/7] spapr, spapr_numa: handle vcpu ibm,associativity Daniel Henrique Barboza
2020-09-01 12:56 ` [PATCH v2 5/7] spapr, spapr_numa: move lookup-arrays handling to spapr_numa.c Daniel Henrique Barboza
2020-09-03  1:34   ` David Gibson
2020-09-03 11:22     ` Daniel Henrique Barboza
2020-09-01 12:56 ` [PATCH v2 6/7] spapr_numa: move NVLink2 associativity " Daniel Henrique Barboza
2020-09-03  1:56   ` David Gibson
2020-09-03 14:20     ` Daniel Henrique Barboza
2020-09-01 12:56 ` [PATCH v2 7/7] spapr_hcall: h_home_node_associativity now reads numa_assoc_array Daniel Henrique Barboza
2020-09-03  1:46   ` David Gibson
2020-09-03 11:17     ` Daniel Henrique Barboza
2020-09-03  1:35 ` [PATCH v2 0/7] pseries NUMA distance rework David Gibson
2020-09-03  1:49   ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.