All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/4] tcg: support heterogenous CPU clusters
@ 2019-01-08 16:30 Peter Maydell
  2019-01-08 16:30 ` [Qemu-devel] [PATCH 1/4] hw/arm/xlx-zynqmp: Realize cluster after putting RPUs in it Peter Maydell
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Peter Maydell @ 2019-01-08 16:30 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: patches, Richard Henderson, Peter Crosthwaite, Paolo Bonzini,
	Alistair Francis, Edgar E. Iglesias, Eduardo Habkost,
	Marcel Apfelbaum, Emilio G . Cota

Currently, TCG implicitly assumes that all CPUs are alike,
because we have a single cache of generated TBs and we
don't account for which CPU generated the code or is looking
for the TB when adding or searching for generated TBs.
This can go wrong in two situations:
 (1) two CPUs have different physical address spaces
     (eg CPU 1 has one lot of RAM/ROM, and CPU 2 has
     different RAM/ROM): the physical address alone is
     then not sufficient to distinguish what code to run
 (2) two CPUs have different features (eg FPU vs no FPU):
     since our TCG frontends bake assumptions into the
     generated code about the presence/absence of features,
     if a CPU with FPU picks up a TB for one generated
     without an FPU it will behave wrongly
     
This is unfortunate, because we already have one board in
the tree which has a heterogenous setup: the xlnx-zcu102.
This board has 4xCortex-A53 and 2xCortex-R5F. Although
most "normal" guest code doesn't run into this, it is
possible to demonstrate the bug with it. There's a test case
at http://people.linaro.org/~peter.maydell/xlnx-het.tgz
which arranges to run a fragment of code in AArch32 which
should behave differently on the two CPUs, but does not
(either the A53 gets the behaviour the R5 should have, or
vice-versa).

This patchset adds the "cluster ID" to the set of things
we include in the TCG TB hash, which means that the two
sets of CPUs in this board can no longer accidentally
get TBs generated by the other cluster. It fixes the
bug demonstrated by the test case.

Design notes:

 * Adding cluster ID to the hash is RTH's suggestion. It might
   also be possible to make each cluster have a TCGContext
   code generation context, and have the CPUs use the right
   one for the cluster, but that would be a lot more code
   rework compared to this approach.
 * I put the cluster number in the existing cflags, since
   we have plenty of spare bits there. If in future we
   need either more than 256 clusters (unlikely) or want
   to reclaim bits in the cflags field for some other
   purpose we can always extend our xxhash from 28 to 32 bytes.
   (I actually wrote the patch to do that before realising
   I didn't need it...)
 * The cluster object now fills in the CPU object's
   cluster_index field when the cluster object is realized.
   This imposes an ordering constraint that all CPUs must
   be added to a cluster before realizing the cluster. That's
   not a big deal, except that unfortunately QOM provides
   no way that I could find to enforce this. If anybody
   has a suggestion for a better approach that would be great.
   Patch 1 therefore makes sure our only current user of the
   cluster feature obeys the ordering constraint.
 * Patch 4 is a tidyup to the gdbstub code which is possible
   now that the CPUState struct has the cluster ID in it.

I believe that this fix is basically all we need to support
heterogenous setups (assuming of course that you just mean
"within an architecture family" like Arm, rather than
systems with say both an Arm and a Microblaze core in them).
The other things I have considered are:

 * code that calls cpu_physical_memory_read() and friends,
   uses address_space_memory, or otherwise assumes there's
   only a single view of "CPU memory": I sent some patches
   out before Christmas that fixed these in generic code
   like the monitor and disassembler. There are also some
   uses in target-arch or device-specific code, which I
   think we can in practice ignore unless/until we come to
   implement a board that is heterogenous and also uses those
   devices or target architectures.
 * handling of TB invalidation: I think this should Just Work,
   because tb_invalidate_phys_addr() takes an AddressSpace+hwaddr,
   which it converts into a ramaddr for (the badly misnamed)
   tb_invalidate_phys_page_range(). So the TB caching all works
   on ramaddrs, and if CPU 1 writes to some ram X that's mapped
   at physaddr A for it but at physaddr B for CPU 2, we will
   still correctly invalidate any TBs that CPU 2 had for code
   that from its point of view lives at physaddr B.
   Note that translate-all.c has a single l1_map[] structure
   which will have PageDesc entries for pages for all CPUs in
   the system. I believe this is OK because the only thing
   we use them for is TB invalidation.
 * dirty memory tracking: like TB invalidation, this is OK
   because it all works either on MemoryRegions or on ramaddrs,
   not on physaddrs.

Have I missed anything that we also need to fix ?

thanks
-- PMM

Peter Maydell (4):
  hw/arm/xlx-zynqmp: Realize cluster after putting RPUs in it
  qom/cpu: Add cluster_index to CPUState
  accel/tcg: Add cluster number to TCG TB hash
  gdbstub: Simplify gdb_get_cpu_pid() to use cpu->cluster_index

 include/exec/exec-all.h   |  4 +++-
 include/hw/cpu/cluster.h  | 19 ++++++++++++++++
 include/qom/cpu.h         |  7 ++++++
 accel/tcg/cpu-exec.c      |  4 ++++
 accel/tcg/translate-all.c |  3 +++
 gdbstub.c                 | 48 ++++-----------------------------------
 hw/arm/xlnx-zynqmp.c      |  4 ++--
 hw/cpu/cluster.c          | 33 +++++++++++++++++++++++++++
 qom/cpu.c                 |  1 +
 9 files changed, 77 insertions(+), 46 deletions(-)

-- 
2.19.2

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 1/4] hw/arm/xlx-zynqmp: Realize cluster after putting RPUs in it
  2019-01-08 16:30 [Qemu-devel] [PATCH 0/4] tcg: support heterogenous CPU clusters Peter Maydell
@ 2019-01-08 16:30 ` Peter Maydell
  2019-01-10 15:11   ` Luc Michel
  2019-01-10 19:52   ` Alistair Francis
  2019-01-08 16:30 ` [Qemu-devel] [PATCH 2/4] qom/cpu: Add cluster_index to CPUState Peter Maydell
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 18+ messages in thread
From: Peter Maydell @ 2019-01-08 16:30 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: patches, Richard Henderson, Peter Crosthwaite, Paolo Bonzini,
	Alistair Francis, Edgar E. Iglesias, Eduardo Habkost,
	Marcel Apfelbaum, Emilio G . Cota

Currently the cluster implementation doesn't have any constraints
on the ordering of realizing the TYPE_CPU_CLUSTER and populating it
with child objects. We want to impose a constraint that realize
must happen only after all the child objects are added, so move
the realize of rpu_cluster. (The apu_cluster is already
realized after child population.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/xlnx-zynqmp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
index c67ac2e64ac..370b0e44a38 100644
--- a/hw/arm/xlnx-zynqmp.c
+++ b/hw/arm/xlnx-zynqmp.c
@@ -183,8 +183,6 @@ static void xlnx_zynqmp_create_rpu(XlnxZynqMPState *s, const char *boot_cpu,
                             &error_abort, NULL);
     qdev_prop_set_uint32(DEVICE(&s->rpu_cluster), "cluster-id", 1);
 
-    qdev_init_nofail(DEVICE(&s->rpu_cluster));
-
     for (i = 0; i < num_rpus; i++) {
         char *name;
 
@@ -212,6 +210,8 @@ static void xlnx_zynqmp_create_rpu(XlnxZynqMPState *s, const char *boot_cpu,
             return;
         }
     }
+
+    qdev_init_nofail(DEVICE(&s->rpu_cluster));
 }
 
 static void xlnx_zynqmp_init(Object *obj)
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 2/4] qom/cpu: Add cluster_index to CPUState
  2019-01-08 16:30 [Qemu-devel] [PATCH 0/4] tcg: support heterogenous CPU clusters Peter Maydell
  2019-01-08 16:30 ` [Qemu-devel] [PATCH 1/4] hw/arm/xlx-zynqmp: Realize cluster after putting RPUs in it Peter Maydell
@ 2019-01-08 16:30 ` Peter Maydell
  2019-01-10 15:13   ` Luc Michel
  2019-01-08 16:30 ` [Qemu-devel] [PATCH 3/4] accel/tcg: Add cluster number to TCG TB hash Peter Maydell
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Peter Maydell @ 2019-01-08 16:30 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: patches, Richard Henderson, Peter Crosthwaite, Paolo Bonzini,
	Alistair Francis, Edgar E. Iglesias, Eduardo Habkost,
	Marcel Apfelbaum, Emilio G . Cota

For TCG we want to distinguish which cluster a CPU is in, and
we need to do it quickly. Cache the cluster index in the CPUState
struct, by having the cluster object set cpu->cluster_index for
each CPU child when it is realized.

This means that board/SoC code must add all CPUs to the cluster
before realizing the cluster object. Regrettably QOM provides no
way to prevent adding children to a realized object and no way for
the parent to be notified when a new child is added to it, so
we don't have any way to enforce/assert this constraint; all
we can do is document it in a comment.

The restriction on how many clusters can exist in the system
is imposed by TCG code which will be added in a subsequent commit,
but the check to enforce it in cluster.c fits better in this one.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/cpu/cluster.h | 19 +++++++++++++++++++
 include/qom/cpu.h        |  7 +++++++
 hw/cpu/cluster.c         | 33 +++++++++++++++++++++++++++++++++
 qom/cpu.c                |  1 +
 4 files changed, 60 insertions(+)

diff --git a/include/hw/cpu/cluster.h b/include/hw/cpu/cluster.h
index 73818232437..d1bef315d10 100644
--- a/include/hw/cpu/cluster.h
+++ b/include/hw/cpu/cluster.h
@@ -34,12 +34,31 @@
  * Arm big.LITTLE system) they should be in different clusters. If the CPUs do
  * not have the same view of memory (for example the main CPU and a management
  * controller processor) they should be in different clusters.
+ *
+ * A cluster is created by creating an object of TYPE_CPU_CLUSTER, and then
+ * adding the CPUs to it as QOM child objects (e.g. using the
+ * object_initialize_child() or object_property_add_child() functions).
+ * All CPUs must be added as children before the cluster is realized.
+ * (Regrettably QOM provides no way to prevent adding children to a realized
+ * object and no way for the parent to be notified when a new child is added
+ * to it, so this restriction is not checked for, but the system will not
+ * behave correctly if it is not adhered to.)
+ *
+ * A CPU which is not put into any cluster will be considered implicitly
+ * to be in a cluster with all the other "loose" CPUs, so all CPUs that are
+ * not assigned to clusters must be identical.
  */
 
 #define TYPE_CPU_CLUSTER "cpu-cluster"
 #define CPU_CLUSTER(obj) \
     OBJECT_CHECK(CPUClusterState, (obj), TYPE_CPU_CLUSTER)
 
+/*
+ * This limit is imposed by TCG, which puts the cluster ID into an
+ * 8 bit field (and uses all-1s for the default "not in any cluster").
+ */
+#define MAX_CLUSTERS 255
+
 /**
  * CPUClusterState:
  * @cluster_id: The cluster ID. This value is for internal use only and should
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 1396f53e5b5..844becbcedc 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -279,6 +279,11 @@ struct qemu_work_item;
 /**
  * CPUState:
  * @cpu_index: CPU index (informative).
+ * @cluster_index: Identifies which cluster this CPU is in.
+ *   For boards which don't define clusters or for "loose" CPUs not assigned
+ *   to a cluster this will be UNASSIGNED_CLUSTER_INDEX; otherwise it will
+ *   be the same as the cluster-id property of the CPU object's TYPE_CPU_CLUSTER
+ *   QOM parent.
  * @nr_cores: Number of cores within this CPU package.
  * @nr_threads: Number of threads within this CPU.
  * @running: #true if CPU is currently running (lockless).
@@ -404,6 +409,7 @@ struct CPUState {
 
     /* TODO Move common fields from CPUArchState here. */
     int cpu_index;
+    int cluster_index;
     uint32_t halted;
     uint32_t can_do_io;
     int32_t exception_index;
@@ -1109,5 +1115,6 @@ extern const struct VMStateDescription vmstate_cpu_common;
 #endif /* NEED_CPU_H */
 
 #define UNASSIGNED_CPU_INDEX -1
+#define UNASSIGNED_CLUSTER_INDEX -1
 
 #endif
diff --git a/hw/cpu/cluster.c b/hw/cpu/cluster.c
index 9d50a235d5c..d672f54a620 100644
--- a/hw/cpu/cluster.c
+++ b/hw/cpu/cluster.c
@@ -20,19 +20,52 @@
 
 #include "qemu/osdep.h"
 #include "hw/cpu/cluster.h"
+#include "qom/cpu.h"
 #include "qapi/error.h"
 #include "qemu/module.h"
+#include "qemu/cutils.h"
 
 static Property cpu_cluster_properties[] = {
     DEFINE_PROP_UINT32("cluster-id", CPUClusterState, cluster_id, 0),
     DEFINE_PROP_END_OF_LIST()
 };
 
+static void cpu_cluster_realize(DeviceState *dev, Error **errp)
+{
+    /* Iterate through all our CPU children and set their cluster_index */
+    CPUClusterState *cluster = CPU_CLUSTER(dev);
+    ObjectPropertyIterator iter;
+    ObjectProperty *prop;
+    Object *cluster_obj = OBJECT(dev);
+
+    if (cluster->cluster_id >= MAX_CLUSTERS) {
+        error_setg(errp, "cluster-id must be less than %d", MAX_CLUSTERS);
+        return;
+    }
+
+    object_property_iter_init(&iter, cluster_obj);
+    while ((prop = object_property_iter_next(&iter))) {
+        Object *cpu_obj;
+        CPUState *cpu;
+
+        if (!strstart(prop->type, "child<", NULL)) {
+            continue;
+        }
+        cpu_obj = object_property_get_link(cluster_obj, prop->name, NULL);
+        cpu = (CPUState *)object_dynamic_cast(cpu_obj, TYPE_CPU);
+        if (!cpu) {
+            continue;
+        }
+        cpu->cluster_index = cluster->cluster_id;
+    }
+}
+
 static void cpu_cluster_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
 
     dc->props = cpu_cluster_properties;
+    dc->realize = cpu_cluster_realize;
 }
 
 static const TypeInfo cpu_cluster_type_info = {
diff --git a/qom/cpu.c b/qom/cpu.c
index 5442a7323be..f5579b1cd50 100644
--- a/qom/cpu.c
+++ b/qom/cpu.c
@@ -364,6 +364,7 @@ static void cpu_common_initfn(Object *obj)
     CPUClass *cc = CPU_GET_CLASS(obj);
 
     cpu->cpu_index = UNASSIGNED_CPU_INDEX;
+    cpu->cluster_index = UNASSIGNED_CLUSTER_INDEX;
     cpu->gdb_num_regs = cpu->gdb_num_g_regs = cc->gdb_num_core_regs;
     /* *-user doesn't have configurable SMP topology */
     /* the default value is changed by qemu_init_vcpu() for softmmu */
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 3/4] accel/tcg: Add cluster number to TCG TB hash
  2019-01-08 16:30 [Qemu-devel] [PATCH 0/4] tcg: support heterogenous CPU clusters Peter Maydell
  2019-01-08 16:30 ` [Qemu-devel] [PATCH 1/4] hw/arm/xlx-zynqmp: Realize cluster after putting RPUs in it Peter Maydell
  2019-01-08 16:30 ` [Qemu-devel] [PATCH 2/4] qom/cpu: Add cluster_index to CPUState Peter Maydell
@ 2019-01-08 16:30 ` Peter Maydell
  2019-01-10 15:14   ` Luc Michel
                     ` (2 more replies)
  2019-01-08 16:30 ` [Qemu-devel] [PATCH 4/4] gdbstub: Simplify gdb_get_cpu_pid() to use cpu->cluster_index Peter Maydell
  2019-01-09 20:53 ` [Qemu-devel] [PATCH 0/4] tcg: support heterogenous CPU clusters Richard Henderson
  4 siblings, 3 replies; 18+ messages in thread
From: Peter Maydell @ 2019-01-08 16:30 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: patches, Richard Henderson, Peter Crosthwaite, Paolo Bonzini,
	Alistair Francis, Edgar E. Iglesias, Eduardo Habkost,
	Marcel Apfelbaum, Emilio G . Cota

Include the cluster number in the hash we use to look
up TBs. This is important because a TB that is valid
for one cluster at a given physical address and set
of CPU flags is not necessarily valid for another:
the two clusters may have different views of physical
memory, or may have different CPU features (eg FPU
present or absent).

We put the cluster number in the high 8 bits of the
TB cflags. This gives us up to 256 clusters, which should
be enough for anybody. If we ever need more, or need
more bits in cflags for other purposes, we could make
tb_hash_func() take more data (and expand qemu_xxhash7()
to qemu_xxhash8()).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/exec/exec-all.h   | 4 +++-
 accel/tcg/cpu-exec.c      | 4 ++++
 accel/tcg/translate-all.c | 3 +++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 815e5b1e838..aa7b81aaf01 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -351,9 +351,11 @@ struct TranslationBlock {
 #define CF_USE_ICOUNT  0x00020000
 #define CF_INVALID     0x00040000 /* TB is stale. Set with @jmp_lock held */
 #define CF_PARALLEL    0x00080000 /* Generate code for a parallel context */
+#define CF_CLUSTER_MASK 0xff000000 /* Top 8 bits are cluster ID */
+#define CF_CLUSTER_SHIFT 24
 /* cflags' mask for hashing/comparison */
 #define CF_HASH_MASK   \
-    (CF_COUNT_MASK | CF_LAST_IO | CF_USE_ICOUNT | CF_PARALLEL)
+    (CF_COUNT_MASK | CF_LAST_IO | CF_USE_ICOUNT | CF_PARALLEL | CF_CLUSTER_MASK)
 
     /* Per-vCPU dynamic tracing state used to generate this TB */
     uint32_t trace_vcpu_dstate;
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 870027d4359..e578a1a3aee 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -336,6 +336,10 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
         return NULL;
     }
     desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
+
+    cf_mask &= ~CF_CLUSTER_MASK;
+    cf_mask |= cpu->cluster_index << CF_CLUSTER_SHIFT;
+
     h = tb_hash_func(phys_pc, pc, flags, cf_mask, *cpu->trace_dstate);
     return qht_lookup_custom(&tb_ctx.htable, &desc, h, tb_lookup_cmp);
 }
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 639f0b27287..ba27f5acc8c 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1692,6 +1692,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         cflags |= CF_NOCACHE | 1;
     }
 
+    cflags &= ~CF_CLUSTER_MASK;
+    cflags |= cpu->cluster_index << CF_CLUSTER_SHIFT;
+
  buffer_overflow:
     tb = tb_alloc(pc);
     if (unlikely(!tb)) {
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 4/4] gdbstub: Simplify gdb_get_cpu_pid() to use cpu->cluster_index
  2019-01-08 16:30 [Qemu-devel] [PATCH 0/4] tcg: support heterogenous CPU clusters Peter Maydell
                   ` (2 preceding siblings ...)
  2019-01-08 16:30 ` [Qemu-devel] [PATCH 3/4] accel/tcg: Add cluster number to TCG TB hash Peter Maydell
@ 2019-01-08 16:30 ` Peter Maydell
  2019-01-10 15:15   ` Luc Michel
  2019-01-09 20:53 ` [Qemu-devel] [PATCH 0/4] tcg: support heterogenous CPU clusters Richard Henderson
  4 siblings, 1 reply; 18+ messages in thread
From: Peter Maydell @ 2019-01-08 16:30 UTC (permalink / raw)
  To: qemu-arm, qemu-devel
  Cc: patches, Richard Henderson, Peter Crosthwaite, Paolo Bonzini,
	Alistair Francis, Edgar E. Iglesias, Eduardo Habkost,
	Marcel Apfelbaum, Emilio G . Cota

Now we're keeping the cluster index in the CPUState, we don't
need to jump through hoops in gdb_get_cpu_pid() to find the
associated cluster object.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 gdbstub.c | 48 +++++-------------------------------------------
 1 file changed, 5 insertions(+), 43 deletions(-)

diff --git a/gdbstub.c b/gdbstub.c
index bfc7afb5096..5d6cbea9d35 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -644,50 +644,12 @@ static int memtox(char *buf, const char *mem, int len)
 
 static uint32_t gdb_get_cpu_pid(const GDBState *s, CPUState *cpu)
 {
-#ifndef CONFIG_USER_ONLY
-    gchar *path, *name = NULL;
-    Object *obj;
-    CPUClusterState *cluster;
-    uint32_t ret;
-
-    path = object_get_canonical_path(OBJECT(cpu));
-
-    if (path == NULL) {
-        /* Return the default process' PID */
-        ret = s->processes[s->process_num - 1].pid;
-        goto out;
-    }
-
-    name = object_get_canonical_path_component(OBJECT(cpu));
-    assert(name != NULL);
-
-    /*
-     * Retrieve the CPU parent path by removing the last '/' and the CPU name
-     * from the CPU canonical path.
-     */
-    path[strlen(path) - strlen(name) - 1] = '\0';
-
-    obj = object_resolve_path_type(path, TYPE_CPU_CLUSTER, NULL);
-
-    if (obj == NULL) {
-        /* Return the default process' PID */
-        ret = s->processes[s->process_num - 1].pid;
-        goto out;
-    }
-
-    cluster = CPU_CLUSTER(obj);
-    ret = cluster->cluster_id + 1;
-
-out:
-    g_free(name);
-    g_free(path);
-
-    return ret;
-
-#else
     /* TODO: In user mode, we should use the task state PID */
-    return s->processes[s->process_num - 1].pid;
-#endif
+    if (cpu->cluster_index == UNASSIGNED_CLUSTER_INDEX) {
+        /* Return the default process' PID */
+        return s->processes[s->process_num - 1].pid;
+    }
+    return cpu->cluster_index + 1;
 }
 
 static GDBProcess *gdb_get_process(const GDBState *s, uint32_t pid)
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] tcg: support heterogenous CPU clusters
  2019-01-08 16:30 [Qemu-devel] [PATCH 0/4] tcg: support heterogenous CPU clusters Peter Maydell
                   ` (3 preceding siblings ...)
  2019-01-08 16:30 ` [Qemu-devel] [PATCH 4/4] gdbstub: Simplify gdb_get_cpu_pid() to use cpu->cluster_index Peter Maydell
@ 2019-01-09 20:53 ` Richard Henderson
  4 siblings, 0 replies; 18+ messages in thread
From: Richard Henderson @ 2019-01-09 20:53 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel
  Cc: patches, Peter Crosthwaite, Paolo Bonzini, Alistair Francis,
	Edgar E. Iglesias, Eduardo Habkost, Marcel Apfelbaum,
	Emilio G . Cota

On 1/9/19 3:30 AM, Peter Maydell wrote:
> Peter Maydell (4):
>   hw/arm/xlx-zynqmp: Realize cluster after putting RPUs in it
>   qom/cpu: Add cluster_index to CPUState
>   accel/tcg: Add cluster number to TCG TB hash
>   gdbstub: Simplify gdb_get_cpu_pid() to use cpu->cluster_index

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] hw/arm/xlx-zynqmp: Realize cluster after putting RPUs in it
  2019-01-08 16:30 ` [Qemu-devel] [PATCH 1/4] hw/arm/xlx-zynqmp: Realize cluster after putting RPUs in it Peter Maydell
@ 2019-01-10 15:11   ` Luc Michel
  2019-01-10 19:52   ` Alistair Francis
  1 sibling, 0 replies; 18+ messages in thread
From: Luc Michel @ 2019-01-10 15:11 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel
  Cc: Eduardo Habkost, Peter Crosthwaite, Alistair Francis,
	Richard Henderson, Emilio G . Cota, patches, Edgar E. Iglesias,
	Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1409 bytes --]

On 1/8/19 5:30 PM, Peter Maydell wrote:
> Currently the cluster implementation doesn't have any constraints
> on the ordering of realizing the TYPE_CPU_CLUSTER and populating it
> with child objects. We want to impose a constraint that realize
> must happen only after all the child objects are added, so move
> the realize of rpu_cluster. (The apu_cluster is already
> realized after child population.)
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>

> ---
>  hw/arm/xlnx-zynqmp.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
> index c67ac2e64ac..370b0e44a38 100644
> --- a/hw/arm/xlnx-zynqmp.c
> +++ b/hw/arm/xlnx-zynqmp.c
> @@ -183,8 +183,6 @@ static void xlnx_zynqmp_create_rpu(XlnxZynqMPState *s, const char *boot_cpu,
>                              &error_abort, NULL);
>      qdev_prop_set_uint32(DEVICE(&s->rpu_cluster), "cluster-id", 1);
>  
> -    qdev_init_nofail(DEVICE(&s->rpu_cluster));
> -
>      for (i = 0; i < num_rpus; i++) {
>          char *name;
>  
> @@ -212,6 +210,8 @@ static void xlnx_zynqmp_create_rpu(XlnxZynqMPState *s, const char *boot_cpu,
>              return;
>          }
>      }
> +
> +    qdev_init_nofail(DEVICE(&s->rpu_cluster));
>  }
>  
>  static void xlnx_zynqmp_init(Object *obj)
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] qom/cpu: Add cluster_index to CPUState
  2019-01-08 16:30 ` [Qemu-devel] [PATCH 2/4] qom/cpu: Add cluster_index to CPUState Peter Maydell
@ 2019-01-10 15:13   ` Luc Michel
  0 siblings, 0 replies; 18+ messages in thread
From: Luc Michel @ 2019-01-10 15:13 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel
  Cc: Eduardo Habkost, Peter Crosthwaite, Alistair Francis,
	Richard Henderson, Emilio G . Cota, patches, Edgar E. Iglesias,
	Paolo Bonzini

On 1/8/19 5:30 PM, Peter Maydell wrote:
> For TCG we want to distinguish which cluster a CPU is in, and
> we need to do it quickly. Cache the cluster index in the CPUState
> struct, by having the cluster object set cpu->cluster_index for
> each CPU child when it is realized.
> 
> This means that board/SoC code must add all CPUs to the cluster
> before realizing the cluster object. Regrettably QOM provides no
> way to prevent adding children to a realized object and no way for
> the parent to be notified when a new child is added to it, so
> we don't have any way to enforce/assert this constraint; all
> we can do is document it in a comment.
> 
> The restriction on how many clusters can exist in the system
> is imposed by TCG code which will be added in a subsequent commit,
> but the check to enforce it in cluster.c fits better in this one.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
> ---
>  include/hw/cpu/cluster.h | 19 +++++++++++++++++++
>  include/qom/cpu.h        |  7 +++++++
>  hw/cpu/cluster.c         | 33 +++++++++++++++++++++++++++++++++
>  qom/cpu.c                |  1 +
>  4 files changed, 60 insertions(+)
> 
> diff --git a/include/hw/cpu/cluster.h b/include/hw/cpu/cluster.h
> index 73818232437..d1bef315d10 100644
> --- a/include/hw/cpu/cluster.h
> +++ b/include/hw/cpu/cluster.h
> @@ -34,12 +34,31 @@
>   * Arm big.LITTLE system) they should be in different clusters. If the CPUs do
>   * not have the same view of memory (for example the main CPU and a management
>   * controller processor) they should be in different clusters.
> + *
> + * A cluster is created by creating an object of TYPE_CPU_CLUSTER, and then
> + * adding the CPUs to it as QOM child objects (e.g. using the
> + * object_initialize_child() or object_property_add_child() functions).
> + * All CPUs must be added as children before the cluster is realized.
> + * (Regrettably QOM provides no way to prevent adding children to a realized
> + * object and no way for the parent to be notified when a new child is added
> + * to it, so this restriction is not checked for, but the system will not
> + * behave correctly if it is not adhered to.)
> + *
> + * A CPU which is not put into any cluster will be considered implicitly
> + * to be in a cluster with all the other "loose" CPUs, so all CPUs that are
> + * not assigned to clusters must be identical.
>   */
>  
>  #define TYPE_CPU_CLUSTER "cpu-cluster"
>  #define CPU_CLUSTER(obj) \
>      OBJECT_CHECK(CPUClusterState, (obj), TYPE_CPU_CLUSTER)
>  
> +/*
> + * This limit is imposed by TCG, which puts the cluster ID into an
> + * 8 bit field (and uses all-1s for the default "not in any cluster").
> + */
> +#define MAX_CLUSTERS 255
> +
>  /**
>   * CPUClusterState:
>   * @cluster_id: The cluster ID. This value is for internal use only and should
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 1396f53e5b5..844becbcedc 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -279,6 +279,11 @@ struct qemu_work_item;
>  /**
>   * CPUState:
>   * @cpu_index: CPU index (informative).
> + * @cluster_index: Identifies which cluster this CPU is in.
> + *   For boards which don't define clusters or for "loose" CPUs not assigned
> + *   to a cluster this will be UNASSIGNED_CLUSTER_INDEX; otherwise it will
> + *   be the same as the cluster-id property of the CPU object's TYPE_CPU_CLUSTER
> + *   QOM parent.
>   * @nr_cores: Number of cores within this CPU package.
>   * @nr_threads: Number of threads within this CPU.
>   * @running: #true if CPU is currently running (lockless).
> @@ -404,6 +409,7 @@ struct CPUState {
>  
>      /* TODO Move common fields from CPUArchState here. */
>      int cpu_index;
> +    int cluster_index;
>      uint32_t halted;
>      uint32_t can_do_io;
>      int32_t exception_index;
> @@ -1109,5 +1115,6 @@ extern const struct VMStateDescription vmstate_cpu_common;
>  #endif /* NEED_CPU_H */
>  
>  #define UNASSIGNED_CPU_INDEX -1
> +#define UNASSIGNED_CLUSTER_INDEX -1
>  
>  #endif
> diff --git a/hw/cpu/cluster.c b/hw/cpu/cluster.c
> index 9d50a235d5c..d672f54a620 100644
> --- a/hw/cpu/cluster.c
> +++ b/hw/cpu/cluster.c
> @@ -20,19 +20,52 @@
>  
>  #include "qemu/osdep.h"
>  #include "hw/cpu/cluster.h"
> +#include "qom/cpu.h"
>  #include "qapi/error.h"
>  #include "qemu/module.h"
> +#include "qemu/cutils.h"
>  
>  static Property cpu_cluster_properties[] = {
>      DEFINE_PROP_UINT32("cluster-id", CPUClusterState, cluster_id, 0),
>      DEFINE_PROP_END_OF_LIST()
>  };
>  
> +static void cpu_cluster_realize(DeviceState *dev, Error **errp)
> +{
> +    /* Iterate through all our CPU children and set their cluster_index */
> +    CPUClusterState *cluster = CPU_CLUSTER(dev);
> +    ObjectPropertyIterator iter;
> +    ObjectProperty *prop;
> +    Object *cluster_obj = OBJECT(dev);
> +
> +    if (cluster->cluster_id >= MAX_CLUSTERS) {
> +        error_setg(errp, "cluster-id must be less than %d", MAX_CLUSTERS);
> +        return;
> +    }
> +
> +    object_property_iter_init(&iter, cluster_obj);
> +    while ((prop = object_property_iter_next(&iter))) {
> +        Object *cpu_obj;
> +        CPUState *cpu;
> +
> +        if (!strstart(prop->type, "child<", NULL)) {
> +            continue;
> +        }
> +        cpu_obj = object_property_get_link(cluster_obj, prop->name, NULL);
> +        cpu = (CPUState *)object_dynamic_cast(cpu_obj, TYPE_CPU);
> +        if (!cpu) {
> +            continue;
> +        }
> +        cpu->cluster_index = cluster->cluster_id;
> +    }
> +}
> +
>  static void cpu_cluster_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
>  
>      dc->props = cpu_cluster_properties;
> +    dc->realize = cpu_cluster_realize;
>  }
>  
>  static const TypeInfo cpu_cluster_type_info = {
> diff --git a/qom/cpu.c b/qom/cpu.c
> index 5442a7323be..f5579b1cd50 100644
> --- a/qom/cpu.c
> +++ b/qom/cpu.c
> @@ -364,6 +364,7 @@ static void cpu_common_initfn(Object *obj)
>      CPUClass *cc = CPU_GET_CLASS(obj);
>  
>      cpu->cpu_index = UNASSIGNED_CPU_INDEX;
> +    cpu->cluster_index = UNASSIGNED_CLUSTER_INDEX;
>      cpu->gdb_num_regs = cpu->gdb_num_g_regs = cc->gdb_num_core_regs;
>      /* *-user doesn't have configurable SMP topology */
>      /* the default value is changed by qemu_init_vcpu() for softmmu */
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] accel/tcg: Add cluster number to TCG TB hash
  2019-01-08 16:30 ` [Qemu-devel] [PATCH 3/4] accel/tcg: Add cluster number to TCG TB hash Peter Maydell
@ 2019-01-10 15:14   ` Luc Michel
  2019-01-10 18:26   ` Peter Maydell
  2019-01-14  1:08   ` Aleksandar Markovic
  2 siblings, 0 replies; 18+ messages in thread
From: Luc Michel @ 2019-01-10 15:14 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel
  Cc: Eduardo Habkost, Peter Crosthwaite, Alistair Francis,
	Richard Henderson, Emilio G . Cota, patches, Edgar E. Iglesias,
	Paolo Bonzini

On 1/8/19 5:30 PM, Peter Maydell wrote:
> Include the cluster number in the hash we use to look
> up TBs. This is important because a TB that is valid
> for one cluster at a given physical address and set
> of CPU flags is not necessarily valid for another:
> the two clusters may have different views of physical
> memory, or may have different CPU features (eg FPU
> present or absent).
> 
> We put the cluster number in the high 8 bits of the
> TB cflags. This gives us up to 256 clusters, which should
> be enough for anybody. If we ever need more, or need
> more bits in cflags for other purposes, we could make
> tb_hash_func() take more data (and expand qemu_xxhash7()
> to qemu_xxhash8()).
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
> ---
>  include/exec/exec-all.h   | 4 +++-
>  accel/tcg/cpu-exec.c      | 4 ++++
>  accel/tcg/translate-all.c | 3 +++
>  3 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 815e5b1e838..aa7b81aaf01 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -351,9 +351,11 @@ struct TranslationBlock {
>  #define CF_USE_ICOUNT  0x00020000
>  #define CF_INVALID     0x00040000 /* TB is stale. Set with @jmp_lock held */
>  #define CF_PARALLEL    0x00080000 /* Generate code for a parallel context */
> +#define CF_CLUSTER_MASK 0xff000000 /* Top 8 bits are cluster ID */
> +#define CF_CLUSTER_SHIFT 24
>  /* cflags' mask for hashing/comparison */
>  #define CF_HASH_MASK   \
> -    (CF_COUNT_MASK | CF_LAST_IO | CF_USE_ICOUNT | CF_PARALLEL)
> +    (CF_COUNT_MASK | CF_LAST_IO | CF_USE_ICOUNT | CF_PARALLEL | CF_CLUSTER_MASK)
>  
>      /* Per-vCPU dynamic tracing state used to generate this TB */
>      uint32_t trace_vcpu_dstate;
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 870027d4359..e578a1a3aee 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -336,6 +336,10 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
>          return NULL;
>      }
>      desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
> +
> +    cf_mask &= ~CF_CLUSTER_MASK;
> +    cf_mask |= cpu->cluster_index << CF_CLUSTER_SHIFT;
> +
>      h = tb_hash_func(phys_pc, pc, flags, cf_mask, *cpu->trace_dstate);
>      return qht_lookup_custom(&tb_ctx.htable, &desc, h, tb_lookup_cmp);
>  }
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 639f0b27287..ba27f5acc8c 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1692,6 +1692,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>          cflags |= CF_NOCACHE | 1;
>      }
>  
> +    cflags &= ~CF_CLUSTER_MASK;
> +    cflags |= cpu->cluster_index << CF_CLUSTER_SHIFT;
> +
>   buffer_overflow:
>      tb = tb_alloc(pc);
>      if (unlikely(!tb)) {
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] gdbstub: Simplify gdb_get_cpu_pid() to use cpu->cluster_index
  2019-01-08 16:30 ` [Qemu-devel] [PATCH 4/4] gdbstub: Simplify gdb_get_cpu_pid() to use cpu->cluster_index Peter Maydell
@ 2019-01-10 15:15   ` Luc Michel
  0 siblings, 0 replies; 18+ messages in thread
From: Luc Michel @ 2019-01-10 15:15 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel
  Cc: Eduardo Habkost, Peter Crosthwaite, Alistair Francis,
	Richard Henderson, Emilio G . Cota, patches, Edgar E. Iglesias,
	Paolo Bonzini

On 1/8/19 5:30 PM, Peter Maydell wrote:
> Now we're keeping the cluster index in the CPUState, we don't
> need to jump through hoops in gdb_get_cpu_pid() to find the
> associated cluster object.
Aah better :-)
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
> ---
>  gdbstub.c | 48 +++++-------------------------------------------
>  1 file changed, 5 insertions(+), 43 deletions(-)
> 
> diff --git a/gdbstub.c b/gdbstub.c
> index bfc7afb5096..5d6cbea9d35 100644
> --- a/gdbstub.c
> +++ b/gdbstub.c
> @@ -644,50 +644,12 @@ static int memtox(char *buf, const char *mem, int len)
>  
>  static uint32_t gdb_get_cpu_pid(const GDBState *s, CPUState *cpu)
>  {
> -#ifndef CONFIG_USER_ONLY
> -    gchar *path, *name = NULL;
> -    Object *obj;
> -    CPUClusterState *cluster;
> -    uint32_t ret;
> -
> -    path = object_get_canonical_path(OBJECT(cpu));
> -
> -    if (path == NULL) {
> -        /* Return the default process' PID */
> -        ret = s->processes[s->process_num - 1].pid;
> -        goto out;
> -    }
> -
> -    name = object_get_canonical_path_component(OBJECT(cpu));
> -    assert(name != NULL);
> -
> -    /*
> -     * Retrieve the CPU parent path by removing the last '/' and the CPU name
> -     * from the CPU canonical path.
> -     */
> -    path[strlen(path) - strlen(name) - 1] = '\0';
> -
> -    obj = object_resolve_path_type(path, TYPE_CPU_CLUSTER, NULL);
> -
> -    if (obj == NULL) {
> -        /* Return the default process' PID */
> -        ret = s->processes[s->process_num - 1].pid;
> -        goto out;
> -    }
> -
> -    cluster = CPU_CLUSTER(obj);
> -    ret = cluster->cluster_id + 1;
> -
> -out:
> -    g_free(name);
> -    g_free(path);
> -
> -    return ret;
> -
> -#else
>      /* TODO: In user mode, we should use the task state PID */
> -    return s->processes[s->process_num - 1].pid;
> -#endif
> +    if (cpu->cluster_index == UNASSIGNED_CLUSTER_INDEX) {
> +        /* Return the default process' PID */
> +        return s->processes[s->process_num - 1].pid;
> +    }
> +    return cpu->cluster_index + 1;
>  }
>  
>  static GDBProcess *gdb_get_process(const GDBState *s, uint32_t pid)
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] accel/tcg: Add cluster number to TCG TB hash
  2019-01-08 16:30 ` [Qemu-devel] [PATCH 3/4] accel/tcg: Add cluster number to TCG TB hash Peter Maydell
  2019-01-10 15:14   ` Luc Michel
@ 2019-01-10 18:26   ` Peter Maydell
  2019-01-11 12:49     ` Aleksandar Markovic
  2019-01-14  1:08   ` Aleksandar Markovic
  2 siblings, 1 reply; 18+ messages in thread
From: Peter Maydell @ 2019-01-10 18:26 UTC (permalink / raw)
  To: qemu-arm, QEMU Developers
  Cc: patches, Richard Henderson, Peter Crosthwaite, Paolo Bonzini,
	Alistair Francis, Edgar E. Iglesias, Eduardo Habkost,
	Marcel Apfelbaum, Emilio G . Cota

On Tue, 8 Jan 2019 at 16:30, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> Include the cluster number in the hash we use to look
> up TBs. This is important because a TB that is valid
> for one cluster at a given physical address and set
> of CPU flags is not necessarily valid for another:
> the two clusters may have different views of physical
> memory, or may have different CPU features (eg FPU
> present or absent).
>
> We put the cluster number in the high 8 bits of the
> TB cflags. This gives us up to 256 clusters, which should
> be enough for anybody. If we ever need more, or need
> more bits in cflags for other purposes, we could make
> tb_hash_func() take more data (and expand qemu_xxhash7()
> to qemu_xxhash8()).
>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  include/exec/exec-all.h   | 4 +++-
>  accel/tcg/cpu-exec.c      | 4 ++++
>  accel/tcg/translate-all.c | 3 +++
>  3 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 815e5b1e838..aa7b81aaf01 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -351,9 +351,11 @@ struct TranslationBlock {
>  #define CF_USE_ICOUNT  0x00020000
>  #define CF_INVALID     0x00040000 /* TB is stale. Set with @jmp_lock held */
>  #define CF_PARALLEL    0x00080000 /* Generate code for a parallel context */
> +#define CF_CLUSTER_MASK 0xff000000 /* Top 8 bits are cluster ID */
> +#define CF_CLUSTER_SHIFT 24
>  /* cflags' mask for hashing/comparison */
>  #define CF_HASH_MASK   \
> -    (CF_COUNT_MASK | CF_LAST_IO | CF_USE_ICOUNT | CF_PARALLEL)
> +    (CF_COUNT_MASK | CF_LAST_IO | CF_USE_ICOUNT | CF_PARALLEL | CF_CLUSTER_MASK)
>
>      /* Per-vCPU dynamic tracing state used to generate this TB */
>      uint32_t trace_vcpu_dstate;
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 870027d4359..e578a1a3aee 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -336,6 +336,10 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
>          return NULL;
>      }
>      desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
> +
> +    cf_mask &= ~CF_CLUSTER_MASK;
> +    cf_mask |= cpu->cluster_index << CF_CLUSTER_SHIFT;
> +

This hunk turns out not to be quite right -- it needs to move
to the top of the function, before the assignment
"desc.flags = flags;". Otherwise tb_lookup_cmp() will
spuriously fail, and execution becomes somewhat slower because
we have to keep retranslating TBs rather than reusing them.
(Surprisingly this is only noticeable in an ARM TFM image
I happen to have, not in Linux kernel boot...)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] hw/arm/xlx-zynqmp: Realize cluster after putting RPUs in it
  2019-01-08 16:30 ` [Qemu-devel] [PATCH 1/4] hw/arm/xlx-zynqmp: Realize cluster after putting RPUs in it Peter Maydell
  2019-01-10 15:11   ` Luc Michel
@ 2019-01-10 19:52   ` Alistair Francis
  1 sibling, 0 replies; 18+ messages in thread
From: Alistair Francis @ 2019-01-10 19:52 UTC (permalink / raw)
  To: Peter Maydell
  Cc: qemu-arm, qemu-devel@nongnu.org Developers, Eduardo Habkost,
	Peter Crosthwaite, Alistair Francis, Richard Henderson,
	Emilio G . Cota, Patch Tracking, Edgar E. Iglesias,
	Paolo Bonzini

On Tue, Jan 8, 2019 at 8:30 AM Peter Maydell <peter.maydell@linaro.org> wrote:
>
> Currently the cluster implementation doesn't have any constraints
> on the ordering of realizing the TYPE_CPU_CLUSTER and populating it
> with child objects. We want to impose a constraint that realize
> must happen only after all the child objects are added, so move
> the realize of rpu_cluster. (The apu_cluster is already
> realized after child population.)
>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  hw/arm/xlnx-zynqmp.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
> index c67ac2e64ac..370b0e44a38 100644
> --- a/hw/arm/xlnx-zynqmp.c
> +++ b/hw/arm/xlnx-zynqmp.c
> @@ -183,8 +183,6 @@ static void xlnx_zynqmp_create_rpu(XlnxZynqMPState *s, const char *boot_cpu,
>                              &error_abort, NULL);
>      qdev_prop_set_uint32(DEVICE(&s->rpu_cluster), "cluster-id", 1);
>
> -    qdev_init_nofail(DEVICE(&s->rpu_cluster));
> -
>      for (i = 0; i < num_rpus; i++) {
>          char *name;
>
> @@ -212,6 +210,8 @@ static void xlnx_zynqmp_create_rpu(XlnxZynqMPState *s, const char *boot_cpu,
>              return;
>          }
>      }
> +
> +    qdev_init_nofail(DEVICE(&s->rpu_cluster));
>  }
>
>  static void xlnx_zynqmp_init(Object *obj)
> --
> 2.19.2
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 3/4] accel/tcg: Add cluster number to TCG TB hash
  2019-01-10 18:26   ` Peter Maydell
@ 2019-01-11 12:49     ` Aleksandar Markovic
  2019-01-11 13:00       ` Peter Maydell
  0 siblings, 1 reply; 18+ messages in thread
From: Aleksandar Markovic @ 2019-01-11 12:49 UTC (permalink / raw)
  To: Peter Maydell
  Cc: qemu-arm, QEMU Developers, Eduardo Habkost, Peter Crosthwaite,
	Alistair Francis, Richard Henderson, Emilio G . Cota, patches,
	Edgar E. Iglesias, Paolo Bonzini

Hello, Peter.

First of all, I want to tell you that I support this series and I salute
efforts in this and related areas. It is known that there have been strong
trends towards asymmetric multi-core systems now for some longish time -
and that QEMU support in that area will greately enhance QEMU itself.

However, being unfamilliar with relevant parts of QEMU internals, allow me
to ask a couple of basic questions:

1. What would be, in more detail, if possible in layman terms, the "bad
case" that this series fixes?

2. Let's suppose, hypothetically, and based on your example from one of
commit messages from this series, that we want to support two multicore
systems:
    A. Cluster 1: 1 core with FPU; cluster 2: 3 cores without FPU
    B. Cluster 1: 2 cores with FPU; cluster 2: 1 core without FPU
Is there an apparatus that would allow the end user specify these and
similar cpnfigurations through command line or acsimilar mean (so, without
QEMU explicitely supporting such core organization, but supporting the
single core in question, of course)? If not, would that be a desirable QEMU
feature?

3. Is there a possibility to have two layer clustering sheme, instead of
one layer? Cluster/subcluster/core instead of cluster/core? For MIPS, there
is a need for such organization. It looks to me 8 bits for cluster id, and
3 bits for subcluster id would be sufficient. In such scheme, the only
change for TCG would be to check both cluster and subcluster id rather than
only cluster id.

Regards,
Aleksandar


On Thursday, January 10, 2019, Peter Maydell <peter.maydell@linaro.org>
wrote:

> On Tue, 8 Jan 2019 at 16:30, Peter Maydell <peter.maydell@linaro.org>
> wrote:
> >
> > Include the cluster number in the hash we use to look
> > up TBs. This is important because a TB that is valid
> > for one cluster at a given physical address and set
> > of CPU flags is not necessarily valid for another:
> > the two clusters may have different views of physical
> > memory, or may have different CPU features (eg FPU
> > present or absent).
> >
> > We put the cluster number in the high 8 bits of the
> > TB cflags. This gives us up to 256 clusters, which should
> > be enough for anybody. If we ever need more, or need
> > more bits in cflags for other purposes, we could make
> > tb_hash_func() take more data (and expand qemu_xxhash7()
> > to qemu_xxhash8()).
> >
> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> > ---
> >  include/exec/exec-all.h   | 4 +++-
> >  accel/tcg/cpu-exec.c      | 4 ++++
> >  accel/tcg/translate-all.c | 3 +++
> >  3 files changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> > index 815e5b1e838..aa7b81aaf01 100644
> > --- a/include/exec/exec-all.h
> > +++ b/include/exec/exec-all.h
> > @@ -351,9 +351,11 @@ struct TranslationBlock {
> >  #define CF_USE_ICOUNT  0x00020000
> >  #define CF_INVALID     0x00040000 /* TB is stale. Set with @jmp_lock
> held */
> >  #define CF_PARALLEL    0x00080000 /* Generate code for a parallel
> context */
> > +#define CF_CLUSTER_MASK 0xff000000 /* Top 8 bits are cluster ID */
> > +#define CF_CLUSTER_SHIFT 24
> >  /* cflags' mask for hashing/comparison */
> >  #define CF_HASH_MASK   \
> > -    (CF_COUNT_MASK | CF_LAST_IO | CF_USE_ICOUNT | CF_PARALLEL)
> > +    (CF_COUNT_MASK | CF_LAST_IO | CF_USE_ICOUNT | CF_PARALLEL |
> CF_CLUSTER_MASK)
> >
> >      /* Per-vCPU dynamic tracing state used to generate this TB */
> >      uint32_t trace_vcpu_dstate;
> > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> > index 870027d4359..e578a1a3aee 100644
> > --- a/accel/tcg/cpu-exec.c
> > +++ b/accel/tcg/cpu-exec.c
> > @@ -336,6 +336,10 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu,
> target_ulong pc,
> >          return NULL;
> >      }
> >      desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
> > +
> > +    cf_mask &= ~CF_CLUSTER_MASK;
> > +    cf_mask |= cpu->cluster_index << CF_CLUSTER_SHIFT;
> > +
>
> This hunk turns out not to be quite right -- it needs to move
> to the top of the function, before the assignment
> "desc.flags = flags;". Otherwise tb_lookup_cmp() will
> spuriously fail, and execution becomes somewhat slower because
> we have to keep retranslating TBs rather than reusing them.
> (Surprisingly this is only noticeable in an ARM TFM image
> I happen to have, not in Linux kernel boot...)
>
> thanks
> -- PMM
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] accel/tcg: Add cluster number to TCG TB hash
  2019-01-11 12:49     ` Aleksandar Markovic
@ 2019-01-11 13:00       ` Peter Maydell
  2019-01-11 15:49         ` Aleksandar Markovic
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Maydell @ 2019-01-11 13:00 UTC (permalink / raw)
  To: Aleksandar Markovic
  Cc: qemu-arm, QEMU Developers, Eduardo Habkost, Peter Crosthwaite,
	Alistair Francis, Richard Henderson, Emilio G . Cota, patches,
	Edgar E. Iglesias, Paolo Bonzini

On Fri, 11 Jan 2019 at 12:49, Aleksandar Markovic
<aleksandar.m.mail@gmail.com> wrote:
> 1. What would be, in more detail, if possible in layman terms,
> the "bad case" that this series fixes?

I describe this in the cover letter (which also has a link to
a tarball with a test case demonstrating it):
> TCG implicitly assumes that all CPUs are alike, because we have
> a single cache of generated TBs and we don't account for which
> CPU generated the code or is looking for the TB when adding or
> searching for generated TBs. This can go wrong in two situations:
> (1) two CPUs have different physical address spaces (eg CPU 1
> has one lot of RAM/ROM, and CPU 2 has different RAM/ROM): the
> physical address alone is then not sufficient to distinguish
> what code to run
> (2) two CPUs have different features (eg FPU
> vs no FPU): since our TCG frontends bake assumptions into the
> generated code about the presence/absence of features, if a
> CPU with FPU picks up a TB for one generated without an FPU
> it will behave wrongly

What happens is that CPU 1 picks up code that was generated
for CPU 2 and which is not correct for it, and thus does
not behave correctly. (In the test case, an instruction that
should UNDEF on the Cortex-R5F but not on the Cortex-A53 will
either UNDEF on the A53 or fail to UNDEF on the R5F, depending
on which CPU happened to get to the test code first.)

> 2. Let's suppose, hypothetically, and based on your example
> from one of commit messages from this series, that we want to
> support two multicore systems:
>     A. Cluster 1: 1 core with FPU; cluster 2: 3 cores without FPU
>     B. Cluster 1: 2 cores with FPU; cluster 2: 1 core without FPU
> Is there an apparatus that would allow the end user specify these
> and similar cpnfigurations through command line or acsimilar mean
> (so, without QEMU explicitely supporting such core organization,
> but supporting the single core in question, of course)?

The QEMU definition of "cluster" requires that all the CPUs
in the cluster must share (a) the same features (eg FPU)
and (b) the same view of physical memory -- this is what
defines that they are in the same cluster and not different
ones. So you'd model this as four clusters (assuming that
A and B have different views of physical memory. Otherwise
you could put all the with-FPU cores in one cluster and
the without-FPU cores in a second.)

Real hardware might choose to define what it calls a "cluster"
differently, but that doesn't matter.

> 3. Is there a possibility to have two layer clustering sheme,
> instead of one layer? Cluster/subcluster/core instead of
> cluster/core? For MIPS, there is a need for such organization.
> It looks to me 8 bits for cluster id, and 3 bits for subcluster
> id would be sufficient.

My view is that there is no need for the internal "cluster ID"
to match what the hardware happens to do with SMP CPU IDs
and NUMA architecture. What do you think we miss by this?
(Handling of NUMA architecture is a distinct bit of QEMU code,
unrelated to this.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] accel/tcg: Add cluster number to TCG TB hash
  2019-01-11 13:00       ` Peter Maydell
@ 2019-01-11 15:49         ` Aleksandar Markovic
  0 siblings, 0 replies; 18+ messages in thread
From: Aleksandar Markovic @ 2019-01-11 15:49 UTC (permalink / raw)
  To: Peter Maydell
  Cc: qemu-arm, QEMU Developers, Eduardo Habkost, Peter Crosthwaite,
	Alistair Francis, Richard Henderson, Emilio G . Cota, patches,
	Edgar E. Iglesias, Paolo Bonzini

On Friday, January 11, 2019, Peter Maydell <peter.maydell@linaro.org> wrote:

> On Fri, 11 Jan 2019 at 12:49, Aleksandar Markovic
> <aleksandar.m.mail@gmail.com> wrote:
> > 1. What would be, in more detail, if possible in layman terms,
> > the "bad case" that this series fixes?
>
> I describe this in the cover letter (which also has a link to
> a tarball with a test case demonstrating it):
> > TCG implicitly assumes that all CPUs are alike, because we have
> > a single cache of generated TBs and we don't account for which
> > CPU generated the code or is looking for the TB when adding or
> > searching for generated TBs. This can go wrong in two situations:
> > (1) two CPUs have different physical address spaces (eg CPU 1
> > has one lot of RAM/ROM, and CPU 2 has different RAM/ROM): the
> > physical address alone is then not sufficient to distinguish
> > what code to run
> > (2) two CPUs have different features (eg FPU
> > vs no FPU): since our TCG frontends bake assumptions into the
> > generated code about the presence/absence of features, if a
> > CPU with FPU picks up a TB for one generated without an FPU
> > it will behave wrongly
>
> What happens is that CPU 1 picks up code that was generated
> for CPU 2 and which is not correct for it, and thus does
> not behave correctly. (In the test case, an instruction that
> should UNDEF on the Cortex-R5F but not on the Cortex-A53 will
> either UNDEF on the A53 or fail to UNDEF on the R5F, depending
> on which CPU happened to get to the test code first.)
>
>
Thanks, this example makes the intentions of the patch clearer to me.

If you don't mind, I may take a closer look at MIPS' (and perhaps some
other targets' a little) multi-core design details in few coming weeks, and
see if we could improve feightfulness of our emulation, or maybe make it
more flexible, or scalable.

Thanks again, and happy holiday season to all!!

Aleksandar




> > 2. Let's suppose, hypothetically, and based on your example
> > from one of commit messages from this series, that we want to
> > support two multicore systems:
> >     A. Cluster 1: 1 core with FPU; cluster 2: 3 cores without FPU
> >     B. Cluster 1: 2 cores with FPU; cluster 2: 1 core without FPU
> > Is there an apparatus that would allow the end user specify these
> > and similar cpnfigurations through command line or acsimilar mean
> > (so, without QEMU explicitely supporting such core organization,
> > but supporting the single core in question, of course)?
>
> The QEMU definition of "cluster" requires that all the CPUs
> in the cluster must share (a) the same features (eg FPU)
> and (b) the same view of physical memory -- this is what
> defines that they are in the same cluster and not different
> ones. So you'd model this as four clusters (assuming that
> A and B have different views of physical memory. Otherwise
> you could put all the with-FPU cores in one cluster and
> the without-FPU cores in a second.)
>
> Real hardware might choose to define what it calls a "cluster"
> differently, but that doesn't matter.
>
> > 3. Is there a possibility to have two layer clustering sheme,
> > instead of one layer? Cluster/subcluster/core instead of
> > cluster/core? For MIPS, there is a need for such organization.
> > It looks to me 8 bits for cluster id, and 3 bits for subcluster
> > id would be sufficient.
>
> My view is that there is no need for the internal "cluster ID"
> to match what the hardware happens to do with SMP CPU IDs
> and NUMA architecture. What do you think we miss by this?
> (Handling of NUMA architecture is a distinct bit of QEMU code,
> unrelated to this.)
>
> thanks
> -- PMM
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] accel/tcg: Add cluster number to TCG TB hash
  2019-01-08 16:30 ` [Qemu-devel] [PATCH 3/4] accel/tcg: Add cluster number to TCG TB hash Peter Maydell
  2019-01-10 15:14   ` Luc Michel
  2019-01-10 18:26   ` Peter Maydell
@ 2019-01-14  1:08   ` Aleksandar Markovic
  2019-01-14 10:27     ` Peter Maydell
  2019-01-14 11:53     ` Peter Maydell
  2 siblings, 2 replies; 18+ messages in thread
From: Aleksandar Markovic @ 2019-01-14  1:08 UTC (permalink / raw)
  To: Peter Maydell
  Cc: qemu-arm, qemu-devel, Eduardo Habkost, Peter Crosthwaite,
	Alistair Francis, Richard Henderson, Emilio G . Cota, patches,
	Edgar E. Iglesias, Paolo Bonzini

On Tuesday, January 8, 2019, Peter Maydell <peter.maydell@linaro.org> wrote:

> Include the cluster number in the hash we use to look
> up TBs. This is important because a TB that is valid
> for one cluster at a given physical address and set
> of CPU flags is not necessarily valid for another:
> the two clusters may have different views of physical
> memory, or may have different CPU features (eg FPU
> present or absent).
>
>
Hi, Peter.

I do understand the definition of cluster_index in the sense of this
series. However, it looks to me that the term "cluster" is generally
overused in areas where we work. This may lead to some confusion for future
developers, and let me suggest some other name, like "tcg_cluster_index" or
"tcg_group_id", or "translation_group_id". Admitedly, they all sound ugly
to me too. But having the name that would clearly separate this id from too
generic "cluster_index" IMHO would save lots of time during potential
related future development.

(Needled to say that,  for example, we in MIPS, for multi-core sustems,
group cores in clusters, that actually do not have anything to do with
clusters in TCG sense...)

Sincerely,

Aleksandar



> We put the cluster number in the high 8 bits of the
> TB cflags. This gives us up to 256 clusters, which should
> be enough for anybody. If we ever need more, or need
> more bits in cflags for other purposes, we could make
> tb_hash_func() take more data (and expand qemu_xxhash7()
> to qemu_xxhash8()).
>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  include/exec/exec-all.h   | 4 +++-
>  accel/tcg/cpu-exec.c      | 4 ++++
>  accel/tcg/translate-all.c | 3 +++
>  3 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 815e5b1e838..aa7b81aaf01 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -351,9 +351,11 @@ struct TranslationBlock {
>  #define CF_USE_ICOUNT  0x00020000
>  #define CF_INVALID     0x00040000 /* TB is stale. Set with @jmp_lock held
> */
>  #define CF_PARALLEL    0x00080000 /* Generate code for a parallel context
> */
> +#define CF_CLUSTER_MASK 0xff000000 /* Top 8 bits are cluster ID */
> +#define CF_CLUSTER_SHIFT 24
>  /* cflags' mask for hashing/comparison */
>  #define CF_HASH_MASK   \
> -    (CF_COUNT_MASK | CF_LAST_IO | CF_USE_ICOUNT | CF_PARALLEL)
> +    (CF_COUNT_MASK | CF_LAST_IO | CF_USE_ICOUNT | CF_PARALLEL |
> CF_CLUSTER_MASK)
>
>      /* Per-vCPU dynamic tracing state used to generate this TB */
>      uint32_t trace_vcpu_dstate;
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 870027d4359..e578a1a3aee 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -336,6 +336,10 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu,
> target_ulong pc,
>          return NULL;
>      }
>      desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
> +
> +    cf_mask &= ~CF_CLUSTER_MASK;
> +    cf_mask |= cpu->cluster_index << CF_CLUSTER_SHIFT;
> +
>      h = tb_hash_func(phys_pc, pc, flags, cf_mask, *cpu->trace_dstate);
>      return qht_lookup_custom(&tb_ctx.htable, &desc, h, tb_lookup_cmp);
>  }
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 639f0b27287..ba27f5acc8c 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1692,6 +1692,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>          cflags |= CF_NOCACHE | 1;
>      }
>
> +    cflags &= ~CF_CLUSTER_MASK;
> +    cflags |= cpu->cluster_index << CF_CLUSTER_SHIFT;
> +
>   buffer_overflow:
>      tb = tb_alloc(pc);
>      if (unlikely(!tb)) {
> --
> 2.19.2
>
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] accel/tcg: Add cluster number to TCG TB hash
  2019-01-14  1:08   ` Aleksandar Markovic
@ 2019-01-14 10:27     ` Peter Maydell
  2019-01-14 11:53     ` Peter Maydell
  1 sibling, 0 replies; 18+ messages in thread
From: Peter Maydell @ 2019-01-14 10:27 UTC (permalink / raw)
  To: Aleksandar Markovic
  Cc: qemu-arm, qemu-devel, Eduardo Habkost, Peter Crosthwaite,
	Alistair Francis, Richard Henderson, Emilio G . Cota, patches,
	Edgar E. Iglesias, Paolo Bonzini

On Mon, 14 Jan 2019 at 01:08, Aleksandar Markovic
<aleksandar.m.mail@gmail.com> wrote:
>
>
>
> On Tuesday, January 8, 2019, Peter Maydell <peter.maydell@linaro.org> wrote:
>>
>> Include the cluster number in the hash we use to look
>> up TBs. This is important because a TB that is valid
>> for one cluster at a given physical address and set
>> of CPU flags is not necessarily valid for another:
>> the two clusters may have different views of physical
>> memory, or may have different CPU features (eg FPU
>> present or absent).

> I do understand the definition of cluster_index in the sense of this series. However, it looks to me that the term "cluster" is generally overused in areas where we work. This may lead to some confusion for future developers, and let me suggest some other name, like "tcg_cluster_index" or "tcg_group_id", or "translation_group_id". Admitedly, they all sound ugly to me too. But having the name that would clearly separate this id from too generic "cluster_index" IMHO would save lots of time during potential related future development.
>
> (Needled to say that,  for example, we in MIPS, for multi-core sustems, group cores in clusters, that actually do not have anything to do with clusters in TCG sense...)

Yeah, the term is a bit overloaded. Arm also has clusters
that are used more in the NUMA sense. However, the
term we have is what is in git master currently...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] accel/tcg: Add cluster number to TCG TB hash
  2019-01-14  1:08   ` Aleksandar Markovic
  2019-01-14 10:27     ` Peter Maydell
@ 2019-01-14 11:53     ` Peter Maydell
  1 sibling, 0 replies; 18+ messages in thread
From: Peter Maydell @ 2019-01-14 11:53 UTC (permalink / raw)
  To: Aleksandar Markovic
  Cc: qemu-arm, qemu-devel, Eduardo Habkost, Peter Crosthwaite,
	Alistair Francis, Richard Henderson, Emilio G . Cota, patches,
	Edgar E. Iglesias, Paolo Bonzini

On Mon, 14 Jan 2019 at 01:08, Aleksandar Markovic
<aleksandar.m.mail@gmail.com> wrote:
> I do understand the definition of cluster_index in the sense
> of this series. However, it looks to me that the term
> "cluster" is generally overused in areas where we work.
> This may lead to some confusion for future developers, and
> let me suggest some other name, like "tcg_cluster_index" or
> "tcg_group_id", or "translation_group_id".

Note also that the cluster index is not purely a TCG
concept -- it also (in master at the moment) affects
the gdbstub interface. Different clusters appear as
separate processes in gdb, whereas different CPUs in
the same cluster are different threads in the same CPU.
(And by default only the first cluster's CPUs will
appear, unless you explicitly attach to the second cluster:
this is a limitation of how gdb's UI handles multiple
processes.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-01-14 11:53 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-08 16:30 [Qemu-devel] [PATCH 0/4] tcg: support heterogenous CPU clusters Peter Maydell
2019-01-08 16:30 ` [Qemu-devel] [PATCH 1/4] hw/arm/xlx-zynqmp: Realize cluster after putting RPUs in it Peter Maydell
2019-01-10 15:11   ` Luc Michel
2019-01-10 19:52   ` Alistair Francis
2019-01-08 16:30 ` [Qemu-devel] [PATCH 2/4] qom/cpu: Add cluster_index to CPUState Peter Maydell
2019-01-10 15:13   ` Luc Michel
2019-01-08 16:30 ` [Qemu-devel] [PATCH 3/4] accel/tcg: Add cluster number to TCG TB hash Peter Maydell
2019-01-10 15:14   ` Luc Michel
2019-01-10 18:26   ` Peter Maydell
2019-01-11 12:49     ` Aleksandar Markovic
2019-01-11 13:00       ` Peter Maydell
2019-01-11 15:49         ` Aleksandar Markovic
2019-01-14  1:08   ` Aleksandar Markovic
2019-01-14 10:27     ` Peter Maydell
2019-01-14 11:53     ` Peter Maydell
2019-01-08 16:30 ` [Qemu-devel] [PATCH 4/4] gdbstub: Simplify gdb_get_cpu_pid() to use cpu->cluster_index Peter Maydell
2019-01-10 15:15   ` Luc Michel
2019-01-09 20:53 ` [Qemu-devel] [PATCH 0/4] tcg: support heterogenous CPU clusters Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.