All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] target/ppc: initial SMT support in TCG
@ 2023-05-31  1:23 Nicholas Piggin
  2023-05-31  1:23 ` [RFC PATCH 1/5] target/ppc: gdbstub init spr gdb_id for all CPUs Nicholas Piggin
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Nicholas Piggin @ 2023-05-31  1:23 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel, Daniel Henrique Barboza

Hi,

I'm posting this now just to get some first thoughts. I wouldn't say
it's ready but it does actually work with some basic tests including
pseries booting a Linux distro. I have powernv booting too, it just
requires some more SPRs converted, nothing fundamentally different so
for the purpose of this RFC I leave it out.

A couple of things, I don't know the object model well enough to do
something nice with topology. Iterating siblings I would have thought
should be going to parent core then iterating its children CPUs. Should
that be done with the object model, or is it better to add direct
pointers in CPUs to core and core to CPUs? It is (semi) important for
performance so maybe that is better than object iterators. If we go that
way, the PnvCore and SpaprCore have pointers to the SMT threads already,
should those be abstracted go in the CPUCore?

The other thing is the serialisation of access. It's using the atomic
single stepping for this which... I guess should be sufficient? Is it
the best way to do it though? Can a lock be used somehow instead?

Thanks,
Nick

Nicholas Piggin (5):
  target/ppc: gdbstub init spr gdb_id for all CPUs
  target/ppc: Add initial flags and helpers for SMT support
  target/ppc: Add support for SMT CTRL register
  target/ppc: Add msgsnd/p and DPDES SMT support
  spapr: Allow up to 8 threads SMT configuration

 hw/ppc/ppc.c                                  |  6 ++
 hw/ppc/spapr.c                                |  4 +-
 hw/ppc/spapr_cpu_core.c                       |  7 +-
 include/hw/ppc/ppc.h                          |  1 +
 target/ppc/cpu.h                              | 16 +++-
 target/ppc/cpu_init.c                         |  5 +
 target/ppc/excp_helper.c                      | 86 ++++++++++++-----
 target/ppc/gdbstub.c                          | 32 ++++---
 target/ppc/helper.h                           |  4 +-
 target/ppc/misc_helper.c                      | 93 +++++++++++++++++--
 target/ppc/translate.c                        | 46 ++++++++-
 .../ppc/translate/processor-ctrl-impl.c.inc   |  2 +-
 12 files changed, 252 insertions(+), 50 deletions(-)

-- 
2.40.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH 1/5] target/ppc: gdbstub init spr gdb_id for all CPUs
  2023-05-31  1:23 [RFC PATCH 0/5] target/ppc: initial SMT support in TCG Nicholas Piggin
@ 2023-05-31  1:23 ` Nicholas Piggin
  2023-05-31  5:49   ` Philippe Mathieu-Daudé
  2023-06-23  9:26   ` Cédric Le Goater
  2023-05-31  1:23 ` [RFC PATCH 2/5] target/ppc: Add initial flags and helpers for SMT support Nicholas Piggin
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 19+ messages in thread
From: Nicholas Piggin @ 2023-05-31  1:23 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel, Daniel Henrique Barboza

Make sure each CPU gets its state set up for gdb, not just the ones
before PowerPCCPUClass has had its gdb state set up.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 target/ppc/gdbstub.c | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/target/ppc/gdbstub.c b/target/ppc/gdbstub.c
index 63c9abe4f1..ca39efdc35 100644
--- a/target/ppc/gdbstub.c
+++ b/target/ppc/gdbstub.c
@@ -327,6 +327,25 @@ void ppc_gdb_gen_spr_xml(PowerPCCPU *cpu)
     unsigned int num_regs = 0;
     int i;
 
+    for (i = 0; i < ARRAY_SIZE(env->spr_cb); i++) {
+        ppc_spr_t *spr = &env->spr_cb[i];
+
+        if (!spr->name) {
+            continue;
+        }
+
+        /*
+         * GDB identifies registers based on the order they are
+         * presented in the XML. These ids will not match QEMU's
+         * representation (which follows the PowerISA).
+         *
+         * Store the position of the current register description so
+         * we can make the correspondence later.
+         */
+        spr->gdb_id = num_regs;
+        num_regs++;
+    }
+
     if (pcc->gdb_spr_xml) {
         return;
     }
@@ -348,17 +367,6 @@ void ppc_gdb_gen_spr_xml(PowerPCCPU *cpu)
 
         g_string_append_printf(xml, " bitsize=\"%d\"", TARGET_LONG_BITS);
         g_string_append(xml, " group=\"spr\"/>");
-
-        /*
-         * GDB identifies registers based on the order they are
-         * presented in the XML. These ids will not match QEMU's
-         * representation (which follows the PowerISA).
-         *
-         * Store the position of the current register description so
-         * we can make the correspondence later.
-         */
-        spr->gdb_id = num_regs;
-        num_regs++;
     }
 
     g_string_append(xml, "</feature>");
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 2/5] target/ppc: Add initial flags and helpers for SMT support
  2023-05-31  1:23 [RFC PATCH 0/5] target/ppc: initial SMT support in TCG Nicholas Piggin
  2023-05-31  1:23 ` [RFC PATCH 1/5] target/ppc: gdbstub init spr gdb_id for all CPUs Nicholas Piggin
@ 2023-05-31  1:23 ` Nicholas Piggin
  2023-05-31  7:25   ` Cédric Le Goater
  2023-05-31  1:23 ` [RFC PATCH 3/5] target/ppc: Add support for SMT CTRL register Nicholas Piggin
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2023-05-31  1:23 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel, Daniel Henrique Barboza

SMT TCG emulation needs to be able to iterate over siblings in a core,
and needs to serialise core access to shared SPRs and state.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 target/ppc/cpu.h       |  9 +++++++++
 target/ppc/cpu_init.c  |  5 +++++
 target/ppc/translate.c | 20 ++++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 1f23b81e90..b594408a8d 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -672,6 +672,8 @@ enum {
     POWERPC_FLAG_TM       = 0x00100000,
     /* Has SCV (ISA 3.00)                                                    */
     POWERPC_FLAG_SCV      = 0x00200000,
+    /* Has >1 thread per core                                                */
+    POWERPC_FLAG_SMT      = 0x00400000,
 };
 
 /*
@@ -1266,6 +1268,13 @@ struct CPUArchState {
     uint64_t pmu_base_time;
 };
 
+#define _CORE_ID(cs)                                            \
+    (POWERPC_CPU(cs)->env.spr_cb[SPR_PIR].default_value & ~(cs->nr_threads - 1))
+
+#define THREAD_SIBLING_FOREACH(cs, cs_sibling)			\
+    CPU_FOREACH(cs_sibling)                                     \
+        if (_CORE_ID(cs) == _CORE_ID(cs_sibling))
+
 #define SET_FIT_PERIOD(a_, b_, c_, d_)          \
 do {                                            \
     env->fit_period[0] = (a_);                  \
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index aa364f36f6..5035f6dada 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6754,6 +6754,7 @@ static void ppc_cpu_realize(DeviceState *dev, Error **errp)
 {
     CPUState *cs = CPU(dev);
     PowerPCCPU *cpu = POWERPC_CPU(dev);
+    CPUPPCState *env = &cpu->env;
     PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
     Error *local_err = NULL;
 
@@ -6785,6 +6786,10 @@ static void ppc_cpu_realize(DeviceState *dev, Error **errp)
 
     pcc->parent_realize(dev, errp);
 
+    if (env_cpu(env)->nr_threads > 1) {
+        env->flags |= POWERPC_FLAG_SMT;
+    }
+
     return;
 
 unrealize:
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index b6bab4c234..72270c2163 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -227,6 +227,26 @@ struct opc_handler_t {
     void (*handler)(DisasContext *ctx);
 };
 
+static inline bool gen_serialize(DisasContext *ctx)
+{
+    if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+        /* Restart with exclusive lock.  */
+        gen_helper_exit_atomic(cpu_env);
+        ctx->base.is_jmp = DISAS_NORETURN;
+        return false;
+    }
+    return true;
+}
+
+static inline bool gen_serialize_core(DisasContext *ctx)
+{
+    if (ctx->flags & POWERPC_FLAG_SMT) {
+        return gen_serialize(ctx);
+    }
+
+    return true;
+}
+
 /* SPR load/store helpers */
 static inline void gen_load_spr(TCGv t, int reg)
 {
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 3/5] target/ppc: Add support for SMT CTRL register
  2023-05-31  1:23 [RFC PATCH 0/5] target/ppc: initial SMT support in TCG Nicholas Piggin
  2023-05-31  1:23 ` [RFC PATCH 1/5] target/ppc: gdbstub init spr gdb_id for all CPUs Nicholas Piggin
  2023-05-31  1:23 ` [RFC PATCH 2/5] target/ppc: Add initial flags and helpers for SMT support Nicholas Piggin
@ 2023-05-31  1:23 ` Nicholas Piggin
  2023-05-31  1:23 ` [RFC PATCH 4/5] target/ppc: Add msgsnd/p and DPDES SMT support Nicholas Piggin
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Nicholas Piggin @ 2023-05-31  1:23 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel, Daniel Henrique Barboza

A relatively simple case to begin with, CTRL is a SMT shared register
where reads and writes need to synchronise against state changes by
other threads in the core.

Atomic serialisation operations are used to achieve this.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 target/ppc/helper.h      |  2 ++
 target/ppc/misc_helper.c | 33 +++++++++++++++++++++++++++++++++
 target/ppc/translate.c   | 18 +++++++++++++++++-
 3 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 38efbc351c..fda40b8a60 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -704,6 +704,8 @@ DEF_HELPER_3(store_dcr, void, env, tl, tl)
 
 DEF_HELPER_2(load_dump_spr, void, env, i32)
 DEF_HELPER_2(store_dump_spr, void, env, i32)
+DEF_HELPER_3(spr_write_CTRL, void, env, i32, tl)
+
 DEF_HELPER_4(fscr_facility_check, void, env, i32, i32, i32)
 DEF_HELPER_4(msr_facility_check, void, env, i32, i32, i32)
 DEF_HELPER_FLAGS_1(load_tbl, TCG_CALL_NO_RWG, tl, env)
diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c
index 40ddc5c08c..ffe54a4310 100644
--- a/target/ppc/misc_helper.c
+++ b/target/ppc/misc_helper.c
@@ -43,6 +43,39 @@ void helper_store_dump_spr(CPUPPCState *env, uint32_t sprn)
              env->spr[sprn]);
 }
 
+void helper_spr_write_CTRL(CPUPPCState *env, uint32_t sprn,
+                           target_ulong val)
+{
+    CPUState *cs = env_cpu(env);
+    CPUState *ccs;
+    uint32_t nr_threads = cs->nr_threads;
+    uint32_t core_id = env->spr[SPR_PIR] & ~(nr_threads - 1);
+    uint32_t run = val & 1;
+    uint32_t ts, ts_mask;
+
+    assert(sprn == SPR_CTRL);
+    assert(core_id == env->spr[SPR_PIR] - env->spr[SPR_TIR]);
+
+    env->spr[sprn] &= ~1U;
+    env->spr[sprn] |= run;
+
+    ts_mask = ~(1U << (8 + env->spr[SPR_TIR]));
+    ts = run << (8 + env->spr[SPR_TIR]);
+
+    CPU_FOREACH(ccs) {
+        CPUPPCState *cenv = &POWERPC_CPU(ccs)->env;
+        uint32_t ccore_id = cenv->spr[SPR_PIR] & ~(nr_threads - 1);
+
+        assert(ccore_id == cenv->spr[SPR_PIR] - cenv->spr[SPR_TIR]);
+
+        if (ccore_id == core_id) {
+            cenv->spr[sprn] &= ts_mask;
+            cenv->spr[sprn] |= ts;
+        }
+    }
+}
+
+
 #ifdef TARGET_PPC64
 static void raise_hv_fu_exception(CPUPPCState *env, uint32_t bit,
                                   const char *caller, uint32_t cause,
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 72270c2163..31821f92f5 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -443,7 +443,7 @@ void spr_write_generic32(DisasContext *ctx, int sprn, int gprn)
 #endif
 }
 
-void spr_write_CTRL(DisasContext *ctx, int sprn, int gprn)
+static void spr_write_CTRL_ST(DisasContext *ctx, int sprn, int gprn)
 {
     /* This does not implement >1 thread */
     TCGv t0 = tcg_temp_new();
@@ -452,6 +452,22 @@ void spr_write_CTRL(DisasContext *ctx, int sprn, int gprn)
     tcg_gen_shli_tl(t1, t0, 8); /* Duplicate the bit in TS */
     tcg_gen_or_tl(t1, t1, t0);
     gen_store_spr(sprn, t1);
+}
+
+void spr_write_CTRL(DisasContext *ctx, int sprn, int gprn)
+{
+    if (!(ctx->flags & POWERPC_FLAG_SMT)) {
+        spr_write_CTRL_ST(ctx, sprn, gprn);
+        goto out;
+    }
+
+    if (!gen_serialize(ctx)) {
+        return;
+    }
+
+    gen_helper_spr_write_CTRL(cpu_env, tcg_constant_i32(sprn),
+                              cpu_gpr[gprn]);
+out:
     spr_store_dump_spr(sprn);
 
     /*
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 4/5] target/ppc: Add msgsnd/p and DPDES SMT support
  2023-05-31  1:23 [RFC PATCH 0/5] target/ppc: initial SMT support in TCG Nicholas Piggin
                   ` (2 preceding siblings ...)
  2023-05-31  1:23 ` [RFC PATCH 3/5] target/ppc: Add support for SMT CTRL register Nicholas Piggin
@ 2023-05-31  1:23 ` Nicholas Piggin
  2023-06-01  7:13   ` Cédric Le Goater
  2023-05-31  1:23 ` [RFC PATCH 5/5] spapr: Allow up to 8 threads SMT configuration Nicholas Piggin
  2023-06-01  7:56 ` [RFC PATCH 0/5] target/ppc: initial SMT support in TCG Cédric Le Goater
  5 siblings, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2023-05-31  1:23 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel, Daniel Henrique Barboza

Doorbells in SMT need to coordinate msgsnd/msgclr and DPDES access from
multiple threads that affect the same state.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 hw/ppc/ppc.c                                  |  6 ++
 include/hw/ppc/ppc.h                          |  1 +
 target/ppc/cpu.h                              |  7 +-
 target/ppc/excp_helper.c                      | 86 +++++++++++++------
 target/ppc/gdbstub.c                          |  2 +-
 target/ppc/helper.h                           |  2 +-
 target/ppc/misc_helper.c                      | 60 +++++++++++--
 target/ppc/translate.c                        |  8 ++
 .../ppc/translate/processor-ctrl-impl.c.inc   |  2 +-
 9 files changed, 140 insertions(+), 34 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index 80b4706db2..e30853413b 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -1434,6 +1434,12 @@ int ppc_cpu_pir(PowerPCCPU *cpu)
     return env->spr_cb[SPR_PIR].default_value;
 }
 
+int ppc_cpu_tir(PowerPCCPU *cpu)
+{
+    CPUPPCState *env = &cpu->env;
+    return env->spr_cb[SPR_PIR].default_value;
+}
+
 PowerPCCPU *ppc_get_vcpu_by_pir(int pir)
 {
     CPUState *cs;
diff --git a/include/hw/ppc/ppc.h b/include/hw/ppc/ppc.h
index 02af03ada2..e095c002dc 100644
--- a/include/hw/ppc/ppc.h
+++ b/include/hw/ppc/ppc.h
@@ -6,6 +6,7 @@
 void ppc_set_irq(PowerPCCPU *cpu, int n_IRQ, int level);
 PowerPCCPU *ppc_get_vcpu_by_pir(int pir);
 int ppc_cpu_pir(PowerPCCPU *cpu);
+int ppc_cpu_tir(PowerPCCPU *cpu);
 
 /* PowerPC hardware exceptions management helpers */
 typedef void (*clk_setup_cb)(void *opaque, uint32_t freq);
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index b594408a8d..b04b309c71 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1056,7 +1056,12 @@ FIELD(FPSCR, FI, FPSCR_FI, 1)
 
 #define DBELL_TYPE_DBELL_SERVER        (0x05 << DBELL_TYPE_SHIFT)
 
-#define DBELL_BRDCAST                  PPC_BIT(37)
+/* XXX: make sure this does not break BookE */
+#define DBELL_BRDCAST_MASK             PPC_BITMASK(37, 38)
+#define DBELL_BRDCAST_SHIFT            25
+#define DBELL_BRDCAST_SUBPROC          (0x1 << DBELL_BRDCAST_SHIFT)
+#define DBELL_BRDCAST_CORE             (0x2 << DBELL_BRDCAST_SHIFT)
+
 #define DBELL_LPIDTAG_SHIFT            14
 #define DBELL_LPIDTAG_MASK             (0xfff << DBELL_LPIDTAG_SHIFT)
 #define DBELL_PIRTAG_MASK              0x3fff
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 4925996cf3..5fc2e17269 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -3085,7 +3085,7 @@ void helper_msgsnd(target_ulong rb)
         PowerPCCPU *cpu = POWERPC_CPU(cs);
         CPUPPCState *cenv = &cpu->env;
 
-        if ((rb & DBELL_BRDCAST) || (cenv->spr[SPR_BOOKE_PIR] == pir)) {
+        if ((rb & DBELL_BRDCAST_MASK) || (cenv->spr[SPR_BOOKE_PIR] == pir)) {
             ppc_set_irq(cpu, irq, 1);
         }
     }
@@ -3104,6 +3104,16 @@ static bool dbell_type_server(target_ulong rb)
     return (rb & DBELL_TYPE_MASK) == DBELL_TYPE_DBELL_SERVER;
 }
 
+static inline bool dbell_type_bcast_core(target_ulong rb)
+{
+    return (rb & DBELL_BRDCAST_MASK) == DBELL_BRDCAST_CORE;
+}
+
+static inline bool dbell_type_bcast_subproc(target_ulong rb)
+{
+    return (rb & DBELL_BRDCAST_MASK) == DBELL_BRDCAST_SUBPROC;
+}
+
 void helper_book3s_msgclr(CPUPPCState *env, target_ulong rb)
 {
     if (!dbell_type_server(rb)) {
@@ -3113,32 +3123,40 @@ void helper_book3s_msgclr(CPUPPCState *env, target_ulong rb)
     ppc_set_irq(env_archcpu(env), PPC_INTERRUPT_HDOORBELL, 0);
 }
 
-static void book3s_msgsnd_common(int pir, int irq)
+void helper_book3s_msgsnd(CPUPPCState *env, target_ulong rb)
 {
-    CPUState *cs;
-
-    qemu_mutex_lock_iothread();
-    CPU_FOREACH(cs) {
-        PowerPCCPU *cpu = POWERPC_CPU(cs);
-        CPUPPCState *cenv = &cpu->env;
+    int pir = rb & DBELL_PROCIDTAG_MASK;
+    int brdcast = rb & DBELL_BRDCAST_MASK;
+    CPUState *cs, *ccs;
+    PowerPCCPU *cpu;
 
-        /* TODO: broadcast message to all threads of the same  processor */
-        if (cenv->spr_cb[SPR_PIR].default_value == pir) {
-            ppc_set_irq(cpu, irq, 1);
-        }
+    if (!dbell_type_server(rb)) {
+        return;
     }
-    qemu_mutex_unlock_iothread();
-}
 
-void helper_book3s_msgsnd(target_ulong rb)
-{
-    int pir = rb & DBELL_PROCIDTAG_MASK;
+    cpu = ppc_get_vcpu_by_pir(pir);
+    if (!cpu) {
+        return;
+    }
+    cs = CPU(cpu);
 
-    if (!dbell_type_server(rb)) {
+    if (cs->nr_threads == 1 || !brdcast) {
+        ppc_set_irq(cpu, PPC_INTERRUPT_HDOORBELL, 1);
         return;
     }
 
-    book3s_msgsnd_common(pir, PPC_INTERRUPT_HDOORBELL);
+    /* WHy does iothread need to be locked for walking CPU list? */
+    /* Answer seems to be because ppc irq handling needs it, but it now takes
+     * the lock itself if needed. Could remove this then.
+     */
+    qemu_mutex_lock_iothread();
+    THREAD_SIBLING_FOREACH(cs, ccs) {
+        PowerPCCPU *ccpu = POWERPC_CPU(ccs);
+        if (cpu != ccpu) {
+            ppc_set_irq(ccpu, PPC_INTERRUPT_HDOORBELL, 1);
+        }
+    }
+    qemu_mutex_unlock_iothread();
 }
 
 #if defined(TARGET_PPC64)
@@ -3154,22 +3172,42 @@ void helper_book3s_msgclrp(CPUPPCState *env, target_ulong rb)
 }
 
 /*
- * sends a message to other threads that are on the same
+ * sends a message to another thread  on the same
  * multi-threaded processor
  */
 void helper_book3s_msgsndp(CPUPPCState *env, target_ulong rb)
 {
-    int pir = env->spr_cb[SPR_PIR].default_value;
+    CPUState *cs = env_cpu(env);
+    PowerPCCPU *cpu = POWERPC_CPU(cs);
+    CPUState *ccs;
+    uint32_t nr_threads = cs->nr_threads;
+    int ttir = rb & PPC_BITMASK(57, 63);
 
     helper_hfscr_facility_check(env, HFSCR_MSGP, "msgsndp", HFSCR_IC_MSGP);
 
-    if (!dbell_type_server(rb)) {
+    if (!dbell_type_server(rb) || ttir >= nr_threads) {
+        return;
+    }
+
+    if (nr_threads == 1) {
+        ppc_set_irq(cpu, PPC_INTERRUPT_DOORBELL, 1);
         return;
     }
 
-    /* TODO: TCG supports only one thread */
+    /* WHy does iothread need to be locked for walking CPU list? */
+    qemu_mutex_lock_iothread();
+    THREAD_SIBLING_FOREACH(cs, ccs) {
+        PowerPCCPU *ccpu = POWERPC_CPU(ccs);
+        uint32_t thread_id = ppc_cpu_tir(ccpu);
+
+        if (ttir == thread_id) {
+            ppc_set_irq(ccpu, PPC_INTERRUPT_DOORBELL, 1);
+            qemu_mutex_unlock_iothread();
+            return;
+        }
+    }
 
-    book3s_msgsnd_common(pir, PPC_INTERRUPT_DOORBELL);
+    assert(0);
 }
 #endif /* TARGET_PPC64 */
 
diff --git a/target/ppc/gdbstub.c b/target/ppc/gdbstub.c
index ca39efdc35..f0304e5bb6 100644
--- a/target/ppc/gdbstub.c
+++ b/target/ppc/gdbstub.c
@@ -117,7 +117,7 @@ void ppc_maybe_bswap_register(CPUPPCState *env, uint8_t *mem_buf, int len)
  * regs and PC, MSR, CR, and so forth.  We hack round this by giving
  * the FP regs zero size when talking to a newer gdb.
  */
-
+/* XXX: read/write dpdes correctly */
 int ppc_cpu_gdb_read_register(CPUState *cs, GByteArray *buf, int n)
 {
     PowerPCCPU *cpu = POWERPC_CPU(cs);
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index fda40b8a60..5ce49c7ebc 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -690,7 +690,7 @@ DEF_HELPER_FLAGS_3(store_sr, TCG_CALL_NO_RWG, void, env, tl, tl)
 
 DEF_HELPER_1(msgsnd, void, tl)
 DEF_HELPER_2(msgclr, void, env, tl)
-DEF_HELPER_1(book3s_msgsnd, void, tl)
+DEF_HELPER_2(book3s_msgsnd, void, env, tl)
 DEF_HELPER_2(book3s_msgclr, void, env, tl)
 #endif
 
diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c
index ffe54a4310..ca84f1b134 100644
--- a/target/ppc/misc_helper.c
+++ b/target/ppc/misc_helper.c
@@ -192,14 +192,38 @@ void helper_store_pcr(CPUPPCState *env, target_ulong value)
  */
 target_ulong helper_load_dpdes(CPUPPCState *env)
 {
+    CPUState *cs = env_cpu(env);
+    CPUState *ccs;
+    uint32_t nr_threads = cs->nr_threads;
+    uint32_t core_id = env->spr[SPR_PIR] & ~(nr_threads - 1);
     target_ulong dpdes = 0;
 
+    assert(core_id == env->spr[SPR_PIR] - env->spr[SPR_TIR]);
+
     helper_hfscr_facility_check(env, HFSCR_MSGP, "load DPDES", HFSCR_IC_MSGP);
 
-    /* TODO: TCG supports only one thread */
-    if (env->pending_interrupts & PPC_INTERRUPT_DOORBELL) {
-        dpdes = 1;
+    if (nr_threads == 1) {
+        if (env->pending_interrupts & PPC_INTERRUPT_DOORBELL) {
+            dpdes = 1;
+        }
+        return dpdes;
+    }
+
+    qemu_mutex_lock_iothread();
+    CPU_FOREACH(ccs) {
+        CPUPPCState *cenv = &POWERPC_CPU(ccs)->env;
+        uint32_t ccore_id = cenv->spr[SPR_PIR] & ~(nr_threads - 1);
+        uint32_t thread_id = cenv->spr[SPR_TIR];
+
+        assert(ccore_id == cenv->spr[SPR_PIR] - cenv->spr[SPR_TIR]);
+
+        if (ccore_id == core_id) {
+            if (cenv->pending_interrupts & PPC_INTERRUPT_DOORBELL) {
+                dpdes |= (0x1 << thread_id);
+            }
+        }
     }
+    qemu_mutex_unlock_iothread();
 
     return dpdes;
 }
@@ -207,17 +231,41 @@ target_ulong helper_load_dpdes(CPUPPCState *env)
 void helper_store_dpdes(CPUPPCState *env, target_ulong val)
 {
     PowerPCCPU *cpu = env_archcpu(env);
+    CPUState *cs = env_cpu(env);
+    CPUState *ccs;
+    uint32_t nr_threads = cs->nr_threads;
+    uint32_t core_id = env->spr[SPR_PIR] & ~(nr_threads - 1);
+
+    assert(core_id == env->spr[SPR_PIR] - env->spr[SPR_TIR]);
 
     helper_hfscr_facility_check(env, HFSCR_MSGP, "store DPDES", HFSCR_IC_MSGP);
 
-    /* TODO: TCG supports only one thread */
-    if (val & ~0x1) {
+    if (val & ~(nr_threads - 1)) {
         qemu_log_mask(LOG_GUEST_ERROR, "Invalid DPDES register value "
                       TARGET_FMT_lx"\n", val);
+        val &= ~(nr_threads - 1);
+        /* Ignore the invalid bits */
+    }
+
+    if (nr_threads == 1) {
+        /* XXX: don't need iothread lock? */
+        ppc_set_irq(cpu, PPC_INTERRUPT_DOORBELL, val & 0x1);
         return;
     }
 
-    ppc_set_irq(cpu, PPC_INTERRUPT_DOORBELL, val & 0x1);
+    qemu_mutex_lock_iothread();
+    CPU_FOREACH(ccs) {
+        CPUPPCState *cenv = &POWERPC_CPU(ccs)->env;
+        uint32_t ccore_id = cenv->spr[SPR_PIR] & ~(nr_threads - 1);
+        uint32_t thread_id = cenv->spr[SPR_TIR];
+
+        assert(ccore_id == cenv->spr[SPR_PIR] - cenv->spr[SPR_TIR]);
+
+        if (ccore_id == core_id) {
+            ppc_set_irq(cpu, PPC_INTERRUPT_DOORBELL, val & (0x1 << thread_id));
+        }
+    }
+    qemu_mutex_unlock_iothread();
 }
 #endif /* defined(TARGET_PPC64) */
 
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 31821f92f5..0aa49323d3 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -820,11 +820,19 @@ void spr_write_pcr(DisasContext *ctx, int sprn, int gprn)
 /* DPDES */
 void spr_read_dpdes(DisasContext *ctx, int gprn, int sprn)
 {
+    if (!gen_serialize_core(ctx)) {
+        return;
+    }
+
     gen_helper_load_dpdes(cpu_gpr[gprn], cpu_env);
 }
 
 void spr_write_dpdes(DisasContext *ctx, int sprn, int gprn)
 {
+    if (!gen_serialize_core(ctx)) {
+        return;
+    }
+
     gen_helper_store_dpdes(cpu_env, cpu_gpr[gprn]);
 }
 #endif
diff --git a/target/ppc/translate/processor-ctrl-impl.c.inc b/target/ppc/translate/processor-ctrl-impl.c.inc
index cc7a50d579..7dfbcd781f 100644
--- a/target/ppc/translate/processor-ctrl-impl.c.inc
+++ b/target/ppc/translate/processor-ctrl-impl.c.inc
@@ -59,7 +59,7 @@ static bool trans_MSGSND(DisasContext *ctx, arg_X_rb *a)
 
 #if !defined(CONFIG_USER_ONLY)
     if (is_book3s_arch2x(ctx)) {
-        gen_helper_book3s_msgsnd(cpu_gpr[a->rb]);
+        gen_helper_book3s_msgsnd(cpu_env, cpu_gpr[a->rb]);
     } else {
         gen_helper_msgsnd(cpu_gpr[a->rb]);
     }
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 5/5] spapr: Allow up to 8 threads SMT configuration
  2023-05-31  1:23 [RFC PATCH 0/5] target/ppc: initial SMT support in TCG Nicholas Piggin
                   ` (3 preceding siblings ...)
  2023-05-31  1:23 ` [RFC PATCH 4/5] target/ppc: Add msgsnd/p and DPDES SMT support Nicholas Piggin
@ 2023-05-31  1:23 ` Nicholas Piggin
  2023-06-01  7:20   ` Cédric Le Goater
  2023-06-01  7:56 ` [RFC PATCH 0/5] target/ppc: initial SMT support in TCG Cédric Le Goater
  5 siblings, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2023-05-31  1:23 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel, Daniel Henrique Barboza

TCG now supports multi-threaded configuration at least enough for
pseries to be functional enough to boot Linux.

This requires PIR and TIR be set, because that's how sibling thread
matching is done.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 hw/ppc/spapr.c          | 4 ++--
 hw/ppc/spapr_cpu_core.c | 7 +++++--
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index dcb7f1c70a..11074cefea 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2524,8 +2524,8 @@ static void spapr_set_vsmt_mode(SpaprMachineState *spapr, Error **errp)
     int ret;
     unsigned int smp_threads = ms->smp.threads;
 
-    if (!kvm_enabled() && (smp_threads > 1)) {
-        error_setg(errp, "TCG cannot support more than 1 thread/core "
+    if (!kvm_enabled() && (smp_threads > 8)) {
+        error_setg(errp, "TCG cannot support more than 8 threads/core "
                    "on a pseries machine");
         return;
     }
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 9b88dd549a..f35ee600f1 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -255,7 +255,7 @@ static void spapr_cpu_core_unrealize(DeviceState *dev)
 }
 
 static bool spapr_realize_vcpu(PowerPCCPU *cpu, SpaprMachineState *spapr,
-                               SpaprCpuCore *sc, Error **errp)
+                               SpaprCpuCore *sc, int thread_nr, Error **errp)
 {
     CPUPPCState *env = &cpu->env;
     CPUState *cs = CPU(cpu);
@@ -267,6 +267,9 @@ static bool spapr_realize_vcpu(PowerPCCPU *cpu, SpaprMachineState *spapr,
     cpu_ppc_set_vhyp(cpu, PPC_VIRTUAL_HYPERVISOR(spapr));
     kvmppc_set_papr(cpu);
 
+    env->spr_cb[SPR_PIR].default_value = cs->cpu_index;
+    env->spr_cb[SPR_TIR].default_value = thread_nr;
+
     /* Set time-base frequency to 512 MHz. vhyp must be set first. */
     cpu_ppc_tb_init(env, SPAPR_TIMEBASE_FREQ);
 
@@ -337,7 +340,7 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error **errp)
     for (i = 0; i < cc->nr_threads; i++) {
         sc->threads[i] = spapr_create_vcpu(sc, i, errp);
         if (!sc->threads[i] ||
-            !spapr_realize_vcpu(sc->threads[i], spapr, sc, errp)) {
+            !spapr_realize_vcpu(sc->threads[i], spapr, sc, i, errp)) {
             spapr_cpu_core_unrealize(dev);
             return;
         }
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 1/5] target/ppc: gdbstub init spr gdb_id for all CPUs
  2023-05-31  1:23 ` [RFC PATCH 1/5] target/ppc: gdbstub init spr gdb_id for all CPUs Nicholas Piggin
@ 2023-05-31  5:49   ` Philippe Mathieu-Daudé
  2023-06-23  9:26   ` Cédric Le Goater
  1 sibling, 0 replies; 19+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-05-31  5:49 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc
  Cc: qemu-devel, Daniel Henrique Barboza, Fabiano Rosas

On 31/5/23 03:23, Nicholas Piggin wrote:
> Make sure each CPU gets its state set up for gdb, not just the ones
> before PowerPCCPUClass has had its gdb state set up.
> 

Cc: qemu-stable@nongnu.org
Fixes: 707c7c2ee1 ("target/ppc: Enable reporting of SPRs to GDB")

> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   target/ppc/gdbstub.c | 30 +++++++++++++++++++-----------
>   1 file changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/target/ppc/gdbstub.c b/target/ppc/gdbstub.c
> index 63c9abe4f1..ca39efdc35 100644
> --- a/target/ppc/gdbstub.c
> +++ b/target/ppc/gdbstub.c
> @@ -327,6 +327,25 @@ void ppc_gdb_gen_spr_xml(PowerPCCPU *cpu)
>       unsigned int num_regs = 0;
>       int i;
>   
> +    for (i = 0; i < ARRAY_SIZE(env->spr_cb); i++) {
> +        ppc_spr_t *spr = &env->spr_cb[i];
> +
> +        if (!spr->name) {
> +            continue;
> +        }
> +
> +        /*
> +         * GDB identifies registers based on the order they are
> +         * presented in the XML. These ids will not match QEMU's
> +         * representation (which follows the PowerISA).
> +         *
> +         * Store the position of the current register description so
> +         * we can make the correspondence later.
> +         */
> +        spr->gdb_id = num_regs;
> +        num_regs++;
> +    }
> +
>       if (pcc->gdb_spr_xml) {
>           return;
>       }
> @@ -348,17 +367,6 @@ void ppc_gdb_gen_spr_xml(PowerPCCPU *cpu)
>   
>           g_string_append_printf(xml, " bitsize=\"%d\"", TARGET_LONG_BITS);
>           g_string_append(xml, " group=\"spr\"/>");
> -
> -        /*
> -         * GDB identifies registers based on the order they are
> -         * presented in the XML. These ids will not match QEMU's
> -         * representation (which follows the PowerISA).
> -         *
> -         * Store the position of the current register description so
> -         * we can make the correspondence later.
> -         */
> -        spr->gdb_id = num_regs;
> -        num_regs++;
>       }
>   
>       g_string_append(xml, "</feature>");

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/5] target/ppc: Add initial flags and helpers for SMT support
  2023-05-31  1:23 ` [RFC PATCH 2/5] target/ppc: Add initial flags and helpers for SMT support Nicholas Piggin
@ 2023-05-31  7:25   ` Cédric Le Goater
  2023-06-02  6:54     ` Nicholas Piggin
  0 siblings, 1 reply; 19+ messages in thread
From: Cédric Le Goater @ 2023-05-31  7:25 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel, Daniel Henrique Barboza

On 5/31/23 03:23, Nicholas Piggin wrote:
> SMT TCG emulation needs to be able to iterate over siblings in a core,
> and needs to serialise core access to shared SPRs and state.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   target/ppc/cpu.h       |  9 +++++++++
>   target/ppc/cpu_init.c  |  5 +++++
>   target/ppc/translate.c | 20 ++++++++++++++++++++
>   3 files changed, 34 insertions(+)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 1f23b81e90..b594408a8d 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -672,6 +672,8 @@ enum {
>       POWERPC_FLAG_TM       = 0x00100000,
>       /* Has SCV (ISA 3.00)                                                    */
>       POWERPC_FLAG_SCV      = 0x00200000,
> +    /* Has >1 thread per core                                                */
> +    POWERPC_FLAG_SMT      = 0x00400000,
>   };
>   
>   /*
> @@ -1266,6 +1268,13 @@ struct CPUArchState {
>       uint64_t pmu_base_time;
>   };
>   
> +#define _CORE_ID(cs)                                            \
> +    (POWERPC_CPU(cs)->env.spr_cb[SPR_PIR].default_value & ~(cs->nr_threads - 1))
> +
> +#define THREAD_SIBLING_FOREACH(cs, cs_sibling)			\
> +    CPU_FOREACH(cs_sibling)                                     \
> +        if (_CORE_ID(cs) == _CORE_ID(cs_sibling))
> +


May be introduce these helpers when needed (and if needed).

>   #define SET_FIT_PERIOD(a_, b_, c_, d_)          \
>   do {                                            \
>       env->fit_period[0] = (a_);                  \
> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> index aa364f36f6..5035f6dada 100644
> --- a/target/ppc/cpu_init.c
> +++ b/target/ppc/cpu_init.c
> @@ -6754,6 +6754,7 @@ static void ppc_cpu_realize(DeviceState *dev, Error **errp)
>   {
>       CPUState *cs = CPU(dev);
>       PowerPCCPU *cpu = POWERPC_CPU(dev);
> +    CPUPPCState *env = &cpu->env;
>       PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>       Error *local_err = NULL;
>   
> @@ -6785,6 +6786,10 @@ static void ppc_cpu_realize(DeviceState *dev, Error **errp)
>   
>       pcc->parent_realize(dev, errp);
>   
> +    if (env_cpu(env)->nr_threads > 1) {
> +        env->flags |= POWERPC_FLAG_SMT;
> +    }
> +
>       return;
>   
>   unrealize:
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index b6bab4c234..72270c2163 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -227,6 +227,26 @@ struct opc_handler_t {
>       void (*handler)(DisasContext *ctx);
>   };
>   
> +static inline bool gen_serialize(DisasContext *ctx)
> +{
> +    if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
> +        /* Restart with exclusive lock.  */
> +        gen_helper_exit_atomic(cpu_env);
> +        ctx->base.is_jmp = DISAS_NORETURN;
> +        return false;
> +    }
> +    return true;
> +}
> +
> +static inline bool gen_serialize_core(DisasContext *ctx)
> +{
> +    if (ctx->flags & POWERPC_FLAG_SMT) {
> +        return gen_serialize(ctx);
> +    }
> +
> +    return true;
> +}
> +
>   /* SPR load/store helpers */
>   static inline void gen_load_spr(TCGv t, int reg)
>   {



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 4/5] target/ppc: Add msgsnd/p and DPDES SMT support
  2023-05-31  1:23 ` [RFC PATCH 4/5] target/ppc: Add msgsnd/p and DPDES SMT support Nicholas Piggin
@ 2023-06-01  7:13   ` Cédric Le Goater
  2023-06-02  6:56     ` Nicholas Piggin
  0 siblings, 1 reply; 19+ messages in thread
From: Cédric Le Goater @ 2023-06-01  7:13 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel, Daniel Henrique Barboza

On 5/31/23 03:23, Nicholas Piggin wrote:
> Doorbells in SMT need to coordinate msgsnd/msgclr and DPDES access from
> multiple threads that affect the same state.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   hw/ppc/ppc.c                                  |  6 ++
>   include/hw/ppc/ppc.h                          |  1 +
>   target/ppc/cpu.h                              |  7 +-
>   target/ppc/excp_helper.c                      | 86 +++++++++++++------
>   target/ppc/gdbstub.c                          |  2 +-
>   target/ppc/helper.h                           |  2 +-
>   target/ppc/misc_helper.c                      | 60 +++++++++++--
>   target/ppc/translate.c                        |  8 ++
>   .../ppc/translate/processor-ctrl-impl.c.inc   |  2 +-
>   9 files changed, 140 insertions(+), 34 deletions(-)
> 
> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> index 80b4706db2..e30853413b 100644
> --- a/hw/ppc/ppc.c
> +++ b/hw/ppc/ppc.c
> @@ -1434,6 +1434,12 @@ int ppc_cpu_pir(PowerPCCPU *cpu)
>       return env->spr_cb[SPR_PIR].default_value;
>   }
>   
> +int ppc_cpu_tir(PowerPCCPU *cpu)
> +{
> +    CPUPPCState *env = &cpu->env;
> +    return env->spr_cb[SPR_PIR].default_value;

PIR or TIR ?

> +}
> +
>   PowerPCCPU *ppc_get_vcpu_by_pir(int pir)
>   {
>       CPUState *cs;
> diff --git a/include/hw/ppc/ppc.h b/include/hw/ppc/ppc.h
> index 02af03ada2..e095c002dc 100644
> --- a/include/hw/ppc/ppc.h
> +++ b/include/hw/ppc/ppc.h
> @@ -6,6 +6,7 @@
>   void ppc_set_irq(PowerPCCPU *cpu, int n_IRQ, int level);
>   PowerPCCPU *ppc_get_vcpu_by_pir(int pir);
>   int ppc_cpu_pir(PowerPCCPU *cpu);
> +int ppc_cpu_tir(PowerPCCPU *cpu);
>   
>   /* PowerPC hardware exceptions management helpers */
>   typedef void (*clk_setup_cb)(void *opaque, uint32_t freq);
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index b594408a8d..b04b309c71 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1056,7 +1056,12 @@ FIELD(FPSCR, FI, FPSCR_FI, 1)
>   
>   #define DBELL_TYPE_DBELL_SERVER        (0x05 << DBELL_TYPE_SHIFT)
>   
> -#define DBELL_BRDCAST                  PPC_BIT(37)
> +/* XXX: make sure this does not break BookE */
> +#define DBELL_BRDCAST_MASK             PPC_BITMASK(37, 38)
> +#define DBELL_BRDCAST_SHIFT            25
> +#define DBELL_BRDCAST_SUBPROC          (0x1 << DBELL_BRDCAST_SHIFT)
> +#define DBELL_BRDCAST_CORE             (0x2 << DBELL_BRDCAST_SHIFT)
> +
>   #define DBELL_LPIDTAG_SHIFT            14
>   #define DBELL_LPIDTAG_MASK             (0xfff << DBELL_LPIDTAG_SHIFT)
>   #define DBELL_PIRTAG_MASK              0x3fff
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index 4925996cf3..5fc2e17269 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -3085,7 +3085,7 @@ void helper_msgsnd(target_ulong rb)
>           PowerPCCPU *cpu = POWERPC_CPU(cs);
>           CPUPPCState *cenv = &cpu->env;
>   
> -        if ((rb & DBELL_BRDCAST) || (cenv->spr[SPR_BOOKE_PIR] == pir)) {
> +        if ((rb & DBELL_BRDCAST_MASK) || (cenv->spr[SPR_BOOKE_PIR] == pir)) {
>               ppc_set_irq(cpu, irq, 1);
>           }
>       }
> @@ -3104,6 +3104,16 @@ static bool dbell_type_server(target_ulong rb)
>       return (rb & DBELL_TYPE_MASK) == DBELL_TYPE_DBELL_SERVER;
>   }
>   
> +static inline bool dbell_type_bcast_core(target_ulong rb)
> +{
> +    return (rb & DBELL_BRDCAST_MASK) == DBELL_BRDCAST_CORE;
> +}
> +
> +static inline bool dbell_type_bcast_subproc(target_ulong rb)
> +{
> +    return (rb & DBELL_BRDCAST_MASK) == DBELL_BRDCAST_SUBPROC;
> +}
> +
>   void helper_book3s_msgclr(CPUPPCState *env, target_ulong rb)
>   {
>       if (!dbell_type_server(rb)) {
> @@ -3113,32 +3123,40 @@ void helper_book3s_msgclr(CPUPPCState *env, target_ulong rb)
>       ppc_set_irq(env_archcpu(env), PPC_INTERRUPT_HDOORBELL, 0);
>   }
>   
> -static void book3s_msgsnd_common(int pir, int irq)
> +void helper_book3s_msgsnd(CPUPPCState *env, target_ulong rb)
>   {
> -    CPUState *cs;
> -
> -    qemu_mutex_lock_iothread();
> -    CPU_FOREACH(cs) {
> -        PowerPCCPU *cpu = POWERPC_CPU(cs);
> -        CPUPPCState *cenv = &cpu->env;
> +    int pir = rb & DBELL_PROCIDTAG_MASK;
> +    int brdcast = rb & DBELL_BRDCAST_MASK;
> +    CPUState *cs, *ccs;
> +    PowerPCCPU *cpu;
>   
> -        /* TODO: broadcast message to all threads of the same  processor */
> -        if (cenv->spr_cb[SPR_PIR].default_value == pir) {
> -            ppc_set_irq(cpu, irq, 1);
> -        }
> +    if (!dbell_type_server(rb)) {
> +        return;
>       }
> -    qemu_mutex_unlock_iothread();
> -}
>   
> -void helper_book3s_msgsnd(target_ulong rb)
> -{
> -    int pir = rb & DBELL_PROCIDTAG_MASK;
> +    cpu = ppc_get_vcpu_by_pir(pir);
> +    if (!cpu) {
> +        return;
> +    }
> +    cs = CPU(cpu);
>   
> -    if (!dbell_type_server(rb)) {
> +    if (cs->nr_threads == 1 || !brdcast) {
> +        ppc_set_irq(cpu, PPC_INTERRUPT_HDOORBELL, 1);
>           return;
>       }
>   
> -    book3s_msgsnd_common(pir, PPC_INTERRUPT_HDOORBELL);
> +    /* WHy does iothread need to be locked for walking CPU list? */
> +    /* Answer seems to be because ppc irq handling needs it, but it now takes
> +     * the lock itself if needed. Could remove this then.
> +     */
> +    qemu_mutex_lock_iothread();
> +    THREAD_SIBLING_FOREACH(cs, ccs) {
> +        PowerPCCPU *ccpu = POWERPC_CPU(ccs);
> +        if (cpu != ccpu) {
> +            ppc_set_irq(ccpu, PPC_INTERRUPT_HDOORBELL, 1);
> +        }
> +    }
> +    qemu_mutex_unlock_iothread();
>   }
>   
>   #if defined(TARGET_PPC64)
> @@ -3154,22 +3172,42 @@ void helper_book3s_msgclrp(CPUPPCState *env, target_ulong rb)
>   }
>   
>   /*
> - * sends a message to other threads that are on the same
> + * sends a message to another thread  on the same
>    * multi-threaded processor
>    */
>   void helper_book3s_msgsndp(CPUPPCState *env, target_ulong rb)
>   {
> -    int pir = env->spr_cb[SPR_PIR].default_value;
> +    CPUState *cs = env_cpu(env);
> +    PowerPCCPU *cpu = POWERPC_CPU(cs);
> +    CPUState *ccs;
> +    uint32_t nr_threads = cs->nr_threads;
> +    int ttir = rb & PPC_BITMASK(57, 63);
>   
>       helper_hfscr_facility_check(env, HFSCR_MSGP, "msgsndp", HFSCR_IC_MSGP);
>   
> -    if (!dbell_type_server(rb)) {
> +    if (!dbell_type_server(rb) || ttir >= nr_threads) {

may be log bad ttir values ? even if the insn is a no-op in that case,
telling the user would be good since it should be a guest os issue

> +        return;
> +    }
> +
> +    if (nr_threads == 1) {
> +        ppc_set_irq(cpu, PPC_INTERRUPT_DOORBELL, 1);
>           return;
>       }
>   
> -    /* TODO: TCG supports only one thread */
> +    /* WHy does iothread need to be locked for walking CPU list? */
> +    qemu_mutex_lock_iothread();
> +    THREAD_SIBLING_FOREACH(cs, ccs) {
> +        PowerPCCPU *ccpu = POWERPC_CPU(ccs);
> +        uint32_t thread_id = ppc_cpu_tir(ccpu);
> +
> +        if (ttir == thread_id) {
> +            ppc_set_irq(ccpu, PPC_INTERRUPT_DOORBELL, 1);
> +            qemu_mutex_unlock_iothread();
> +            return;
> +        }
> +    }
>   
> -    book3s_msgsnd_common(pir, PPC_INTERRUPT_DOORBELL);
> +    assert(0);
>   }
>   #endif /* TARGET_PPC64 */
>   
> diff --git a/target/ppc/gdbstub.c b/target/ppc/gdbstub.c
> index ca39efdc35..f0304e5bb6 100644
> --- a/target/ppc/gdbstub.c
> +++ b/target/ppc/gdbstub.c
> @@ -117,7 +117,7 @@ void ppc_maybe_bswap_register(CPUPPCState *env, uint8_t *mem_buf, int len)
>    * regs and PC, MSR, CR, and so forth.  We hack round this by giving
>    * the FP regs zero size when talking to a newer gdb.
>    */
> -
> +/* XXX: read/write dpdes correctly */
>   int ppc_cpu_gdb_read_register(CPUState *cs, GByteArray *buf, int n)
>   {
>       PowerPCCPU *cpu = POWERPC_CPU(cs);
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index fda40b8a60..5ce49c7ebc 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -690,7 +690,7 @@ DEF_HELPER_FLAGS_3(store_sr, TCG_CALL_NO_RWG, void, env, tl, tl)
>   
>   DEF_HELPER_1(msgsnd, void, tl)
>   DEF_HELPER_2(msgclr, void, env, tl)
> -DEF_HELPER_1(book3s_msgsnd, void, tl)
> +DEF_HELPER_2(book3s_msgsnd, void, env, tl)
>   DEF_HELPER_2(book3s_msgclr, void, env, tl)
>   #endif
>   
> diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c
> index ffe54a4310..ca84f1b134 100644
> --- a/target/ppc/misc_helper.c
> +++ b/target/ppc/misc_helper.c
> @@ -192,14 +192,38 @@ void helper_store_pcr(CPUPPCState *env, target_ulong value)
>    */
>   target_ulong helper_load_dpdes(CPUPPCState *env)
>   {
> +    CPUState *cs = env_cpu(env);
> +    CPUState *ccs;
> +    uint32_t nr_threads = cs->nr_threads;
> +    uint32_t core_id = env->spr[SPR_PIR] & ~(nr_threads - 1);

you could add an helper for the above.

>       target_ulong dpdes = 0;
>   
> +    assert(core_id == env->spr[SPR_PIR] - env->spr[SPR_TIR]);
> +
>       helper_hfscr_facility_check(env, HFSCR_MSGP, "load DPDES", HFSCR_IC_MSGP);
>   
> -    /* TODO: TCG supports only one thread */
> -    if (env->pending_interrupts & PPC_INTERRUPT_DOORBELL) {
> -        dpdes = 1;
> +    if (nr_threads == 1) {
> +        if (env->pending_interrupts & PPC_INTERRUPT_DOORBELL) {
> +            dpdes = 1;
> +        }
> +        return dpdes;
> +    }
> +
> +    qemu_mutex_lock_iothread();
> +    CPU_FOREACH(ccs) {
> +        CPUPPCState *cenv = &POWERPC_CPU(ccs)->env;
> +        uint32_t ccore_id = cenv->spr[SPR_PIR] & ~(nr_threads - 1);
> +        uint32_t thread_id = cenv->spr[SPR_TIR];
> +
> +        assert(ccore_id == cenv->spr[SPR_PIR] - cenv->spr[SPR_TIR]);
> +
> +        if (ccore_id == core_id) {
> +            if (cenv->pending_interrupts & PPC_INTERRUPT_DOORBELL) {
> +                dpdes |= (0x1 << thread_id);
> +            }
> +        }
>       }
> +    qemu_mutex_unlock_iothread();
>   
>       return dpdes;
>   }
> @@ -207,17 +231,41 @@ target_ulong helper_load_dpdes(CPUPPCState *env)
>   void helper_store_dpdes(CPUPPCState *env, target_ulong val)
>   {
>       PowerPCCPU *cpu = env_archcpu(env);
> +    CPUState *cs = env_cpu(env);
> +    CPUState *ccs;
> +    uint32_t nr_threads = cs->nr_threads;
> +    uint32_t core_id = env->spr[SPR_PIR] & ~(nr_threads - 1);
> +
> +    assert(core_id == env->spr[SPR_PIR] - env->spr[SPR_TIR]);
>   
>       helper_hfscr_facility_check(env, HFSCR_MSGP, "store DPDES", HFSCR_IC_MSGP);
>   
> -    /* TODO: TCG supports only one thread */
> -    if (val & ~0x1) {
> +    if (val & ~(nr_threads - 1)) {
>           qemu_log_mask(LOG_GUEST_ERROR, "Invalid DPDES register value "
>                         TARGET_FMT_lx"\n", val);
> +        val &= ~(nr_threads - 1);
> +        /* Ignore the invalid bits */
> +    }
> +
> +    if (nr_threads == 1) {
> +        /* XXX: don't need iothread lock? */
> +        ppc_set_irq(cpu, PPC_INTERRUPT_DOORBELL, val & 0x1);
>           return;
>       }
>   
> -    ppc_set_irq(cpu, PPC_INTERRUPT_DOORBELL, val & 0x1);
> +    qemu_mutex_lock_iothread();
> +    CPU_FOREACH(ccs) {
> +        CPUPPCState *cenv = &POWERPC_CPU(ccs)->env;
> +        uint32_t ccore_id = cenv->spr[SPR_PIR] & ~(nr_threads - 1);
> +        uint32_t thread_id = cenv->spr[SPR_TIR];
> +
> +        assert(ccore_id == cenv->spr[SPR_PIR] - cenv->spr[SPR_TIR]);
> +
> +        if (ccore_id == core_id) {
> +            ppc_set_irq(cpu, PPC_INTERRUPT_DOORBELL, val & (0x1 << thread_id));
> +        }
> +    }
> +    qemu_mutex_unlock_iothread();
>   }
>   #endif /* defined(TARGET_PPC64) */
>   
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 31821f92f5..0aa49323d3 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -820,11 +820,19 @@ void spr_write_pcr(DisasContext *ctx, int sprn, int gprn)
>   /* DPDES */
>   void spr_read_dpdes(DisasContext *ctx, int gprn, int sprn)
>   {
> +    if (!gen_serialize_core(ctx)) {
> +        return;
> +    }
> +
>       gen_helper_load_dpdes(cpu_gpr[gprn], cpu_env);
>   }
>   
>   void spr_write_dpdes(DisasContext *ctx, int sprn, int gprn)
>   {
> +    if (!gen_serialize_core(ctx)) {
> +        return;
> +    }
> +
>       gen_helper_store_dpdes(cpu_env, cpu_gpr[gprn]);
>   }
>   #endif
> diff --git a/target/ppc/translate/processor-ctrl-impl.c.inc b/target/ppc/translate/processor-ctrl-impl.c.inc
> index cc7a50d579..7dfbcd781f 100644
> --- a/target/ppc/translate/processor-ctrl-impl.c.inc
> +++ b/target/ppc/translate/processor-ctrl-impl.c.inc
> @@ -59,7 +59,7 @@ static bool trans_MSGSND(DisasContext *ctx, arg_X_rb *a)
>   
>   #if !defined(CONFIG_USER_ONLY)
>       if (is_book3s_arch2x(ctx)) {
> -        gen_helper_book3s_msgsnd(cpu_gpr[a->rb]);
> +        gen_helper_book3s_msgsnd(cpu_env, cpu_gpr[a->rb]);
>       } else {
>           gen_helper_msgsnd(cpu_gpr[a->rb]);
>       }



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 5/5] spapr: Allow up to 8 threads SMT configuration
  2023-05-31  1:23 ` [RFC PATCH 5/5] spapr: Allow up to 8 threads SMT configuration Nicholas Piggin
@ 2023-06-01  7:20   ` Cédric Le Goater
  2023-06-02  6:59     ` Nicholas Piggin
  2023-06-05 10:29     ` Nicholas Piggin
  0 siblings, 2 replies; 19+ messages in thread
From: Cédric Le Goater @ 2023-06-01  7:20 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel, Daniel Henrique Barboza

On 5/31/23 03:23, Nicholas Piggin wrote:
> TCG now supports multi-threaded configuration at least enough for
> pseries to be functional enough to boot Linux.
> 
> This requires PIR and TIR be set, because that's how sibling thread
> matching is done.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   hw/ppc/spapr.c          | 4 ++--
>   hw/ppc/spapr_cpu_core.c | 7 +++++--
>   2 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index dcb7f1c70a..11074cefea 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2524,8 +2524,8 @@ static void spapr_set_vsmt_mode(SpaprMachineState *spapr, Error **errp)
>       int ret;
>       unsigned int smp_threads = ms->smp.threads;
>   
> -    if (!kvm_enabled() && (smp_threads > 1)) {
> -        error_setg(errp, "TCG cannot support more than 1 thread/core "
> +    if (!kvm_enabled() && (smp_threads > 8)) {
> +        error_setg(errp, "TCG cannot support more than 8 threads/core "
>                      "on a pseries machine");

I think we should add test on the CPU also.

>           return;
>       }
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index 9b88dd549a..f35ee600f1 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -255,7 +255,7 @@ static void spapr_cpu_core_unrealize(DeviceState *dev)
>   }
>   
>   static bool spapr_realize_vcpu(PowerPCCPU *cpu, SpaprMachineState *spapr,
> -                               SpaprCpuCore *sc, Error **errp)
> +                               SpaprCpuCore *sc, int thread_nr, Error **errp)

thread_index may be ?

>   {
>       CPUPPCState *env = &cpu->env;
>       CPUState *cs = CPU(cpu);
> @@ -267,6 +267,9 @@ static bool spapr_realize_vcpu(PowerPCCPU *cpu, SpaprMachineState *spapr,
>       cpu_ppc_set_vhyp(cpu, PPC_VIRTUAL_HYPERVISOR(spapr));
>       kvmppc_set_papr(cpu);

so, spapr_create_vcpu() set cs->cpu_index :
     cs->cpu_index = cc->core_id + i;

and spapr_realize_vcpu :
    
> +    env->spr_cb[SPR_PIR].default_value = cs->cpu_index;
> +    env->spr_cb[SPR_TIR].default_value = thread_nr;
> +
it would be cleaner to do the SPR assignment in one place.


>       /* Set time-base frequency to 512 MHz. vhyp must be set first. */
>       cpu_ppc_tb_init(env, SPAPR_TIMEBASE_FREQ);
>   
> @@ -337,7 +340,7 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error **errp)
>       for (i = 0; i < cc->nr_threads; i++) {
>           sc->threads[i] = spapr_create_vcpu(sc, i, errp);
>           if (!sc->threads[i] ||
> -            !spapr_realize_vcpu(sc->threads[i], spapr, sc, errp)) {
> +            !spapr_realize_vcpu(sc->threads[i], spapr, sc, i, errp)) {
>               spapr_cpu_core_unrealize(dev);
>               return;
>           }



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/5] target/ppc: initial SMT support in TCG
  2023-05-31  1:23 [RFC PATCH 0/5] target/ppc: initial SMT support in TCG Nicholas Piggin
                   ` (4 preceding siblings ...)
  2023-05-31  1:23 ` [RFC PATCH 5/5] spapr: Allow up to 8 threads SMT configuration Nicholas Piggin
@ 2023-06-01  7:56 ` Cédric Le Goater
  2023-06-02  7:01   ` Nicholas Piggin
  5 siblings, 1 reply; 19+ messages in thread
From: Cédric Le Goater @ 2023-06-01  7:56 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel, Daniel Henrique Barboza

Hello Nick,

On 5/31/23 03:23, Nicholas Piggin wrote:
> Hi,
> 
> I'm posting this now just to get some first thoughts. I wouldn't say
> it's ready but it does actually work with some basic tests including
> pseries booting a Linux distro. I have powernv booting too, it just
> requires some more SPRs converted, nothing fundamentally different so
> for the purpose of this RFC I leave it out.
> 
> A couple of things, I don't know the object model well enough to do
> something nice with topology. Iterating siblings I would have thought
> should be going to parent core then iterating its children CPUs. Should
> that be done with the object model, or is it better to add direct
> pointers in CPUs to core and core to CPUs? It is (semi) important for> performance so maybe that is better than object iterators. If we go that
> way, the PnvCore and SpaprCore have pointers to the SMT threads already,
> should those be abstracted go in the CPUCore?

You should be able to move the thread array into the CPUCore. If you do
that, please check that migration compat is not impacted by the state
change. However, I am not sure you can use the CPUCore model under the
insn modeling. Something to check.

Anyhow, the way you implemented the loop on the siblings is sufficiently
fast for a small numbers of CPU and safe, w.r.t to CPU hotplug. So
I would leave that part for now, if it runs decently with 4*4 vCPUs in
TCG it should be fine.

Thanks,

C.


  
> The other thing is the serialisation of access. It's using the atomic
> single stepping for this which... I guess should be sufficient? Is it
> the best way to do it though? Can a lock be used somehow instead?
> 
> Thanks,
> Nick
> 
> Nicholas Piggin (5):
>    target/ppc: gdbstub init spr gdb_id for all CPUs
>    target/ppc: Add initial flags and helpers for SMT support
>    target/ppc: Add support for SMT CTRL register
>    target/ppc: Add msgsnd/p and DPDES SMT support
>    spapr: Allow up to 8 threads SMT configuration
> 
>   hw/ppc/ppc.c                                  |  6 ++
>   hw/ppc/spapr.c                                |  4 +-
>   hw/ppc/spapr_cpu_core.c                       |  7 +-
>   include/hw/ppc/ppc.h                          |  1 +
>   target/ppc/cpu.h                              | 16 +++-
>   target/ppc/cpu_init.c                         |  5 +
>   target/ppc/excp_helper.c                      | 86 ++++++++++++-----
>   target/ppc/gdbstub.c                          | 32 ++++---
>   target/ppc/helper.h                           |  4 +-
>   target/ppc/misc_helper.c                      | 93 +++++++++++++++++--
>   target/ppc/translate.c                        | 46 ++++++++-
>   .../ppc/translate/processor-ctrl-impl.c.inc   |  2 +-
>   12 files changed, 252 insertions(+), 50 deletions(-)
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/5] target/ppc: Add initial flags and helpers for SMT support
  2023-05-31  7:25   ` Cédric Le Goater
@ 2023-06-02  6:54     ` Nicholas Piggin
  0 siblings, 0 replies; 19+ messages in thread
From: Nicholas Piggin @ 2023-06-02  6:54 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-ppc; +Cc: qemu-devel, Daniel Henrique Barboza

On Wed May 31, 2023 at 5:25 PM AEST, Cédric Le Goater wrote:
> On 5/31/23 03:23, Nicholas Piggin wrote:
> > SMT TCG emulation needs to be able to iterate over siblings in a core,
> > and needs to serialise core access to shared SPRs and state.
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >   target/ppc/cpu.h       |  9 +++++++++
> >   target/ppc/cpu_init.c  |  5 +++++
> >   target/ppc/translate.c | 20 ++++++++++++++++++++
> >   3 files changed, 34 insertions(+)
> > 
> > diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> > index 1f23b81e90..b594408a8d 100644
> > --- a/target/ppc/cpu.h
> > +++ b/target/ppc/cpu.h
> > @@ -672,6 +672,8 @@ enum {
> >       POWERPC_FLAG_TM       = 0x00100000,
> >       /* Has SCV (ISA 3.00)                                                    */
> >       POWERPC_FLAG_SCV      = 0x00200000,
> > +    /* Has >1 thread per core                                                */
> > +    POWERPC_FLAG_SMT      = 0x00400000,
> >   };
> >   
> >   /*
> > @@ -1266,6 +1268,13 @@ struct CPUArchState {
> >       uint64_t pmu_base_time;
> >   };
> >   
> > +#define _CORE_ID(cs)                                            \
> > +    (POWERPC_CPU(cs)->env.spr_cb[SPR_PIR].default_value & ~(cs->nr_threads - 1))
> > +
> > +#define THREAD_SIBLING_FOREACH(cs, cs_sibling)			\
> > +    CPU_FOREACH(cs_sibling)                                     \
> > +        if (_CORE_ID(cs) == _CORE_ID(cs_sibling))
> > +
>
>
> May be introduce these helpers when needed (and if needed).

Yeah that's a good idea, I tried to structure it so you could see
the main components first, but for a real patch it might indeed be
better add them as needed.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 4/5] target/ppc: Add msgsnd/p and DPDES SMT support
  2023-06-01  7:13   ` Cédric Le Goater
@ 2023-06-02  6:56     ` Nicholas Piggin
  0 siblings, 0 replies; 19+ messages in thread
From: Nicholas Piggin @ 2023-06-02  6:56 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-ppc; +Cc: qemu-devel, Daniel Henrique Barboza

On Thu Jun 1, 2023 at 5:13 PM AEST, Cédric Le Goater wrote:
> On 5/31/23 03:23, Nicholas Piggin wrote:
> > Doorbells in SMT need to coordinate msgsnd/msgclr and DPDES access from
> > multiple threads that affect the same state.
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >   hw/ppc/ppc.c                                  |  6 ++
> >   include/hw/ppc/ppc.h                          |  1 +
> >   target/ppc/cpu.h                              |  7 +-
> >   target/ppc/excp_helper.c                      | 86 +++++++++++++------
> >   target/ppc/gdbstub.c                          |  2 +-
> >   target/ppc/helper.h                           |  2 +-
> >   target/ppc/misc_helper.c                      | 60 +++++++++++--
> >   target/ppc/translate.c                        |  8 ++
> >   .../ppc/translate/processor-ctrl-impl.c.inc   |  2 +-
> >   9 files changed, 140 insertions(+), 34 deletions(-)
> > 
> > diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> > index 80b4706db2..e30853413b 100644
> > --- a/hw/ppc/ppc.c
> > +++ b/hw/ppc/ppc.c
> > @@ -1434,6 +1434,12 @@ int ppc_cpu_pir(PowerPCCPU *cpu)
> >       return env->spr_cb[SPR_PIR].default_value;
> >   }
> >   
> > +int ppc_cpu_tir(PowerPCCPU *cpu)
> > +{
> > +    CPUPPCState *env = &cpu->env;
> > +    return env->spr_cb[SPR_PIR].default_value;
>
> PIR or TIR ?

Good catch, I think I "tidied" that up before sending it :\

> > @@ -3154,22 +3172,42 @@ void helper_book3s_msgclrp(CPUPPCState *env, target_ulong rb)
> >   }
> >   
> >   /*
> > - * sends a message to other threads that are on the same
> > + * sends a message to another thread  on the same
> >    * multi-threaded processor
> >    */
> >   void helper_book3s_msgsndp(CPUPPCState *env, target_ulong rb)
> >   {
> > -    int pir = env->spr_cb[SPR_PIR].default_value;
> > +    CPUState *cs = env_cpu(env);
> > +    PowerPCCPU *cpu = POWERPC_CPU(cs);
> > +    CPUState *ccs;
> > +    uint32_t nr_threads = cs->nr_threads;
> > +    int ttir = rb & PPC_BITMASK(57, 63);
> >   
> >       helper_hfscr_facility_check(env, HFSCR_MSGP, "msgsndp", HFSCR_IC_MSGP);
> >   
> > -    if (!dbell_type_server(rb)) {
> > +    if (!dbell_type_server(rb) || ttir >= nr_threads) {
>
> may be log bad ttir values ? even if the insn is a no-op in that case,
> telling the user would be good since it should be a guest os issue

Yeah. We don't seem to do that on a systematic basis in PPC but we
probably should. It's certainly helped me before when I've got
something wrong. Good idea.

> > @@ -192,14 +192,38 @@ void helper_store_pcr(CPUPPCState *env, target_ulong value)
> >    */
> >   target_ulong helper_load_dpdes(CPUPPCState *env)
> >   {
> > +    CPUState *cs = env_cpu(env);
> > +    CPUState *ccs;
> > +    uint32_t nr_threads = cs->nr_threads;
> > +    uint32_t core_id = env->spr[SPR_PIR] & ~(nr_threads - 1);
>
> you could add an helper for the above.

Yes.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 5/5] spapr: Allow up to 8 threads SMT configuration
  2023-06-01  7:20   ` Cédric Le Goater
@ 2023-06-02  6:59     ` Nicholas Piggin
  2023-06-02  7:04       ` Cédric Le Goater
  2023-06-05 10:29     ` Nicholas Piggin
  1 sibling, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2023-06-02  6:59 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-ppc; +Cc: qemu-devel, Daniel Henrique Barboza

On Thu Jun 1, 2023 at 5:20 PM AEST, Cédric Le Goater wrote:
> On 5/31/23 03:23, Nicholas Piggin wrote:
> > TCG now supports multi-threaded configuration at least enough for
> > pseries to be functional enough to boot Linux.
> > 
> > This requires PIR and TIR be set, because that's how sibling thread
> > matching is done.
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >   hw/ppc/spapr.c          | 4 ++--
> >   hw/ppc/spapr_cpu_core.c | 7 +++++--
> >   2 files changed, 7 insertions(+), 4 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index dcb7f1c70a..11074cefea 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -2524,8 +2524,8 @@ static void spapr_set_vsmt_mode(SpaprMachineState *spapr, Error **errp)
> >       int ret;
> >       unsigned int smp_threads = ms->smp.threads;
> >   
> > -    if (!kvm_enabled() && (smp_threads > 1)) {
> > -        error_setg(errp, "TCG cannot support more than 1 thread/core "
> > +    if (!kvm_enabled() && (smp_threads > 8)) {
> > +        error_setg(errp, "TCG cannot support more than 8 threads/core "
> >                      "on a pseries machine");
>
> I think we should add test on the CPU also.

On the CPU type, POWER7 can have 1/2/4, POWER8 can have 1/2/4/8?
POWER9 could also switch PVR between big and small core depending
on whether you select SMT8 I suppose.

> >           return;
> >       }
> > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> > index 9b88dd549a..f35ee600f1 100644
> > --- a/hw/ppc/spapr_cpu_core.c
> > +++ b/hw/ppc/spapr_cpu_core.c
> > @@ -255,7 +255,7 @@ static void spapr_cpu_core_unrealize(DeviceState *dev)
> >   }
> >   
> >   static bool spapr_realize_vcpu(PowerPCCPU *cpu, SpaprMachineState *spapr,
> > -                               SpaprCpuCore *sc, Error **errp)
> > +                               SpaprCpuCore *sc, int thread_nr, Error **errp)
>
> thread_index may be ?

Sure.

> >   {
> >       CPUPPCState *env = &cpu->env;
> >       CPUState *cs = CPU(cpu);
> > @@ -267,6 +267,9 @@ static bool spapr_realize_vcpu(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >       cpu_ppc_set_vhyp(cpu, PPC_VIRTUAL_HYPERVISOR(spapr));
> >       kvmppc_set_papr(cpu);
>
> so, spapr_create_vcpu() set cs->cpu_index :
>      cs->cpu_index = cc->core_id + i;
>
> and spapr_realize_vcpu :
>     
> > +    env->spr_cb[SPR_PIR].default_value = cs->cpu_index;
> > +    env->spr_cb[SPR_TIR].default_value = thread_nr;
> > +
> it would be cleaner to do the SPR assignment in one place.

I'll try that, it sounds good.

Thanks,
Nick



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/5] target/ppc: initial SMT support in TCG
  2023-06-01  7:56 ` [RFC PATCH 0/5] target/ppc: initial SMT support in TCG Cédric Le Goater
@ 2023-06-02  7:01   ` Nicholas Piggin
  2023-06-02  7:21     ` Cédric Le Goater
  0 siblings, 1 reply; 19+ messages in thread
From: Nicholas Piggin @ 2023-06-02  7:01 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-ppc; +Cc: qemu-devel, Daniel Henrique Barboza

On Thu Jun 1, 2023 at 5:56 PM AEST, Cédric Le Goater wrote:
> Hello Nick,
>
> On 5/31/23 03:23, Nicholas Piggin wrote:
> > Hi,
> > 
> > I'm posting this now just to get some first thoughts. I wouldn't say
> > it's ready but it does actually work with some basic tests including
> > pseries booting a Linux distro. I have powernv booting too, it just
> > requires some more SPRs converted, nothing fundamentally different so
> > for the purpose of this RFC I leave it out.
> > 
> > A couple of things, I don't know the object model well enough to do
> > something nice with topology. Iterating siblings I would have thought
> > should be going to parent core then iterating its children CPUs. Should
> > that be done with the object model, or is it better to add direct
> > pointers in CPUs to core and core to CPUs? It is (semi) important for> performance so maybe that is better than object iterators. If we go that
> > way, the PnvCore and SpaprCore have pointers to the SMT threads already,
> > should those be abstracted go in the CPUCore?
>
> You should be able to move the thread array into the CPUCore. If you do
> that, please check that migration compat is not impacted by the state
> change. However, I am not sure you can use the CPUCore model under the
> insn modeling. Something to check.

Okay.

> Anyhow, the way you implemented the loop on the siblings is sufficiently
> fast for a small numbers of CPU and safe, w.r.t to CPU hotplug. So
> I would leave that part for now, if it runs decently with 4*4 vCPUs in
> TCG it should be fine.

Yeah you're right I'm overly paranoid about it but we don't do hundreds
of CPUs in TCG so it should be fine. Maybe I will defer it for now
then and just do the CPU iteration.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 5/5] spapr: Allow up to 8 threads SMT configuration
  2023-06-02  6:59     ` Nicholas Piggin
@ 2023-06-02  7:04       ` Cédric Le Goater
  0 siblings, 0 replies; 19+ messages in thread
From: Cédric Le Goater @ 2023-06-02  7:04 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel, Daniel Henrique Barboza

On 6/2/23 08:59, Nicholas Piggin wrote:
> On Thu Jun 1, 2023 at 5:20 PM AEST, Cédric Le Goater wrote:
>> On 5/31/23 03:23, Nicholas Piggin wrote:
>>> TCG now supports multi-threaded configuration at least enough for
>>> pseries to be functional enough to boot Linux.
>>>
>>> This requires PIR and TIR be set, because that's how sibling thread
>>> matching is done.
>>>
>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>> ---
>>>    hw/ppc/spapr.c          | 4 ++--
>>>    hw/ppc/spapr_cpu_core.c | 7 +++++--
>>>    2 files changed, 7 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index dcb7f1c70a..11074cefea 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -2524,8 +2524,8 @@ static void spapr_set_vsmt_mode(SpaprMachineState *spapr, Error **errp)
>>>        int ret;
>>>        unsigned int smp_threads = ms->smp.threads;
>>>    
>>> -    if (!kvm_enabled() && (smp_threads > 1)) {
>>> -        error_setg(errp, "TCG cannot support more than 1 thread/core "
>>> +    if (!kvm_enabled() && (smp_threads > 8)) {
>>> +        error_setg(errp, "TCG cannot support more than 8 threads/core "
>>>                       "on a pseries machine");
>>
>> I think we should add test on the CPU also.
> 
> On the CPU type, POWER7 can have 1/2/4, POWER8 can have 1/2/4/8?
> POWER9 could also switch PVR between big and small core depending
> on whether you select SMT8 I suppose.

What I meant is to limit the support to the CPUs which will be most likely
used : POWER8-10. I don't think we care much about the others P7, P5+, 970.

Thanks,

C.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/5] target/ppc: initial SMT support in TCG
  2023-06-02  7:01   ` Nicholas Piggin
@ 2023-06-02  7:21     ` Cédric Le Goater
  0 siblings, 0 replies; 19+ messages in thread
From: Cédric Le Goater @ 2023-06-02  7:21 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel, Daniel Henrique Barboza

On 6/2/23 09:01, Nicholas Piggin wrote:
> On Thu Jun 1, 2023 at 5:56 PM AEST, Cédric Le Goater wrote:
>> Hello Nick,
>>
>> On 5/31/23 03:23, Nicholas Piggin wrote:
>>> Hi,
>>>
>>> I'm posting this now just to get some first thoughts. I wouldn't say
>>> it's ready but it does actually work with some basic tests including
>>> pseries booting a Linux distro. I have powernv booting too, it just
>>> requires some more SPRs converted, nothing fundamentally different so
>>> for the purpose of this RFC I leave it out.
>>>
>>> A couple of things, I don't know the object model well enough to do
>>> something nice with topology. Iterating siblings I would have thought
>>> should be going to parent core then iterating its children CPUs. Should
>>> that be done with the object model, or is it better to add direct
>>> pointers in CPUs to core and core to CPUs? It is (semi) important for> performance so maybe that is better than object iterators. If we go that
>>> way, the PnvCore and SpaprCore have pointers to the SMT threads already,
>>> should those be abstracted go in the CPUCore?
>>
>> You should be able to move the thread array into the CPUCore. If you do
>> that, please check that migration compat is not impacted by the state
>> change. However, I am not sure you can use the CPUCore model under the
>> insn modeling. Something to check.
> 
> Okay.
> 
>> Anyhow, the way you implemented the loop on the siblings is sufficiently
>> fast for a small numbers of CPU and safe, w.r.t to CPU hotplug. So
>> I would leave that part for now, if it runs decently with 4*4 vCPUs in
>> TCG it should be fine.
> 
> Yeah you're right I'm overly paranoid about it but we don't do hundreds
> of CPUs in TCG so it should be fine. 

The PowerNV did run with 64 CPUs at some point. Boot was slow bc of
contention in some areas when starting the secondaries. When stabilized,
perf was decent.

I think that a realistic goal for book3s is to support 2 sockets * 2 cores
* 4 threads on PowerNV and 16 vCPUs on pseries, this to exercise the various
ways to IPI on the different HV implementations.

> Maybe I will defer it for now then and just do the CPU iteration.

May be there is some value to store a CPU siblings under the PowerPC
CPU descriptor. IT could be useful for instructions that apply to the
current CPU but for others requiring a PIR value, you will have to
find a starting point in the CPU list, so it won't make much difference
I think.

Thanks,

C.



> 
> Thanks,
> Nick



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 5/5] spapr: Allow up to 8 threads SMT configuration
  2023-06-01  7:20   ` Cédric Le Goater
  2023-06-02  6:59     ` Nicholas Piggin
@ 2023-06-05 10:29     ` Nicholas Piggin
  1 sibling, 0 replies; 19+ messages in thread
From: Nicholas Piggin @ 2023-06-05 10:29 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-ppc; +Cc: qemu-devel, Daniel Henrique Barboza

On Thu Jun 1, 2023 at 5:20 PM AEST, Cédric Le Goater wrote:
> On 5/31/23 03:23, Nicholas Piggin wrote:
> > @@ -267,6 +267,9 @@ static bool spapr_realize_vcpu(PowerPCCPU *cpu, SpaprMachineState *spapr,
> >       cpu_ppc_set_vhyp(cpu, PPC_VIRTUAL_HYPERVISOR(spapr));
> >       kvmppc_set_papr(cpu);
>
> so, spapr_create_vcpu() set cs->cpu_index :
>      cs->cpu_index = cc->core_id + i;
>
> and spapr_realize_vcpu :
>     
> > +    env->spr_cb[SPR_PIR].default_value = cs->cpu_index;
> > +    env->spr_cb[SPR_TIR].default_value = thread_nr;
> > +
> it would be cleaner to do the SPR assignment in one place.

Problem is we can't do it in create because the SPRs have not been
registered yet, and can't move cpu_index to realize because it's needed
earlier.

A nice way to do this might be to have a cpu_index and a thread_index
(or require that it is cpu_index % nr_threads), and use those values as
the default when registering PIR and TIR SPRs. But I haven't quite
looked into it far enough yet. pnv sets PIR in the realize function
already so maybe it's okay this way for now and it can be tidied up.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 1/5] target/ppc: gdbstub init spr gdb_id for all CPUs
  2023-05-31  1:23 ` [RFC PATCH 1/5] target/ppc: gdbstub init spr gdb_id for all CPUs Nicholas Piggin
  2023-05-31  5:49   ` Philippe Mathieu-Daudé
@ 2023-06-23  9:26   ` Cédric Le Goater
  1 sibling, 0 replies; 19+ messages in thread
From: Cédric Le Goater @ 2023-06-23  9:26 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel, Daniel Henrique Barboza

On 5/31/23 03:23, Nicholas Piggin wrote:
> Make sure each CPU gets its state set up for gdb, not just the ones
> before PowerPCCPUClass has had its gdb state set up.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Applied to ppc-next.

Thanks,

C.



> ---
>   target/ppc/gdbstub.c | 30 +++++++++++++++++++-----------
>   1 file changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/target/ppc/gdbstub.c b/target/ppc/gdbstub.c
> index 63c9abe4f1..ca39efdc35 100644
> --- a/target/ppc/gdbstub.c
> +++ b/target/ppc/gdbstub.c
> @@ -327,6 +327,25 @@ void ppc_gdb_gen_spr_xml(PowerPCCPU *cpu)
>       unsigned int num_regs = 0;
>       int i;
>   
> +    for (i = 0; i < ARRAY_SIZE(env->spr_cb); i++) {
> +        ppc_spr_t *spr = &env->spr_cb[i];
> +
> +        if (!spr->name) {
> +            continue;
> +        }
> +
> +        /*
> +         * GDB identifies registers based on the order they are
> +         * presented in the XML. These ids will not match QEMU's
> +         * representation (which follows the PowerISA).
> +         *
> +         * Store the position of the current register description so
> +         * we can make the correspondence later.
> +         */
> +        spr->gdb_id = num_regs;
> +        num_regs++;
> +    }
> +
>       if (pcc->gdb_spr_xml) {
>           return;
>       }
> @@ -348,17 +367,6 @@ void ppc_gdb_gen_spr_xml(PowerPCCPU *cpu)
>   
>           g_string_append_printf(xml, " bitsize=\"%d\"", TARGET_LONG_BITS);
>           g_string_append(xml, " group=\"spr\"/>");
> -
> -        /*
> -         * GDB identifies registers based on the order they are
> -         * presented in the XML. These ids will not match QEMU's
> -         * representation (which follows the PowerISA).
> -         *
> -         * Store the position of the current register description so
> -         * we can make the correspondence later.
> -         */
> -        spr->gdb_id = num_regs;
> -        num_regs++;
>       }
>   
>       g_string_append(xml, "</feature>");



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-06-23  9:27 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-31  1:23 [RFC PATCH 0/5] target/ppc: initial SMT support in TCG Nicholas Piggin
2023-05-31  1:23 ` [RFC PATCH 1/5] target/ppc: gdbstub init spr gdb_id for all CPUs Nicholas Piggin
2023-05-31  5:49   ` Philippe Mathieu-Daudé
2023-06-23  9:26   ` Cédric Le Goater
2023-05-31  1:23 ` [RFC PATCH 2/5] target/ppc: Add initial flags and helpers for SMT support Nicholas Piggin
2023-05-31  7:25   ` Cédric Le Goater
2023-06-02  6:54     ` Nicholas Piggin
2023-05-31  1:23 ` [RFC PATCH 3/5] target/ppc: Add support for SMT CTRL register Nicholas Piggin
2023-05-31  1:23 ` [RFC PATCH 4/5] target/ppc: Add msgsnd/p and DPDES SMT support Nicholas Piggin
2023-06-01  7:13   ` Cédric Le Goater
2023-06-02  6:56     ` Nicholas Piggin
2023-05-31  1:23 ` [RFC PATCH 5/5] spapr: Allow up to 8 threads SMT configuration Nicholas Piggin
2023-06-01  7:20   ` Cédric Le Goater
2023-06-02  6:59     ` Nicholas Piggin
2023-06-02  7:04       ` Cédric Le Goater
2023-06-05 10:29     ` Nicholas Piggin
2023-06-01  7:56 ` [RFC PATCH 0/5] target/ppc: initial SMT support in TCG Cédric Le Goater
2023-06-02  7:01   ` Nicholas Piggin
2023-06-02  7:21     ` Cédric Le Goater

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.