All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 1/4] target/ppc: Fix eieio memory ordering semantics
@ 2022-05-03 10:33 Nicholas Piggin
  2022-05-03 10:33 ` [RFC PATCH 2/4] tcg/ppc: ST_ST memory ordering is not provided with eieio Nicholas Piggin
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Nicholas Piggin @ 2022-05-03 10:33 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel

The generated eieio memory ordering semantics do not match the
instruction definition in the architecture. Add a big comment to
explain this strange instruction and correct the memory ordering
behaviour.

Signed-off: Nicholas Piggin <npiggin@gmail.com>
---
 target/ppc/translate.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index fa34f81c30..abb8807180 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3513,7 +3513,31 @@ static void gen_stswx(DisasContext *ctx)
 /* eieio */
 static void gen_eieio(DisasContext *ctx)
 {
-    TCGBar bar = TCG_MO_LD_ST;
+    TCGBar bar = TCG_MO_ALL;
+
+    /*
+     * eieio has complex semanitcs. It provides memory ordering between
+     * operations in the set:
+     * - loads from CI memory.
+     * - stores to CI memory.
+     * - stores to WT memory.
+     *
+     * It separately also orders memory for operations in the set:
+     * - stores to cacheble memory.
+     *
+     * It also serializes instructions:
+     * - dcbt and dcbst.
+     *
+     * It separately serializes:
+     * - tlbie and tlbsync.
+     *
+     * And separately serializes:
+     * - slbieg, slbiag, and slbsync.
+     *
+     * The end result is that CI memory ordering requires TCG_MO_ALL
+     * and it is not possible to special-case more relaxed ordering for
+     * cacheable accesses. TCG_BAR_SC is required to provide the serialization.
+     */
 
     /*
      * POWER9 has a eieio instruction variant using bit 6 as a hint to
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH 2/4] tcg/ppc: ST_ST memory ordering is not provided with eieio
  2022-05-03 10:33 [RFC PATCH 1/4] target/ppc: Fix eieio memory ordering semantics Nicholas Piggin
@ 2022-05-03 10:33 ` Nicholas Piggin
  2022-05-03 15:01   ` Richard Henderson
  2022-05-03 10:33 ` [RFC PATCH 3/4] tcg/ppc: Optimize memory ordering generation with lwsync Nicholas Piggin
  2022-05-03 10:33 ` [RFC PATCH 4/4] target/ppc: Implement lwsync with weaker memory ordering Nicholas Piggin
  2 siblings, 1 reply; 6+ messages in thread
From: Nicholas Piggin @ 2022-05-03 10:33 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel

eieio does not provide ordering between stores to CI memory and stores
to cacheable memory so it can't be used as a general ST_ST barrier.

Signed-of-by: Nicholas Piggin <npiggin@gmail.com>
---
 tcg/ppc/tcg-target.c.inc | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index cfcd121f9c..3ff845d063 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -1836,8 +1836,6 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0)
     a0 &= TCG_MO_ALL;
     if (a0 == TCG_MO_LD_LD) {
         insn = LWSYNC;
-    } else if (a0 == TCG_MO_ST_ST) {
-        insn = EIEIO;
     }
     tcg_out32(s, insn);
 }
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH 3/4] tcg/ppc: Optimize memory ordering generation with lwsync
  2022-05-03 10:33 [RFC PATCH 1/4] target/ppc: Fix eieio memory ordering semantics Nicholas Piggin
  2022-05-03 10:33 ` [RFC PATCH 2/4] tcg/ppc: ST_ST memory ordering is not provided with eieio Nicholas Piggin
@ 2022-05-03 10:33 ` Nicholas Piggin
  2022-05-03 14:53   ` Richard Henderson
  2022-05-03 10:33 ` [RFC PATCH 4/4] target/ppc: Implement lwsync with weaker memory ordering Nicholas Piggin
  2 siblings, 1 reply; 6+ messages in thread
From: Nicholas Piggin @ 2022-05-03 10:33 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel

lwsync orders more than just LD_LD, importantly it matches x86 and
s390 default memory ordering.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 target/ppc/cpu.h         | 2 ++
 tcg/ppc/tcg-target.c.inc | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index c2b6c987c0..0b0e9761cd 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -28,6 +28,8 @@
 
 #define TCG_GUEST_DEFAULT_MO 0
 
+#define PPC_LWSYNC_MO (TCG_MO_LD_LD | TCG_MO_LD_ST | TCG_MO_ST_ST)
+
 #define TARGET_PAGE_BITS_64K 16
 #define TARGET_PAGE_BITS_16M 24
 
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 3ff845d063..b87fc2383e 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -1834,7 +1834,7 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0)
 {
     uint32_t insn = HWSYNC;
     a0 &= TCG_MO_ALL;
-    if (a0 == TCG_MO_LD_LD) {
+    if ((a0 & PPC_LWSYNC_MO) == a0) {
         insn = LWSYNC;
     }
     tcg_out32(s, insn);
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH 4/4] target/ppc: Implement lwsync with weaker memory ordering
  2022-05-03 10:33 [RFC PATCH 1/4] target/ppc: Fix eieio memory ordering semantics Nicholas Piggin
  2022-05-03 10:33 ` [RFC PATCH 2/4] tcg/ppc: ST_ST memory ordering is not provided with eieio Nicholas Piggin
  2022-05-03 10:33 ` [RFC PATCH 3/4] tcg/ppc: Optimize memory ordering generation with lwsync Nicholas Piggin
@ 2022-05-03 10:33 ` Nicholas Piggin
  2 siblings, 0 replies; 6+ messages in thread
From: Nicholas Piggin @ 2022-05-03 10:33 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel

This allows an x86 host to no-op lwsyncs, and ppc host can use lwsync
rather than sync.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 target/ppc/cpu.h       |  4 +++-
 target/ppc/cpu_init.c  | 13 +++++++------
 target/ppc/machine.c   |  3 ++-
 target/ppc/translate.c |  8 +++++++-
 4 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 0b0e9761cd..bf5f226567 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -2287,6 +2287,8 @@ enum {
     PPC2_ISA300        = 0x0000000000080000ULL,
     /* POWER ISA 3.1                                                         */
     PPC2_ISA310        = 0x0000000000100000ULL,
+    /*   lwsync instruction                                                  */
+    PPC2_MEM_LWSYNC    = 0x0000000000200000ULL,
 
 #define PPC_TCG_INSNS2 (PPC2_BOOKE206 | PPC2_VSX | PPC2_PRCNTL | PPC2_DBRX | \
                         PPC2_ISA205 | PPC2_VSX207 | PPC2_PERM_ISA206 | \
@@ -2295,7 +2297,7 @@ enum {
                         PPC2_BCTAR_ISA207 | PPC2_LSQ_ISA207 | \
                         PPC2_ALTIVEC_207 | PPC2_ISA207S | PPC2_DFP | \
                         PPC2_FP_CVT_S64 | PPC2_TM | PPC2_PM_ISA206 | \
-                        PPC2_ISA300 | PPC2_ISA310)
+                        PPC2_ISA300 | PPC2_ISA310 | PPC2_MEM_LWSYNC)
 };
 
 /*****************************************************************************/
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index d42e2ba8e0..26d9277ffb 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -5769,7 +5769,7 @@ POWERPC_FAMILY(970)(ObjectClass *oc, void *data)
                        PPC_MEM_TLBIE | PPC_MEM_TLBSYNC |
                        PPC_64B | PPC_ALTIVEC |
                        PPC_SEGMENT_64B | PPC_SLBI;
-    pcc->insns_flags2 = PPC2_FP_CVT_S64;
+    pcc->insns_flags2 = PPC2_FP_CVT_S64 | PPC2_MEM_LWSYNC;
     pcc->msr_mask = (1ull << MSR_SF) |
                     (1ull << MSR_VR) |
                     (1ull << MSR_POW) |
@@ -5846,7 +5846,7 @@ POWERPC_FAMILY(POWER5P)(ObjectClass *oc, void *data)
                        PPC_64B |
                        PPC_POPCNTB |
                        PPC_SEGMENT_64B | PPC_SLBI;
-    pcc->insns_flags2 = PPC2_FP_CVT_S64;
+    pcc->insns_flags2 = PPC2_FP_CVT_S64 | PPC2_MEM_LWSYNC;
     pcc->msr_mask = (1ull << MSR_SF) |
                     (1ull << MSR_VR) |
                     (1ull << MSR_POW) |
@@ -5984,7 +5984,7 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
                         PPC2_PERM_ISA206 | PPC2_DIVE_ISA206 |
                         PPC2_ATOMIC_ISA206 | PPC2_FP_CVT_ISA206 |
                         PPC2_FP_TST_ISA206 | PPC2_FP_CVT_S64 |
-                        PPC2_PM_ISA206;
+                        PPC2_PM_ISA206 | PPC2_MEM_LWSYNC;
     pcc->msr_mask = (1ull << MSR_SF) |
                     (1ull << MSR_VR) |
                     (1ull << MSR_VSX) |
@@ -6157,7 +6157,7 @@ POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
                         PPC2_FP_TST_ISA206 | PPC2_BCTAR_ISA207 |
                         PPC2_LSQ_ISA207 | PPC2_ALTIVEC_207 |
                         PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 |
-                        PPC2_TM | PPC2_PM_ISA206;
+                        PPC2_TM | PPC2_PM_ISA206 | PPC2_MEM_LWSYNC;
     pcc->msr_mask = (1ull << MSR_SF) |
                     (1ull << MSR_HV) |
                     (1ull << MSR_TM) |
@@ -6375,7 +6375,7 @@ POWERPC_FAMILY(POWER9)(ObjectClass *oc, void *data)
                         PPC2_FP_TST_ISA206 | PPC2_BCTAR_ISA207 |
                         PPC2_LSQ_ISA207 | PPC2_ALTIVEC_207 |
                         PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 |
-                        PPC2_TM | PPC2_ISA300 | PPC2_PRCNTL;
+                        PPC2_TM | PPC2_ISA300 | PPC2_PRCNTL | PPC2_MEM_LWSYNC;
     pcc->msr_mask = (1ull << MSR_SF) |
                     (1ull << MSR_HV) |
                     (1ull << MSR_TM) |
@@ -6590,7 +6590,8 @@ POWERPC_FAMILY(POWER10)(ObjectClass *oc, void *data)
                         PPC2_FP_TST_ISA206 | PPC2_BCTAR_ISA207 |
                         PPC2_LSQ_ISA207 | PPC2_ALTIVEC_207 |
                         PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 |
-                        PPC2_TM | PPC2_ISA300 | PPC2_PRCNTL | PPC2_ISA310;
+                        PPC2_TM | PPC2_ISA300 | PPC2_PRCNTL | PPC2_ISA310 |
+                        PPC2_MEM_LWSYNC;
     pcc->msr_mask = (1ull << MSR_SF) |
                     (1ull << MSR_HV) |
                     (1ull << MSR_TM) |
diff --git a/target/ppc/machine.c b/target/ppc/machine.c
index e673944597..33b3d6cf30 100644
--- a/target/ppc/machine.c
+++ b/target/ppc/machine.c
@@ -157,7 +157,8 @@ static int cpu_pre_save(void *opaque)
         | PPC2_ATOMIC_ISA206 | PPC2_FP_CVT_ISA206
         | PPC2_FP_TST_ISA206 | PPC2_BCTAR_ISA207
         | PPC2_LSQ_ISA207 | PPC2_ALTIVEC_207
-        | PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 | PPC2_TM;
+        | PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 | PPC2_TM
+        | PPC2_MEM_LWSYNC;
 
     env->spr[SPR_LR] = env->lr;
     env->spr[SPR_CTR] = env->ctr;
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index abb8807180..76691cf082 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -4040,8 +4040,13 @@ static void gen_stqcx_(DisasContext *ctx)
 /* sync */
 static void gen_sync(DisasContext *ctx)
 {
+    TCGBar bar = TCG_MO_ALL;
     uint32_t l = (ctx->opcode >> 21) & 3;
 
+    if ((l == 1) && (ctx->insns_flags2 & PPC2_MEM_LWSYNC)) {
+        bar = PPC_LWSYNC_MO;
+    }
+
     /*
      * We may need to check for a pending TLB flush.
      *
@@ -4053,7 +4058,8 @@ static void gen_sync(DisasContext *ctx)
     if (((l == 2) || !(ctx->insns_flags & PPC_64B)) && !ctx->pr) {
         gen_check_tlb_flush(ctx, true);
     }
-    tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
+
+    tcg_gen_mb(bar | TCG_BAR_SC);
 }
 
 /* wait */
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 3/4] tcg/ppc: Optimize memory ordering generation with lwsync
  2022-05-03 10:33 ` [RFC PATCH 3/4] tcg/ppc: Optimize memory ordering generation with lwsync Nicholas Piggin
@ 2022-05-03 14:53   ` Richard Henderson
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2022-05-03 14:53 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel

On 5/3/22 03:33, Nicholas Piggin wrote:
> lwsync orders more than just LD_LD, importantly it matches x86 and
> s390 default memory ordering.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   target/ppc/cpu.h         | 2 ++
>   tcg/ppc/tcg-target.c.inc | 2 +-
>   2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index c2b6c987c0..0b0e9761cd 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -28,6 +28,8 @@
>   
>   #define TCG_GUEST_DEFAULT_MO 0
>   
> +#define PPC_LWSYNC_MO (TCG_MO_LD_LD | TCG_MO_LD_ST | TCG_MO_ST_ST)

You can't put this here...


> +
>   #define TARGET_PAGE_BITS_64K 16
>   #define TARGET_PAGE_BITS_16M 24
>   
> diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
> index 3ff845d063..b87fc2383e 100644
> --- a/tcg/ppc/tcg-target.c.inc
> +++ b/tcg/ppc/tcg-target.c.inc
> @@ -1834,7 +1834,7 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0)
>   {
>       uint32_t insn = HWSYNC;
>       a0 &= TCG_MO_ALL;
> -    if (a0 == TCG_MO_LD_LD) {
> +    if ((a0 & PPC_LWSYNC_MO) == a0) {

... and have it used here.  You should have seen compilation failures for the missing 
symbol.  I can only assume you used a restricted --target-list in testing.

Anyway, it looks like a simpler test would be

     insn = (a0 & TCG_MO_ST_LD ? HWSYNC : LWSYNC);


r~


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 2/4] tcg/ppc: ST_ST memory ordering is not provided with eieio
  2022-05-03 10:33 ` [RFC PATCH 2/4] tcg/ppc: ST_ST memory ordering is not provided with eieio Nicholas Piggin
@ 2022-05-03 15:01   ` Richard Henderson
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2022-05-03 15:01 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel

On 5/3/22 03:33, Nicholas Piggin wrote:
> eieio does not provide ordering between stores to CI memory and stores
> to cacheable memory so it can't be used as a general ST_ST barrier.
> 
> Signed-of-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   tcg/ppc/tcg-target.c.inc | 2 --
>   1 file changed, 2 deletions(-)
> 
> diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
> index cfcd121f9c..3ff845d063 100644
> --- a/tcg/ppc/tcg-target.c.inc
> +++ b/tcg/ppc/tcg-target.c.inc
> @@ -1836,8 +1836,6 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0)
>       a0 &= TCG_MO_ALL;
>       if (a0 == TCG_MO_LD_LD) {
>           insn = LWSYNC;
> -    } else if (a0 == TCG_MO_ST_ST) {
> -        insn = EIEIO;
>       }
>       tcg_out32(s, insn);
>   }

Certainly matches the comment from patch 1.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-05-03 15:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-03 10:33 [RFC PATCH 1/4] target/ppc: Fix eieio memory ordering semantics Nicholas Piggin
2022-05-03 10:33 ` [RFC PATCH 2/4] tcg/ppc: ST_ST memory ordering is not provided with eieio Nicholas Piggin
2022-05-03 15:01   ` Richard Henderson
2022-05-03 10:33 ` [RFC PATCH 3/4] tcg/ppc: Optimize memory ordering generation with lwsync Nicholas Piggin
2022-05-03 14:53   ` Richard Henderson
2022-05-03 10:33 ` [RFC PATCH 4/4] target/ppc: Implement lwsync with weaker memory ordering Nicholas Piggin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.