All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] ppc: improve some memory ordering issues
@ 2022-05-19 13:59 Nicholas Piggin
  2022-05-19 13:59 ` [PATCH 1/4] target/ppc: Fix eieio memory ordering semantics Nicholas Piggin
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Nicholas Piggin @ 2022-05-19 13:59 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel, Richard Henderson

Since RFC[*], this fixes a compile issue noticed by Richard,
and has survived some basic stressing with mttcg.

Thanks,
Nick

[*] https://lists.nongnu.org/archive/html/qemu-ppc/2022-05/msg00046.html

Nicholas Piggin (4):
  target/ppc: Fix eieio memory ordering semantics
  tcg/ppc: ST_ST memory ordering is not provided with eieio
  tcg/ppc: Optimize memory ordering generation with lwsync
  target/ppc: Implement lwsync with weaker memory ordering

 target/ppc/cpu.h         |  4 +++-
 target/ppc/cpu_init.c    | 13 +++++++------
 target/ppc/machine.c     |  3 ++-
 target/ppc/translate.c   | 35 +++++++++++++++++++++++++++++++++--
 tcg/ppc/tcg-target.c.inc | 11 ++++++-----
 5 files changed, 51 insertions(+), 15 deletions(-)

-- 
2.35.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] target/ppc: Fix eieio memory ordering semantics
  2022-05-19 13:59 [PATCH 0/4] ppc: improve some memory ordering issues Nicholas Piggin
@ 2022-05-19 13:59 ` Nicholas Piggin
  2022-05-19 15:30   ` Richard Henderson
  2022-05-19 13:59 ` [PATCH 2/4] tcg/ppc: ST_ST memory ordering is not provided with eieio Nicholas Piggin
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Nicholas Piggin @ 2022-05-19 13:59 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel, Richard Henderson

The generated eieio memory ordering semantics do not match the
instruction definition in the architecture. Add a big comment to
explain this strange instruction and correct the memory ordering
behaviour.

Signed-off: Nicholas Piggin <npiggin@gmail.com>
---
 target/ppc/translate.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index fa34f81c30..eb42f7e459 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3513,7 +3513,32 @@ static void gen_stswx(DisasContext *ctx)
 /* eieio */
 static void gen_eieio(DisasContext *ctx)
 {
-    TCGBar bar = TCG_MO_LD_ST;
+    TCGBar bar = TCG_MO_ALL;
+
+    /*
+     * eieio has complex semanitcs. It provides memory ordering between
+     * operations in the set:
+     * - loads from CI memory.
+     * - stores to CI memory.
+     * - stores to WT memory.
+     *
+     * It separately also orders memory for operations in the set:
+     * - stores to cacheble memory.
+     *
+     * It also serializes instructions:
+     * - dcbt and dcbst.
+     *
+     * It separately serializes:
+     * - tlbie and tlbsync.
+     *
+     * And separately serializes:
+     * - slbieg, slbiag, and slbsync.
+     *
+     * The end result is that CI memory ordering requires TCG_MO_ALL
+     * and it is not possible to special-case more relaxed ordering for
+     * cacheable accesses. TCG_BAR_SC is required to provide this
+     * serialization.
+     */
 
     /*
      * POWER9 has a eieio instruction variant using bit 6 as a hint to
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] tcg/ppc: ST_ST memory ordering is not provided with eieio
  2022-05-19 13:59 [PATCH 0/4] ppc: improve some memory ordering issues Nicholas Piggin
  2022-05-19 13:59 ` [PATCH 1/4] target/ppc: Fix eieio memory ordering semantics Nicholas Piggin
@ 2022-05-19 13:59 ` Nicholas Piggin
  2022-05-19 13:59 ` [PATCH 3/4] tcg/ppc: Optimize memory ordering generation with lwsync Nicholas Piggin
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Nicholas Piggin @ 2022-05-19 13:59 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel, Richard Henderson

eieio does not provide ordering between stores to CI memory and stores
to cacheable memory so it can't be used as a general ST_ST barrier.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-of-by: Nicholas Piggin <npiggin@gmail.com>
---
 tcg/ppc/tcg-target.c.inc | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index cfcd121f9c..3ff845d063 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -1836,8 +1836,6 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0)
     a0 &= TCG_MO_ALL;
     if (a0 == TCG_MO_LD_LD) {
         insn = LWSYNC;
-    } else if (a0 == TCG_MO_ST_ST) {
-        insn = EIEIO;
     }
     tcg_out32(s, insn);
 }
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] tcg/ppc: Optimize memory ordering generation with lwsync
  2022-05-19 13:59 [PATCH 0/4] ppc: improve some memory ordering issues Nicholas Piggin
  2022-05-19 13:59 ` [PATCH 1/4] target/ppc: Fix eieio memory ordering semantics Nicholas Piggin
  2022-05-19 13:59 ` [PATCH 2/4] tcg/ppc: ST_ST memory ordering is not provided with eieio Nicholas Piggin
@ 2022-05-19 13:59 ` Nicholas Piggin
  2022-05-19 15:30   ` Richard Henderson
  2022-05-19 13:59 ` [PATCH 4/4] target/ppc: Implement lwsync with weaker memory ordering Nicholas Piggin
  2022-05-23 19:24 ` [PATCH 0/4] ppc: improve some memory ordering issues Daniel Henrique Barboza
  4 siblings, 1 reply; 9+ messages in thread
From: Nicholas Piggin @ 2022-05-19 13:59 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel, Richard Henderson

lwsync orders more than just LD_LD, importantly it matches x86 and
s390 default memory ordering.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 tcg/ppc/tcg-target.c.inc | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 3ff845d063..c0a5bca34f 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -1832,11 +1832,14 @@ static void tcg_out_brcond2 (TCGContext *s, const TCGArg *args,
 
 static void tcg_out_mb(TCGContext *s, TCGArg a0)
 {
-    uint32_t insn = HWSYNC;
-    a0 &= TCG_MO_ALL;
-    if (a0 == TCG_MO_LD_LD) {
+    uint32_t insn;
+
+    if (a0 & TCG_MO_ST_LD) {
+        insn = HWSYNC;
+    } else {
         insn = LWSYNC;
     }
+
     tcg_out32(s, insn);
 }
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] target/ppc: Implement lwsync with weaker memory ordering
  2022-05-19 13:59 [PATCH 0/4] ppc: improve some memory ordering issues Nicholas Piggin
                   ` (2 preceding siblings ...)
  2022-05-19 13:59 ` [PATCH 3/4] tcg/ppc: Optimize memory ordering generation with lwsync Nicholas Piggin
@ 2022-05-19 13:59 ` Nicholas Piggin
  2022-05-19 15:34   ` Richard Henderson
  2022-05-23 19:24 ` [PATCH 0/4] ppc: improve some memory ordering issues Daniel Henrique Barboza
  4 siblings, 1 reply; 9+ messages in thread
From: Nicholas Piggin @ 2022-05-19 13:59 UTC (permalink / raw)
  To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel, Richard Henderson

This allows an x86 host to no-op lwsyncs, and ppc host can use lwsync
rather than sync.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 target/ppc/cpu.h       |  4 +++-
 target/ppc/cpu_init.c  | 13 +++++++------
 target/ppc/machine.c   |  3 ++-
 target/ppc/translate.c |  8 +++++++-
 4 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 48596cfb25..b9b2536394 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -2271,6 +2271,8 @@ enum {
     PPC2_ISA300        = 0x0000000000080000ULL,
     /* POWER ISA 3.1                                                         */
     PPC2_ISA310        = 0x0000000000100000ULL,
+    /*   lwsync instruction                                                  */
+    PPC2_MEM_LWSYNC    = 0x0000000000200000ULL,
 
 #define PPC_TCG_INSNS2 (PPC2_BOOKE206 | PPC2_VSX | PPC2_PRCNTL | PPC2_DBRX | \
                         PPC2_ISA205 | PPC2_VSX207 | PPC2_PERM_ISA206 | \
@@ -2279,7 +2281,7 @@ enum {
                         PPC2_BCTAR_ISA207 | PPC2_LSQ_ISA207 | \
                         PPC2_ALTIVEC_207 | PPC2_ISA207S | PPC2_DFP | \
                         PPC2_FP_CVT_S64 | PPC2_TM | PPC2_PM_ISA206 | \
-                        PPC2_ISA300 | PPC2_ISA310)
+                        PPC2_ISA300 | PPC2_ISA310 | PPC2_MEM_LWSYNC)
 };
 
 /*****************************************************************************/
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 527ad40fcb..0f891afa04 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -5769,7 +5769,7 @@ POWERPC_FAMILY(970)(ObjectClass *oc, void *data)
                        PPC_MEM_TLBIE | PPC_MEM_TLBSYNC |
                        PPC_64B | PPC_ALTIVEC |
                        PPC_SEGMENT_64B | PPC_SLBI;
-    pcc->insns_flags2 = PPC2_FP_CVT_S64;
+    pcc->insns_flags2 = PPC2_FP_CVT_S64 | PPC2_MEM_LWSYNC;
     pcc->msr_mask = (1ull << MSR_SF) |
                     (1ull << MSR_VR) |
                     (1ull << MSR_POW) |
@@ -5846,7 +5846,7 @@ POWERPC_FAMILY(POWER5P)(ObjectClass *oc, void *data)
                        PPC_64B |
                        PPC_POPCNTB |
                        PPC_SEGMENT_64B | PPC_SLBI;
-    pcc->insns_flags2 = PPC2_FP_CVT_S64;
+    pcc->insns_flags2 = PPC2_FP_CVT_S64 | PPC2_MEM_LWSYNC;
     pcc->msr_mask = (1ull << MSR_SF) |
                     (1ull << MSR_VR) |
                     (1ull << MSR_POW) |
@@ -5985,7 +5985,7 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
                         PPC2_PERM_ISA206 | PPC2_DIVE_ISA206 |
                         PPC2_ATOMIC_ISA206 | PPC2_FP_CVT_ISA206 |
                         PPC2_FP_TST_ISA206 | PPC2_FP_CVT_S64 |
-                        PPC2_PM_ISA206;
+                        PPC2_PM_ISA206 | PPC2_MEM_LWSYNC;
     pcc->msr_mask = (1ull << MSR_SF) |
                     (1ull << MSR_VR) |
                     (1ull << MSR_VSX) |
@@ -6159,7 +6159,7 @@ POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
                         PPC2_FP_TST_ISA206 | PPC2_BCTAR_ISA207 |
                         PPC2_LSQ_ISA207 | PPC2_ALTIVEC_207 |
                         PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 |
-                        PPC2_TM | PPC2_PM_ISA206;
+                        PPC2_TM | PPC2_PM_ISA206 | PPC2_MEM_LWSYNC;
     pcc->msr_mask = (1ull << MSR_SF) |
                     (1ull << MSR_HV) |
                     (1ull << MSR_TM) |
@@ -6379,7 +6379,7 @@ POWERPC_FAMILY(POWER9)(ObjectClass *oc, void *data)
                         PPC2_FP_TST_ISA206 | PPC2_BCTAR_ISA207 |
                         PPC2_LSQ_ISA207 | PPC2_ALTIVEC_207 |
                         PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 |
-                        PPC2_TM | PPC2_ISA300 | PPC2_PRCNTL;
+                        PPC2_TM | PPC2_ISA300 | PPC2_PRCNTL | PPC2_MEM_LWSYNC;
     pcc->msr_mask = (1ull << MSR_SF) |
                     (1ull << MSR_HV) |
                     (1ull << MSR_TM) |
@@ -6596,7 +6596,8 @@ POWERPC_FAMILY(POWER10)(ObjectClass *oc, void *data)
                         PPC2_FP_TST_ISA206 | PPC2_BCTAR_ISA207 |
                         PPC2_LSQ_ISA207 | PPC2_ALTIVEC_207 |
                         PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 |
-                        PPC2_TM | PPC2_ISA300 | PPC2_PRCNTL | PPC2_ISA310;
+                        PPC2_TM | PPC2_ISA300 | PPC2_PRCNTL | PPC2_ISA310 |
+                        PPC2_MEM_LWSYNC;
     pcc->msr_mask = (1ull << MSR_SF) |
                     (1ull << MSR_HV) |
                     (1ull << MSR_TM) |
diff --git a/target/ppc/machine.c b/target/ppc/machine.c
index 7104a5c67e..a7d9036c09 100644
--- a/target/ppc/machine.c
+++ b/target/ppc/machine.c
@@ -157,7 +157,8 @@ static int cpu_pre_save(void *opaque)
         | PPC2_ATOMIC_ISA206 | PPC2_FP_CVT_ISA206
         | PPC2_FP_TST_ISA206 | PPC2_BCTAR_ISA207
         | PPC2_LSQ_ISA207 | PPC2_ALTIVEC_207
-        | PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 | PPC2_TM;
+        | PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 | PPC2_TM
+        | PPC2_MEM_LWSYNC;
 
     env->spr[SPR_LR] = env->lr;
     env->spr[SPR_CTR] = env->ctr;
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index eb42f7e459..1d6daa4608 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -4041,8 +4041,13 @@ static void gen_stqcx_(DisasContext *ctx)
 /* sync */
 static void gen_sync(DisasContext *ctx)
 {
+    TCGBar bar = TCG_MO_ALL;
     uint32_t l = (ctx->opcode >> 21) & 3;
 
+    if ((l == 1) && (ctx->insns_flags2 & PPC2_MEM_LWSYNC)) {
+        bar = TCG_MO_LD_LD | TCG_MO_LD_ST | TCG_MO_ST_ST;
+    }
+
     /*
      * We may need to check for a pending TLB flush.
      *
@@ -4054,7 +4059,8 @@ static void gen_sync(DisasContext *ctx)
     if (((l == 2) || !(ctx->insns_flags & PPC_64B)) && !ctx->pr) {
         gen_check_tlb_flush(ctx, true);
     }
-    tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
+
+    tcg_gen_mb(bar | TCG_BAR_SC);
 }
 
 /* wait */
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] target/ppc: Fix eieio memory ordering semantics
  2022-05-19 13:59 ` [PATCH 1/4] target/ppc: Fix eieio memory ordering semantics Nicholas Piggin
@ 2022-05-19 15:30   ` Richard Henderson
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2022-05-19 15:30 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel

On 5/19/22 06:59, Nicholas Piggin wrote:
> The generated eieio memory ordering semantics do not match the
> instruction definition in the architecture. Add a big comment to
> explain this strange instruction and correct the memory ordering
> behaviour.
> 
> Signed-off: Nicholas Piggin<npiggin@gmail.com>
> ---
>   target/ppc/translate.c | 27 ++++++++++++++++++++++++++-
>   1 file changed, 26 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/4] tcg/ppc: Optimize memory ordering generation with lwsync
  2022-05-19 13:59 ` [PATCH 3/4] tcg/ppc: Optimize memory ordering generation with lwsync Nicholas Piggin
@ 2022-05-19 15:30   ` Richard Henderson
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2022-05-19 15:30 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel

On 5/19/22 06:59, Nicholas Piggin wrote:
> lwsync orders more than just LD_LD, importantly it matches x86 and
> s390 default memory ordering.
> 
> Signed-off-by: Nicholas Piggin<npiggin@gmail.com>
> ---
>   tcg/ppc/tcg-target.c.inc | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/4] target/ppc: Implement lwsync with weaker memory ordering
  2022-05-19 13:59 ` [PATCH 4/4] target/ppc: Implement lwsync with weaker memory ordering Nicholas Piggin
@ 2022-05-19 15:34   ` Richard Henderson
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2022-05-19 15:34 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel

On 5/19/22 06:59, Nicholas Piggin wrote:
> This allows an x86 host to no-op lwsyncs, and ppc host can use lwsync
> rather than sync.
> 
> Signed-off-by: Nicholas Piggin<npiggin@gmail.com>
> ---
>   target/ppc/cpu.h       |  4 +++-
>   target/ppc/cpu_init.c  | 13 +++++++------
>   target/ppc/machine.c   |  3 ++-
>   target/ppc/translate.c |  8 +++++++-
>   4 files changed, 19 insertions(+), 9 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

to the translate part, and I'll trust you on the set of cpus adjusted.


r~


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/4] ppc: improve some memory ordering issues
  2022-05-19 13:59 [PATCH 0/4] ppc: improve some memory ordering issues Nicholas Piggin
                   ` (3 preceding siblings ...)
  2022-05-19 13:59 ` [PATCH 4/4] target/ppc: Implement lwsync with weaker memory ordering Nicholas Piggin
@ 2022-05-23 19:24 ` Daniel Henrique Barboza
  4 siblings, 0 replies; 9+ messages in thread
From: Daniel Henrique Barboza @ 2022-05-23 19:24 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel, Richard Henderson

Queued in gitlab.com/danielhb/qemu/tree/ppc-next. Thanks,


Daniel

On 5/19/22 10:59, Nicholas Piggin wrote:
> Since RFC[*], this fixes a compile issue noticed by Richard,
> and has survived some basic stressing with mttcg.
> 
> Thanks,
> Nick
> 
> [*] https://lists.nongnu.org/archive/html/qemu-ppc/2022-05/msg00046.html
> 
> Nicholas Piggin (4):
>    target/ppc: Fix eieio memory ordering semantics
>    tcg/ppc: ST_ST memory ordering is not provided with eieio
>    tcg/ppc: Optimize memory ordering generation with lwsync
>    target/ppc: Implement lwsync with weaker memory ordering
> 
>   target/ppc/cpu.h         |  4 +++-
>   target/ppc/cpu_init.c    | 13 +++++++------
>   target/ppc/machine.c     |  3 ++-
>   target/ppc/translate.c   | 35 +++++++++++++++++++++++++++++++++--
>   tcg/ppc/tcg-target.c.inc | 11 ++++++-----
>   5 files changed, 51 insertions(+), 15 deletions(-)
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-05-23 19:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-19 13:59 [PATCH 0/4] ppc: improve some memory ordering issues Nicholas Piggin
2022-05-19 13:59 ` [PATCH 1/4] target/ppc: Fix eieio memory ordering semantics Nicholas Piggin
2022-05-19 15:30   ` Richard Henderson
2022-05-19 13:59 ` [PATCH 2/4] tcg/ppc: ST_ST memory ordering is not provided with eieio Nicholas Piggin
2022-05-19 13:59 ` [PATCH 3/4] tcg/ppc: Optimize memory ordering generation with lwsync Nicholas Piggin
2022-05-19 15:30   ` Richard Henderson
2022-05-19 13:59 ` [PATCH 4/4] target/ppc: Implement lwsync with weaker memory ordering Nicholas Piggin
2022-05-19 15:34   ` Richard Henderson
2022-05-23 19:24 ` [PATCH 0/4] ppc: improve some memory ordering issues Daniel Henrique Barboza

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.