All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations
@ 2018-06-26 16:19 Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook Richard Henderson
                   ` (13 more replies)
  0 siblings, 14 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

In another patch set this week, I had noticed the old linux-user
do_store_exclusive code was still present.  I had thought that was
dead code that simply hadn't been removed, but it turned out that
we had not completed the transition to tcg atomics for linux-user.

In the process, I discovered that we weren't using atomic operations
for the 128-bit lq, lqarx, and stqcx insns.  These would have simply
produced incorrect results for -smp in system mode.

I tidy the code a bit by making use of MO_ALIGN, which means that
we don't need a separate explicit alignment check.

I use the new min/max atomic operations I added recently for
ARMv8.2-Atomics and RISC-V.

Finally, Power9 has some *really* odd atomic operations in its
l[wd]at and st[wd]at instructions.  We were generating illegal
instruction for these.  I implement them for serial context and
force parallel context to grab the exclusive lock and try again.

Except for the trivial linux-user ll/sc case, I do not have any
code that exercises these instructions.  Perhaps the IBM folk
have something that can test the others?


r~


Richard Henderson (13):
  target/ppc: Add do_unaligned_access hook
  target/ppc: Use atomic load for LQ and LQARX
  target/ppc: Use atomic store for STQ
  target/ppc: Use atomic cmpxchg for STQCX
  target/ppc: Remove POWERPC_EXCP_STCX
  target/ppc: Tidy gen_conditional_store
  target/ppc: Split out gen_load_locked
  target/ppc: Split out gen_ld_atomic
  target/ppc: Split out gen_st_atomic
  target/ppc: Use MO_ALIGN for EXIWX and ECOWX
  target/ppc: Use atomic min/max helpers
  target/ppc: Implement the rest of gen_ld_atomic
  target/ppc: Implement the rest of gen_st_atomic

 target/ppc/cpu.h                |   8 +-
 target/ppc/helper.h             |  11 +
 target/ppc/internal.h           |   5 +
 linux-user/ppc/cpu_loop.c       | 123 ++----
 target/ppc/excp_helper.c        |  18 +-
 target/ppc/mem_helper.c         |  72 +++-
 target/ppc/translate.c          | 648 ++++++++++++++++++++------------
 target/ppc/translate_init.inc.c |   1 +
 8 files changed, 539 insertions(+), 347 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-27  9:09   ` David Gibson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX Richard Henderson
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

This allows faults from MO_ALIGN to have the same effect
as from gen_check_align.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/internal.h           |  5 +++++
 target/ppc/excp_helper.c        | 18 +++++++++++++++++-
 target/ppc/translate_init.inc.c |  1 +
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index 1f441c6483..a9bcadff42 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -252,4 +252,9 @@ static inline void putVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env)
 void helper_compute_fprf_float16(CPUPPCState *env, float16 arg);
 void helper_compute_fprf_float32(CPUPPCState *env, float32 arg);
 void helper_compute_fprf_float128(CPUPPCState *env, float128 arg);
+
+/* Raise a data fault alignment exception for the specified virtual address */
+void ppc_cpu_do_unaligned_access(CPUState *cs, vaddr addr,
+                                 MMUAccessType access_type,
+                                 int mmu_idx, uintptr_t retaddr);
 #endif /* PPC_INTERNAL_H */
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index c092fbead0..d6e97a90e0 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -22,7 +22,7 @@
 #include "exec/helper-proto.h"
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
-
+#include "internal.h"
 #include "helper_regs.h"
 
 //#define DEBUG_OP
@@ -1198,3 +1198,19 @@ void helper_book3s_msgsnd(target_ulong rb)
     qemu_mutex_unlock_iothread();
 }
 #endif
+
+void ppc_cpu_do_unaligned_access(CPUState *cs, vaddr vaddr,
+                                 MMUAccessType access_type,
+                                 int mmu_idx, uintptr_t retaddr)
+{
+    CPUPPCState *env = cs->env_ptr;
+    uint32_t insn;
+
+    /* Restore state and reload the insn we executed, for filling in DSISR.  */
+    cpu_restore_state(cs, retaddr, true);
+    insn = cpu_ldl_code(env, env->nip);
+
+    cs->exception_index = POWERPC_EXCP_ALIGN;
+    env->error_code = insn & 0x03FF0000;
+    cpu_loop_exit(cs);
+}
diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index 76d6f3fd5e..7813b1b004 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -10457,6 +10457,7 @@ static void ppc_cpu_class_init(ObjectClass *oc, void *data)
     cc->set_pc = ppc_cpu_set_pc;
     cc->gdb_read_register = ppc_cpu_gdb_read_register;
     cc->gdb_write_register = ppc_cpu_gdb_write_register;
+    cc->do_unaligned_access = ppc_cpu_do_unaligned_access;
 #ifdef CONFIG_USER_ONLY
     cc->handle_mmu_fault = ppc_cpu_handle_mmu_fault;
 #else
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-28  3:49   ` David Gibson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ Richard Henderson
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

Section 1.4 of the Power ISA v3.0B states that both of these
instructions are single-copy atomic.  As we cannot (yet) issue
128-bit loads within TCG, use the generic helpers provided.

Since TCG cannot (yet) return a 128-bit value, add a slot within
CPUPPCState for returning the high half of a 128-bit return value.
This solution is preferred to the helper assigning to architectural
registers directly, as it avoids clobbering all TCG live values.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/cpu.h        |  3 ++
 target/ppc/helper.h     |  5 +++
 target/ppc/mem_helper.c | 20 ++++++++-
 target/ppc/translate.c  | 93 ++++++++++++++++++++++++++++++-----------
 4 files changed, 95 insertions(+), 26 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index c7f3fb6b73..973cf44cda 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1015,6 +1015,9 @@ struct CPUPPCState {
     /* Next instruction pointer */
     target_ulong nip;
 
+    /* High part of 128-bit helper return.  */
+    uint64_t retxh;
+
     int access_type; /* when a memory exception occurs, the access
                         type is stored here */
 
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index d751f0e219..3f451a5d7e 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -799,3 +799,8 @@ DEF_HELPER_4(dscliq, void, env, fprp, fprp, i32)
 
 DEF_HELPER_1(tbegin, void, env)
 DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
+
+#if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
+DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
+DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
+#endif
diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index a34e604db3..44a8f3445a 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -21,9 +21,9 @@
 #include "exec/exec-all.h"
 #include "qemu/host-utils.h"
 #include "exec/helper-proto.h"
-
 #include "helper_regs.h"
 #include "exec/cpu_ldst.h"
+#include "tcg.h"
 #include "internal.h"
 
 //#define DEBUG_OP
@@ -215,6 +215,24 @@ target_ulong helper_lscbx(CPUPPCState *env, target_ulong addr, uint32_t reg,
     return i;
 }
 
+#if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
+uint64_t helper_lq_le_parallel(CPUPPCState *env, target_ulong addr,
+                               uint32_t opidx)
+{
+    Int128 ret = helper_atomic_ldo_le_mmu(env, addr, opidx, GETPC());
+    env->retxh = int128_gethi(ret);
+    return int128_getlo(ret);
+}
+
+uint64_t helper_lq_be_parallel(CPUPPCState *env, target_ulong addr,
+                               uint32_t opidx)
+{
+    Int128 ret = helper_atomic_ldo_be_mmu(env, addr, opidx, GETPC());
+    env->retxh = int128_gethi(ret);
+    return int128_getlo(ret);
+}
+#endif
+
 /*****************************************************************************/
 /* Altivec extension helpers */
 #if defined(HOST_WORDS_BIGENDIAN)
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 3a215a1dc6..0923cc24e3 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -2607,7 +2607,7 @@ static void gen_ld(DisasContext *ctx)
 static void gen_lq(DisasContext *ctx)
 {
     int ra, rd;
-    TCGv EA;
+    TCGv EA, hi, lo;
 
     /* lq is a legal user mode instruction starting in ISA 2.07 */
     bool legal_in_user_mode = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
@@ -2633,16 +2633,35 @@ static void gen_lq(DisasContext *ctx)
     EA = tcg_temp_new();
     gen_addr_imm_index(ctx, EA, 0x0F);
 
-    /* We only need to swap high and low halves. gen_qemu_ld64_i64 does
-       necessary 64-bit byteswap already. */
-    if (unlikely(ctx->le_mode)) {
-        gen_qemu_ld64_i64(ctx, cpu_gpr[rd + 1], EA);
+    /* Note that the low part is always in RD+1, even in LE mode.  */
+    lo = cpu_gpr[rd + 1];
+    hi = cpu_gpr[rd];
+
+    if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+#ifdef CONFIG_ATOMIC128
+        TCGv_i32 oi = tcg_temp_new_i32();
+        if (ctx->le_mode) {
+            tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ, ctx->mem_idx));
+            gen_helper_lq_le_parallel(lo, cpu_env, EA, oi);
+        } else {
+            tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ, ctx->mem_idx));
+            gen_helper_lq_be_parallel(lo, cpu_env, EA, oi);
+        }
+        tcg_temp_free_i32(oi);
+        tcg_gen_ld_i64(hi, cpu_env, offsetof(CPUPPCState, retxh));
+#else
+        /* Restart with exclusive lock.  */
+        gen_helper_exit_atomic(cpu_env);
+        ctx->base.is_jmp = DISAS_NORETURN;
+#endif
+    } else if (ctx->le_mode) {
+        tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_LEQ);
         gen_addr_add(ctx, EA, EA, 8);
-        gen_qemu_ld64_i64(ctx, cpu_gpr[rd], EA);
+        tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_LEQ);
     } else {
-        gen_qemu_ld64_i64(ctx, cpu_gpr[rd], EA);
+        tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_BEQ);
         gen_addr_add(ctx, EA, EA, 8);
-        gen_qemu_ld64_i64(ctx, cpu_gpr[rd + 1], EA);
+        tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_BEQ);
     }
     tcg_temp_free(EA);
 }
@@ -3236,9 +3255,8 @@ STCX(stdcx_, DEF_MEMOP(MO_Q))
 /* lqarx */
 static void gen_lqarx(DisasContext *ctx)
 {
-    TCGv EA;
     int rd = rD(ctx->opcode);
-    TCGv gpr1, gpr2;
+    TCGv EA, hi, lo;
 
     if (unlikely((rd & 1) || (rd == rA(ctx->opcode)) ||
                  (rd == rB(ctx->opcode)))) {
@@ -3247,24 +3265,49 @@ static void gen_lqarx(DisasContext *ctx)
     }
 
     gen_set_access_type(ctx, ACCESS_RES);
-    EA = tcg_temp_local_new();
+    EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    gen_check_align(ctx, EA, 15);
-    if (unlikely(ctx->le_mode)) {
-        gpr1 = cpu_gpr[rd+1];
-        gpr2 = cpu_gpr[rd];
-    } else {
-        gpr1 = cpu_gpr[rd];
-        gpr2 = cpu_gpr[rd+1];
-    }
-    tcg_gen_qemu_ld_i64(gpr1, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
-    tcg_gen_mov_tl(cpu_reserve, EA);
-    gen_addr_add(ctx, EA, EA, 8);
-    tcg_gen_qemu_ld_i64(gpr2, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
 
-    tcg_gen_st_tl(gpr1, cpu_env, offsetof(CPUPPCState, reserve_val));
-    tcg_gen_st_tl(gpr2, cpu_env, offsetof(CPUPPCState, reserve_val2));
+    /* Note that the low part is always in RD+1, even in LE mode.  */
+    lo = cpu_gpr[rd + 1];
+    hi = cpu_gpr[rd];
+
+    if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+#ifdef CONFIG_ATOMIC128
+        TCGv_i32 oi = tcg_temp_new_i32();
+        if (ctx->le_mode) {
+            tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ | MO_ALIGN_16,
+                                                ctx->mem_idx));
+            gen_helper_lq_le_parallel(lo, cpu_env, EA, oi);
+        } else {
+            tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ | MO_ALIGN_16,
+                                                ctx->mem_idx));
+            gen_helper_lq_be_parallel(lo, cpu_env, EA, oi);
+        }
+        tcg_temp_free_i32(oi);
+        tcg_gen_ld_i64(hi, cpu_env, offsetof(CPUPPCState, retxh));
+#else
+        /* Restart with exclusive lock.  */
+        gen_helper_exit_atomic(cpu_env);
+        ctx->base.is_jmp = DISAS_NORETURN;
+        tcg_temp_free(EA);
+        return;
+#endif
+    } else if (ctx->le_mode) {
+        tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_LEQ | MO_ALIGN_16);
+        tcg_gen_mov_tl(cpu_reserve, EA);
+        gen_addr_add(ctx, EA, EA, 8);
+        tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_LEQ);
+    } else {
+        tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_BEQ | MO_ALIGN_16);
+        tcg_gen_mov_tl(cpu_reserve, EA);
+        gen_addr_add(ctx, EA, EA, 8);
+        tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_BEQ);
+    }
     tcg_temp_free(EA);
+
+    tcg_gen_st_tl(hi, cpu_env, offsetof(CPUPPCState, reserve_val));
+    tcg_gen_st_tl(lo, cpu_env, offsetof(CPUPPCState, reserve_val2));
 }
 
 /* stqcx. */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-28  3:51   ` David Gibson
  2018-06-29  3:33   ` David Gibson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 04/13] target/ppc: Use atomic cmpxchg for STQCX Richard Henderson
                   ` (10 subsequent siblings)
  13 siblings, 2 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

Section 1.4 of the Power ISA v3.0B states that this insn is
single-copy atomic.  As we cannot (yet) issue 128-bit loads
within TCG, use the generic helpers provided.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/helper.h     |  4 ++++
 target/ppc/mem_helper.c | 14 ++++++++++++++
 target/ppc/translate.c  | 35 +++++++++++++++++++++++++++--------
 3 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 3f451a5d7e..cbc1228570 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -803,4 +803,8 @@ DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
 #if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
 DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
 DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
+DEF_HELPER_FLAGS_5(stq_le_parallel, TCG_CALL_NO_WG,
+                   void, env, tl, i64, i64, i32)
+DEF_HELPER_FLAGS_5(stq_be_parallel, TCG_CALL_NO_WG,
+                   void, env, tl, i64, i64, i32)
 #endif
diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index 44a8f3445a..57e301edc3 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -231,6 +231,20 @@ uint64_t helper_lq_be_parallel(CPUPPCState *env, target_ulong addr,
     env->retxh = int128_gethi(ret);
     return int128_getlo(ret);
 }
+
+void helper_stq_le_parallel(CPUPPCState *env, target_ulong addr,
+                            uint64_t lo, uint64_t hi, uint32_t opidx)
+{
+    Int128 val = int128_make128(lo, hi);
+    helper_atomic_sto_le_mmu(env, addr, val, opidx, GETPC());
+}
+
+void helper_stq_be_parallel(CPUPPCState *env, target_ulong addr,
+                            uint64_t lo, uint64_t hi, uint32_t opidx)
+{
+    Int128 val = int128_make128(lo, hi);
+    helper_atomic_sto_be_mmu(env, addr, val, opidx, GETPC());
+}
 #endif
 
 /*****************************************************************************/
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 0923cc24e3..3d63a62269 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -2760,6 +2760,7 @@ static void gen_std(DisasContext *ctx)
     if ((ctx->opcode & 0x3) == 0x2) { /* stq */
         bool legal_in_user_mode = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
         bool le_is_supported = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
+        TCGv hi, lo;
 
         if (!(ctx->insns_flags & PPC_64BX)) {
             gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
@@ -2783,20 +2784,38 @@ static void gen_std(DisasContext *ctx)
         EA = tcg_temp_new();
         gen_addr_imm_index(ctx, EA, 0x03);
 
-        /* We only need to swap high and low halves. gen_qemu_st64_i64 does
-           necessary 64-bit byteswap already. */
-        if (unlikely(ctx->le_mode)) {
-            gen_qemu_st64_i64(ctx, cpu_gpr[rs + 1], EA);
+        /* Note that the low part is always in RS+1, even in LE mode.  */
+        lo = cpu_gpr[rs + 1];
+        hi = cpu_gpr[rs];
+
+        if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+#ifdef CONFIG_ATOMIC128
+            TCGv_i32 oi = tcg_temp_new_i32();
+            if (ctx->le_mode) {
+                tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ, ctx->mem_idx));
+                gen_helper_stq_le_parallel(cpu_env, EA, lo, hi, oi);
+            } else {
+                tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ, ctx->mem_idx));
+                gen_helper_stq_be_parallel(cpu_env, EA, lo, hi, oi);
+            }
+            tcg_temp_free_i32(oi);
+#else
+            /* Restart with exclusive lock.  */
+            gen_helper_exit_atomic(cpu_env);
+            ctx->base.is_jmp = DISAS_NORETURN;
+#endif
+        } else if (ctx->le_mode) {
+            tcg_gen_qemu_st_i64(lo, EA, ctx->mem_idx, MO_LEQ);
             gen_addr_add(ctx, EA, EA, 8);
-            gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
+            tcg_gen_qemu_st_i64(hi, EA, ctx->mem_idx, MO_LEQ);
         } else {
-            gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
+            tcg_gen_qemu_st_i64(hi, EA, ctx->mem_idx, MO_BEQ);
             gen_addr_add(ctx, EA, EA, 8);
-            gen_qemu_st64_i64(ctx, cpu_gpr[rs + 1], EA);
+            tcg_gen_qemu_st_i64(lo, EA, ctx->mem_idx, MO_BEQ);
         }
         tcg_temp_free(EA);
     } else {
-        /* std / stdu*/
+        /* std / stdu */
         if (Rc(ctx->opcode)) {
             if (unlikely(rA(ctx->opcode) == 0)) {
                 gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 04/13] target/ppc: Use atomic cmpxchg for STQCX
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (2 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 05/13] target/ppc: Remove POWERPC_EXCP_STCX Richard Henderson
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

When running in a parallel context, we must use a helper in order
to perform the 128-bit atomic operation.  When running in a serial
context, do the compare before the store.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/helper.h     |  2 +
 target/ppc/mem_helper.c | 38 +++++++++++++++++
 target/ppc/translate.c  | 95 ++++++++++++++++++++++++++---------------
 3 files changed, 101 insertions(+), 34 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index cbc1228570..5706c2497f 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -807,4 +807,6 @@ DEF_HELPER_FLAGS_5(stq_le_parallel, TCG_CALL_NO_WG,
                    void, env, tl, i64, i64, i32)
 DEF_HELPER_FLAGS_5(stq_be_parallel, TCG_CALL_NO_WG,
                    void, env, tl, i64, i64, i32)
+DEF_HELPER_5(stqcx_le_parallel, i32, env, tl, i64, i64, i32)
+DEF_HELPER_5(stqcx_be_parallel, i32, env, tl, i64, i64, i32)
 #endif
diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index 57e301edc3..8f0d86d104 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -245,6 +245,44 @@ void helper_stq_be_parallel(CPUPPCState *env, target_ulong addr,
     Int128 val = int128_make128(lo, hi);
     helper_atomic_sto_be_mmu(env, addr, val, opidx, GETPC());
 }
+
+uint32_t helper_stqcx_le_parallel(CPUPPCState *env, target_ulong addr,
+                                  uint64_t new_lo, uint64_t new_hi,
+                                  uint32_t opidx)
+{
+    bool success = false;
+
+    if (likely(addr == env->reserve_addr)) {
+        Int128 oldv, cmpv, newv;
+
+        cmpv = int128_make128(env->reserve_val2, env->reserve_val);
+        newv = int128_make128(new_lo, new_hi);
+        oldv = helper_atomic_cmpxchgo_le_mmu(env, addr, cmpv, newv,
+                                             opidx, GETPC());
+        success = int128_eq(oldv, cmpv);
+    }
+    env->reserve_addr = -1;
+    return env->so + success * CRF_EQ_BIT;
+}
+
+uint32_t helper_stqcx_be_parallel(CPUPPCState *env, target_ulong addr,
+                                  uint64_t new_lo, uint64_t new_hi,
+                                  uint32_t opidx)
+{
+    bool success = false;
+
+    if (likely(addr == env->reserve_addr)) {
+        Int128 oldv, cmpv, newv;
+
+        cmpv = int128_make128(env->reserve_val2, env->reserve_val);
+        newv = int128_make128(new_lo, new_hi);
+        oldv = helper_atomic_cmpxchgo_be_mmu(env, addr, cmpv, newv,
+                                             opidx, GETPC());
+        success = int128_eq(oldv, cmpv);
+    }
+    env->reserve_addr = -1;
+    return env->so + success * CRF_EQ_BIT;
+}
 #endif
 
 /*****************************************************************************/
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 3d63a62269..c7b9d226eb 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3332,50 +3332,77 @@ static void gen_lqarx(DisasContext *ctx)
 /* stqcx. */
 static void gen_stqcx_(DisasContext *ctx)
 {
-    TCGv EA;
-    int reg = rS(ctx->opcode);
-    int len = 16;
-#if !defined(CONFIG_USER_ONLY)
-    TCGLabel *l1;
-    TCGv gpr1, gpr2;
-#endif
+    int rs = rS(ctx->opcode);
+    TCGv EA, hi, lo;
 
-    if (unlikely((rD(ctx->opcode) & 1))) {
+    if (unlikely(rs & 1)) {
         gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
         return;
     }
+
     gen_set_access_type(ctx, ACCESS_RES);
-    EA = tcg_temp_local_new();
+    EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    if (len > 1) {
-        gen_check_align(ctx, EA, (len) - 1);
-    }
 
-#if defined(CONFIG_USER_ONLY)
-    gen_conditional_store(ctx, EA, reg, 16);
+    /* Note that the low part is always in RS+1, even in LE mode.  */
+    lo = cpu_gpr[rs + 1];
+    hi = cpu_gpr[rs];
+
+    if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+        TCGv_i32 oi = tcg_const_i32(DEF_MEMOP(MO_Q) | MO_ALIGN_16);
+#ifdef CONFIG_ATOMIC128
+        if (ctx->le_mode) {
+            gen_helper_stqcx_le_parallel(cpu_crf[0], cpu_env, EA, lo, hi, oi);
+        } else {
+            gen_helper_stqcx_le_parallel(cpu_crf[0], cpu_env, EA, lo, hi, oi);
+        }
 #else
-    tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
-    l1 = gen_new_label();
-    tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
-    tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ);
-
-    if (unlikely(ctx->le_mode)) {
-        gpr1 = cpu_gpr[reg + 1];
-        gpr2 = cpu_gpr[reg];
-    } else {
-        gpr1 = cpu_gpr[reg];
-        gpr2 = cpu_gpr[reg + 1];
-    }
-    tcg_gen_qemu_st_tl(gpr1, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
-    gen_addr_add(ctx, EA, EA, 8);
-    tcg_gen_qemu_st_tl(gpr2, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
-
-    gen_set_label(l1);
-    tcg_gen_movi_tl(cpu_reserve, -1);
+        /* Restart with exclusive lock.  */
+        gen_helper_exit_atomic(cpu_env);
+        ctx->base.is_jmp = DISAS_NORETURN;
 #endif
-    tcg_temp_free(EA);
-}
+        tcg_temp_free(EA);
+        tcg_temp_free_i32(oi);
+    } else {
+        TCGLabel *lab_fail = gen_new_label();
+        TCGLabel *lab_over = gen_new_label();
+        TCGv_i64 t0 = tcg_temp_new_i64();
+        TCGv_i64 t1 = tcg_temp_new_i64();
 
+        tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, lab_fail);
+        tcg_temp_free(EA);
+
+        gen_qemu_ld64_i64(ctx, t0, cpu_reserve);
+        tcg_gen_ld_i64(t1, cpu_env, (ctx->le_mode
+                                     ? offsetof(CPUPPCState, reserve_val2)
+                                     : offsetof(CPUPPCState, reserve_val)));
+        tcg_gen_brcond_i64(TCG_COND_NE, t0, t1, lab_fail);
+
+        tcg_gen_addi_i64(t0, cpu_reserve, 8);
+        gen_qemu_ld64_i64(ctx, t0, t0);
+        tcg_gen_ld_i64(t1, cpu_env, (ctx->le_mode
+                                     ? offsetof(CPUPPCState, reserve_val)
+                                     : offsetof(CPUPPCState, reserve_val2)));
+        tcg_gen_brcond_i64(TCG_COND_NE, t0, t1, lab_fail);
+
+        /* Success */
+        gen_qemu_st64_i64(ctx, ctx->le_mode ? lo : hi, cpu_reserve);
+        tcg_gen_addi_i64(t0, cpu_reserve, 8);
+        gen_qemu_st64_i64(ctx, ctx->le_mode ? hi : lo, t0);
+
+        tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
+        tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ);
+        tcg_gen_br(lab_over);
+
+        gen_set_label(lab_fail);
+        tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
+
+        gen_set_label(lab_over);
+        tcg_gen_movi_tl(cpu_reserve, -1);
+        tcg_temp_free_i64(t0);
+        tcg_temp_free_i64(t1);
+    }
+}
 #endif /* defined(TARGET_PPC64) */
 
 /* sync */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 05/13] target/ppc: Remove POWERPC_EXCP_STCX
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (3 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 04/13] target/ppc: Use atomic cmpxchg for STQCX Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 06/13] target/ppc: Tidy gen_conditional_store Richard Henderson
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

Always use the gen_conditional_store implementation that uses
atomic_cmpxchg.  Make sure and clear reserve_addr across most
interrupts crossing the cpu_loop.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/cpu.h          |   5 --
 linux-user/ppc/cpu_loop.c | 123 +++++++-------------------------------
 target/ppc/translate.c    |  14 -----
 3 files changed, 23 insertions(+), 119 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 973cf44cda..4edcf62cf7 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -196,7 +196,6 @@ enum {
     /* QEMU exceptions: special cases we want to stop translation            */
     POWERPC_EXCP_SYNC         = 0x202, /* context synchronizing instruction  */
     POWERPC_EXCP_SYSCALL_USER = 0x203, /* System call in user mode only      */
-    POWERPC_EXCP_STCX         = 0x204 /* Conditional stores in user mode     */
 };
 
 /* Exceptions error codes                                                    */
@@ -994,10 +993,6 @@ struct CPUPPCState {
     /* Reservation value */
     target_ulong reserve_val;
     target_ulong reserve_val2;
-    /* Reservation store address */
-    target_ulong reserve_ea;
-    /* Reserved store source register and size */
-    target_ulong reserve_info;
 
     /* Those ones are used in supervisor mode only */
     /* machine state register */
diff --git a/linux-user/ppc/cpu_loop.c b/linux-user/ppc/cpu_loop.c
index 2fb516cb00..133a87f349 100644
--- a/linux-user/ppc/cpu_loop.c
+++ b/linux-user/ppc/cpu_loop.c
@@ -65,99 +65,23 @@ int ppc_dcr_write (ppc_dcr_t *dcr_env, int dcrn, uint32_t val)
     return -1;
 }
 
-static int do_store_exclusive(CPUPPCState *env)
-{
-    target_ulong addr;
-    target_ulong page_addr;
-    target_ulong val, val2 __attribute__((unused)) = 0;
-    int flags;
-    int segv = 0;
-
-    addr = env->reserve_ea;
-    page_addr = addr & TARGET_PAGE_MASK;
-    start_exclusive();
-    mmap_lock();
-    flags = page_get_flags(page_addr);
-    if ((flags & PAGE_READ) == 0) {
-        segv = 1;
-    } else {
-        int reg = env->reserve_info & 0x1f;
-        int size = env->reserve_info >> 5;
-        int stored = 0;
-
-        if (addr == env->reserve_addr) {
-            switch (size) {
-            case 1: segv = get_user_u8(val, addr); break;
-            case 2: segv = get_user_u16(val, addr); break;
-            case 4: segv = get_user_u32(val, addr); break;
-#if defined(TARGET_PPC64)
-            case 8: segv = get_user_u64(val, addr); break;
-            case 16: {
-                segv = get_user_u64(val, addr);
-                if (!segv) {
-                    segv = get_user_u64(val2, addr + 8);
-                }
-                break;
-            }
-#endif
-            default: abort();
-            }
-            if (!segv && val == env->reserve_val) {
-                val = env->gpr[reg];
-                switch (size) {
-                case 1: segv = put_user_u8(val, addr); break;
-                case 2: segv = put_user_u16(val, addr); break;
-                case 4: segv = put_user_u32(val, addr); break;
-#if defined(TARGET_PPC64)
-                case 8: segv = put_user_u64(val, addr); break;
-                case 16: {
-                    if (val2 == env->reserve_val2) {
-                        if (msr_le) {
-                            val2 = val;
-                            val = env->gpr[reg+1];
-                        } else {
-                            val2 = env->gpr[reg+1];
-                        }
-                        segv = put_user_u64(val, addr);
-                        if (!segv) {
-                            segv = put_user_u64(val2, addr + 8);
-                        }
-                    }
-                    break;
-                }
-#endif
-                default: abort();
-                }
-                if (!segv) {
-                    stored = 1;
-                }
-            }
-        }
-        env->crf[0] = (stored << 1) | xer_so;
-        env->reserve_addr = (target_ulong)-1;
-    }
-    if (!segv) {
-        env->nip += 4;
-    }
-    mmap_unlock();
-    end_exclusive();
-    return segv;
-}
-
 void cpu_loop(CPUPPCState *env)
 {
     CPUState *cs = CPU(ppc_env_get_cpu(env));
     target_siginfo_t info;
-    int trapnr;
+    int trapnr, sig;
     target_ulong ret;
 
     for(;;) {
+        bool arch_interrupt;
+
         cpu_exec_start(cs);
         trapnr = cpu_exec(cs);
         cpu_exec_end(cs);
         process_queued_cpu_work(cs);
 
-        switch(trapnr) {
+        arch_interrupt = true;
+        switch (trapnr) {
         case POWERPC_EXCP_NONE:
             /* Just go on */
             break;
@@ -524,26 +448,15 @@ void cpu_loop(CPUPPCState *env)
             }
             env->gpr[3] = ret;
             break;
-        case POWERPC_EXCP_STCX:
-            if (do_store_exclusive(env)) {
-                info.si_signo = TARGET_SIGSEGV;
-                info.si_errno = 0;
-                info.si_code = TARGET_SEGV_MAPERR;
-                info._sifields._sigfault._addr = env->nip;
-                queue_signal(env, info.si_signo, QEMU_SI_FAULT, &info);
-            }
-            break;
         case EXCP_DEBUG:
-            {
-                int sig;
-
-                sig = gdb_handlesig(cs, TARGET_SIGTRAP);
-                if (sig) {
-                    info.si_signo = sig;
-                    info.si_errno = 0;
-                    info.si_code = TARGET_TRAP_BRKPT;
-                    queue_signal(env, info.si_signo, QEMU_SI_FAULT, &info);
-                  }
+            sig = gdb_handlesig(cs, TARGET_SIGTRAP);
+            if (sig) {
+                info.si_signo = sig;
+                info.si_errno = 0;
+                info.si_code = TARGET_TRAP_BRKPT;
+                queue_signal(env, info.si_signo, QEMU_SI_FAULT, &info);
+            } else {
+                arch_interrupt = false;
             }
             break;
         case EXCP_INTERRUPT:
@@ -551,12 +464,22 @@ void cpu_loop(CPUPPCState *env)
             break;
         case EXCP_ATOMIC:
             cpu_exec_step_atomic(cs);
+            arch_interrupt = false;
             break;
         default:
             cpu_abort(cs, "Unknown exception 0x%x. Aborting\n", trapnr);
             break;
         }
         process_pending_signals(env);
+
+        /* Most of the traps imply a transition through kernel mode,
+         * which implies an REI instruction has been executed.  Which
+         * means that RX and LOCK_ADDR should be cleared.  But there
+         * are a few exceptions for traps internal to QEMU.
+         */
+        if (arch_interrupt) {
+            env->reserve_addr = -1;
+        }
     }
 }
 
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index c7b9d226eb..03e8c5df03 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3201,19 +3201,6 @@ ST_ATOMIC(stwat, DEF_MEMOP(MO_UL), i32, trunc_tl_i32)
 ST_ATOMIC(stdat, DEF_MEMOP(MO_Q), i64, mov_i64)
 #endif
 
-#if defined(CONFIG_USER_ONLY)
-static void gen_conditional_store(DisasContext *ctx, TCGv EA,
-                                  int reg, int memop)
-{
-    TCGv t0 = tcg_temp_new();
-
-    tcg_gen_st_tl(EA, cpu_env, offsetof(CPUPPCState, reserve_ea));
-    tcg_gen_movi_tl(t0, (MEMOP_GET_SIZE(memop) << 5) | reg);
-    tcg_gen_st_tl(t0, cpu_env, offsetof(CPUPPCState, reserve_info));
-    tcg_temp_free(t0);
-    gen_exception_err(ctx, POWERPC_EXCP_STCX, 0);
-}
-#else
 static void gen_conditional_store(DisasContext *ctx, TCGv EA,
                                   int reg, int memop)
 {
@@ -3244,7 +3231,6 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA,
     gen_set_label(l2);
     tcg_gen_movi_tl(cpu_reserve, -1);
 }
-#endif
 
 #define STCX(name, memop)                                   \
 static void gen_##name(DisasContext *ctx)                   \
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 06/13] target/ppc: Tidy gen_conditional_store
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (4 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 05/13] target/ppc: Remove POWERPC_EXCP_STCX Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 07/13] target/ppc: Split out gen_load_locked Richard Henderson
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

Leave only the minimal amount of code within the STCX macro,
moving the rest of the code into gen_conditional_store.
Remove the explicit call to gen_check_align; the matching LDAX will
have already checked alignment, and we verify the same address.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate.c | 28 +++++++++++-----------------
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 03e8c5df03..e751072404 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3201,14 +3201,17 @@ ST_ATOMIC(stwat, DEF_MEMOP(MO_UL), i32, trunc_tl_i32)
 ST_ATOMIC(stdat, DEF_MEMOP(MO_Q), i64, mov_i64)
 #endif
 
-static void gen_conditional_store(DisasContext *ctx, TCGv EA,
-                                  int reg, int memop)
+static void gen_conditional_store(DisasContext *ctx, TCGMemOp memop)
 {
     TCGLabel *l1 = gen_new_label();
     TCGLabel *l2 = gen_new_label();
-    TCGv t0;
+    TCGv t0 = tcg_temp_new();
+    int reg = rS(ctx->opcode);
 
-    tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
+    gen_set_access_type(ctx, ACCESS_RES);
+    gen_addr_reg_index(ctx, t0);
+    tcg_gen_brcond_tl(TCG_COND_NE, t0, cpu_reserve, l1);
+    tcg_temp_free(t0);
 
     t0 = tcg_temp_new();
     tcg_gen_atomic_cmpxchg_tl(t0, cpu_reserve, cpu_reserve_val,
@@ -3232,19 +3235,10 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA,
     tcg_gen_movi_tl(cpu_reserve, -1);
 }
 
-#define STCX(name, memop)                                   \
-static void gen_##name(DisasContext *ctx)                   \
-{                                                           \
-    TCGv t0;                                                \
-    int len = MEMOP_GET_SIZE(memop);                        \
-    gen_set_access_type(ctx, ACCESS_RES);                   \
-    t0 = tcg_temp_local_new();                              \
-    gen_addr_reg_index(ctx, t0);                            \
-    if (len > 1) {                                          \
-        gen_check_align(ctx, t0, (len) - 1);                \
-    }                                                       \
-    gen_conditional_store(ctx, t0, rS(ctx->opcode), memop); \
-    tcg_temp_free(t0);                                      \
+#define STCX(name, memop)                  \
+static void gen_##name(DisasContext *ctx)  \
+{                                          \
+    gen_conditional_store(ctx, memop);     \
 }
 
 STCX(stbcx_, DEF_MEMOP(MO_UB))
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 07/13] target/ppc: Split out gen_load_locked
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (5 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 06/13] target/ppc: Tidy gen_conditional_store Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 08/13] target/ppc: Split out gen_ld_atomic Richard Henderson
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

Leave only the minimal amount of code within the LDAR macro,
moving the rest of the code into gen_load_locked.  Use MO_ALIGN
and remove the explicit call to gen_check_align.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate.c | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index e751072404..f48fcbeefb 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3070,23 +3070,24 @@ static void gen_isync(DisasContext *ctx)
 
 #define MEMOP_GET_SIZE(x)  (1 << ((x) & MO_SIZE))
 
-#define LARX(name, memop)                                            \
-static void gen_##name(DisasContext *ctx)                            \
-{                                                                    \
-    TCGv t0;                                                         \
-    TCGv gpr = cpu_gpr[rD(ctx->opcode)];                             \
-    int len = MEMOP_GET_SIZE(memop);                                 \
-    gen_set_access_type(ctx, ACCESS_RES);                            \
-    t0 = tcg_temp_local_new();                                       \
-    gen_addr_reg_index(ctx, t0);                                     \
-    if ((len) > 1) {                                                 \
-        gen_check_align(ctx, t0, (len)-1);                           \
-    }                                                                \
-    tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop);                \
-    tcg_gen_mov_tl(cpu_reserve, t0);                                 \
-    tcg_gen_mov_tl(cpu_reserve_val, gpr);                            \
-    tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);                           \
-    tcg_temp_free(t0);                                               \
+static void gen_load_locked(DisasContext *ctx, TCGMemOp memop)
+{
+    TCGv gpr = cpu_gpr[rD(ctx->opcode)];
+    TCGv t0 = tcg_temp_new();
+
+    gen_set_access_type(ctx, ACCESS_RES);
+    gen_addr_reg_index(ctx, t0);
+    tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop | MO_ALIGN);
+    tcg_gen_mov_tl(cpu_reserve, t0);
+    tcg_gen_mov_tl(cpu_reserve_val, gpr);
+    tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
+    tcg_temp_free(t0);
+}
+
+#define LARX(name, memop)                  \
+static void gen_##name(DisasContext *ctx)  \
+{                                          \
+    gen_load_locked(ctx, memop);           \
 }
 
 /* lwarx */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 08/13] target/ppc: Split out gen_ld_atomic
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (6 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 07/13] target/ppc: Split out gen_load_locked Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 09/13] target/ppc: Split out gen_st_atomic Richard Henderson
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

Move the guts of LD_ATOMIC to a function.  Use foo_tl for the operations
instead of foo_i32 or foo_i64 specifically.  Use MO_ALIGN instead of an
explicit call to gen_check_align.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate.c | 105 ++++++++++++++++++++---------------------
 1 file changed, 52 insertions(+), 53 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index f48fcbeefb..361b178db8 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3095,61 +3095,60 @@ LARX(lbarx, DEF_MEMOP(MO_UB))
 LARX(lharx, DEF_MEMOP(MO_UW))
 LARX(lwarx, DEF_MEMOP(MO_UL))
 
-#define LD_ATOMIC(name, memop, tp, op, eop)                             \
-static void gen_##name(DisasContext *ctx)                               \
-{                                                                       \
-    int len = MEMOP_GET_SIZE(memop);                                    \
-    uint32_t gpr_FC = FC(ctx->opcode);                                  \
-    TCGv EA = tcg_temp_local_new();                                     \
-    TCGv_##tp t0, t1;                                                   \
-                                                                        \
-    gen_addr_register(ctx, EA);                                         \
-    if (len > 1) {                                                      \
-        gen_check_align(ctx, EA, len - 1);                              \
-    }                                                                   \
-    t0 = tcg_temp_new_##tp();                                           \
-    t1 = tcg_temp_new_##tp();                                           \
-    tcg_gen_##op(t0, cpu_gpr[rD(ctx->opcode) + 1]);                     \
-                                                                        \
-    switch (gpr_FC) {                                                   \
-    case 0: /* Fetch and add */                                         \
-        tcg_gen_atomic_fetch_add_##tp(t1, EA, t0, ctx->mem_idx, memop); \
-        break;                                                          \
-    case 1: /* Fetch and xor */                                         \
-        tcg_gen_atomic_fetch_xor_##tp(t1, EA, t0, ctx->mem_idx, memop); \
-        break;                                                          \
-    case 2: /* Fetch and or */                                          \
-        tcg_gen_atomic_fetch_or_##tp(t1, EA, t0, ctx->mem_idx, memop);  \
-        break;                                                          \
-    case 3: /* Fetch and 'and' */                                       \
-        tcg_gen_atomic_fetch_and_##tp(t1, EA, t0, ctx->mem_idx, memop); \
-        break;                                                          \
-    case 8: /* Swap */                                                  \
-        tcg_gen_atomic_xchg_##tp(t1, EA, t0, ctx->mem_idx, memop);      \
-        break;                                                          \
-    case 4:  /* Fetch and max unsigned */                               \
-    case 5:  /* Fetch and max signed */                                 \
-    case 6:  /* Fetch and min unsigned */                               \
-    case 7:  /* Fetch and min signed */                                 \
-    case 16: /* compare and swap not equal */                           \
-    case 24: /* Fetch and increment bounded */                          \
-    case 25: /* Fetch and increment equal */                            \
-    case 28: /* Fetch and decrement bounded */                          \
-        gen_invalid(ctx);                                               \
-        break;                                                          \
-    default:                                                            \
-        /* invoke data storage error handler */                         \
-        gen_exception_err(ctx, POWERPC_EXCP_DSI, POWERPC_EXCP_INVAL);   \
-    }                                                                   \
-    tcg_gen_##eop(cpu_gpr[rD(ctx->opcode)], t1);                        \
-    tcg_temp_free_##tp(t0);                                             \
-    tcg_temp_free_##tp(t1);                                             \
-    tcg_temp_free(EA);                                                  \
+static void gen_ld_atomic(DisasContext *ctx, TCGMemOp memop)
+{
+    uint32_t gpr_FC = FC(ctx->opcode);
+    TCGv EA = tcg_temp_new();
+    TCGv src, dst;
+
+    gen_addr_register(ctx, EA);
+    dst = cpu_gpr[rD(ctx->opcode)];
+    src = cpu_gpr[rD(ctx->opcode) + 1];
+
+    memop |= MO_ALIGN;
+    switch (gpr_FC) {
+    case 0: /* Fetch and add */
+        tcg_gen_atomic_fetch_add_tl(dst, EA, src, ctx->mem_idx, memop);
+        break;
+    case 1: /* Fetch and xor */
+        tcg_gen_atomic_fetch_xor_tl(dst, EA, src, ctx->mem_idx, memop);
+        break;
+    case 2: /* Fetch and or */
+        tcg_gen_atomic_fetch_or_tl(dst, EA, src, ctx->mem_idx, memop);
+        break;
+    case 3: /* Fetch and 'and' */
+        tcg_gen_atomic_fetch_and_tl(dst, EA, src, ctx->mem_idx, memop);
+        break;
+    case 8: /* Swap */
+        tcg_gen_atomic_xchg_tl(dst, EA, src, ctx->mem_idx, memop);
+        break;
+    case 4:  /* Fetch and max unsigned */
+    case 5:  /* Fetch and max signed */
+    case 6:  /* Fetch and min unsigned */
+    case 7:  /* Fetch and min signed */
+    case 16: /* compare and swap not equal */
+    case 24: /* Fetch and increment bounded */
+    case 25: /* Fetch and increment equal */
+    case 28: /* Fetch and decrement bounded */
+        gen_invalid(ctx);
+        break;
+    default:
+        /* invoke data storage error handler */
+        gen_exception_err(ctx, POWERPC_EXCP_DSI, POWERPC_EXCP_INVAL);
+    }
+    tcg_temp_free(EA);
 }
 
-LD_ATOMIC(lwat, DEF_MEMOP(MO_UL), i32, trunc_tl_i32, extu_i32_tl)
-#if defined(TARGET_PPC64)
-LD_ATOMIC(ldat, DEF_MEMOP(MO_Q), i64, mov_i64, mov_i64)
+static void gen_lwat(DisasContext *ctx)
+{
+    gen_ld_atomic(ctx, DEF_MEMOP(MO_UL));
+}
+
+#ifdef TARGET_PPC64
+static void gen_ldat(DisasContext *ctx)
+{
+    gen_ld_atomic(ctx, DEF_MEMOP(MO_Q));
+}
 #endif
 
 #define ST_ATOMIC(name, memop, tp, op)                                  \
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 09/13] target/ppc: Split out gen_st_atomic
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (7 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 08/13] target/ppc: Split out gen_ld_atomic Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 10/13] target/ppc: Use MO_ALIGN for EXIWX and ECOWX Richard Henderson
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

Move the guts of ST_ATOMIC to a function.  Use foo_tl for the operations
instead of foo_i32 or foo_i64 specifically.  Use MO_ALIGN instead of an
explicit call to gen_check_align.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate.c | 93 +++++++++++++++++++++---------------------
 1 file changed, 47 insertions(+), 46 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 361b178db8..53ca8f0114 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3151,54 +3151,55 @@ static void gen_ldat(DisasContext *ctx)
 }
 #endif
 
-#define ST_ATOMIC(name, memop, tp, op)                                  \
-static void gen_##name(DisasContext *ctx)                               \
-{                                                                       \
-    int len = MEMOP_GET_SIZE(memop);                                    \
-    uint32_t gpr_FC = FC(ctx->opcode);                                  \
-    TCGv EA = tcg_temp_local_new();                                     \
-    TCGv_##tp t0, t1;                                                   \
-                                                                        \
-    gen_addr_register(ctx, EA);                                         \
-    if (len > 1) {                                                      \
-        gen_check_align(ctx, EA, len - 1);                              \
-    }                                                                   \
-    t0 = tcg_temp_new_##tp();                                           \
-    t1 = tcg_temp_new_##tp();                                           \
-    tcg_gen_##op(t0, cpu_gpr[rD(ctx->opcode) + 1]);                     \
-                                                                        \
-    switch (gpr_FC) {                                                   \
-    case 0: /* add and Store */                                         \
-        tcg_gen_atomic_add_fetch_##tp(t1, EA, t0, ctx->mem_idx, memop); \
-        break;                                                          \
-    case 1: /* xor and Store */                                         \
-        tcg_gen_atomic_xor_fetch_##tp(t1, EA, t0, ctx->mem_idx, memop); \
-        break;                                                          \
-    case 2: /* Or and Store */                                          \
-        tcg_gen_atomic_or_fetch_##tp(t1, EA, t0, ctx->mem_idx, memop);  \
-        break;                                                          \
-    case 3: /* 'and' and Store */                                       \
-        tcg_gen_atomic_and_fetch_##tp(t1, EA, t0, ctx->mem_idx, memop); \
-        break;                                                          \
-    case 4:  /* Store max unsigned */                                   \
-    case 5:  /* Store max signed */                                     \
-    case 6:  /* Store min unsigned */                                   \
-    case 7:  /* Store min signed */                                     \
-    case 24: /* Store twin  */                                          \
-        gen_invalid(ctx);                                               \
-        break;                                                          \
-    default:                                                            \
-        /* invoke data storage error handler */                         \
-        gen_exception_err(ctx, POWERPC_EXCP_DSI, POWERPC_EXCP_INVAL);   \
-    }                                                                   \
-    tcg_temp_free_##tp(t0);                                             \
-    tcg_temp_free_##tp(t1);                                             \
-    tcg_temp_free(EA);                                                  \
+static void gen_st_atomic(DisasContext *ctx, TCGMemOp memop)
+{
+    uint32_t gpr_FC = FC(ctx->opcode);
+    TCGv EA = tcg_temp_new();
+    TCGv src, discard;
+
+    gen_addr_register(ctx, EA);
+    src = cpu_gpr[rD(ctx->opcode)];
+    discard = tcg_temp_new();
+
+    memop |= MO_ALIGN;
+    switch (gpr_FC) {
+    case 0: /* add and Store */
+        tcg_gen_atomic_add_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+        break;
+    case 1: /* xor and Store */
+        tcg_gen_atomic_xor_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+        break;
+    case 2: /* Or and Store */
+        tcg_gen_atomic_or_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+        break;
+    case 3: /* 'and' and Store */
+        tcg_gen_atomic_and_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+        break;
+    case 4:  /* Store max unsigned */
+    case 5:  /* Store max signed */
+    case 6:  /* Store min unsigned */
+    case 7:  /* Store min signed */
+    case 24: /* Store twin  */
+        gen_invalid(ctx);
+        break;
+    default:
+        /* invoke data storage error handler */
+        gen_exception_err(ctx, POWERPC_EXCP_DSI, POWERPC_EXCP_INVAL);
+    }
+    tcg_temp_free(discard);
+    tcg_temp_free(EA);
 }
 
-ST_ATOMIC(stwat, DEF_MEMOP(MO_UL), i32, trunc_tl_i32)
-#if defined(TARGET_PPC64)
-ST_ATOMIC(stdat, DEF_MEMOP(MO_Q), i64, mov_i64)
+static void gen_stwat(DisasContext *ctx)
+{
+    gen_st_atomic(ctx, DEF_MEMOP(MO_UL));
+}
+
+#ifdef TARGET_PPC64
+static void gen_stdat(DisasContext *ctx)
+{
+    gen_st_atomic(ctx, DEF_MEMOP(MO_Q));
+}
 #endif
 
 static void gen_conditional_store(DisasContext *ctx, TCGMemOp memop)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 10/13] target/ppc: Use MO_ALIGN for EXIWX and ECOWX
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (8 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 09/13] target/ppc: Split out gen_st_atomic Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 11/13] target/ppc: Use atomic min/max helpers Richard Henderson
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

This avoids the need for gen_check_align entirely.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate.c | 25 ++++---------------------
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 53ca8f0114..c2a28be6d7 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -2388,23 +2388,6 @@ static inline void gen_addr_add(DisasContext *ctx, TCGv ret, TCGv arg1,
     }
 }
 
-static inline void gen_check_align(DisasContext *ctx, TCGv EA, int mask)
-{
-    TCGLabel *l1 = gen_new_label();
-    TCGv t0 = tcg_temp_new();
-    TCGv_i32 t1, t2;
-    tcg_gen_andi_tl(t0, EA, mask);
-    tcg_gen_brcondi_tl(TCG_COND_EQ, t0, 0, l1);
-    t1 = tcg_const_i32(POWERPC_EXCP_ALIGN);
-    t2 = tcg_const_i32(ctx->opcode & 0x03FF0000);
-    gen_update_nip(ctx, ctx->base.pc_next - 4);
-    gen_helper_raise_exception_err(cpu_env, t1, t2);
-    tcg_temp_free_i32(t1);
-    tcg_temp_free_i32(t2);
-    gen_set_label(l1);
-    tcg_temp_free(t0);
-}
-
 static inline void gen_align_no_le(DisasContext *ctx)
 {
     gen_exception_err(ctx, POWERPC_EXCP_ALIGN,
@@ -4706,8 +4689,8 @@ static void gen_eciwx(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_EXT);
     t0 = tcg_temp_new();
     gen_addr_reg_index(ctx, t0);
-    gen_check_align(ctx, t0, 0x03);
-    gen_qemu_ld32u(ctx, cpu_gpr[rD(ctx->opcode)], t0);
+    tcg_gen_qemu_ld_tl(cpu_gpr[rD(ctx->opcode)], t0, ctx->mem_idx,
+                       DEF_MEMOP(MO_UL | MO_ALIGN));
     tcg_temp_free(t0);
 }
 
@@ -4719,8 +4702,8 @@ static void gen_ecowx(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_EXT);
     t0 = tcg_temp_new();
     gen_addr_reg_index(ctx, t0);
-    gen_check_align(ctx, t0, 0x03);
-    gen_qemu_st32(ctx, cpu_gpr[rD(ctx->opcode)], t0);
+    tcg_gen_qemu_st_tl(cpu_gpr[rD(ctx->opcode)], t0, ctx->mem_idx,
+                       DEF_MEMOP(MO_UL | MO_ALIGN));
     tcg_temp_free(t0);
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 11/13] target/ppc: Use atomic min/max helpers
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (9 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 10/13] target/ppc: Use MO_ALIGN for EXIWX and ECOWX Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 12/13] target/ppc: Implement the rest of gen_ld_atomic Richard Henderson
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

These operations were previously unimplemented for ppc.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index c2a28be6d7..79285b6698 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3102,13 +3102,21 @@ static void gen_ld_atomic(DisasContext *ctx, TCGMemOp memop)
     case 3: /* Fetch and 'and' */
         tcg_gen_atomic_fetch_and_tl(dst, EA, src, ctx->mem_idx, memop);
         break;
+    case 4:  /* Fetch and max unsigned */
+        tcg_gen_atomic_fetch_umax_tl(dst, EA, src, ctx->mem_idx, memop);
+        break;
+    case 5:  /* Fetch and max signed */
+        tcg_gen_atomic_fetch_smax_tl(dst, EA, src, ctx->mem_idx, memop);
+        break;
+    case 6:  /* Fetch and min unsigned */
+        tcg_gen_atomic_fetch_umin_tl(dst, EA, src, ctx->mem_idx, memop);
+        break;
+    case 7:  /* Fetch and min signed */
+        tcg_gen_atomic_fetch_smin_tl(dst, EA, src, ctx->mem_idx, memop);
+        break;
     case 8: /* Swap */
         tcg_gen_atomic_xchg_tl(dst, EA, src, ctx->mem_idx, memop);
         break;
-    case 4:  /* Fetch and max unsigned */
-    case 5:  /* Fetch and max signed */
-    case 6:  /* Fetch and min unsigned */
-    case 7:  /* Fetch and min signed */
     case 16: /* compare and swap not equal */
     case 24: /* Fetch and increment bounded */
     case 25: /* Fetch and increment equal */
@@ -3159,9 +3167,17 @@ static void gen_st_atomic(DisasContext *ctx, TCGMemOp memop)
         tcg_gen_atomic_and_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
         break;
     case 4:  /* Store max unsigned */
+        tcg_gen_atomic_umax_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+        break;
     case 5:  /* Store max signed */
+        tcg_gen_atomic_smax_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+        break;
     case 6:  /* Store min unsigned */
+        tcg_gen_atomic_umin_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+        break;
     case 7:  /* Store min signed */
+        tcg_gen_atomic_smin_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+        break;
     case 24: /* Store twin  */
         gen_invalid(ctx);
         break;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 12/13] target/ppc: Implement the rest of gen_ld_atomic
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (10 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 11/13] target/ppc: Use atomic min/max helpers Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 13/13] target/ppc: Implement the rest of gen_st_atomic Richard Henderson
  2018-06-29  4:15 ` [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations David Gibson
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

These cases were stubbed out.  For now, implement them only within
a serial context, forcing parallel execution to synchronize.  It
would be possible to implement these with cmpxchg loops, if we care.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate.c | 89 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 82 insertions(+), 7 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 79285b6698..597a37d3ec 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3078,16 +3078,45 @@ LARX(lbarx, DEF_MEMOP(MO_UB))
 LARX(lharx, DEF_MEMOP(MO_UW))
 LARX(lwarx, DEF_MEMOP(MO_UL))
 
+static void gen_fetch_inc_conditional(DisasContext *ctx, TCGMemOp memop,
+                                      TCGv EA, TCGCond cond, int addend)
+{
+    TCGv t = tcg_temp_new();
+    TCGv t2 = tcg_temp_new();
+    TCGv u = tcg_temp_new();
+
+    tcg_gen_qemu_ld_tl(t, EA, ctx->mem_idx, memop);
+    tcg_gen_addi_tl(t2, EA, MEMOP_GET_SIZE(memop));
+    tcg_gen_qemu_ld_tl(t2, t2, ctx->mem_idx, memop);
+    tcg_gen_addi_tl(u, t, addend);
+
+    /* E.g. for fetch and increment bounded... */
+    /* mem(EA,s) = (t != t2 ? u = t + 1 : t) */
+    tcg_gen_movcond_tl(cond, u, t, t2, u, t);
+    tcg_gen_qemu_st_tl(u, EA, ctx->mem_idx, memop);
+
+    /* RT = (t != t2 ? t : u = 1<<(s*8-1)) */
+    tcg_gen_movi_tl(u, 1 << (MEMOP_GET_SIZE(memop) * 8 - 1));
+    tcg_gen_movcond_tl(cond, cpu_gpr[rD(ctx->opcode)], t, t2, t, u);
+
+    tcg_temp_free(t);
+    tcg_temp_free(t2);
+    tcg_temp_free(u);
+}
+
 static void gen_ld_atomic(DisasContext *ctx, TCGMemOp memop)
 {
     uint32_t gpr_FC = FC(ctx->opcode);
     TCGv EA = tcg_temp_new();
+    int rt = rD(ctx->opcode);
+    bool need_serial;
     TCGv src, dst;
 
     gen_addr_register(ctx, EA);
-    dst = cpu_gpr[rD(ctx->opcode)];
-    src = cpu_gpr[rD(ctx->opcode) + 1];
+    dst = cpu_gpr[rt];
+    src = cpu_gpr[(rt + 1) & 31];
 
+    need_serial = false;
     memop |= MO_ALIGN;
     switch (gpr_FC) {
     case 0: /* Fetch and add */
@@ -3117,17 +3146,63 @@ static void gen_ld_atomic(DisasContext *ctx, TCGMemOp memop)
     case 8: /* Swap */
         tcg_gen_atomic_xchg_tl(dst, EA, src, ctx->mem_idx, memop);
         break;
-    case 16: /* compare and swap not equal */
-    case 24: /* Fetch and increment bounded */
-    case 25: /* Fetch and increment equal */
-    case 28: /* Fetch and decrement bounded */
-        gen_invalid(ctx);
+
+    case 16: /* Compare and swap not equal */
+        if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+            need_serial = true;
+        } else {
+            TCGv t0 = tcg_temp_new();
+            TCGv t1 = tcg_temp_new();
+
+            tcg_gen_qemu_ld_tl(t0, EA, ctx->mem_idx, memop);
+            if ((memop & MO_SIZE) == MO_64 || TARGET_LONG_BITS == 32) {
+                tcg_gen_mov_tl(t1, src);
+            } else {
+                tcg_gen_ext32u_tl(t1, src);
+            }
+            tcg_gen_movcond_tl(TCG_COND_NE, t1, t0, t1,
+                               cpu_gpr[(rt + 2) & 31], t0);
+            tcg_gen_qemu_st_tl(t1, EA, ctx->mem_idx, memop);
+            tcg_gen_mov_tl(dst, t0);
+
+            tcg_temp_free(t0);
+            tcg_temp_free(t1);
+        }
         break;
+
+    case 24: /* Fetch and increment bounded */
+        if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+            need_serial = true;
+        } else {
+            gen_fetch_inc_conditional(ctx, memop, EA, TCG_COND_NE, 1);
+        }
+        break;
+    case 25: /* Fetch and increment equal */
+        if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+            need_serial = true;
+        } else {
+            gen_fetch_inc_conditional(ctx, memop, EA, TCG_COND_EQ, 1);
+        }
+        break;
+    case 28: /* Fetch and decrement bounded */
+        if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+            need_serial = true;
+        } else {
+            gen_fetch_inc_conditional(ctx, memop, EA, TCG_COND_NE, -1);
+        }
+        break;
+
     default:
         /* invoke data storage error handler */
         gen_exception_err(ctx, POWERPC_EXCP_DSI, POWERPC_EXCP_INVAL);
     }
     tcg_temp_free(EA);
+
+    if (need_serial) {
+        /* Restart with exclusive lock.  */
+        gen_helper_exit_atomic(cpu_env);
+        ctx->base.is_jmp = DISAS_NORETURN;
+    }
 }
 
 static void gen_lwat(DisasContext *ctx)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 13/13] target/ppc: Implement the rest of gen_st_atomic
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (11 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 12/13] target/ppc: Implement the rest of gen_ld_atomic Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-29  4:15 ` [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations David Gibson
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

The store twin case was stubbed out.  For now, implement it only within
a serial context, forcing parallel execution to synchronize.  It would
be possible to implement with a cmpxchg loop, if we care, but the loose
alignment requirements (simply no crossing 32-byte boundary) might send
us back to the serial context anyway.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 597a37d3ec..e120f2ed0b 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3254,7 +3254,31 @@ static void gen_st_atomic(DisasContext *ctx, TCGMemOp memop)
         tcg_gen_atomic_smin_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
         break;
     case 24: /* Store twin  */
-        gen_invalid(ctx);
+        if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+            /* Restart with exclusive lock.  */
+            gen_helper_exit_atomic(cpu_env);
+            ctx->base.is_jmp = DISAS_NORETURN;
+        } else {
+            TCGv t = tcg_temp_new();
+            TCGv t2 = tcg_temp_new();
+            TCGv s = tcg_temp_new();
+            TCGv s2 = tcg_temp_new();
+            TCGv ea_plus_s = tcg_temp_new();
+
+            tcg_gen_qemu_ld_tl(t, EA, ctx->mem_idx, memop);
+            tcg_gen_addi_tl(ea_plus_s, EA, MEMOP_GET_SIZE(memop));
+            tcg_gen_qemu_ld_tl(t2, ea_plus_s, ctx->mem_idx, memop);
+            tcg_gen_movcond_tl(TCG_COND_EQ, s, t, t2, src, t);
+            tcg_gen_movcond_tl(TCG_COND_EQ, s2, t, t2, src, t2);
+            tcg_gen_qemu_st_tl(s, EA, ctx->mem_idx, memop);
+            tcg_gen_qemu_st_tl(s2, ea_plus_s, ctx->mem_idx, memop);
+
+            tcg_temp_free(ea_plus_s);
+            tcg_temp_free(s2);
+            tcg_temp_free(s);
+            tcg_temp_free(t2);
+            tcg_temp_free(t);
+        }
         break;
     default:
         /* invoke data storage error handler */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook Richard Henderson
@ 2018-06-27  9:09   ` David Gibson
  2018-06-27 13:52     ` Richard Henderson
  0 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2018-06-27  9:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 3241 bytes --]

On Tue, Jun 26, 2018 at 09:19:09AM -0700, Richard Henderson wrote:
> This allows faults from MO_ALIGN to have the same effect
> as from gen_check_align.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

So, most powerpc cpus can handle most unaligned accesses without an
exception.  I'm assuming this series won't preclude that?

> ---
>  target/ppc/internal.h           |  5 +++++
>  target/ppc/excp_helper.c        | 18 +++++++++++++++++-
>  target/ppc/translate_init.inc.c |  1 +
>  3 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/target/ppc/internal.h b/target/ppc/internal.h
> index 1f441c6483..a9bcadff42 100644
> --- a/target/ppc/internal.h
> +++ b/target/ppc/internal.h
> @@ -252,4 +252,9 @@ static inline void putVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env)
>  void helper_compute_fprf_float16(CPUPPCState *env, float16 arg);
>  void helper_compute_fprf_float32(CPUPPCState *env, float32 arg);
>  void helper_compute_fprf_float128(CPUPPCState *env, float128 arg);
> +
> +/* Raise a data fault alignment exception for the specified virtual address */
> +void ppc_cpu_do_unaligned_access(CPUState *cs, vaddr addr,
> +                                 MMUAccessType access_type,
> +                                 int mmu_idx, uintptr_t retaddr);
>  #endif /* PPC_INTERNAL_H */
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index c092fbead0..d6e97a90e0 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -22,7 +22,7 @@
>  #include "exec/helper-proto.h"
>  #include "exec/exec-all.h"
>  #include "exec/cpu_ldst.h"
> -
> +#include "internal.h"
>  #include "helper_regs.h"
>  
>  //#define DEBUG_OP
> @@ -1198,3 +1198,19 @@ void helper_book3s_msgsnd(target_ulong rb)
>      qemu_mutex_unlock_iothread();
>  }
>  #endif
> +
> +void ppc_cpu_do_unaligned_access(CPUState *cs, vaddr vaddr,
> +                                 MMUAccessType access_type,
> +                                 int mmu_idx, uintptr_t retaddr)
> +{
> +    CPUPPCState *env = cs->env_ptr;
> +    uint32_t insn;
> +
> +    /* Restore state and reload the insn we executed, for filling in DSISR.  */
> +    cpu_restore_state(cs, retaddr, true);
> +    insn = cpu_ldl_code(env, env->nip);
> +
> +    cs->exception_index = POWERPC_EXCP_ALIGN;
> +    env->error_code = insn & 0x03FF0000;
> +    cpu_loop_exit(cs);
> +}
> diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
> index 76d6f3fd5e..7813b1b004 100644
> --- a/target/ppc/translate_init.inc.c
> +++ b/target/ppc/translate_init.inc.c
> @@ -10457,6 +10457,7 @@ static void ppc_cpu_class_init(ObjectClass *oc, void *data)
>      cc->set_pc = ppc_cpu_set_pc;
>      cc->gdb_read_register = ppc_cpu_gdb_read_register;
>      cc->gdb_write_register = ppc_cpu_gdb_write_register;
> +    cc->do_unaligned_access = ppc_cpu_do_unaligned_access;
>  #ifdef CONFIG_USER_ONLY
>      cc->handle_mmu_fault = ppc_cpu_handle_mmu_fault;
>  #else

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook
  2018-06-27  9:09   ` David Gibson
@ 2018-06-27 13:52     ` Richard Henderson
  2018-06-28  3:46       ` David Gibson
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2018-06-27 13:52 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel, qemu-ppc

On 06/27/2018 02:09 AM, David Gibson wrote:
> On Tue, Jun 26, 2018 at 09:19:09AM -0700, Richard Henderson wrote:
>> This allows faults from MO_ALIGN to have the same effect
>> as from gen_check_align.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> 
> So, most powerpc cpus can handle most unaligned accesses without an
> exception.  I'm assuming this series won't preclude that?

Correct.  This hook will only fire when using MO_ALIGN to
request an alignment check.


r~

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook
  2018-06-27 13:52     ` Richard Henderson
@ 2018-06-28  3:46       ` David Gibson
  0 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2018-06-28  3:46 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 892 bytes --]

On Wed, Jun 27, 2018 at 06:52:49AM -0700, Richard Henderson wrote:
> On 06/27/2018 02:09 AM, David Gibson wrote:
> > On Tue, Jun 26, 2018 at 09:19:09AM -0700, Richard Henderson wrote:
> >> This allows faults from MO_ALIGN to have the same effect
> >> as from gen_check_align.
> >>
> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> > 
> > So, most powerpc cpus can handle most unaligned accesses without an
> > exception.  I'm assuming this series won't preclude that?
> 
> Correct.  This hook will only fire when using MO_ALIGN to
> request an alignment check.

Thanks for the confirmation, first patch applied to ppc-for-3.0.
Continuing to look at the rest.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX Richard Henderson
@ 2018-06-28  3:49   ` David Gibson
  2018-06-28 15:22     ` Richard Henderson
  0 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2018-06-28  3:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 8966 bytes --]

On Tue, Jun 26, 2018 at 09:19:10AM -0700, Richard Henderson wrote:
> Section 1.4 of the Power ISA v3.0B states that both of these
> instructions are single-copy atomic.  As we cannot (yet) issue
> 128-bit loads within TCG, use the generic helpers provided.
> 
> Since TCG cannot (yet) return a 128-bit value, add a slot within
> CPUPPCState for returning the high half of a 128-bit return value.
> This solution is preferred to the helper assigning to architectural
> registers directly, as it avoids clobbering all TCG live values.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/ppc/cpu.h        |  3 ++
>  target/ppc/helper.h     |  5 +++
>  target/ppc/mem_helper.c | 20 ++++++++-
>  target/ppc/translate.c  | 93 ++++++++++++++++++++++++++++++-----------
>  4 files changed, 95 insertions(+), 26 deletions(-)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index c7f3fb6b73..973cf44cda 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1015,6 +1015,9 @@ struct CPUPPCState {
>      /* Next instruction pointer */
>      target_ulong nip;
>  
> +    /* High part of 128-bit helper return.  */
> +    uint64_t retxh;
> +

Adding a temporary here is kind of gross.  I guess the helper
interface doesn't allow for 128-bit returns, but couldn't you pass a
register number into the helper and have it update the right GPR
without going through a temp?

>      int access_type; /* when a memory exception occurs, the access
>                          type is stored here */
>  
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index d751f0e219..3f451a5d7e 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -799,3 +799,8 @@ DEF_HELPER_4(dscliq, void, env, fprp, fprp, i32)
>  
>  DEF_HELPER_1(tbegin, void, env)
>  DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
> +
> +#if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
> +DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
> +DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
> +#endif
> diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
> index a34e604db3..44a8f3445a 100644
> --- a/target/ppc/mem_helper.c
> +++ b/target/ppc/mem_helper.c
> @@ -21,9 +21,9 @@
>  #include "exec/exec-all.h"
>  #include "qemu/host-utils.h"
>  #include "exec/helper-proto.h"
> -
>  #include "helper_regs.h"
>  #include "exec/cpu_ldst.h"
> +#include "tcg.h"
>  #include "internal.h"
>  
>  //#define DEBUG_OP
> @@ -215,6 +215,24 @@ target_ulong helper_lscbx(CPUPPCState *env, target_ulong addr, uint32_t reg,
>      return i;
>  }
>  
> +#if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
> +uint64_t helper_lq_le_parallel(CPUPPCState *env, target_ulong addr,
> +                               uint32_t opidx)
> +{
> +    Int128 ret = helper_atomic_ldo_le_mmu(env, addr, opidx, GETPC());
> +    env->retxh = int128_gethi(ret);
> +    return int128_getlo(ret);
> +}
> +
> +uint64_t helper_lq_be_parallel(CPUPPCState *env, target_ulong addr,
> +                               uint32_t opidx)
> +{
> +    Int128 ret = helper_atomic_ldo_be_mmu(env, addr, opidx, GETPC());
> +    env->retxh = int128_gethi(ret);
> +    return int128_getlo(ret);
> +}
> +#endif
> +
>  /*****************************************************************************/
>  /* Altivec extension helpers */
>  #if defined(HOST_WORDS_BIGENDIAN)
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 3a215a1dc6..0923cc24e3 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -2607,7 +2607,7 @@ static void gen_ld(DisasContext *ctx)
>  static void gen_lq(DisasContext *ctx)
>  {
>      int ra, rd;
> -    TCGv EA;
> +    TCGv EA, hi, lo;
>  
>      /* lq is a legal user mode instruction starting in ISA 2.07 */
>      bool legal_in_user_mode = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
> @@ -2633,16 +2633,35 @@ static void gen_lq(DisasContext *ctx)
>      EA = tcg_temp_new();
>      gen_addr_imm_index(ctx, EA, 0x0F);
>  
> -    /* We only need to swap high and low halves. gen_qemu_ld64_i64 does
> -       necessary 64-bit byteswap already. */
> -    if (unlikely(ctx->le_mode)) {
> -        gen_qemu_ld64_i64(ctx, cpu_gpr[rd + 1], EA);
> +    /* Note that the low part is always in RD+1, even in LE mode.  */
> +    lo = cpu_gpr[rd + 1];
> +    hi = cpu_gpr[rd];
> +
> +    if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
> +#ifdef CONFIG_ATOMIC128
> +        TCGv_i32 oi = tcg_temp_new_i32();
> +        if (ctx->le_mode) {
> +            tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ, ctx->mem_idx));
> +            gen_helper_lq_le_parallel(lo, cpu_env, EA, oi);
> +        } else {
> +            tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ, ctx->mem_idx));
> +            gen_helper_lq_be_parallel(lo, cpu_env, EA, oi);
> +        }
> +        tcg_temp_free_i32(oi);
> +        tcg_gen_ld_i64(hi, cpu_env, offsetof(CPUPPCState, retxh));
> +#else
> +        /* Restart with exclusive lock.  */
> +        gen_helper_exit_atomic(cpu_env);
> +        ctx->base.is_jmp = DISAS_NORETURN;
> +#endif
> +    } else if (ctx->le_mode) {
> +        tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_LEQ);
>          gen_addr_add(ctx, EA, EA, 8);
> -        gen_qemu_ld64_i64(ctx, cpu_gpr[rd], EA);
> +        tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_LEQ);
>      } else {
> -        gen_qemu_ld64_i64(ctx, cpu_gpr[rd], EA);
> +        tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_BEQ);
>          gen_addr_add(ctx, EA, EA, 8);
> -        gen_qemu_ld64_i64(ctx, cpu_gpr[rd + 1], EA);
> +        tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_BEQ);
>      }
>      tcg_temp_free(EA);
>  }
> @@ -3236,9 +3255,8 @@ STCX(stdcx_, DEF_MEMOP(MO_Q))
>  /* lqarx */
>  static void gen_lqarx(DisasContext *ctx)
>  {
> -    TCGv EA;
>      int rd = rD(ctx->opcode);
> -    TCGv gpr1, gpr2;
> +    TCGv EA, hi, lo;
>  
>      if (unlikely((rd & 1) || (rd == rA(ctx->opcode)) ||
>                   (rd == rB(ctx->opcode)))) {
> @@ -3247,24 +3265,49 @@ static void gen_lqarx(DisasContext *ctx)
>      }
>  
>      gen_set_access_type(ctx, ACCESS_RES);
> -    EA = tcg_temp_local_new();
> +    EA = tcg_temp_new();
>      gen_addr_reg_index(ctx, EA);
> -    gen_check_align(ctx, EA, 15);
> -    if (unlikely(ctx->le_mode)) {
> -        gpr1 = cpu_gpr[rd+1];
> -        gpr2 = cpu_gpr[rd];
> -    } else {
> -        gpr1 = cpu_gpr[rd];
> -        gpr2 = cpu_gpr[rd+1];
> -    }
> -    tcg_gen_qemu_ld_i64(gpr1, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
> -    tcg_gen_mov_tl(cpu_reserve, EA);
> -    gen_addr_add(ctx, EA, EA, 8);
> -    tcg_gen_qemu_ld_i64(gpr2, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
>  
> -    tcg_gen_st_tl(gpr1, cpu_env, offsetof(CPUPPCState, reserve_val));
> -    tcg_gen_st_tl(gpr2, cpu_env, offsetof(CPUPPCState, reserve_val2));
> +    /* Note that the low part is always in RD+1, even in LE mode.  */
> +    lo = cpu_gpr[rd + 1];
> +    hi = cpu_gpr[rd];
> +
> +    if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
> +#ifdef CONFIG_ATOMIC128
> +        TCGv_i32 oi = tcg_temp_new_i32();
> +        if (ctx->le_mode) {
> +            tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ | MO_ALIGN_16,
> +                                                ctx->mem_idx));
> +            gen_helper_lq_le_parallel(lo, cpu_env, EA, oi);
> +        } else {
> +            tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ | MO_ALIGN_16,
> +                                                ctx->mem_idx));
> +            gen_helper_lq_be_parallel(lo, cpu_env, EA, oi);
> +        }
> +        tcg_temp_free_i32(oi);
> +        tcg_gen_ld_i64(hi, cpu_env, offsetof(CPUPPCState, retxh));
> +#else
> +        /* Restart with exclusive lock.  */
> +        gen_helper_exit_atomic(cpu_env);
> +        ctx->base.is_jmp = DISAS_NORETURN;
> +        tcg_temp_free(EA);
> +        return;
> +#endif
> +    } else if (ctx->le_mode) {
> +        tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_LEQ | MO_ALIGN_16);
> +        tcg_gen_mov_tl(cpu_reserve, EA);
> +        gen_addr_add(ctx, EA, EA, 8);
> +        tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_LEQ);
> +    } else {
> +        tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_BEQ | MO_ALIGN_16);
> +        tcg_gen_mov_tl(cpu_reserve, EA);
> +        gen_addr_add(ctx, EA, EA, 8);
> +        tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_BEQ);
> +    }
>      tcg_temp_free(EA);
> +
> +    tcg_gen_st_tl(hi, cpu_env, offsetof(CPUPPCState, reserve_val));
> +    tcg_gen_st_tl(lo, cpu_env, offsetof(CPUPPCState, reserve_val2));
>  }
>  
>  /* stqcx. */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ Richard Henderson
@ 2018-06-28  3:51   ` David Gibson
  2018-06-29  3:33   ` David Gibson
  1 sibling, 0 replies; 23+ messages in thread
From: David Gibson @ 2018-06-28  3:51 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 5147 bytes --]

On Tue, Jun 26, 2018 at 09:19:11AM -0700, Richard Henderson wrote:
> Section 1.4 of the Power ISA v3.0B states that this insn is
> single-copy atomic.  As we cannot (yet) issue 128-bit loads

nit: s/loads/stores/

> within TCG, use the generic helpers provided.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/ppc/helper.h     |  4 ++++
>  target/ppc/mem_helper.c | 14 ++++++++++++++
>  target/ppc/translate.c  | 35 +++++++++++++++++++++++++++--------
>  3 files changed, 45 insertions(+), 8 deletions(-)
> 
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index 3f451a5d7e..cbc1228570 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -803,4 +803,8 @@ DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
>  #if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
>  DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
>  DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
> +DEF_HELPER_FLAGS_5(stq_le_parallel, TCG_CALL_NO_WG,
> +                   void, env, tl, i64, i64, i32)
> +DEF_HELPER_FLAGS_5(stq_be_parallel, TCG_CALL_NO_WG,
> +                   void, env, tl, i64, i64, i32)
>  #endif
> diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
> index 44a8f3445a..57e301edc3 100644
> --- a/target/ppc/mem_helper.c
> +++ b/target/ppc/mem_helper.c
> @@ -231,6 +231,20 @@ uint64_t helper_lq_be_parallel(CPUPPCState *env, target_ulong addr,
>      env->retxh = int128_gethi(ret);
>      return int128_getlo(ret);
>  }
> +
> +void helper_stq_le_parallel(CPUPPCState *env, target_ulong addr,
> +                            uint64_t lo, uint64_t hi, uint32_t opidx)
> +{
> +    Int128 val = int128_make128(lo, hi);
> +    helper_atomic_sto_le_mmu(env, addr, val, opidx, GETPC());
> +}
> +
> +void helper_stq_be_parallel(CPUPPCState *env, target_ulong addr,
> +                            uint64_t lo, uint64_t hi, uint32_t opidx)
> +{
> +    Int128 val = int128_make128(lo, hi);
> +    helper_atomic_sto_be_mmu(env, addr, val, opidx, GETPC());
> +}
>  #endif
>  
>  /*****************************************************************************/
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 0923cc24e3..3d63a62269 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -2760,6 +2760,7 @@ static void gen_std(DisasContext *ctx)
>      if ((ctx->opcode & 0x3) == 0x2) { /* stq */
>          bool legal_in_user_mode = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
>          bool le_is_supported = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
> +        TCGv hi, lo;
>  
>          if (!(ctx->insns_flags & PPC_64BX)) {
>              gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
> @@ -2783,20 +2784,38 @@ static void gen_std(DisasContext *ctx)
>          EA = tcg_temp_new();
>          gen_addr_imm_index(ctx, EA, 0x03);
>  
> -        /* We only need to swap high and low halves. gen_qemu_st64_i64 does
> -           necessary 64-bit byteswap already. */
> -        if (unlikely(ctx->le_mode)) {
> -            gen_qemu_st64_i64(ctx, cpu_gpr[rs + 1], EA);
> +        /* Note that the low part is always in RS+1, even in LE mode.  */
> +        lo = cpu_gpr[rs + 1];
> +        hi = cpu_gpr[rs];
> +
> +        if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
> +#ifdef CONFIG_ATOMIC128
> +            TCGv_i32 oi = tcg_temp_new_i32();
> +            if (ctx->le_mode) {
> +                tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ, ctx->mem_idx));
> +                gen_helper_stq_le_parallel(cpu_env, EA, lo, hi, oi);
> +            } else {
> +                tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ, ctx->mem_idx));
> +                gen_helper_stq_be_parallel(cpu_env, EA, lo, hi, oi);
> +            }
> +            tcg_temp_free_i32(oi);
> +#else
> +            /* Restart with exclusive lock.  */
> +            gen_helper_exit_atomic(cpu_env);
> +            ctx->base.is_jmp = DISAS_NORETURN;
> +#endif
> +        } else if (ctx->le_mode) {
> +            tcg_gen_qemu_st_i64(lo, EA, ctx->mem_idx, MO_LEQ);
>              gen_addr_add(ctx, EA, EA, 8);
> -            gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
> +            tcg_gen_qemu_st_i64(hi, EA, ctx->mem_idx, MO_LEQ);
>          } else {
> -            gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
> +            tcg_gen_qemu_st_i64(hi, EA, ctx->mem_idx, MO_BEQ);
>              gen_addr_add(ctx, EA, EA, 8);
> -            gen_qemu_st64_i64(ctx, cpu_gpr[rs + 1], EA);
> +            tcg_gen_qemu_st_i64(lo, EA, ctx->mem_idx, MO_BEQ);
>          }
>          tcg_temp_free(EA);
>      } else {
> -        /* std / stdu*/
> +        /* std / stdu */
>          if (Rc(ctx->opcode)) {
>              if (unlikely(rA(ctx->opcode) == 0)) {
>                  gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX
  2018-06-28  3:49   ` David Gibson
@ 2018-06-28 15:22     ` Richard Henderson
  2018-06-29  3:33       ` David Gibson
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2018-06-28 15:22 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 956 bytes --]

On 06/27/2018 08:49 PM, David Gibson wrote:
>> +    /* High part of 128-bit helper return.  */
>> +    uint64_t retxh;
>> +
> 
> Adding a temporary here is kind of gross.  I guess the helper
> interface doesn't allow for 128-bit returns, but couldn't you pass a
> register number into the helper and have it update the right GPR
> without going through a temp?

I could pass a pointer, but that would cause ...

>> +#if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
>> +DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
>> +DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)

... the helper definitions to lose TCG_CALL_NO_WG, because they *would* write
to a global register.  Which would cause TCG to discard all of the global guest
registers cached within host registers.

I've used this secondary memory return before, in target/s390,
and to me it seems cleaner than pointers.


r~


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 508 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX
  2018-06-28 15:22     ` Richard Henderson
@ 2018-06-29  3:33       ` David Gibson
  0 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2018-06-29  3:33 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 1304 bytes --]

On Thu, Jun 28, 2018 at 08:22:38AM -0700, Richard Henderson wrote:
> On 06/27/2018 08:49 PM, David Gibson wrote:
> >> +    /* High part of 128-bit helper return.  */
> >> +    uint64_t retxh;
> >> +
> > 
> > Adding a temporary here is kind of gross.  I guess the helper
> > interface doesn't allow for 128-bit returns, but couldn't you pass a
> > register number into the helper and have it update the right GPR
> > without going through a temp?
> 
> I could pass a pointer, but that would cause ...
> 
> >> +#if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
> >> +DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
> >> +DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
> 
> ... the helper definitions to lose TCG_CALL_NO_WG, because they *would* write
> to a global register.  Which would cause TCG to discard all of the global guest
> registers cached within host registers.
> 
> I've used this secondary memory return before, in target/s390,
> and to me it seems cleaner than pointers.

Ok, sounds reasonable, applied to ppc-for-3.0.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ Richard Henderson
  2018-06-28  3:51   ` David Gibson
@ 2018-06-29  3:33   ` David Gibson
  1 sibling, 0 replies; 23+ messages in thread
From: David Gibson @ 2018-06-29  3:33 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 5158 bytes --]

On Tue, Jun 26, 2018 at 09:19:11AM -0700, Richard Henderson wrote:
> Section 1.4 of the Power ISA v3.0B states that this insn is
> single-copy atomic.  As we cannot (yet) issue 128-bit loads
> within TCG, use the generic helpers provided.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Applied to ppc-for-3.0, thanks.

> ---
>  target/ppc/helper.h     |  4 ++++
>  target/ppc/mem_helper.c | 14 ++++++++++++++
>  target/ppc/translate.c  | 35 +++++++++++++++++++++++++++--------
>  3 files changed, 45 insertions(+), 8 deletions(-)
> 
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index 3f451a5d7e..cbc1228570 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -803,4 +803,8 @@ DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
>  #if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
>  DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
>  DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
> +DEF_HELPER_FLAGS_5(stq_le_parallel, TCG_CALL_NO_WG,
> +                   void, env, tl, i64, i64, i32)
> +DEF_HELPER_FLAGS_5(stq_be_parallel, TCG_CALL_NO_WG,
> +                   void, env, tl, i64, i64, i32)
>  #endif
> diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
> index 44a8f3445a..57e301edc3 100644
> --- a/target/ppc/mem_helper.c
> +++ b/target/ppc/mem_helper.c
> @@ -231,6 +231,20 @@ uint64_t helper_lq_be_parallel(CPUPPCState *env, target_ulong addr,
>      env->retxh = int128_gethi(ret);
>      return int128_getlo(ret);
>  }
> +
> +void helper_stq_le_parallel(CPUPPCState *env, target_ulong addr,
> +                            uint64_t lo, uint64_t hi, uint32_t opidx)
> +{
> +    Int128 val = int128_make128(lo, hi);
> +    helper_atomic_sto_le_mmu(env, addr, val, opidx, GETPC());
> +}
> +
> +void helper_stq_be_parallel(CPUPPCState *env, target_ulong addr,
> +                            uint64_t lo, uint64_t hi, uint32_t opidx)
> +{
> +    Int128 val = int128_make128(lo, hi);
> +    helper_atomic_sto_be_mmu(env, addr, val, opidx, GETPC());
> +}
>  #endif
>  
>  /*****************************************************************************/
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 0923cc24e3..3d63a62269 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -2760,6 +2760,7 @@ static void gen_std(DisasContext *ctx)
>      if ((ctx->opcode & 0x3) == 0x2) { /* stq */
>          bool legal_in_user_mode = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
>          bool le_is_supported = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
> +        TCGv hi, lo;
>  
>          if (!(ctx->insns_flags & PPC_64BX)) {
>              gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
> @@ -2783,20 +2784,38 @@ static void gen_std(DisasContext *ctx)
>          EA = tcg_temp_new();
>          gen_addr_imm_index(ctx, EA, 0x03);
>  
> -        /* We only need to swap high and low halves. gen_qemu_st64_i64 does
> -           necessary 64-bit byteswap already. */
> -        if (unlikely(ctx->le_mode)) {
> -            gen_qemu_st64_i64(ctx, cpu_gpr[rs + 1], EA);
> +        /* Note that the low part is always in RS+1, even in LE mode.  */
> +        lo = cpu_gpr[rs + 1];
> +        hi = cpu_gpr[rs];
> +
> +        if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
> +#ifdef CONFIG_ATOMIC128
> +            TCGv_i32 oi = tcg_temp_new_i32();
> +            if (ctx->le_mode) {
> +                tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ, ctx->mem_idx));
> +                gen_helper_stq_le_parallel(cpu_env, EA, lo, hi, oi);
> +            } else {
> +                tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ, ctx->mem_idx));
> +                gen_helper_stq_be_parallel(cpu_env, EA, lo, hi, oi);
> +            }
> +            tcg_temp_free_i32(oi);
> +#else
> +            /* Restart with exclusive lock.  */
> +            gen_helper_exit_atomic(cpu_env);
> +            ctx->base.is_jmp = DISAS_NORETURN;
> +#endif
> +        } else if (ctx->le_mode) {
> +            tcg_gen_qemu_st_i64(lo, EA, ctx->mem_idx, MO_LEQ);
>              gen_addr_add(ctx, EA, EA, 8);
> -            gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
> +            tcg_gen_qemu_st_i64(hi, EA, ctx->mem_idx, MO_LEQ);
>          } else {
> -            gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
> +            tcg_gen_qemu_st_i64(hi, EA, ctx->mem_idx, MO_BEQ);
>              gen_addr_add(ctx, EA, EA, 8);
> -            gen_qemu_st64_i64(ctx, cpu_gpr[rs + 1], EA);
> +            tcg_gen_qemu_st_i64(lo, EA, ctx->mem_idx, MO_BEQ);
>          }
>          tcg_temp_free(EA);
>      } else {
> -        /* std / stdu*/
> +        /* std / stdu */
>          if (Rc(ctx->opcode)) {
>              if (unlikely(rA(ctx->opcode) == 0)) {
>                  gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (12 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 13/13] target/ppc: Implement the rest of gen_st_atomic Richard Henderson
@ 2018-06-29  4:15 ` David Gibson
  13 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2018-06-29  4:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 2578 bytes --]

On Tue, Jun 26, 2018 at 09:19:08AM -0700, Richard Henderson wrote:
> In another patch set this week, I had noticed the old linux-user
> do_store_exclusive code was still present.  I had thought that was
> dead code that simply hadn't been removed, but it turned out that
> we had not completed the transition to tcg atomics for linux-user.
> 
> In the process, I discovered that we weren't using atomic operations
> for the 128-bit lq, lqarx, and stqcx insns.  These would have simply
> produced incorrect results for -smp in system mode.
> 
> I tidy the code a bit by making use of MO_ALIGN, which means that
> we don't need a separate explicit alignment check.
> 
> I use the new min/max atomic operations I added recently for
> ARMv8.2-Atomics and RISC-V.
> 
> Finally, Power9 has some *really* odd atomic operations in its
> l[wd]at and st[wd]at instructions.  We were generating illegal
> instruction for these.  I implement them for serial context and
> force parallel context to grab the exclusive lock and try again.
> 
> Except for the trivial linux-user ll/sc case, I do not have any
> code that exercises these instructions.  Perhaps the IBM folk
> have something that can test the others?

I've now applied the whole series to ppc-for-3.0.

> 
> 
> r~
> 
> 
> Richard Henderson (13):
>   target/ppc: Add do_unaligned_access hook
>   target/ppc: Use atomic load for LQ and LQARX
>   target/ppc: Use atomic store for STQ
>   target/ppc: Use atomic cmpxchg for STQCX
>   target/ppc: Remove POWERPC_EXCP_STCX
>   target/ppc: Tidy gen_conditional_store
>   target/ppc: Split out gen_load_locked
>   target/ppc: Split out gen_ld_atomic
>   target/ppc: Split out gen_st_atomic
>   target/ppc: Use MO_ALIGN for EXIWX and ECOWX
>   target/ppc: Use atomic min/max helpers
>   target/ppc: Implement the rest of gen_ld_atomic
>   target/ppc: Implement the rest of gen_st_atomic
> 
>  target/ppc/cpu.h                |   8 +-
>  target/ppc/helper.h             |  11 +
>  target/ppc/internal.h           |   5 +
>  linux-user/ppc/cpu_loop.c       | 123 ++----
>  target/ppc/excp_helper.c        |  18 +-
>  target/ppc/mem_helper.c         |  72 +++-
>  target/ppc/translate.c          | 648 ++++++++++++++++++++------------
>  target/ppc/translate_init.inc.c |   1 +
>  8 files changed, 539 insertions(+), 347 deletions(-)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-06-29  4:15 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook Richard Henderson
2018-06-27  9:09   ` David Gibson
2018-06-27 13:52     ` Richard Henderson
2018-06-28  3:46       ` David Gibson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX Richard Henderson
2018-06-28  3:49   ` David Gibson
2018-06-28 15:22     ` Richard Henderson
2018-06-29  3:33       ` David Gibson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ Richard Henderson
2018-06-28  3:51   ` David Gibson
2018-06-29  3:33   ` David Gibson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 04/13] target/ppc: Use atomic cmpxchg for STQCX Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 05/13] target/ppc: Remove POWERPC_EXCP_STCX Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 06/13] target/ppc: Tidy gen_conditional_store Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 07/13] target/ppc: Split out gen_load_locked Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 08/13] target/ppc: Split out gen_ld_atomic Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 09/13] target/ppc: Split out gen_st_atomic Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 10/13] target/ppc: Use MO_ALIGN for EXIWX and ECOWX Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 11/13] target/ppc: Use atomic min/max helpers Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 12/13] target/ppc: Implement the rest of gen_ld_atomic Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 13/13] target/ppc: Implement the rest of gen_st_atomic Richard Henderson
2018-06-29  4:15 ` [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.