All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4
@ 2016-09-12  6:41 Nikunj A Dadhania
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 01/17] target-ppc: consolidate load operations Nikunj A Dadhania
                   ` (18 more replies)
  0 siblings, 19 replies; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-12  6:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

1) Consolidate Load/Store operations using tcg_gen_qemu_ld/st functions
2) This series contains 10 new instructions for POWER9 ISA3.0
   Use newer qemu load/store tcg helpers and optimize stxvw4x and lxvw4x.

Patches:
 01-09:  Cleanup load/store operations in ppc translator 
    10:  xxspltib: VSX Vector Splat Immediate Byte
    11:  darn: Deliver a random number
    12:  lxsibzx - Load VSX Scalar as Integer Byte & Zero Indexed
         lxsihzx - Load VSX Scalar as Integer Halfword & Zero Indexed
    13:  stxsibx - Store VSX Scalar as Integer Byte Indexed
         stxsihx - Store VSX Scalar as Integer Halfword Indexed
    14:  lxvw4x - improve implementation
    15:  lxvb16x: Load VSX Vector Byte*16
         lxvh8x:  Load VSX Vector Halfword*8
    16:  stxv4x - improve implementation
    17:  stxvb16x: Store VSX Vector Byte*16
         stxvh8x:  Store VSX Vector Halfword*8

Series also available here: https://github.com/nikunjad/qemu/tree/p9-tcg

Changelog:
v1: 
* More load/store cleanups in byte reverse routines
* ld64/st64 converted to newer macro and updated call sites
* Cleanup load with reservation and store conditional
* Return invalid random for darn instruction

v0:
* darn - read /dev/random to get the random number
* xxspltib - make is PPC64 only
* Consolidate load/store operations and use macros to generate qemu_st/ld
* Simplify load/store vsx endian manipulation

Nikunj A Dadhania (16):
  target-ppc: consolidate load operations
  target-ppc: convert ld64 to use new macro
  target-ppc: convert ld[16,32,64]ur to use new macro
  target-ppc: consolidate store operations
  target-ppc: convert st64 to use new macro
  target-ppc: convert st[16,32,64]r to use new macro
  target-ppc: consolidate load with reservation
  target-ppc: move out stqcx impementation
  target-ppc: consolidate store conditional
  target-ppc: add xxspltib instruction
  target-ppc: add lxsi[bw]zx instruction
  target-ppc: add stxsi[bh]x instruction
  target-ppc: improve lxvw4x implementation
  target-ppc: add lxvb16x and lxvh8x
  target-ppc: improve stxvw4x implementation
  target-ppc: add stxvb16x and stxvh8x

Ravi Bangoria (1):
  target-ppc: implement darn instruction

 target-ppc/helper.h                 |   4 +
 target-ppc/int_helper.c             |  16 ++
 target-ppc/mem_helper.c             |  11 ++
 target-ppc/translate.c              | 379 +++++++++++++++++-------------------
 target-ppc/translate/fp-impl.inc.c  |  84 ++++----
 target-ppc/translate/fp-ops.inc.c   |   2 +-
 target-ppc/translate/spe-impl.inc.c |   4 +-
 target-ppc/translate/vmx-impl.inc.c |  24 +--
 target-ppc/translate/vsx-impl.inc.c | 208 ++++++++++++++++----
 target-ppc/translate/vsx-ops.inc.c  |  13 ++
 10 files changed, 460 insertions(+), 285 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RESEND v2 01/17] target-ppc: consolidate load operations
  2016-09-12  6:41 [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 Nikunj A Dadhania
@ 2016-09-12  6:41 ` Nikunj A Dadhania
  2016-09-15  0:46   ` David Gibson
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 02/17] target-ppc: convert ld64 to use new macro Nikunj A Dadhania
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-12  6:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

Implement macro to consolidate store operations using newer
tcg_gen_qemu_ld functions.

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/translate.c | 58 +++++++++++++++++---------------------------------
 1 file changed, 20 insertions(+), 38 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index a27f455..6606969 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -2462,50 +2462,32 @@ static inline void gen_align_no_le(DisasContext *ctx)
 }
 
 /***                             Integer load                              ***/
-static inline void gen_qemu_ld8u(DisasContext *ctx, TCGv arg1, TCGv arg2)
-{
-    tcg_gen_qemu_ld8u(arg1, arg2, ctx->mem_idx);
-}
-
-static inline void gen_qemu_ld16u(DisasContext *ctx, TCGv arg1, TCGv arg2)
-{
-    TCGMemOp op = MO_UW | ctx->default_tcg_memop_mask;
-    tcg_gen_qemu_ld_tl(arg1, arg2, ctx->mem_idx, op);
-}
-
-static inline void gen_qemu_ld16s(DisasContext *ctx, TCGv arg1, TCGv arg2)
-{
-    TCGMemOp op = MO_SW | ctx->default_tcg_memop_mask;
-    tcg_gen_qemu_ld_tl(arg1, arg2, ctx->mem_idx, op);
-}
+#define DEF_MEMOP(op) ((op) | ctx->default_tcg_memop_mask)
 
-static inline void gen_qemu_ld32u(DisasContext *ctx, TCGv arg1, TCGv arg2)
-{
-    TCGMemOp op = MO_UL | ctx->default_tcg_memop_mask;
-    tcg_gen_qemu_ld_tl(arg1, arg2, ctx->mem_idx, op);
+#define GEN_QEMU_LOAD_TL(ldop, op)                                      \
+static void glue(gen_qemu_, ldop)(DisasContext *ctx,                    \
+                                  TCGv val,                             \
+                                  TCGv addr)                            \
+{                                                                       \
+    tcg_gen_qemu_ld_tl(val, addr, ctx->mem_idx, op);                    \
 }
 
-static void gen_qemu_ld32u_i64(DisasContext *ctx, TCGv_i64 val, TCGv addr)
-{
-    TCGv tmp = tcg_temp_new();
-    gen_qemu_ld32u(ctx, tmp, addr);
-    tcg_gen_extu_tl_i64(val, tmp);
-    tcg_temp_free(tmp);
-}
+GEN_QEMU_LOAD_TL(ld8u,  DEF_MEMOP(MO_UB))
+GEN_QEMU_LOAD_TL(ld16u, DEF_MEMOP(MO_UW))
+GEN_QEMU_LOAD_TL(ld16s, DEF_MEMOP(MO_SW))
+GEN_QEMU_LOAD_TL(ld32u, DEF_MEMOP(MO_UL))
+GEN_QEMU_LOAD_TL(ld32s, DEF_MEMOP(MO_SL))
 
-static inline void gen_qemu_ld32s(DisasContext *ctx, TCGv arg1, TCGv arg2)
-{
-    TCGMemOp op = MO_SL | ctx->default_tcg_memop_mask;
-    tcg_gen_qemu_ld_tl(arg1, arg2, ctx->mem_idx, op);
+#define GEN_QEMU_LOAD_64(ldop, op)                                  \
+static void glue(gen_qemu_, glue(ldop, _i64))(DisasContext *ctx,    \
+                                             TCGv_i64 val,          \
+                                             TCGv addr)             \
+{                                                                   \
+    tcg_gen_qemu_ld_i64(val, addr, ctx->mem_idx, op);               \
 }
 
-static void gen_qemu_ld32s_i64(DisasContext *ctx, TCGv_i64 val, TCGv addr)
-{
-    TCGv tmp = tcg_temp_new();
-    gen_qemu_ld32s(ctx, tmp, addr);
-    tcg_gen_ext_tl_i64(val, tmp);
-    tcg_temp_free(tmp);
-}
+GEN_QEMU_LOAD_64(ld32u, DEF_MEMOP(MO_UL))
+GEN_QEMU_LOAD_64(ld32s, DEF_MEMOP(MO_SL))
 
 static inline void gen_qemu_ld64(DisasContext *ctx, TCGv_i64 arg1, TCGv arg2)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RESEND v2 02/17] target-ppc: convert ld64 to use new macro
  2016-09-12  6:41 [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 Nikunj A Dadhania
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 01/17] target-ppc: consolidate load operations Nikunj A Dadhania
@ 2016-09-12  6:41 ` Nikunj A Dadhania
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 03/17] target-ppc: convert ld[16, 32, 64]ur " Nikunj A Dadhania
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-12  6:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

Use macro for ld64 as well, this changes the function signature from
gen_qemu_ld64 => gen_qemu_ld64_i64. Replace this at all the call sites.

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/translate.c              | 39 +++++++++++++++-------------------
 target-ppc/translate/fp-impl.inc.c  | 42 ++++++++++++++++++-------------------
 target-ppc/translate/spe-impl.inc.c |  2 +-
 target-ppc/translate/vmx-impl.inc.c | 12 +++++------
 target-ppc/translate/vsx-impl.inc.c |  8 +++----
 5 files changed, 49 insertions(+), 54 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 6606969..d36d45e 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -2488,12 +2488,7 @@ static void glue(gen_qemu_, glue(ldop, _i64))(DisasContext *ctx,    \
 
 GEN_QEMU_LOAD_64(ld32u, DEF_MEMOP(MO_UL))
 GEN_QEMU_LOAD_64(ld32s, DEF_MEMOP(MO_SL))
-
-static inline void gen_qemu_ld64(DisasContext *ctx, TCGv_i64 arg1, TCGv arg2)
-{
-    TCGMemOp op = MO_Q | ctx->default_tcg_memop_mask;
-    tcg_gen_qemu_ld_i64(arg1, arg2, ctx->mem_idx, op);
-}
+GEN_QEMU_LOAD_64(ld64,  DEF_MEMOP(MO_Q))
 
 static inline void gen_qemu_st8(DisasContext *ctx, TCGv arg1, TCGv arg2)
 {
@@ -2612,12 +2607,12 @@ GEN_LDUX(lwa, ld32s, 0x15, 0x0B, PPC_64B);
 /* lwax */
 GEN_LDX(lwa, ld32s, 0x15, 0x0A, PPC_64B);
 /* ldux */
-GEN_LDUX(ld, ld64, 0x15, 0x01, PPC_64B);
+GEN_LDUX(ld, ld64_i64, 0x15, 0x01, PPC_64B);
 /* ldx */
-GEN_LDX(ld, ld64, 0x15, 0x00, PPC_64B);
+GEN_LDX(ld, ld64_i64, 0x15, 0x00, PPC_64B);
 
 /* CI load/store variants */
-GEN_LDX_HVRM(ldcix, ld64, 0x15, 0x1b, PPC_CILDST)
+GEN_LDX_HVRM(ldcix, ld64_i64, 0x15, 0x1b, PPC_CILDST)
 GEN_LDX_HVRM(lwzcix, ld32u, 0x15, 0x15, PPC_CILDST)
 GEN_LDX_HVRM(lhzcix, ld16u, 0x15, 0x19, PPC_CILDST)
 GEN_LDX_HVRM(lbzcix, ld8u, 0x15, 0x1a, PPC_CILDST)
@@ -2640,7 +2635,7 @@ static void gen_ld(DisasContext *ctx)
         gen_qemu_ld32s(ctx, cpu_gpr[rD(ctx->opcode)], EA);
     } else {
         /* ld - ldu */
-        gen_qemu_ld64(ctx, cpu_gpr[rD(ctx->opcode)], EA);
+        gen_qemu_ld64_i64(ctx, cpu_gpr[rD(ctx->opcode)], EA);
     }
     if (Rc(ctx->opcode))
         tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);
@@ -2677,16 +2672,16 @@ static void gen_lq(DisasContext *ctx)
     EA = tcg_temp_new();
     gen_addr_imm_index(ctx, EA, 0x0F);
 
-    /* We only need to swap high and low halves. gen_qemu_ld64 does necessary
-       64-bit byteswap already. */
+    /* We only need to swap high and low halves. gen_qemu_ld64_i64 does
+       necessary 64-bit byteswap already. */
     if (unlikely(ctx->le_mode)) {
-        gen_qemu_ld64(ctx, cpu_gpr[rd+1], EA);
+        gen_qemu_ld64_i64(ctx, cpu_gpr[rd + 1], EA);
         gen_addr_add(ctx, EA, EA, 8);
-        gen_qemu_ld64(ctx, cpu_gpr[rd], EA);
+        gen_qemu_ld64_i64(ctx, cpu_gpr[rd], EA);
     } else {
-        gen_qemu_ld64(ctx, cpu_gpr[rd], EA);
+        gen_qemu_ld64_i64(ctx, cpu_gpr[rd], EA);
         gen_addr_add(ctx, EA, EA, 8);
-        gen_qemu_ld64(ctx, cpu_gpr[rd+1], EA);
+        gen_qemu_ld64_i64(ctx, cpu_gpr[rd + 1], EA);
     }
     tcg_temp_free(EA);
 }
@@ -3184,7 +3179,7 @@ STCX(stwcx_, 4);
 
 #if defined(TARGET_PPC64)
 /* ldarx */
-LARX(ldarx, 8, ld64);
+LARX(ldarx, 8, ld64_i64);
 
 /* lqarx */
 static void gen_lqarx(DisasContext *ctx)
@@ -3210,11 +3205,11 @@ static void gen_lqarx(DisasContext *ctx)
         gpr1 = cpu_gpr[rd];
         gpr2 = cpu_gpr[rd+1];
     }
-    gen_qemu_ld64(ctx, gpr1, EA);
+    gen_qemu_ld64_i64(ctx, gpr1, EA);
     tcg_gen_mov_tl(cpu_reserve, EA);
 
     gen_addr_add(ctx, EA, EA, 8);
-    gen_qemu_ld64(ctx, gpr2, EA);
+    gen_qemu_ld64_i64(ctx, gpr2, EA);
 
     tcg_gen_st_tl(gpr1, cpu_env, offsetof(CPUPPCState, reserve_val));
     tcg_gen_st_tl(gpr2, cpu_env, offsetof(CPUPPCState, reserve_val2));
@@ -6601,12 +6596,12 @@ GEN_LDS(lwz, ld32u, 0x00, PPC_INTEGER)
 #if defined(TARGET_PPC64)
 GEN_LDUX(lwa, ld32s, 0x15, 0x0B, PPC_64B)
 GEN_LDX(lwa, ld32s, 0x15, 0x0A, PPC_64B)
-GEN_LDUX(ld, ld64, 0x15, 0x01, PPC_64B)
-GEN_LDX(ld, ld64, 0x15, 0x00, PPC_64B)
+GEN_LDUX(ld, ld64_i64, 0x15, 0x01, PPC_64B)
+GEN_LDX(ld, ld64_i64, 0x15, 0x00, PPC_64B)
 GEN_LDX_E(ldbr, ld64ur, 0x14, 0x10, PPC_NONE, PPC2_DBRX, CHK_NONE)
 
 /* HV/P7 and later only */
-GEN_LDX_HVRM(ldcix, ld64, 0x15, 0x1b, PPC_CILDST)
+GEN_LDX_HVRM(ldcix, ld64_i64, 0x15, 0x1b, PPC_CILDST)
 GEN_LDX_HVRM(lwzcix, ld32u, 0x15, 0x18, PPC_CILDST)
 GEN_LDX_HVRM(lhzcix, ld16u, 0x15, 0x19, PPC_CILDST)
 GEN_LDX_HVRM(lbzcix, ld8u, 0x15, 0x1a, PPC_CILDST)
diff --git a/target-ppc/translate/fp-impl.inc.c b/target-ppc/translate/fp-impl.inc.c
index 9ba9289..53b7fc7 100644
--- a/target-ppc/translate/fp-impl.inc.c
+++ b/target-ppc/translate/fp-impl.inc.c
@@ -672,7 +672,7 @@ static inline void gen_qemu_ld32fs(DisasContext *ctx, TCGv_i64 arg1, TCGv arg2)
 }
 
  /* lfd lfdu lfdux lfdx */
-GEN_LDFS(lfd, ld64, 0x12, PPC_FLOAT);
+GEN_LDFS(lfd, ld64_i64, 0x12, PPC_FLOAT);
  /* lfs lfsu lfsux lfsx */
 GEN_LDFS(lfs, ld32fs, 0x10, PPC_FLOAT);
 
@@ -687,16 +687,16 @@ static void gen_lfdp(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
     gen_addr_imm_index(ctx, EA, 0);
-    /* We only need to swap high and low halves. gen_qemu_ld64 does necessary
-       64-bit byteswap already. */
+    /* We only need to swap high and low halves. gen_qemu_ld64_i64 does
+       necessary 64-bit byteswap already. */
     if (unlikely(ctx->le_mode)) {
-        gen_qemu_ld64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_ld64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
     } else {
-        gen_qemu_ld64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_ld64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
     }
     tcg_temp_free(EA);
 }
@@ -712,16 +712,16 @@ static void gen_lfdpx(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    /* We only need to swap high and low halves. gen_qemu_ld64 does necessary
-       64-bit byteswap already. */
+    /* We only need to swap high and low halves. gen_qemu_ld64_i64 does
+       necessary 64-bit byteswap already. */
     if (unlikely(ctx->le_mode)) {
-        gen_qemu_ld64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_ld64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
     } else {
-        gen_qemu_ld64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_ld64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        gen_qemu_ld64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
     }
     tcg_temp_free(EA);
 }
@@ -924,9 +924,9 @@ static void gen_lfq(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     t0 = tcg_temp_new();
     gen_addr_imm_index(ctx, t0, 0);
-    gen_qemu_ld64(ctx, cpu_fpr[rd], t0);
+    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
     gen_addr_add(ctx, t0, t0, 8);
-    gen_qemu_ld64(ctx, cpu_fpr[(rd + 1) % 32], t0);
+    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
     tcg_temp_free(t0);
 }
 
@@ -940,9 +940,9 @@ static void gen_lfqu(DisasContext *ctx)
     t0 = tcg_temp_new();
     t1 = tcg_temp_new();
     gen_addr_imm_index(ctx, t0, 0);
-    gen_qemu_ld64(ctx, cpu_fpr[rd], t0);
+    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
     gen_addr_add(ctx, t1, t0, 8);
-    gen_qemu_ld64(ctx, cpu_fpr[(rd + 1) % 32], t1);
+    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
     if (ra != 0)
         tcg_gen_mov_tl(cpu_gpr[ra], t0);
     tcg_temp_free(t0);
@@ -958,10 +958,10 @@ static void gen_lfqux(DisasContext *ctx)
     TCGv t0, t1;
     t0 = tcg_temp_new();
     gen_addr_reg_index(ctx, t0);
-    gen_qemu_ld64(ctx, cpu_fpr[rd], t0);
+    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
     t1 = tcg_temp_new();
     gen_addr_add(ctx, t1, t0, 8);
-    gen_qemu_ld64(ctx, cpu_fpr[(rd + 1) % 32], t1);
+    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
     tcg_temp_free(t1);
     if (ra != 0)
         tcg_gen_mov_tl(cpu_gpr[ra], t0);
@@ -976,9 +976,9 @@ static void gen_lfqx(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     t0 = tcg_temp_new();
     gen_addr_reg_index(ctx, t0);
-    gen_qemu_ld64(ctx, cpu_fpr[rd], t0);
+    gen_qemu_ld64_i64(ctx, cpu_fpr[rd], t0);
     gen_addr_add(ctx, t0, t0, 8);
-    gen_qemu_ld64(ctx, cpu_fpr[(rd + 1) % 32], t0);
+    gen_qemu_ld64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
     tcg_temp_free(t0);
 }
 
diff --git a/target-ppc/translate/spe-impl.inc.c b/target-ppc/translate/spe-impl.inc.c
index 0ce403a..a969927 100644
--- a/target-ppc/translate/spe-impl.inc.c
+++ b/target-ppc/translate/spe-impl.inc.c
@@ -617,7 +617,7 @@ static inline void gen_addr_spe_imm_index(DisasContext *ctx, TCGv EA, int sh)
 static inline void gen_op_evldd(DisasContext *ctx, TCGv addr)
 {
     TCGv_i64 t0 = tcg_temp_new_i64();
-    gen_qemu_ld64(ctx, t0, addr);
+    gen_qemu_ld64_i64(ctx, t0, addr);
     gen_store_gpr64(rD(ctx->opcode), t0);
     tcg_temp_free_i64(t0);
 }
diff --git a/target-ppc/translate/vmx-impl.inc.c b/target-ppc/translate/vmx-impl.inc.c
index 024faf7..eb3164b 100644
--- a/target-ppc/translate/vmx-impl.inc.c
+++ b/target-ppc/translate/vmx-impl.inc.c
@@ -26,16 +26,16 @@ static void glue(gen_, name)(DisasContext *ctx)
     EA = tcg_temp_new();                                                      \
     gen_addr_reg_index(ctx, EA);                                              \
     tcg_gen_andi_tl(EA, EA, ~0xf);                                            \
-    /* We only need to swap high and low halves. gen_qemu_ld64 does necessary \
-       64-bit byteswap already. */                                            \
+    /* We only need to swap high and low halves. gen_qemu_ld64_i64 does       \
+       necessary 64-bit byteswap already. */                                  \
     if (ctx->le_mode) {                                                       \
-        gen_qemu_ld64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                    \
+        gen_qemu_ld64_i64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                \
         tcg_gen_addi_tl(EA, EA, 8);                                           \
-        gen_qemu_ld64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                    \
+        gen_qemu_ld64_i64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                \
     } else {                                                                  \
-        gen_qemu_ld64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                    \
+        gen_qemu_ld64_i64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                \
         tcg_gen_addi_tl(EA, EA, 8);                                           \
-        gen_qemu_ld64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                    \
+        gen_qemu_ld64_i64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                \
     }                                                                         \
     tcg_temp_free(EA);                                                        \
 }
diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index 9f77b06..5725794 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -34,7 +34,7 @@ static void gen_##name(DisasContext *ctx)                     \
     tcg_temp_free(EA);                                        \
 }
 
-VSX_LOAD_SCALAR(lxsdx, ld64)
+VSX_LOAD_SCALAR(lxsdx, ld64_i64)
 VSX_LOAD_SCALAR(lxsiwax, ld32s_i64)
 VSX_LOAD_SCALAR(lxsiwzx, ld32u_i64)
 VSX_LOAD_SCALAR(lxsspx, ld32fs)
@@ -49,9 +49,9 @@ static void gen_lxvd2x(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    gen_qemu_ld64(ctx, cpu_vsrh(xT(ctx->opcode)), EA);
+    gen_qemu_ld64_i64(ctx, cpu_vsrh(xT(ctx->opcode)), EA);
     tcg_gen_addi_tl(EA, EA, 8);
-    gen_qemu_ld64(ctx, cpu_vsrl(xT(ctx->opcode)), EA);
+    gen_qemu_ld64_i64(ctx, cpu_vsrl(xT(ctx->opcode)), EA);
     tcg_temp_free(EA);
 }
 
@@ -65,7 +65,7 @@ static void gen_lxvdsx(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    gen_qemu_ld64(ctx, cpu_vsrh(xT(ctx->opcode)), EA);
+    gen_qemu_ld64_i64(ctx, cpu_vsrh(xT(ctx->opcode)), EA);
     tcg_gen_mov_i64(cpu_vsrl(xT(ctx->opcode)), cpu_vsrh(xT(ctx->opcode)));
     tcg_temp_free(EA);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RESEND  v2 03/17] target-ppc: convert ld[16, 32, 64]ur to use new macro
  2016-09-12  6:41 [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 Nikunj A Dadhania
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 01/17] target-ppc: consolidate load operations Nikunj A Dadhania
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 02/17] target-ppc: convert ld64 to use new macro Nikunj A Dadhania
@ 2016-09-12  6:41 ` Nikunj A Dadhania
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 04/17] target-ppc: consolidate store operations Nikunj A Dadhania
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-12  6:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

Make byte-swap routines use the common GEN_QEMU_LOAD macro

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/translate.c | 27 ++++++++++-----------------
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index d36d45e..0d27067 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -2463,6 +2463,7 @@ static inline void gen_align_no_le(DisasContext *ctx)
 
 /***                             Integer load                              ***/
 #define DEF_MEMOP(op) ((op) | ctx->default_tcg_memop_mask)
+#define BSWAP_MEMOP(op) ((op) | (ctx->default_tcg_memop_mask ^ MO_BSWAP))
 
 #define GEN_QEMU_LOAD_TL(ldop, op)                                      \
 static void glue(gen_qemu_, ldop)(DisasContext *ctx,                    \
@@ -2478,6 +2479,9 @@ GEN_QEMU_LOAD_TL(ld16s, DEF_MEMOP(MO_SW))
 GEN_QEMU_LOAD_TL(ld32u, DEF_MEMOP(MO_UL))
 GEN_QEMU_LOAD_TL(ld32s, DEF_MEMOP(MO_SL))
 
+GEN_QEMU_LOAD_TL(ld16ur, BSWAP_MEMOP(MO_UW))
+GEN_QEMU_LOAD_TL(ld32ur, BSWAP_MEMOP(MO_UL))
+
 #define GEN_QEMU_LOAD_64(ldop, op)                                  \
 static void glue(gen_qemu_, glue(ldop, _i64))(DisasContext *ctx,    \
                                              TCGv_i64 val,          \
@@ -2490,6 +2494,10 @@ GEN_QEMU_LOAD_64(ld32u, DEF_MEMOP(MO_UL))
 GEN_QEMU_LOAD_64(ld32s, DEF_MEMOP(MO_SL))
 GEN_QEMU_LOAD_64(ld64,  DEF_MEMOP(MO_Q))
 
+#if defined(TARGET_PPC64)
+GEN_QEMU_LOAD_64(ld64ur, BSWAP_MEMOP(MO_Q))
+#endif
+
 static inline void gen_qemu_st8(DisasContext *ctx, TCGv arg1, TCGv arg2)
 {
     tcg_gen_qemu_st8(arg1, arg2, ctx->mem_idx);
@@ -2836,29 +2844,14 @@ static void gen_std(DisasContext *ctx)
 /***                Integer load and store with byte reverse               ***/
 
 /* lhbrx */
-static inline void gen_qemu_ld16ur(DisasContext *ctx, TCGv arg1, TCGv arg2)
-{
-    TCGMemOp op = MO_UW | (ctx->default_tcg_memop_mask ^ MO_BSWAP);
-    tcg_gen_qemu_ld_tl(arg1, arg2, ctx->mem_idx, op);
-}
 GEN_LDX(lhbr, ld16ur, 0x16, 0x18, PPC_INTEGER);
 
 /* lwbrx */
-static inline void gen_qemu_ld32ur(DisasContext *ctx, TCGv arg1, TCGv arg2)
-{
-    TCGMemOp op = MO_UL | (ctx->default_tcg_memop_mask ^ MO_BSWAP);
-    tcg_gen_qemu_ld_tl(arg1, arg2, ctx->mem_idx, op);
-}
 GEN_LDX(lwbr, ld32ur, 0x16, 0x10, PPC_INTEGER);
 
 #if defined(TARGET_PPC64)
 /* ldbrx */
-static inline void gen_qemu_ld64ur(DisasContext *ctx, TCGv arg1, TCGv arg2)
-{
-    TCGMemOp op = MO_Q | (ctx->default_tcg_memop_mask ^ MO_BSWAP);
-    tcg_gen_qemu_ld_i64(arg1, arg2, ctx->mem_idx, op);
-}
-GEN_LDX_E(ldbr, ld64ur, 0x14, 0x10, PPC_NONE, PPC2_DBRX, CHK_NONE);
+GEN_LDX_E(ldbr, ld64ur_i64, 0x14, 0x10, PPC_NONE, PPC2_DBRX, CHK_NONE);
 #endif  /* TARGET_PPC64 */
 
 /* sthbrx */
@@ -6598,7 +6591,7 @@ GEN_LDUX(lwa, ld32s, 0x15, 0x0B, PPC_64B)
 GEN_LDX(lwa, ld32s, 0x15, 0x0A, PPC_64B)
 GEN_LDUX(ld, ld64_i64, 0x15, 0x01, PPC_64B)
 GEN_LDX(ld, ld64_i64, 0x15, 0x00, PPC_64B)
-GEN_LDX_E(ldbr, ld64ur, 0x14, 0x10, PPC_NONE, PPC2_DBRX, CHK_NONE)
+GEN_LDX_E(ldbr, ld64ur_i64, 0x14, 0x10, PPC_NONE, PPC2_DBRX, CHK_NONE)
 
 /* HV/P7 and later only */
 GEN_LDX_HVRM(ldcix, ld64_i64, 0x15, 0x1b, PPC_CILDST)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RESEND v2 04/17] target-ppc: consolidate store operations
  2016-09-12  6:41 [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (2 preceding siblings ...)
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 03/17] target-ppc: convert ld[16, 32, 64]ur " Nikunj A Dadhania
@ 2016-09-12  6:41 ` Nikunj A Dadhania
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 05/17] target-ppc: convert st64 to use new macro Nikunj A Dadhania
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-12  6:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

Implement macro to consolidate store operations using newer
tcg_gen_qemu_st function.

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/translate.c | 35 ++++++++++++++++-------------------
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 0d27067..f228ae0 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -2498,30 +2498,27 @@ GEN_QEMU_LOAD_64(ld64,  DEF_MEMOP(MO_Q))
 GEN_QEMU_LOAD_64(ld64ur, BSWAP_MEMOP(MO_Q))
 #endif
 
-static inline void gen_qemu_st8(DisasContext *ctx, TCGv arg1, TCGv arg2)
-{
-    tcg_gen_qemu_st8(arg1, arg2, ctx->mem_idx);
+#define GEN_QEMU_STORE_TL(stop, op)                                     \
+static void glue(gen_qemu_, stop)(DisasContext *ctx,                    \
+                                  TCGv val,                             \
+                                  TCGv addr)                            \
+{                                                                       \
+    tcg_gen_qemu_st_tl(val, addr, ctx->mem_idx, op);                    \
 }
 
-static inline void gen_qemu_st16(DisasContext *ctx, TCGv arg1, TCGv arg2)
-{
-    TCGMemOp op = MO_UW | ctx->default_tcg_memop_mask;
-    tcg_gen_qemu_st_tl(arg1, arg2, ctx->mem_idx, op);
-}
+GEN_QEMU_STORE_TL(st8,  DEF_MEMOP(MO_UB))
+GEN_QEMU_STORE_TL(st16, DEF_MEMOP(MO_UW))
+GEN_QEMU_STORE_TL(st32, DEF_MEMOP(MO_UL))
 
-static inline void gen_qemu_st32(DisasContext *ctx, TCGv arg1, TCGv arg2)
-{
-    TCGMemOp op = MO_UL | ctx->default_tcg_memop_mask;
-    tcg_gen_qemu_st_tl(arg1, arg2, ctx->mem_idx, op);
+#define GEN_QEMU_STORE_64(stop, op)                               \
+static void glue(gen_qemu_, glue(stop, _i64))(DisasContext *ctx,  \
+                                              TCGv_i64 val,       \
+                                              TCGv addr)          \
+{                                                                 \
+    tcg_gen_qemu_st_i64(val, addr, ctx->mem_idx, op);             \
 }
 
-static void gen_qemu_st32_i64(DisasContext *ctx, TCGv_i64 val, TCGv addr)
-{
-    TCGv tmp = tcg_temp_new();
-    tcg_gen_trunc_i64_tl(tmp, val);
-    gen_qemu_st32(ctx, tmp, addr);
-    tcg_temp_free(tmp);
-}
+GEN_QEMU_STORE_64(st32, DEF_MEMOP(MO_UL))
 
 static inline void gen_qemu_st64(DisasContext *ctx, TCGv_i64 arg1, TCGv arg2)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RESEND v2 05/17] target-ppc: convert st64 to use new macro
  2016-09-12  6:41 [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (3 preceding siblings ...)
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 04/17] target-ppc: consolidate store operations Nikunj A Dadhania
@ 2016-09-12  6:41 ` Nikunj A Dadhania
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 06/17] target-ppc: convert st[16, 32, 64]r " Nikunj A Dadhania
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-12  6:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

Use macro for ld64 as well, this changes the function signature from
gen_qemu_st64 => gen_qemu_st64_i64. Replace this at all the call sites.

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/translate.c              | 37 ++++++++++++++------------------
 target-ppc/translate/fp-impl.inc.c  | 42 ++++++++++++++++++-------------------
 target-ppc/translate/fp-ops.inc.c   |  2 +-
 target-ppc/translate/spe-impl.inc.c |  2 +-
 target-ppc/translate/vmx-impl.inc.c | 12 +++++------
 target-ppc/translate/vsx-impl.inc.c |  6 +++---
 6 files changed, 48 insertions(+), 53 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index f228ae0..254ad40 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -2519,12 +2519,7 @@ static void glue(gen_qemu_, glue(stop, _i64))(DisasContext *ctx,  \
 }
 
 GEN_QEMU_STORE_64(st32, DEF_MEMOP(MO_UL))
-
-static inline void gen_qemu_st64(DisasContext *ctx, TCGv_i64 arg1, TCGv arg2)
-{
-    TCGMemOp op = MO_Q | ctx->default_tcg_memop_mask;
-    tcg_gen_qemu_st_i64(arg1, arg2, ctx->mem_idx, op);
-}
+GEN_QEMU_STORE_64(st64, DEF_MEMOP(MO_Q))
 
 #define GEN_LD(name, ldop, opc, type)                                         \
 static void glue(gen_, name)(DisasContext *ctx)                                       \
@@ -2769,9 +2764,9 @@ GEN_STS(sth, st16, 0x0C, PPC_INTEGER);
 /* stw stwu stwux stwx */
 GEN_STS(stw, st32, 0x04, PPC_INTEGER);
 #if defined(TARGET_PPC64)
-GEN_STUX(std, st64, 0x15, 0x05, PPC_64B);
-GEN_STX(std, st64, 0x15, 0x04, PPC_64B);
-GEN_STX_HVRM(stdcix, st64, 0x15, 0x1f, PPC_CILDST)
+GEN_STUX(std, st64_i64, 0x15, 0x05, PPC_64B);
+GEN_STX(std, st64_i64, 0x15, 0x04, PPC_64B);
+GEN_STX_HVRM(stdcix, st64_i64, 0x15, 0x1f, PPC_CILDST)
 GEN_STX_HVRM(stwcix, st32, 0x15, 0x1c, PPC_CILDST)
 GEN_STX_HVRM(sthcix, st16, 0x15, 0x1d, PPC_CILDST)
 GEN_STX_HVRM(stbcix, st8, 0x15, 0x1e, PPC_CILDST)
@@ -2808,16 +2803,16 @@ static void gen_std(DisasContext *ctx)
         EA = tcg_temp_new();
         gen_addr_imm_index(ctx, EA, 0x03);
 
-        /* We only need to swap high and low halves. gen_qemu_st64 does
+        /* We only need to swap high and low halves. gen_qemu_st64_i64 does
            necessary 64-bit byteswap already. */
         if (unlikely(ctx->le_mode)) {
-            gen_qemu_st64(ctx, cpu_gpr[rs+1], EA);
+            gen_qemu_st64_i64(ctx, cpu_gpr[rs + 1], EA);
             gen_addr_add(ctx, EA, EA, 8);
-            gen_qemu_st64(ctx, cpu_gpr[rs], EA);
+            gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
         } else {
-            gen_qemu_st64(ctx, cpu_gpr[rs], EA);
+            gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
             gen_addr_add(ctx, EA, EA, 8);
-            gen_qemu_st64(ctx, cpu_gpr[rs+1], EA);
+            gen_qemu_st64_i64(ctx, cpu_gpr[rs + 1], EA);
         }
         tcg_temp_free(EA);
     } else {
@@ -2831,7 +2826,7 @@ static void gen_std(DisasContext *ctx)
         gen_set_access_type(ctx, ACCESS_INT);
         EA = tcg_temp_new();
         gen_addr_imm_index(ctx, EA, 0x03);
-        gen_qemu_st64(ctx, cpu_gpr[rs], EA);
+        gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
         if (Rc(ctx->opcode))
             tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], EA);
         tcg_temp_free(EA);
@@ -3113,7 +3108,7 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA,
     tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], 1 << CRF_EQ);
 #if defined(TARGET_PPC64)
     if (size == 8) {
-        gen_qemu_st64(ctx, cpu_gpr[reg], EA);
+        gen_qemu_st64_i64(ctx, cpu_gpr[reg], EA);
     } else
 #endif
     if (size == 4) {
@@ -3130,10 +3125,10 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA,
             gpr1 = cpu_gpr[reg];
             gpr2 = cpu_gpr[reg+1];
         }
-        gen_qemu_st64(ctx, gpr1, EA);
+        gen_qemu_st64_i64(ctx, gpr1, EA);
         EA8 = tcg_temp_local_new();
         gen_addr_add(ctx, EA8, EA, 8);
-        gen_qemu_st64(ctx, gpr2, EA8);
+        gen_qemu_st64_i64(ctx, gpr2, EA8);
         tcg_temp_free(EA8);
 #endif
     } else {
@@ -6622,10 +6617,10 @@ GEN_STS(stb, st8, 0x06, PPC_INTEGER)
 GEN_STS(sth, st16, 0x0C, PPC_INTEGER)
 GEN_STS(stw, st32, 0x04, PPC_INTEGER)
 #if defined(TARGET_PPC64)
-GEN_STUX(std, st64, 0x15, 0x05, PPC_64B)
-GEN_STX(std, st64, 0x15, 0x04, PPC_64B)
+GEN_STUX(std, st64_i64, 0x15, 0x05, PPC_64B)
+GEN_STX(std, st64_i64, 0x15, 0x04, PPC_64B)
 GEN_STX_E(stdbr, st64r, 0x14, 0x14, PPC_NONE, PPC2_DBRX, CHK_NONE)
-GEN_STX_HVRM(stdcix, st64, 0x15, 0x1f, PPC_CILDST)
+GEN_STX_HVRM(stdcix, st64_i64, 0x15, 0x1f, PPC_CILDST)
 GEN_STX_HVRM(stwcix, st32, 0x15, 0x1c, PPC_CILDST)
 GEN_STX_HVRM(sthcix, st16, 0x15, 0x1d, PPC_CILDST)
 GEN_STX_HVRM(stbcix, st8, 0x15, 0x1e, PPC_CILDST)
diff --git a/target-ppc/translate/fp-impl.inc.c b/target-ppc/translate/fp-impl.inc.c
index 53b7fc7..872af7b 100644
--- a/target-ppc/translate/fp-impl.inc.c
+++ b/target-ppc/translate/fp-impl.inc.c
@@ -848,7 +848,7 @@ static inline void gen_qemu_st32fs(DisasContext *ctx, TCGv_i64 arg1, TCGv arg2)
 }
 
 /* stfd stfdu stfdux stfdx */
-GEN_STFS(stfd, st64, 0x16, PPC_FLOAT);
+GEN_STFS(stfd, st64_i64, 0x16, PPC_FLOAT);
 /* stfs stfsu stfsux stfsx */
 GEN_STFS(stfs, st32fs, 0x14, PPC_FLOAT);
 
@@ -863,16 +863,16 @@ static void gen_stfdp(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
     gen_addr_imm_index(ctx, EA, 0);
-    /* We only need to swap high and low halves. gen_qemu_st64 does necessary
-       64-bit byteswap already. */
+    /* We only need to swap high and low halves. gen_qemu_st64_i64 does
+       necessary 64-bit byteswap already. */
     if (unlikely(ctx->le_mode)) {
-        gen_qemu_st64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_st64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
     } else {
-        gen_qemu_st64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_st64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
     }
     tcg_temp_free(EA);
 }
@@ -888,16 +888,16 @@ static void gen_stfdpx(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    /* We only need to swap high and low halves. gen_qemu_st64 does necessary
-       64-bit byteswap already. */
+    /* We only need to swap high and low halves. gen_qemu_st64_i64 does
+       necessary 64-bit byteswap already. */
     if (unlikely(ctx->le_mode)) {
-        gen_qemu_st64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_st64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
     } else {
-        gen_qemu_st64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
+        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
         tcg_gen_addi_tl(EA, EA, 8);
-        gen_qemu_st64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
+        gen_qemu_st64_i64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
     }
     tcg_temp_free(EA);
 }
@@ -990,9 +990,9 @@ static void gen_stfq(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     t0 = tcg_temp_new();
     gen_addr_imm_index(ctx, t0, 0);
-    gen_qemu_st64(ctx, cpu_fpr[rd], t0);
+    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
     gen_addr_add(ctx, t0, t0, 8);
-    gen_qemu_st64(ctx, cpu_fpr[(rd + 1) % 32], t0);
+    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
     tcg_temp_free(t0);
 }
 
@@ -1005,10 +1005,10 @@ static void gen_stfqu(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     t0 = tcg_temp_new();
     gen_addr_imm_index(ctx, t0, 0);
-    gen_qemu_st64(ctx, cpu_fpr[rd], t0);
+    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
     t1 = tcg_temp_new();
     gen_addr_add(ctx, t1, t0, 8);
-    gen_qemu_st64(ctx, cpu_fpr[(rd + 1) % 32], t1);
+    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
     tcg_temp_free(t1);
     if (ra != 0)
         tcg_gen_mov_tl(cpu_gpr[ra], t0);
@@ -1024,10 +1024,10 @@ static void gen_stfqux(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     t0 = tcg_temp_new();
     gen_addr_reg_index(ctx, t0);
-    gen_qemu_st64(ctx, cpu_fpr[rd], t0);
+    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
     t1 = tcg_temp_new();
     gen_addr_add(ctx, t1, t0, 8);
-    gen_qemu_st64(ctx, cpu_fpr[(rd + 1) % 32], t1);
+    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t1);
     tcg_temp_free(t1);
     if (ra != 0)
         tcg_gen_mov_tl(cpu_gpr[ra], t0);
@@ -1042,9 +1042,9 @@ static void gen_stfqx(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     t0 = tcg_temp_new();
     gen_addr_reg_index(ctx, t0);
-    gen_qemu_st64(ctx, cpu_fpr[rd], t0);
+    gen_qemu_st64_i64(ctx, cpu_fpr[rd], t0);
     gen_addr_add(ctx, t0, t0, 8);
-    gen_qemu_st64(ctx, cpu_fpr[(rd + 1) % 32], t0);
+    gen_qemu_st64_i64(ctx, cpu_fpr[(rd + 1) % 32], t0);
     tcg_temp_free(t0);
 }
 
diff --git a/target-ppc/translate/fp-ops.inc.c b/target-ppc/translate/fp-ops.inc.c
index 291a1e6..d36ab4e 100644
--- a/target-ppc/translate/fp-ops.inc.c
+++ b/target-ppc/translate/fp-ops.inc.c
@@ -85,7 +85,7 @@ GEN_STUF(name, stop, op | 0x21, type)                                         \
 GEN_STUXF(name, stop, op | 0x01, type)                                        \
 GEN_STXF(name, stop, 0x17, op | 0x00, type)
 
-GEN_STFS(stfd, st64, 0x16, PPC_FLOAT)
+GEN_STFS(stfd, st64_i64, 0x16, PPC_FLOAT)
 GEN_STFS(stfs, st32fs, 0x14, PPC_FLOAT)
 GEN_STXF(stfiw, st32fiw, 0x17, 0x1E, PPC_FLOAT_STFIWX)
 GEN_HANDLER_E(stfdp, 0x3D, 0xFF, 0xFF, 0x00200003, PPC_NONE, PPC2_ISA205),
diff --git a/target-ppc/translate/spe-impl.inc.c b/target-ppc/translate/spe-impl.inc.c
index a969927..8c1c16c 100644
--- a/target-ppc/translate/spe-impl.inc.c
+++ b/target-ppc/translate/spe-impl.inc.c
@@ -725,7 +725,7 @@ static inline void gen_op_evstdd(DisasContext *ctx, TCGv addr)
 {
     TCGv_i64 t0 = tcg_temp_new_i64();
     gen_load_gpr64(t0, rS(ctx->opcode));
-    gen_qemu_st64(ctx, t0, addr);
+    gen_qemu_st64_i64(ctx, t0, addr);
     tcg_temp_free_i64(t0);
 }
 
diff --git a/target-ppc/translate/vmx-impl.inc.c b/target-ppc/translate/vmx-impl.inc.c
index eb3164b..3ce374d 100644
--- a/target-ppc/translate/vmx-impl.inc.c
+++ b/target-ppc/translate/vmx-impl.inc.c
@@ -52,16 +52,16 @@ static void gen_st##name(DisasContext *ctx)                                   \
     EA = tcg_temp_new();                                                      \
     gen_addr_reg_index(ctx, EA);                                              \
     tcg_gen_andi_tl(EA, EA, ~0xf);                                            \
-    /* We only need to swap high and low halves. gen_qemu_st64 does necessary \
-       64-bit byteswap already. */                                            \
+    /* We only need to swap high and low halves. gen_qemu_st64_i64 does       \
+       necessary 64-bit byteswap already. */                                  \
     if (ctx->le_mode) {                                                       \
-        gen_qemu_st64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                    \
+        gen_qemu_st64_i64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                \
         tcg_gen_addi_tl(EA, EA, 8);                                           \
-        gen_qemu_st64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                    \
+        gen_qemu_st64_i64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                \
     } else {                                                                  \
-        gen_qemu_st64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                    \
+        gen_qemu_st64_i64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                \
         tcg_gen_addi_tl(EA, EA, 8);                                           \
-        gen_qemu_st64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                    \
+        gen_qemu_st64_i64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                \
     }                                                                         \
     tcg_temp_free(EA);                                                        \
 }
diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index 5725794..99cabb2 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -115,7 +115,7 @@ static void gen_##name(DisasContext *ctx)                     \
     tcg_temp_free(EA);                                        \
 }
 
-VSX_STORE_SCALAR(stxsdx, st64)
+VSX_STORE_SCALAR(stxsdx, st64_i64)
 VSX_STORE_SCALAR(stxsiwx, st32_i64)
 VSX_STORE_SCALAR(stxsspx, st32fs)
 
@@ -129,9 +129,9 @@ static void gen_stxvd2x(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    gen_qemu_st64(ctx, cpu_vsrh(xS(ctx->opcode)), EA);
+    gen_qemu_st64_i64(ctx, cpu_vsrh(xS(ctx->opcode)), EA);
     tcg_gen_addi_tl(EA, EA, 8);
-    gen_qemu_st64(ctx, cpu_vsrl(xS(ctx->opcode)), EA);
+    gen_qemu_st64_i64(ctx, cpu_vsrl(xS(ctx->opcode)), EA);
     tcg_temp_free(EA);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RESEND  v2 06/17] target-ppc: convert st[16, 32, 64]r to use new macro
  2016-09-12  6:41 [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (4 preceding siblings ...)
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 05/17] target-ppc: convert st64 to use new macro Nikunj A Dadhania
@ 2016-09-12  6:41 ` Nikunj A Dadhania
  2016-09-15  0:48   ` David Gibson
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 07/17] target-ppc: consolidate load with reservation Nikunj A Dadhania
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-12  6:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

Make byte-swap routines use the common GEN_QEMU_LOAD macro

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/translate.c | 32 ++++++++++----------------------
 1 file changed, 10 insertions(+), 22 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 254ad40..60668c2 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -2510,6 +2510,9 @@ GEN_QEMU_STORE_TL(st8,  DEF_MEMOP(MO_UB))
 GEN_QEMU_STORE_TL(st16, DEF_MEMOP(MO_UW))
 GEN_QEMU_STORE_TL(st32, DEF_MEMOP(MO_UL))
 
+GEN_QEMU_STORE_TL(st16r, BSWAP_MEMOP(MO_UW))
+GEN_QEMU_STORE_TL(st32r, BSWAP_MEMOP(MO_UL))
+
 #define GEN_QEMU_STORE_64(stop, op)                               \
 static void glue(gen_qemu_, glue(stop, _i64))(DisasContext *ctx,  \
                                               TCGv_i64 val,       \
@@ -2521,6 +2524,10 @@ static void glue(gen_qemu_, glue(stop, _i64))(DisasContext *ctx,  \
 GEN_QEMU_STORE_64(st32, DEF_MEMOP(MO_UL))
 GEN_QEMU_STORE_64(st64, DEF_MEMOP(MO_Q))
 
+#if defined(TARGET_PPC64)
+GEN_QEMU_STORE_64(st64r, BSWAP_MEMOP(MO_Q))
+#endif
+
 #define GEN_LD(name, ldop, opc, type)                                         \
 static void glue(gen_, name)(DisasContext *ctx)                                       \
 {                                                                             \
@@ -2844,34 +2851,15 @@ GEN_LDX(lwbr, ld32ur, 0x16, 0x10, PPC_INTEGER);
 #if defined(TARGET_PPC64)
 /* ldbrx */
 GEN_LDX_E(ldbr, ld64ur_i64, 0x14, 0x10, PPC_NONE, PPC2_DBRX, CHK_NONE);
+/* stdbrx */
+GEN_STX_E(stdbr, st64r_i64, 0x14, 0x14, PPC_NONE, PPC2_DBRX, CHK_NONE);
 #endif  /* TARGET_PPC64 */
 
 /* sthbrx */
-static inline void gen_qemu_st16r(DisasContext *ctx, TCGv arg1, TCGv arg2)
-{
-    TCGMemOp op = MO_UW | (ctx->default_tcg_memop_mask ^ MO_BSWAP);
-    tcg_gen_qemu_st_tl(arg1, arg2, ctx->mem_idx, op);
-}
 GEN_STX(sthbr, st16r, 0x16, 0x1C, PPC_INTEGER);
-
 /* stwbrx */
-static inline void gen_qemu_st32r(DisasContext *ctx, TCGv arg1, TCGv arg2)
-{
-    TCGMemOp op = MO_UL | (ctx->default_tcg_memop_mask ^ MO_BSWAP);
-    tcg_gen_qemu_st_tl(arg1, arg2, ctx->mem_idx, op);
-}
 GEN_STX(stwbr, st32r, 0x16, 0x14, PPC_INTEGER);
 
-#if defined(TARGET_PPC64)
-/* stdbrx */
-static inline void gen_qemu_st64r(DisasContext *ctx, TCGv arg1, TCGv arg2)
-{
-    TCGMemOp op = MO_Q | (ctx->default_tcg_memop_mask ^ MO_BSWAP);
-    tcg_gen_qemu_st_i64(arg1, arg2, ctx->mem_idx, op);
-}
-GEN_STX_E(stdbr, st64r, 0x14, 0x14, PPC_NONE, PPC2_DBRX, CHK_NONE);
-#endif  /* TARGET_PPC64 */
-
 /***                    Integer load and store multiple                    ***/
 
 /* lmw */
@@ -6619,7 +6607,7 @@ GEN_STS(stw, st32, 0x04, PPC_INTEGER)
 #if defined(TARGET_PPC64)
 GEN_STUX(std, st64_i64, 0x15, 0x05, PPC_64B)
 GEN_STX(std, st64_i64, 0x15, 0x04, PPC_64B)
-GEN_STX_E(stdbr, st64r, 0x14, 0x14, PPC_NONE, PPC2_DBRX, CHK_NONE)
+GEN_STX_E(stdbr, st64r_i64, 0x14, 0x14, PPC_NONE, PPC2_DBRX, CHK_NONE)
 GEN_STX_HVRM(stdcix, st64_i64, 0x15, 0x1f, PPC_CILDST)
 GEN_STX_HVRM(stwcix, st32, 0x15, 0x1c, PPC_CILDST)
 GEN_STX_HVRM(sthcix, st16, 0x15, 0x1d, PPC_CILDST)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RESEND v2 07/17] target-ppc: consolidate load with reservation
  2016-09-12  6:41 [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (5 preceding siblings ...)
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 06/17] target-ppc: convert st[16, 32, 64]r " Nikunj A Dadhania
@ 2016-09-12  6:41 ` Nikunj A Dadhania
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 08/17] target-ppc: move out stqcx impementation Nikunj A Dadhania
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-12  6:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

Use tcg_gen_qemu_ld in the load with reservation instructions.

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/translate.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 60668c2..72e78ff 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -3049,28 +3049,30 @@ static void gen_isync(DisasContext *ctx)
     gen_stop_exception(ctx);
 }
 
-#define LARX(name, len, loadop)                                      \
+#define MEMOP_GET_SIZE(x)  (1 << ((x) & MO_SIZE))
+
+#define LARX(name, memop)                                            \
 static void gen_##name(DisasContext *ctx)                            \
 {                                                                    \
     TCGv t0;                                                         \
     TCGv gpr = cpu_gpr[rD(ctx->opcode)];                             \
+    int len = MEMOP_GET_SIZE(memop);                                 \
     gen_set_access_type(ctx, ACCESS_RES);                            \
     t0 = tcg_temp_local_new();                                       \
     gen_addr_reg_index(ctx, t0);                                     \
     if ((len) > 1) {                                                 \
         gen_check_align(ctx, t0, (len)-1);                           \
     }                                                                \
-    gen_qemu_##loadop(ctx, gpr, t0);                                 \
+    tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop);                \
     tcg_gen_mov_tl(cpu_reserve, t0);                                 \
     tcg_gen_st_tl(gpr, cpu_env, offsetof(CPUPPCState, reserve_val)); \
     tcg_temp_free(t0);                                               \
 }
 
 /* lwarx */
-LARX(lbarx, 1, ld8u);
-LARX(lharx, 2, ld16u);
-LARX(lwarx, 4, ld32u);
-
+LARX(lbarx, DEF_MEMOP(MO_UB))
+LARX(lharx, DEF_MEMOP(MO_UW))
+LARX(lwarx, DEF_MEMOP(MO_UL))
 
 #if defined(CONFIG_USER_ONLY)
 static void gen_conditional_store(DisasContext *ctx, TCGv EA,
@@ -3152,7 +3154,7 @@ STCX(stwcx_, 4);
 
 #if defined(TARGET_PPC64)
 /* ldarx */
-LARX(ldarx, 8, ld64_i64);
+LARX(ldarx, DEF_MEMOP(MO_Q))
 
 /* lqarx */
 static void gen_lqarx(DisasContext *ctx)
@@ -3178,15 +3180,13 @@ static void gen_lqarx(DisasContext *ctx)
         gpr1 = cpu_gpr[rd];
         gpr2 = cpu_gpr[rd+1];
     }
-    gen_qemu_ld64_i64(ctx, gpr1, EA);
+    tcg_gen_qemu_ld_i64(gpr1, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
     tcg_gen_mov_tl(cpu_reserve, EA);
-
     gen_addr_add(ctx, EA, EA, 8);
-    gen_qemu_ld64_i64(ctx, gpr2, EA);
+    tcg_gen_qemu_ld_i64(gpr2, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
 
     tcg_gen_st_tl(gpr1, cpu_env, offsetof(CPUPPCState, reserve_val));
     tcg_gen_st_tl(gpr2, cpu_env, offsetof(CPUPPCState, reserve_val2));
-
     tcg_temp_free(EA);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RESEND v2 08/17] target-ppc: move out stqcx impementation
  2016-09-12  6:41 [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (6 preceding siblings ...)
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 07/17] target-ppc: consolidate load with reservation Nikunj A Dadhania
@ 2016-09-12  6:41 ` Nikunj A Dadhania
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 09/17] target-ppc: consolidate store conditional Nikunj A Dadhania
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-12  6:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

Being a 16byte operation, qemu_ld/st still does not support this. Move
this out so other store operation can use qemu_ld/st in the following
patch. Also, convert it to two MO_Q operations for stqcx.

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/translate.c | 69 ++++++++++++++++++++++++++++++++++----------------
 1 file changed, 47 insertions(+), 22 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 72e78ff..618fe43 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -3105,22 +3105,6 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA,
         gen_qemu_st32(ctx, cpu_gpr[reg], EA);
     } else if (size == 2) {
         gen_qemu_st16(ctx, cpu_gpr[reg], EA);
-#if defined(TARGET_PPC64)
-    } else if (size == 16) {
-        TCGv gpr1, gpr2 , EA8;
-        if (unlikely(ctx->le_mode)) {
-            gpr1 = cpu_gpr[reg+1];
-            gpr2 = cpu_gpr[reg];
-        } else {
-            gpr1 = cpu_gpr[reg];
-            gpr2 = cpu_gpr[reg+1];
-        }
-        gen_qemu_st64_i64(ctx, gpr1, EA);
-        EA8 = tcg_temp_local_new();
-        gen_addr_add(ctx, EA8, EA, 8);
-        gen_qemu_st64_i64(ctx, gpr2, EA8);
-        tcg_temp_free(EA8);
-#endif
     } else {
         gen_qemu_st8(ctx, cpu_gpr[reg], EA);
     }
@@ -3133,11 +3117,6 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA,
 static void gen_##name(DisasContext *ctx)                 \
 {                                                         \
     TCGv t0;                                              \
-    if (unlikely((len == 16) && (rD(ctx->opcode) & 1))) { \
-        gen_inval_exception(ctx,                          \
-                            POWERPC_EXCP_INVAL_INVAL);    \
-        return;                                           \
-    }                                                     \
     gen_set_access_type(ctx, ACCESS_RES);                 \
     t0 = tcg_temp_local_new();                            \
     gen_addr_reg_index(ctx, t0);                          \
@@ -3190,9 +3169,55 @@ static void gen_lqarx(DisasContext *ctx)
     tcg_temp_free(EA);
 }
 
+/* stqcx. */
+static void gen_stqcx_(DisasContext *ctx)
+{
+    TCGv EA;
+    int reg = rS(ctx->opcode);
+    int len = 16;
+#if !defined(CONFIG_USER_ONLY)
+    TCGLabel *l1;
+    TCGv gpr1, gpr2;
+#endif
+
+    if (unlikely((rD(ctx->opcode) & 1))) {
+        gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
+        return;
+    }
+    gen_set_access_type(ctx, ACCESS_RES);
+    EA = tcg_temp_local_new();
+    gen_addr_reg_index(ctx, EA);
+    if (len > 1) {
+        gen_check_align(ctx, EA, (len) - 1);
+    }
+
+#if defined(CONFIG_USER_ONLY)
+    gen_conditional_store(ctx, EA, reg, 16);
+#else
+    tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
+    l1 = gen_new_label();
+    tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
+    tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], 1 << CRF_EQ);
+
+    if (unlikely(ctx->le_mode)) {
+        gpr1 = cpu_gpr[reg + 1];
+        gpr2 = cpu_gpr[reg];
+    } else {
+        gpr1 = cpu_gpr[reg];
+        gpr2 = cpu_gpr[reg + 1];
+    }
+    tcg_gen_qemu_st_tl(gpr1, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
+    gen_addr_add(ctx, EA, EA, 8);
+    tcg_gen_qemu_st_tl(gpr2, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
+
+    gen_set_label(l1);
+    tcg_gen_movi_tl(cpu_reserve, -1);
+#endif
+    tcg_temp_free(EA);
+}
+
 /* stdcx. */
 STCX(stdcx_, 8);
-STCX(stqcx_, 16);
 #endif /* defined(TARGET_PPC64) */
 
 /* sync */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RESEND v2 09/17] target-ppc: consolidate store conditional
  2016-09-12  6:41 [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (7 preceding siblings ...)
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 08/17] target-ppc: move out stqcx impementation Nikunj A Dadhania
@ 2016-09-12  6:41 ` Nikunj A Dadhania
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 10/17] target-ppc: add xxspltib instruction Nikunj A Dadhania
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-12  6:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

Use tcg_gen_qemu_st store conditional instructions.

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/translate.c | 58 +++++++++++++++++++++-----------------------------
 1 file changed, 24 insertions(+), 34 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 618fe43..ecc1674 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -3076,19 +3076,19 @@ LARX(lwarx, DEF_MEMOP(MO_UL))
 
 #if defined(CONFIG_USER_ONLY)
 static void gen_conditional_store(DisasContext *ctx, TCGv EA,
-                                  int reg, int size)
+                                  int reg, int memop)
 {
     TCGv t0 = tcg_temp_new();
 
     tcg_gen_st_tl(EA, cpu_env, offsetof(CPUPPCState, reserve_ea));
-    tcg_gen_movi_tl(t0, (size << 5) | reg);
+    tcg_gen_movi_tl(t0, (MEMOP_GET_SIZE(memop) << 5) | reg);
     tcg_gen_st_tl(t0, cpu_env, offsetof(CPUPPCState, reserve_info));
     tcg_temp_free(t0);
     gen_exception_err(ctx, POWERPC_EXCP_STCX, 0);
 }
 #else
 static void gen_conditional_store(DisasContext *ctx, TCGv EA,
-                                  int reg, int size)
+                                  int reg, int memop)
 {
     TCGLabel *l1;
 
@@ -3096,44 +3096,36 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA,
     l1 = gen_new_label();
     tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
     tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], 1 << CRF_EQ);
-#if defined(TARGET_PPC64)
-    if (size == 8) {
-        gen_qemu_st64_i64(ctx, cpu_gpr[reg], EA);
-    } else
-#endif
-    if (size == 4) {
-        gen_qemu_st32(ctx, cpu_gpr[reg], EA);
-    } else if (size == 2) {
-        gen_qemu_st16(ctx, cpu_gpr[reg], EA);
-    } else {
-        gen_qemu_st8(ctx, cpu_gpr[reg], EA);
-    }
+    tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop);
     gen_set_label(l1);
     tcg_gen_movi_tl(cpu_reserve, -1);
 }
 #endif
 
-#define STCX(name, len)                                   \
-static void gen_##name(DisasContext *ctx)                 \
-{                                                         \
-    TCGv t0;                                              \
-    gen_set_access_type(ctx, ACCESS_RES);                 \
-    t0 = tcg_temp_local_new();                            \
-    gen_addr_reg_index(ctx, t0);                          \
-    if (len > 1) {                                        \
-        gen_check_align(ctx, t0, (len)-1);                \
-    }                                                     \
-    gen_conditional_store(ctx, t0, rS(ctx->opcode), len); \
-    tcg_temp_free(t0);                                    \
-}
-
-STCX(stbcx_, 1);
-STCX(sthcx_, 2);
-STCX(stwcx_, 4);
+#define STCX(name, memop)                                   \
+static void gen_##name(DisasContext *ctx)                   \
+{                                                           \
+    TCGv t0;                                                \
+    int len = MEMOP_GET_SIZE(memop);                        \
+    gen_set_access_type(ctx, ACCESS_RES);                   \
+    t0 = tcg_temp_local_new();                              \
+    gen_addr_reg_index(ctx, t0);                            \
+    if (len > 1) {                                          \
+        gen_check_align(ctx, t0, (len) - 1);                \
+    }                                                       \
+    gen_conditional_store(ctx, t0, rS(ctx->opcode), memop); \
+    tcg_temp_free(t0);                                      \
+}
+
+STCX(stbcx_, DEF_MEMOP(MO_UB))
+STCX(sthcx_, DEF_MEMOP(MO_UW))
+STCX(stwcx_, DEF_MEMOP(MO_UL))
 
 #if defined(TARGET_PPC64)
 /* ldarx */
 LARX(ldarx, DEF_MEMOP(MO_Q))
+/* stdcx. */
+STCX(stdcx_, DEF_MEMOP(MO_Q))
 
 /* lqarx */
 static void gen_lqarx(DisasContext *ctx)
@@ -3216,8 +3208,6 @@ static void gen_stqcx_(DisasContext *ctx)
     tcg_temp_free(EA);
 }
 
-/* stdcx. */
-STCX(stdcx_, 8);
 #endif /* defined(TARGET_PPC64) */
 
 /* sync */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RESEND v2 10/17] target-ppc: add xxspltib instruction
  2016-09-12  6:41 [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (8 preceding siblings ...)
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 09/17] target-ppc: consolidate store conditional Nikunj A Dadhania
@ 2016-09-12  6:41 ` Nikunj A Dadhania
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 11/17] target-ppc: implement darn instruction Nikunj A Dadhania
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-12  6:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

xxspltib: VSX Vector Splat Immediate Byte

Copy the immediate byte in each byte of target VSR

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/translate.c              |  2 ++
 target-ppc/translate/vsx-impl.inc.c | 20 ++++++++++++++++++++
 target-ppc/translate/vsx-ops.inc.c  |  5 +++++
 3 files changed, 27 insertions(+)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index ecc1674..133c531 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -591,6 +591,8 @@ EXTRACT_HELPER(DM, 8, 2);
 EXTRACT_HELPER(UIM, 16, 2);
 EXTRACT_HELPER(SHW, 8, 2);
 EXTRACT_HELPER(SP, 19, 2);
+EXTRACT_HELPER(IMM8, 11, 8);
+
 /*****************************************************************************/
 /* PowerPC instructions table                                                */
 
diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index 99cabb2..67f5621 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -647,6 +647,26 @@ static void gen_xxspltw(DisasContext *ctx)
     tcg_temp_free_i64(b2);
 }
 
+#define pattern(x) (((x) & 0xff) * (~(uint64_t)0 / 0xff))
+
+static void gen_xxspltib(DisasContext *ctx)
+{
+    unsigned char uim8 = IMM8(ctx->opcode);
+    if (xS(ctx->opcode) < 32) {
+        if (unlikely(!ctx->altivec_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VPU);
+            return;
+        }
+    } else {
+        if (unlikely(!ctx->vsx_enabled)) {
+            gen_exception(ctx, POWERPC_EXCP_VSXU);
+            return;
+        }
+    }
+    tcg_gen_movi_i64(cpu_vsrh(xT(ctx->opcode)), pattern(uim8));
+    tcg_gen_movi_i64(cpu_vsrl(xT(ctx->opcode)), pattern(uim8));
+}
+
 static void gen_xxsldwi(DisasContext *ctx)
 {
     TCGv_i64 xth, xtl;
diff --git a/target-ppc/translate/vsx-ops.inc.c b/target-ppc/translate/vsx-ops.inc.c
index 8b9da65..62a6251 100644
--- a/target-ppc/translate/vsx-ops.inc.c
+++ b/target-ppc/translate/vsx-ops.inc.c
@@ -20,6 +20,10 @@ GEN_HANDLER_E(mfvsrd, 0x1F, 0x13, 0x01, 0x0000F800, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(mtvsrd, 0x1F, 0x13, 0x05, 0x0000F800, PPC_NONE, PPC2_VSX207),
 #endif
 
+#define GEN_XX1FORM(name, opc2, opc3, fl2)                              \
+GEN_HANDLER2_E(name, #name, 0x3C, opc2 | 0, opc3, 0, PPC_NONE, fl2), \
+GEN_HANDLER2_E(name, #name, 0x3C, opc2 | 1, opc3, 0, PPC_NONE, fl2)
+
 #define GEN_XX2FORM(name, opc2, opc3, fl2)                           \
 GEN_HANDLER2_E(name, #name, 0x3C, opc2 | 0, opc3, 0, PPC_NONE, fl2), \
 GEN_HANDLER2_E(name, #name, 0x3C, opc2 | 1, opc3, 0, PPC_NONE, fl2)
@@ -222,6 +226,7 @@ VSX_LOGICAL(xxlorc, 0x8, 0x15, PPC2_VSX207),
 GEN_XX3FORM(xxmrghw, 0x08, 0x02, PPC2_VSX),
 GEN_XX3FORM(xxmrglw, 0x08, 0x06, PPC2_VSX),
 GEN_XX2FORM(xxspltw, 0x08, 0x0A, PPC2_VSX),
+GEN_XX1FORM(xxspltib, 0x08, 0x0B, PPC2_ISA300),
 GEN_XX3FORM_DM(xxsldwi, 0x08, 0x00),
 
 #define GEN_XXSEL_ROW(opc3) \
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RESEND v2 11/17] target-ppc: implement darn instruction
  2016-09-12  6:41 [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (9 preceding siblings ...)
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 10/17] target-ppc: add xxspltib instruction Nikunj A Dadhania
@ 2016-09-12  6:41 ` Nikunj A Dadhania
  2016-09-15  1:07   ` David Gibson
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 12/17] target-ppc: add lxsi[bw]zx instruction Nikunj A Dadhania
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-12  6:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh, Ravi Bangoria

From: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>

darn: Deliver A Random Number

Currently return invalid random number for all the case. This needs
proper algorithm to provide cryptographically suitable random data.
Reading from /dev/random can block and that is not an expected behaviour
while the cpu instruction is getting executed. Moreover, /dev/random
would only work for linux-user

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/helper.h     |  2 ++
 target-ppc/int_helper.c | 16 ++++++++++++++++
 target-ppc/translate.c  | 18 ++++++++++++++++++
 3 files changed, 36 insertions(+)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index e75d070..966f2ce 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -50,6 +50,8 @@ DEF_HELPER_FLAGS_1(cnttzd, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_1(popcntd, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_2(bpermd, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_3(srad, tl, env, tl, tl)
+DEF_HELPER_0(darn32, tl)
+DEF_HELPER_0(darn64, tl)
 #endif
 
 DEF_HELPER_FLAGS_1(cntlsw32, TCG_CALL_NO_RWG_SE, i32, i32)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index 291fba0..51a9ac5 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -182,6 +182,22 @@ target_ulong helper_cnttzd(target_ulong t)
 {
     return ctz64(t);
 }
+
+/* Return invalid random number.
+ *
+ * FIXME: Add rng backend or other mechanism to get cryptographically suitable
+ * random number
+ */
+target_ulong helper_darn32(void)
+{
+    return -1;
+}
+
+target_ulong helper_darn64(void)
+{
+    return -1;
+}
+
 #endif
 
 #if defined(TARGET_PPC64)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 133c531..e9dad3f 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -528,6 +528,8 @@ EXTRACT_HELPER(FPW, 16, 1);
 
 /* addpcis */
 EXTRACT_HELPER_DXFORM(DX, 10, 6, 6, 5, 16, 1, 1, 0, 0)
+/* darn */
+EXTRACT_HELPER(L, 16, 2);
 
 /***                            Jump target decoding                       ***/
 /* Immediate address */
@@ -1895,6 +1897,21 @@ static void gen_cnttzd(DisasContext *ctx)
         gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
     }
 }
+
+/* darn */
+static void gen_darn(DisasContext *ctx)
+{
+    int l = L(ctx->opcode);
+
+    if (l == 0) {
+        gen_helper_darn32(cpu_gpr[rD(ctx->opcode)]);
+    } else if (l <= 2) {
+        /* Return 64-bit random for both CRN and RRN */
+        gen_helper_darn64(cpu_gpr[rD(ctx->opcode)]);
+    } else {
+        tcg_gen_movi_i64(cpu_gpr[rD(ctx->opcode)], -1);
+    }
+}
 #endif
 
 /***                             Integer rotate                            ***/
@@ -6212,6 +6229,7 @@ GEN_HANDLER_E(prtyw, 0x1F, 0x1A, 0x04, 0x0000F801, PPC_NONE, PPC2_ISA205),
 GEN_HANDLER(popcntd, 0x1F, 0x1A, 0x0F, 0x0000F801, PPC_POPCNTWD),
 GEN_HANDLER(cntlzd, 0x1F, 0x1A, 0x01, 0x00000000, PPC_64B),
 GEN_HANDLER_E(cnttzd, 0x1F, 0x1A, 0x11, 0x00000000, PPC_NONE, PPC2_ISA300),
+GEN_HANDLER_E(darn, 0x1F, 0x13, 0x17, 0x001CF801, PPC_NONE, PPC2_ISA300),
 GEN_HANDLER_E(prtyd, 0x1F, 0x1A, 0x05, 0x0000F801, PPC_NONE, PPC2_ISA205),
 GEN_HANDLER_E(bpermd, 0x1F, 0x1C, 0x07, 0x00000001, PPC_NONE, PPC2_PERM_ISA206),
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RESEND v2 12/17] target-ppc: add lxsi[bw]zx instruction
  2016-09-12  6:41 [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (10 preceding siblings ...)
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 11/17] target-ppc: implement darn instruction Nikunj A Dadhania
@ 2016-09-12  6:41 ` Nikunj A Dadhania
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 13/17] target-ppc: add stxsi[bh]x instruction Nikunj A Dadhania
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-12  6:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

lxsibzx - Load VSX Scalar as Integer Byte & Zero Indexed
lxsihzx - Load VSX Scalar as Integer Halfword & Zero Indexed

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/translate.c              | 2 ++
 target-ppc/translate/vsx-impl.inc.c | 2 ++
 target-ppc/translate/vsx-ops.inc.c  | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index e9dad3f..7a4b488 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -2509,6 +2509,8 @@ static void glue(gen_qemu_, glue(ldop, _i64))(DisasContext *ctx,    \
     tcg_gen_qemu_ld_i64(val, addr, ctx->mem_idx, op);               \
 }
 
+GEN_QEMU_LOAD_64(ld8u,  DEF_MEMOP(MO_UB))
+GEN_QEMU_LOAD_64(ld16u, DEF_MEMOP(MO_UW))
 GEN_QEMU_LOAD_64(ld32u, DEF_MEMOP(MO_UL))
 GEN_QEMU_LOAD_64(ld32s, DEF_MEMOP(MO_SL))
 GEN_QEMU_LOAD_64(ld64,  DEF_MEMOP(MO_Q))
diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index 67f5621..888f2e4 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -36,6 +36,8 @@ static void gen_##name(DisasContext *ctx)                     \
 
 VSX_LOAD_SCALAR(lxsdx, ld64_i64)
 VSX_LOAD_SCALAR(lxsiwax, ld32s_i64)
+VSX_LOAD_SCALAR(lxsibzx, ld8u_i64)
+VSX_LOAD_SCALAR(lxsihzx, ld16u_i64)
 VSX_LOAD_SCALAR(lxsiwzx, ld32u_i64)
 VSX_LOAD_SCALAR(lxsspx, ld32fs)
 
diff --git a/target-ppc/translate/vsx-ops.inc.c b/target-ppc/translate/vsx-ops.inc.c
index 62a6251..4cd742c 100644
--- a/target-ppc/translate/vsx-ops.inc.c
+++ b/target-ppc/translate/vsx-ops.inc.c
@@ -1,6 +1,8 @@
 GEN_HANDLER_E(lxsdx, 0x1F, 0x0C, 0x12, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(lxsiwax, 0x1F, 0x0C, 0x02, 0, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(lxsiwzx, 0x1F, 0x0C, 0x00, 0, PPC_NONE, PPC2_VSX207),
+GEN_HANDLER_E(lxsibzx, 0x1F, 0x0D, 0x18, 0, PPC_NONE, PPC2_ISA300),
+GEN_HANDLER_E(lxsihzx, 0x1F, 0x0D, 0x19, 0, PPC_NONE, PPC2_ISA300),
 GEN_HANDLER_E(lxsspx, 0x1F, 0x0C, 0x10, 0, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(lxvd2x, 0x1F, 0x0C, 0x1A, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(lxvdsx, 0x1F, 0x0C, 0x0A, 0, PPC_NONE, PPC2_VSX),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RESEND v2 13/17] target-ppc: add stxsi[bh]x instruction
  2016-09-12  6:41 [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (11 preceding siblings ...)
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 12/17] target-ppc: add lxsi[bw]zx instruction Nikunj A Dadhania
@ 2016-09-12  6:41 ` Nikunj A Dadhania
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 14/17] target-ppc: improve lxvw4x implementation Nikunj A Dadhania
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-12  6:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

stxsibx - Store VSX Scalar as Integer Byte Indexed
stxsihx - Store VSX Scalar as Integer Halfword Indexed

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/translate.c              | 2 ++
 target-ppc/translate/vsx-impl.inc.c | 3 +++
 target-ppc/translate/vsx-ops.inc.c  | 2 ++
 3 files changed, 7 insertions(+)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 7a4b488..eb681de 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -2542,6 +2542,8 @@ static void glue(gen_qemu_, glue(stop, _i64))(DisasContext *ctx,  \
     tcg_gen_qemu_st_i64(val, addr, ctx->mem_idx, op);             \
 }
 
+GEN_QEMU_STORE_64(st8,  DEF_MEMOP(MO_UB))
+GEN_QEMU_STORE_64(st16, DEF_MEMOP(MO_UW))
 GEN_QEMU_STORE_64(st32, DEF_MEMOP(MO_UL))
 GEN_QEMU_STORE_64(st64, DEF_MEMOP(MO_Q))
 
diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index 888f2e4..eee6052 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -118,6 +118,9 @@ static void gen_##name(DisasContext *ctx)                     \
 }
 
 VSX_STORE_SCALAR(stxsdx, st64_i64)
+
+VSX_STORE_SCALAR(stxsibx, st8_i64)
+VSX_STORE_SCALAR(stxsihx, st16_i64)
 VSX_STORE_SCALAR(stxsiwx, st32_i64)
 VSX_STORE_SCALAR(stxsspx, st32fs)
 
diff --git a/target-ppc/translate/vsx-ops.inc.c b/target-ppc/translate/vsx-ops.inc.c
index 4cd742c..414b73b 100644
--- a/target-ppc/translate/vsx-ops.inc.c
+++ b/target-ppc/translate/vsx-ops.inc.c
@@ -9,6 +9,8 @@ GEN_HANDLER_E(lxvdsx, 0x1F, 0x0C, 0x0A, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(lxvw4x, 0x1F, 0x0C, 0x18, 0, PPC_NONE, PPC2_VSX),
 
 GEN_HANDLER_E(stxsdx, 0x1F, 0xC, 0x16, 0, PPC_NONE, PPC2_VSX),
+GEN_HANDLER_E(stxsibx, 0x1F, 0xD, 0x1C, 0, PPC_NONE, PPC2_ISA300),
+GEN_HANDLER_E(stxsihx, 0x1F, 0xD, 0x1D, 0, PPC_NONE, PPC2_ISA300),
 GEN_HANDLER_E(stxsiwx, 0x1F, 0xC, 0x04, 0, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(stxsspx, 0x1F, 0xC, 0x14, 0, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(stxvd2x, 0x1F, 0xC, 0x1E, 0, PPC_NONE, PPC2_VSX),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RESEND v2 14/17] target-ppc: improve lxvw4x implementation
  2016-09-12  6:41 [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (12 preceding siblings ...)
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 13/17] target-ppc: add stxsi[bh]x instruction Nikunj A Dadhania
@ 2016-09-12  6:41 ` Nikunj A Dadhania
  2016-09-15  1:20   ` David Gibson
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 15/17] target-ppc: add lxvb16x and lxvh8x Nikunj A Dadhania
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-12  6:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

Load 8byte at a time and manipulate.

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/helper.h                 |  1 +
 target-ppc/mem_helper.c             |  5 +++++
 target-ppc/translate/vsx-impl.inc.c | 34 ++++++++++++++++++++--------------
 3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 966f2ce..1bbeac4 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -297,6 +297,7 @@ DEF_HELPER_2(mtvscr, void, env, avr)
 DEF_HELPER_3(lvebx, void, env, avr, tl)
 DEF_HELPER_3(lvehx, void, env, avr, tl)
 DEF_HELPER_3(lvewx, void, env, avr, tl)
+DEF_HELPER_1(bswap32x2, i64, i64)
 DEF_HELPER_3(stvebx, void, env, avr, tl)
 DEF_HELPER_3(stvehx, void, env, avr, tl)
 DEF_HELPER_3(stvewx, void, env, avr, tl)
diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c
index 6548715..a56051a 100644
--- a/target-ppc/mem_helper.c
+++ b/target-ppc/mem_helper.c
@@ -285,6 +285,11 @@ STVE(stvewx, cpu_stl_data_ra, bswap32, u32)
 #undef I
 #undef LVE
 
+uint64_t helper_bswap32x2(uint64_t x)
+{
+    return deposit64((x >> 32), 32, 32, (x));
+}
+
 #undef HI_IDX
 #undef LO_IDX
 
diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index eee6052..e3374df 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -75,7 +75,7 @@ static void gen_lxvdsx(DisasContext *ctx)
 static void gen_lxvw4x(DisasContext *ctx)
 {
     TCGv EA;
-    TCGv_i64 tmp;
+    TCGv_i64 t0, t1;
     TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
     TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
     if (unlikely(!ctx->vsx_enabled)) {
@@ -84,22 +84,28 @@ static void gen_lxvw4x(DisasContext *ctx)
     }
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
-    tmp = tcg_temp_new_i64();
 
     gen_addr_reg_index(ctx, EA);
-    gen_qemu_ld32u_i64(ctx, tmp, EA);
-    tcg_gen_addi_tl(EA, EA, 4);
-    gen_qemu_ld32u_i64(ctx, xth, EA);
-    tcg_gen_deposit_i64(xth, xth, tmp, 32, 32);
-
-    tcg_gen_addi_tl(EA, EA, 4);
-    gen_qemu_ld32u_i64(ctx, tmp, EA);
-    tcg_gen_addi_tl(EA, EA, 4);
-    gen_qemu_ld32u_i64(ctx, xtl, EA);
-    tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32);
-
+    if (ctx->le_mode) {
+        t0 = tcg_temp_new_i64();
+        t1 = tcg_temp_new_i64();
+        tcg_gen_qemu_ld_i64(t0, EA, ctx->mem_idx, MO_LEQ);
+        tcg_gen_shri_i64(t1, t0, 32);
+        tcg_gen_deposit_i64(xth, t1, t0, 32, 32);
+        tcg_gen_addi_tl(EA, EA, 8);
+        tcg_gen_qemu_ld_i64(t0, EA, ctx->mem_idx, MO_LEQ);
+        tcg_gen_shri_i64(t1, t0, 32);
+        tcg_gen_deposit_i64(xtl, t1, t0, 32, 32);
+        tcg_temp_free_i64(t0);
+        tcg_temp_free_i64(t1);
+    } else {
+        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
+        gen_helper_bswap32x2(xth, xth);
+        tcg_gen_addi_tl(EA, EA, 8);
+        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
+        gen_helper_bswap32x2(xtl, xtl);
+    }
     tcg_temp_free(EA);
-    tcg_temp_free_i64(tmp);
 }
 
 #define VSX_STORE_SCALAR(name, operation)                     \
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RESEND v2 15/17] target-ppc: add lxvb16x and lxvh8x
  2016-09-12  6:41 [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (13 preceding siblings ...)
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 14/17] target-ppc: improve lxvw4x implementation Nikunj A Dadhania
@ 2016-09-12  6:41 ` Nikunj A Dadhania
  2016-09-15  1:41   ` David Gibson
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 16/17] target-ppc: improve stxvw4x implementation Nikunj A Dadhania
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-12  6:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

lxvb16x: Load VSX Vector Byte*16
lxvh8x:  Load VSX Vector Halfword*8

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/helper.h                 |  1 +
 target-ppc/mem_helper.c             |  6 ++++
 target-ppc/translate/vsx-impl.inc.c | 57 +++++++++++++++++++++++++++++++++++++
 target-ppc/translate/vsx-ops.inc.c  |  2 ++
 4 files changed, 66 insertions(+)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 1bbeac4..6de0db7 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -298,6 +298,7 @@ DEF_HELPER_3(lvebx, void, env, avr, tl)
 DEF_HELPER_3(lvehx, void, env, avr, tl)
 DEF_HELPER_3(lvewx, void, env, avr, tl)
 DEF_HELPER_1(bswap32x2, i64, i64)
+DEF_HELPER_1(bswap16x4, i64, i64)
 DEF_HELPER_3(stvebx, void, env, avr, tl)
 DEF_HELPER_3(stvehx, void, env, avr, tl)
 DEF_HELPER_3(stvewx, void, env, avr, tl)
diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c
index a56051a..608803f 100644
--- a/target-ppc/mem_helper.c
+++ b/target-ppc/mem_helper.c
@@ -290,6 +290,12 @@ uint64_t helper_bswap32x2(uint64_t x)
     return deposit64((x >> 32), 32, 32, (x));
 }
 
+uint64_t helper_bswap16x4(uint64_t x)
+{
+    uint64_t m = 0x00ff00ff00ff00ffull;
+    return ((x & m) << 8) | ((x >> 8) & m);
+}
+
 #undef HI_IDX
 #undef LO_IDX
 
diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index e3374df..caa6660 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -108,6 +108,63 @@ static void gen_lxvw4x(DisasContext *ctx)
     tcg_temp_free(EA);
 }
 
+static void gen_lxvb16x(DisasContext *ctx)
+{
+    TCGv EA;
+    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
+    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
+
+    if (unlikely(!ctx->vsx_enabled)) {
+        gen_exception(ctx, POWERPC_EXCP_VSXU);
+        return;
+    }
+    gen_set_access_type(ctx, ACCESS_INT);
+    EA = tcg_temp_new();
+    gen_addr_reg_index(ctx, EA);
+    if (ctx->le_mode) {
+        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_BEQ);
+        tcg_gen_addi_tl(EA, EA, 8);
+        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
+    } else {
+        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
+        gen_helper_bswap32x2(xth, xth);
+        tcg_gen_addi_tl(EA, EA, 8);
+        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
+        gen_helper_bswap32x2(xtl, xtl);
+    }
+    tcg_temp_free(EA);
+}
+
+static void gen_lxvh8x(DisasContext *ctx)
+{
+    TCGv EA;
+    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
+    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
+
+    if (unlikely(!ctx->vsx_enabled)) {
+        gen_exception(ctx, POWERPC_EXCP_VSXU);
+        return;
+    }
+    gen_set_access_type(ctx, ACCESS_INT);
+    EA = tcg_temp_new();
+    gen_addr_reg_index(ctx, EA);
+
+    if (ctx->le_mode) {
+        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_BEQ);
+        gen_helper_bswap16x4(xth, xth);
+        tcg_gen_addi_tl(EA, EA, 8);
+        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
+        gen_helper_bswap16x4(xtl, xtl);
+    } else {
+        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
+        gen_helper_bswap32x2(xth, xth);
+        tcg_gen_addi_tl(EA, EA, 8);
+        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
+        gen_helper_bswap32x2(xtl, xtl);
+    }
+    tcg_temp_free(EA);
+}
+
 #define VSX_STORE_SCALAR(name, operation)                     \
 static void gen_##name(DisasContext *ctx)                     \
 {                                                             \
diff --git a/target-ppc/translate/vsx-ops.inc.c b/target-ppc/translate/vsx-ops.inc.c
index 414b73b..598b349 100644
--- a/target-ppc/translate/vsx-ops.inc.c
+++ b/target-ppc/translate/vsx-ops.inc.c
@@ -7,6 +7,8 @@ GEN_HANDLER_E(lxsspx, 0x1F, 0x0C, 0x10, 0, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(lxvd2x, 0x1F, 0x0C, 0x1A, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(lxvdsx, 0x1F, 0x0C, 0x0A, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(lxvw4x, 0x1F, 0x0C, 0x18, 0, PPC_NONE, PPC2_VSX),
+GEN_HANDLER_E(lxvh8x, 0x1F, 0x0C, 0x19, 0, PPC_NONE,  PPC2_ISA300),
+GEN_HANDLER_E(lxvb16x, 0x1F, 0x0C, 0x1B, 0, PPC_NONE, PPC2_ISA300),
 
 GEN_HANDLER_E(stxsdx, 0x1F, 0xC, 0x16, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(stxsibx, 0x1F, 0xD, 0x1C, 0, PPC_NONE, PPC2_ISA300),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RESEND v2 16/17] target-ppc: improve stxvw4x implementation
  2016-09-12  6:41 [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (14 preceding siblings ...)
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 15/17] target-ppc: add lxvb16x and lxvh8x Nikunj A Dadhania
@ 2016-09-12  6:41 ` Nikunj A Dadhania
  2016-09-15  1:44   ` David Gibson
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 17/17] target-ppc: add stxvb16x and stxvh8x Nikunj A Dadhania
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-12  6:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

Manipulate data and store 8bytes instead of 4bytes.

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/translate/vsx-impl.inc.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index caa6660..f2fc5f9 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -205,7 +205,8 @@ static void gen_stxvd2x(DisasContext *ctx)
 
 static void gen_stxvw4x(DisasContext *ctx)
 {
-    TCGv_i64 tmp;
+    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
+    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
     TCGv EA;
     if (unlikely(!ctx->vsx_enabled)) {
         gen_exception(ctx, POWERPC_EXCP_VSXU);
@@ -214,21 +215,19 @@ static void gen_stxvw4x(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_INT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    tmp = tcg_temp_new_i64();
-
-    tcg_gen_shri_i64(tmp, cpu_vsrh(xS(ctx->opcode)), 32);
-    gen_qemu_st32_i64(ctx, tmp, EA);
-    tcg_gen_addi_tl(EA, EA, 4);
-    gen_qemu_st32_i64(ctx, cpu_vsrh(xS(ctx->opcode)), EA);
-
-    tcg_gen_shri_i64(tmp, cpu_vsrl(xS(ctx->opcode)), 32);
-    tcg_gen_addi_tl(EA, EA, 4);
-    gen_qemu_st32_i64(ctx, tmp, EA);
-    tcg_gen_addi_tl(EA, EA, 4);
-    gen_qemu_st32_i64(ctx, cpu_vsrl(xS(ctx->opcode)), EA);
 
+    if (ctx->le_mode) {
+        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_BEQ);
+        tcg_gen_addi_tl(EA, EA, 8);
+        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
+    } else {
+        gen_helper_bswap32x2(xsh, xsh);
+        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_LEQ);
+        tcg_gen_addi_tl(EA, EA, 8);
+        gen_helper_bswap32x2(xsl, xsl);
+        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_LEQ);
+    }
     tcg_temp_free(EA);
-    tcg_temp_free_i64(tmp);
 }
 
 #define MV_VSRW(name, tcgop1, tcgop2, target, source)           \
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH RESEND v2 17/17] target-ppc: add stxvb16x and stxvh8x
  2016-09-12  6:41 [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (15 preceding siblings ...)
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 16/17] target-ppc: improve stxvw4x implementation Nikunj A Dadhania
@ 2016-09-12  6:41 ` Nikunj A Dadhania
  2016-09-15  1:46   ` David Gibson
  2016-09-12  7:19 ` [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 no-reply
  2016-09-15  0:56 ` David Gibson
  18 siblings, 1 reply; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-12  6:41 UTC (permalink / raw)
  To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, benh

stxvb16x: Store VSX Vector Byte*16
stxvh8x:  Store VSX Vector Halfword*8

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/translate/vsx-impl.inc.c | 55 +++++++++++++++++++++++++++++++++++++
 target-ppc/translate/vsx-ops.inc.c  |  2 ++
 2 files changed, 57 insertions(+)

diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index f2fc5f9..20afe3b 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -165,6 +165,61 @@ static void gen_lxvh8x(DisasContext *ctx)
     tcg_temp_free(EA);
 }
 
+static void gen_stxvb16x(DisasContext *ctx)
+{
+    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
+    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
+    TCGv EA;
+
+    if (unlikely(!ctx->vsx_enabled)) {
+        gen_exception(ctx, POWERPC_EXCP_VSXU);
+        return;
+    }
+    gen_set_access_type(ctx, ACCESS_INT);
+    EA = tcg_temp_new();
+    gen_addr_reg_index(ctx, EA);
+
+    if (ctx->le_mode) {
+        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_BEQ);
+        tcg_gen_addi_tl(EA, EA, 8);
+        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
+    } else {
+        gen_helper_bswap32x2(xsh, xsh);
+        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_LEQ);
+        tcg_gen_addi_tl(EA, EA, 8);
+        gen_helper_bswap32x2(xsl, xsl);
+        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_LEQ);
+    }
+    tcg_temp_free(EA);
+}
+
+static void gen_stxvh8x(DisasContext *ctx)
+{
+    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
+    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
+    TCGv EA;
+
+    if (unlikely(!ctx->vsx_enabled)) {
+        gen_exception(ctx, POWERPC_EXCP_VSXU);
+        return;
+    }
+    gen_set_access_type(ctx, ACCESS_INT);
+    EA = tcg_temp_new();
+    gen_addr_reg_index(ctx, EA);
+    if (ctx->le_mode) {
+        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_BEQ);
+        tcg_gen_addi_tl(EA, EA, 8);
+        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
+    } else {
+        gen_helper_bswap32x2(xsh, xsh);
+        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_LEQ);
+        tcg_gen_addi_tl(EA, EA, 8);
+        gen_helper_bswap32x2(xsl, xsl);
+        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_LEQ);
+    }
+    tcg_temp_free(EA);
+}
+
 #define VSX_STORE_SCALAR(name, operation)                     \
 static void gen_##name(DisasContext *ctx)                     \
 {                                                             \
diff --git a/target-ppc/translate/vsx-ops.inc.c b/target-ppc/translate/vsx-ops.inc.c
index 598b349..f5afa0f 100644
--- a/target-ppc/translate/vsx-ops.inc.c
+++ b/target-ppc/translate/vsx-ops.inc.c
@@ -17,6 +17,8 @@ GEN_HANDLER_E(stxsiwx, 0x1F, 0xC, 0x04, 0, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(stxsspx, 0x1F, 0xC, 0x14, 0, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(stxvd2x, 0x1F, 0xC, 0x1E, 0, PPC_NONE, PPC2_VSX),
 GEN_HANDLER_E(stxvw4x, 0x1F, 0xC, 0x1C, 0, PPC_NONE, PPC2_VSX),
+GEN_HANDLER_E(stxvh8x, 0x1F, 0x0C, 0x1D, 0, PPC_NONE,  PPC2_ISA300),
+GEN_HANDLER_E(stxvb16x, 0x1F, 0x0C, 0x1F, 0, PPC_NONE, PPC2_ISA300),
 
 GEN_HANDLER_E(mfvsrwz, 0x1F, 0x13, 0x03, 0x0000F800, PPC_NONE, PPC2_VSX207),
 GEN_HANDLER_E(mtvsrwa, 0x1F, 0x13, 0x06, 0x0000F800, PPC_NONE, PPC2_VSX207),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4
  2016-09-12  6:41 [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (16 preceding siblings ...)
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 17/17] target-ppc: add stxvb16x and stxvh8x Nikunj A Dadhania
@ 2016-09-12  7:19 ` no-reply
  2016-09-15  0:56 ` David Gibson
  18 siblings, 0 replies; 32+ messages in thread
From: no-reply @ 2016-09-12  7:19 UTC (permalink / raw)
  To: nikunj; +Cc: famz, qemu-ppc, david, rth, qemu-devel

Hi,

Your series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 1473662506-27441-1-git-send-email-nikunj@linux.vnet.ibm.com
Subject: [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git show --no-patch --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
bd71c6f target-ppc: add stxvb16x and stxvh8x
bb4334f target-ppc: improve stxvw4x implementation
dcd12d2 target-ppc: add lxvb16x and lxvh8x
623b945 target-ppc: improve lxvw4x implementation
165f015 target-ppc: add stxsi[bh]x instruction
2c19405 target-ppc: add lxsi[bw]zx instruction
b83a0fa target-ppc: implement darn instruction
2983841 target-ppc: add xxspltib instruction
45d1faf target-ppc: consolidate store conditional
4d12515 target-ppc: move out stqcx impementation
d937588 target-ppc: consolidate load with reservation
da405fb target-ppc: convert st[16, 32, 64]r to use new macro
6df6f91 target-ppc: convert st64 to use new macro
0033af2 target-ppc: consolidate store operations
c74871d target-ppc: convert ld[16, 32, 64]ur to use new macro
cafa0ef target-ppc: convert ld64 to use new macro
cd34c15 target-ppc: consolidate load operations

=== OUTPUT BEGIN ===
Checking PATCH 1/17: target-ppc: consolidate load operations...
Checking PATCH 2/17: target-ppc: convert ld64 to use new macro...
Checking PATCH 3/17: target-ppc: convert ld[16, 32, 64]ur to use new macro...
Checking PATCH 4/17: target-ppc: consolidate store operations...
Checking PATCH 5/17: target-ppc: convert st64 to use new macro...
Checking PATCH 6/17: target-ppc: convert st[16, 32, 64]r to use new macro...
Checking PATCH 7/17: target-ppc: consolidate load with reservation...
Checking PATCH 8/17: target-ppc: move out stqcx impementation...
Checking PATCH 9/17: target-ppc: consolidate store conditional...
Checking PATCH 10/17: target-ppc: add xxspltib instruction...
ERROR: Macros with complex values should be enclosed in parenthesis
#65: FILE: target-ppc/translate/vsx-ops.inc.c:23:
+#define GEN_XX1FORM(name, opc2, opc3, fl2)                              \
+GEN_HANDLER2_E(name, #name, 0x3C, opc2 | 0, opc3, 0, PPC_NONE, fl2), \
+GEN_HANDLER2_E(name, #name, 0x3C, opc2 | 1, opc3, 0, PPC_NONE, fl2)

total: 1 errors, 0 warnings, 51 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 11/17: target-ppc: implement darn instruction...
Checking PATCH 12/17: target-ppc: add lxsi[bw]zx instruction...
Checking PATCH 13/17: target-ppc: add stxsi[bh]x instruction...
Checking PATCH 14/17: target-ppc: improve lxvw4x implementation...
Checking PATCH 15/17: target-ppc: add lxvb16x and lxvh8x...
Checking PATCH 16/17: target-ppc: improve stxvw4x implementation...
Checking PATCH 17/17: target-ppc: add stxvb16x and stxvh8x...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND v2 01/17] target-ppc: consolidate load operations
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 01/17] target-ppc: consolidate load operations Nikunj A Dadhania
@ 2016-09-15  0:46   ` David Gibson
  0 siblings, 0 replies; 32+ messages in thread
From: David Gibson @ 2016-09-15  0:46 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, benh

[-- Attachment #1: Type: text/plain, Size: 3890 bytes --]

On Mon, Sep 12, 2016 at 12:11:30PM +0530, Nikunj A Dadhania wrote:
> Implement macro to consolidate store operations using newer
> tcg_gen_qemu_ld functions.

s/store/load/, but I can fix that as I apply if I don't find anything
else in the series which requires a respin.

> 
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  target-ppc/translate.c | 58 +++++++++++++++++---------------------------------
>  1 file changed, 20 insertions(+), 38 deletions(-)
> 
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index a27f455..6606969 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -2462,50 +2462,32 @@ static inline void gen_align_no_le(DisasContext *ctx)
>  }
>  
>  /***                             Integer load                              ***/
> -static inline void gen_qemu_ld8u(DisasContext *ctx, TCGv arg1, TCGv arg2)
> -{
> -    tcg_gen_qemu_ld8u(arg1, arg2, ctx->mem_idx);
> -}
> -
> -static inline void gen_qemu_ld16u(DisasContext *ctx, TCGv arg1, TCGv arg2)
> -{
> -    TCGMemOp op = MO_UW | ctx->default_tcg_memop_mask;
> -    tcg_gen_qemu_ld_tl(arg1, arg2, ctx->mem_idx, op);
> -}
> -
> -static inline void gen_qemu_ld16s(DisasContext *ctx, TCGv arg1, TCGv arg2)
> -{
> -    TCGMemOp op = MO_SW | ctx->default_tcg_memop_mask;
> -    tcg_gen_qemu_ld_tl(arg1, arg2, ctx->mem_idx, op);
> -}
> +#define DEF_MEMOP(op) ((op) | ctx->default_tcg_memop_mask)
>  
> -static inline void gen_qemu_ld32u(DisasContext *ctx, TCGv arg1, TCGv arg2)
> -{
> -    TCGMemOp op = MO_UL | ctx->default_tcg_memop_mask;
> -    tcg_gen_qemu_ld_tl(arg1, arg2, ctx->mem_idx, op);
> +#define GEN_QEMU_LOAD_TL(ldop, op)                                      \
> +static void glue(gen_qemu_, ldop)(DisasContext *ctx,                    \
> +                                  TCGv val,                             \
> +                                  TCGv addr)                            \
> +{                                                                       \
> +    tcg_gen_qemu_ld_tl(val, addr, ctx->mem_idx, op);                    \
>  }
>  
> -static void gen_qemu_ld32u_i64(DisasContext *ctx, TCGv_i64 val, TCGv addr)
> -{
> -    TCGv tmp = tcg_temp_new();
> -    gen_qemu_ld32u(ctx, tmp, addr);
> -    tcg_gen_extu_tl_i64(val, tmp);
> -    tcg_temp_free(tmp);
> -}
> +GEN_QEMU_LOAD_TL(ld8u,  DEF_MEMOP(MO_UB))
> +GEN_QEMU_LOAD_TL(ld16u, DEF_MEMOP(MO_UW))
> +GEN_QEMU_LOAD_TL(ld16s, DEF_MEMOP(MO_SW))
> +GEN_QEMU_LOAD_TL(ld32u, DEF_MEMOP(MO_UL))
> +GEN_QEMU_LOAD_TL(ld32s, DEF_MEMOP(MO_SL))
>  
> -static inline void gen_qemu_ld32s(DisasContext *ctx, TCGv arg1, TCGv arg2)
> -{
> -    TCGMemOp op = MO_SL | ctx->default_tcg_memop_mask;
> -    tcg_gen_qemu_ld_tl(arg1, arg2, ctx->mem_idx, op);
> +#define GEN_QEMU_LOAD_64(ldop, op)                                  \
> +static void glue(gen_qemu_, glue(ldop, _i64))(DisasContext *ctx,    \
> +                                             TCGv_i64 val,          \
> +                                             TCGv addr)             \
> +{                                                                   \
> +    tcg_gen_qemu_ld_i64(val, addr, ctx->mem_idx, op);               \
>  }
>  
> -static void gen_qemu_ld32s_i64(DisasContext *ctx, TCGv_i64 val, TCGv addr)
> -{
> -    TCGv tmp = tcg_temp_new();
> -    gen_qemu_ld32s(ctx, tmp, addr);
> -    tcg_gen_ext_tl_i64(val, tmp);
> -    tcg_temp_free(tmp);
> -}
> +GEN_QEMU_LOAD_64(ld32u, DEF_MEMOP(MO_UL))
> +GEN_QEMU_LOAD_64(ld32s, DEF_MEMOP(MO_SL))
>  
>  static inline void gen_qemu_ld64(DisasContext *ctx, TCGv_i64 arg1, TCGv arg2)
>  {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND v2 06/17] target-ppc: convert st[16, 32, 64]r to use new macro
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 06/17] target-ppc: convert st[16, 32, 64]r " Nikunj A Dadhania
@ 2016-09-15  0:48   ` David Gibson
  0 siblings, 0 replies; 32+ messages in thread
From: David Gibson @ 2016-09-15  0:48 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, benh

[-- Attachment #1: Type: text/plain, Size: 3645 bytes --]

On Mon, Sep 12, 2016 at 12:11:35PM +0530, Nikunj A Dadhania wrote:
> Make byte-swap routines use the common GEN_QEMU_LOAD macro

s/GEN_QEMU_LOAD/GEN_QEMU_STORE/

> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  target-ppc/translate.c | 32 ++++++++++----------------------
>  1 file changed, 10 insertions(+), 22 deletions(-)
> 
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 254ad40..60668c2 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -2510,6 +2510,9 @@ GEN_QEMU_STORE_TL(st8,  DEF_MEMOP(MO_UB))
>  GEN_QEMU_STORE_TL(st16, DEF_MEMOP(MO_UW))
>  GEN_QEMU_STORE_TL(st32, DEF_MEMOP(MO_UL))
>  
> +GEN_QEMU_STORE_TL(st16r, BSWAP_MEMOP(MO_UW))
> +GEN_QEMU_STORE_TL(st32r, BSWAP_MEMOP(MO_UL))
> +
>  #define GEN_QEMU_STORE_64(stop, op)                               \
>  static void glue(gen_qemu_, glue(stop, _i64))(DisasContext *ctx,  \
>                                                TCGv_i64 val,       \
> @@ -2521,6 +2524,10 @@ static void glue(gen_qemu_, glue(stop, _i64))(DisasContext *ctx,  \
>  GEN_QEMU_STORE_64(st32, DEF_MEMOP(MO_UL))
>  GEN_QEMU_STORE_64(st64, DEF_MEMOP(MO_Q))
>  
> +#if defined(TARGET_PPC64)
> +GEN_QEMU_STORE_64(st64r, BSWAP_MEMOP(MO_Q))
> +#endif
> +
>  #define GEN_LD(name, ldop, opc, type)                                         \
>  static void glue(gen_, name)(DisasContext *ctx)                                       \
>  {                                                                             \
> @@ -2844,34 +2851,15 @@ GEN_LDX(lwbr, ld32ur, 0x16, 0x10, PPC_INTEGER);
>  #if defined(TARGET_PPC64)
>  /* ldbrx */
>  GEN_LDX_E(ldbr, ld64ur_i64, 0x14, 0x10, PPC_NONE, PPC2_DBRX, CHK_NONE);
> +/* stdbrx */
> +GEN_STX_E(stdbr, st64r_i64, 0x14, 0x14, PPC_NONE, PPC2_DBRX, CHK_NONE);
>  #endif  /* TARGET_PPC64 */
>  
>  /* sthbrx */
> -static inline void gen_qemu_st16r(DisasContext *ctx, TCGv arg1, TCGv arg2)
> -{
> -    TCGMemOp op = MO_UW | (ctx->default_tcg_memop_mask ^ MO_BSWAP);
> -    tcg_gen_qemu_st_tl(arg1, arg2, ctx->mem_idx, op);
> -}
>  GEN_STX(sthbr, st16r, 0x16, 0x1C, PPC_INTEGER);
> -
>  /* stwbrx */
> -static inline void gen_qemu_st32r(DisasContext *ctx, TCGv arg1, TCGv arg2)
> -{
> -    TCGMemOp op = MO_UL | (ctx->default_tcg_memop_mask ^ MO_BSWAP);
> -    tcg_gen_qemu_st_tl(arg1, arg2, ctx->mem_idx, op);
> -}
>  GEN_STX(stwbr, st32r, 0x16, 0x14, PPC_INTEGER);
>  
> -#if defined(TARGET_PPC64)
> -/* stdbrx */
> -static inline void gen_qemu_st64r(DisasContext *ctx, TCGv arg1, TCGv arg2)
> -{
> -    TCGMemOp op = MO_Q | (ctx->default_tcg_memop_mask ^ MO_BSWAP);
> -    tcg_gen_qemu_st_i64(arg1, arg2, ctx->mem_idx, op);
> -}
> -GEN_STX_E(stdbr, st64r, 0x14, 0x14, PPC_NONE, PPC2_DBRX, CHK_NONE);
> -#endif  /* TARGET_PPC64 */
> -
>  /***                    Integer load and store multiple                    ***/
>  
>  /* lmw */
> @@ -6619,7 +6607,7 @@ GEN_STS(stw, st32, 0x04, PPC_INTEGER)
>  #if defined(TARGET_PPC64)
>  GEN_STUX(std, st64_i64, 0x15, 0x05, PPC_64B)
>  GEN_STX(std, st64_i64, 0x15, 0x04, PPC_64B)
> -GEN_STX_E(stdbr, st64r, 0x14, 0x14, PPC_NONE, PPC2_DBRX, CHK_NONE)
> +GEN_STX_E(stdbr, st64r_i64, 0x14, 0x14, PPC_NONE, PPC2_DBRX, CHK_NONE)
>  GEN_STX_HVRM(stdcix, st64_i64, 0x15, 0x1f, PPC_CILDST)
>  GEN_STX_HVRM(stwcix, st32, 0x15, 0x1c, PPC_CILDST)
>  GEN_STX_HVRM(sthcix, st16, 0x15, 0x1d, PPC_CILDST)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4
  2016-09-12  6:41 [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 Nikunj A Dadhania
                   ` (17 preceding siblings ...)
  2016-09-12  7:19 ` [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 no-reply
@ 2016-09-15  0:56 ` David Gibson
  2016-09-15  1:49   ` David Gibson
  18 siblings, 1 reply; 32+ messages in thread
From: David Gibson @ 2016-09-15  0:56 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, benh

[-- Attachment #1: Type: text/plain, Size: 3312 bytes --]

On Mon, Sep 12, 2016 at 12:11:29PM +0530, Nikunj A Dadhania wrote:
> 1) Consolidate Load/Store operations using tcg_gen_qemu_ld/st functions
> 2) This series contains 10 new instructions for POWER9 ISA3.0
>    Use newer qemu load/store tcg helpers and optimize stxvw4x and lxvw4x.
> 
> Patches:
>  01-09:  Cleanup load/store operations in ppc translator

I've applied 1-9 to ppc-for-2.8.  Still need to review the remainder
of the series.

>     10:  xxspltib: VSX Vector Splat Immediate Byte
>     11:  darn: Deliver a random number
>     12:  lxsibzx - Load VSX Scalar as Integer Byte & Zero Indexed
>          lxsihzx - Load VSX Scalar as Integer Halfword & Zero Indexed
>     13:  stxsibx - Store VSX Scalar as Integer Byte Indexed
>          stxsihx - Store VSX Scalar as Integer Halfword Indexed
>     14:  lxvw4x - improve implementation
>     15:  lxvb16x: Load VSX Vector Byte*16
>          lxvh8x:  Load VSX Vector Halfword*8
>     16:  stxv4x - improve implementation
>     17:  stxvb16x: Store VSX Vector Byte*16
>          stxvh8x:  Store VSX Vector Halfword*8
> 
> Series also available here: https://github.com/nikunjad/qemu/tree/p9-tcg
> 
> Changelog:
> v1: 
> * More load/store cleanups in byte reverse routines
> * ld64/st64 converted to newer macro and updated call sites
> * Cleanup load with reservation and store conditional
> * Return invalid random for darn instruction
> 
> v0:
> * darn - read /dev/random to get the random number
> * xxspltib - make is PPC64 only
> * Consolidate load/store operations and use macros to generate qemu_st/ld
> * Simplify load/store vsx endian manipulation
> 
> Nikunj A Dadhania (16):
>   target-ppc: consolidate load operations
>   target-ppc: convert ld64 to use new macro
>   target-ppc: convert ld[16,32,64]ur to use new macro
>   target-ppc: consolidate store operations
>   target-ppc: convert st64 to use new macro
>   target-ppc: convert st[16,32,64]r to use new macro
>   target-ppc: consolidate load with reservation
>   target-ppc: move out stqcx impementation
>   target-ppc: consolidate store conditional
>   target-ppc: add xxspltib instruction
>   target-ppc: add lxsi[bw]zx instruction
>   target-ppc: add stxsi[bh]x instruction
>   target-ppc: improve lxvw4x implementation
>   target-ppc: add lxvb16x and lxvh8x
>   target-ppc: improve stxvw4x implementation
>   target-ppc: add stxvb16x and stxvh8x
> 
> Ravi Bangoria (1):
>   target-ppc: implement darn instruction
> 
>  target-ppc/helper.h                 |   4 +
>  target-ppc/int_helper.c             |  16 ++
>  target-ppc/mem_helper.c             |  11 ++
>  target-ppc/translate.c              | 379 +++++++++++++++++-------------------
>  target-ppc/translate/fp-impl.inc.c  |  84 ++++----
>  target-ppc/translate/fp-ops.inc.c   |   2 +-
>  target-ppc/translate/spe-impl.inc.c |   4 +-
>  target-ppc/translate/vmx-impl.inc.c |  24 +--
>  target-ppc/translate/vsx-impl.inc.c | 208 ++++++++++++++++----
>  target-ppc/translate/vsx-ops.inc.c  |  13 ++
>  10 files changed, 460 insertions(+), 285 deletions(-)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND v2 11/17] target-ppc: implement darn instruction
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 11/17] target-ppc: implement darn instruction Nikunj A Dadhania
@ 2016-09-15  1:07   ` David Gibson
  2016-09-15  6:40     ` Nikunj A Dadhania
  0 siblings, 1 reply; 32+ messages in thread
From: David Gibson @ 2016-09-15  1:07 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, benh, Ravi Bangoria

[-- Attachment #1: Type: text/plain, Size: 4071 bytes --]

On Mon, Sep 12, 2016 at 12:11:40PM +0530, Nikunj A Dadhania wrote:
> From: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
> 
> darn: Deliver A Random Number
> 
> Currently return invalid random number for all the case. This needs
> proper algorithm to provide cryptographically suitable random data.
> Reading from /dev/random can block and that is not an expected behaviour
> while the cpu instruction is getting executed. Moreover, /dev/random
> would only work for linux-user
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  target-ppc/helper.h     |  2 ++
>  target-ppc/int_helper.c | 16 ++++++++++++++++
>  target-ppc/translate.c  | 18 ++++++++++++++++++
>  3 files changed, 36 insertions(+)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index e75d070..966f2ce 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -50,6 +50,8 @@ DEF_HELPER_FLAGS_1(cnttzd, TCG_CALL_NO_RWG_SE, tl, tl)
>  DEF_HELPER_FLAGS_1(popcntd, TCG_CALL_NO_RWG_SE, tl, tl)
>  DEF_HELPER_FLAGS_2(bpermd, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>  DEF_HELPER_3(srad, tl, env, tl, tl)
> +DEF_HELPER_0(darn32, tl)
> +DEF_HELPER_0(darn64, tl)
>  #endif
>  
>  DEF_HELPER_FLAGS_1(cntlsw32, TCG_CALL_NO_RWG_SE, i32, i32)
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index 291fba0..51a9ac5 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -182,6 +182,22 @@ target_ulong helper_cnttzd(target_ulong t)
>  {
>      return ctz64(t);
>  }
> +
> +/* Return invalid random number.
> + *
> + * FIXME: Add rng backend or other mechanism to get cryptographically suitable
> + * random number
> + */
> +target_ulong helper_darn32(void)
> +{
> +    return -1;
> +}
> +
> +target_ulong helper_darn64(void)
> +{
> +    return -1;
> +}
> +

TBH, I think you're going to want a single helper for both 32-bit and
64-bit cases.

>  #endif
>  
>  #if defined(TARGET_PPC64)
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 133c531..e9dad3f 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -528,6 +528,8 @@ EXTRACT_HELPER(FPW, 16, 1);
>  
>  /* addpcis */
>  EXTRACT_HELPER_DXFORM(DX, 10, 6, 6, 5, 16, 1, 1, 0, 0)
> +/* darn */
> +EXTRACT_HELPER(L, 16, 2);
>  
>  /***                            Jump target decoding                       ***/
>  /* Immediate address */
> @@ -1895,6 +1897,21 @@ static void gen_cnttzd(DisasContext *ctx)
>          gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
>      }
>  }
> +
> +/* darn */
> +static void gen_darn(DisasContext *ctx)
> +{
> +    int l = L(ctx->opcode);
> +
> +    if (l == 0) {
> +        gen_helper_darn32(cpu_gpr[rD(ctx->opcode)]);
> +    } else if (l <= 2) {
> +        /* Return 64-bit random for both CRN and RRN */
> +        gen_helper_darn64(cpu_gpr[rD(ctx->opcode)]);

So it might be simpler to just leave out the helper stubs for now, and
always return the invalid value from the generated code.

> +    } else {
> +        tcg_gen_movi_i64(cpu_gpr[rD(ctx->opcode)], -1);
> +    }
> +}
>  #endif
>  
>  /***                             Integer rotate                            ***/
> @@ -6212,6 +6229,7 @@ GEN_HANDLER_E(prtyw, 0x1F, 0x1A, 0x04, 0x0000F801, PPC_NONE, PPC2_ISA205),
>  GEN_HANDLER(popcntd, 0x1F, 0x1A, 0x0F, 0x0000F801, PPC_POPCNTWD),
>  GEN_HANDLER(cntlzd, 0x1F, 0x1A, 0x01, 0x00000000, PPC_64B),
>  GEN_HANDLER_E(cnttzd, 0x1F, 0x1A, 0x11, 0x00000000, PPC_NONE, PPC2_ISA300),
> +GEN_HANDLER_E(darn, 0x1F, 0x13, 0x17, 0x001CF801, PPC_NONE, PPC2_ISA300),
>  GEN_HANDLER_E(prtyd, 0x1F, 0x1A, 0x05, 0x0000F801, PPC_NONE, PPC2_ISA205),
>  GEN_HANDLER_E(bpermd, 0x1F, 0x1C, 0x07, 0x00000001, PPC_NONE, PPC2_PERM_ISA206),
>  #endif

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND v2 14/17] target-ppc: improve lxvw4x implementation
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 14/17] target-ppc: improve lxvw4x implementation Nikunj A Dadhania
@ 2016-09-15  1:20   ` David Gibson
  2016-09-15  9:57     ` Nikunj A Dadhania
  0 siblings, 1 reply; 32+ messages in thread
From: David Gibson @ 2016-09-15  1:20 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, benh

[-- Attachment #1: Type: text/plain, Size: 3870 bytes --]

On Mon, Sep 12, 2016 at 12:11:43PM +0530, Nikunj A Dadhania wrote:
> Load 8byte at a time and manipulate.
> 
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  target-ppc/helper.h                 |  1 +
>  target-ppc/mem_helper.c             |  5 +++++
>  target-ppc/translate/vsx-impl.inc.c | 34 ++++++++++++++++++++--------------
>  3 files changed, 26 insertions(+), 14 deletions(-)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index 966f2ce..1bbeac4 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -297,6 +297,7 @@ DEF_HELPER_2(mtvscr, void, env, avr)
>  DEF_HELPER_3(lvebx, void, env, avr, tl)
>  DEF_HELPER_3(lvehx, void, env, avr, tl)
>  DEF_HELPER_3(lvewx, void, env, avr, tl)
> +DEF_HELPER_1(bswap32x2, i64, i64)
>  DEF_HELPER_3(stvebx, void, env, avr, tl)
>  DEF_HELPER_3(stvehx, void, env, avr, tl)
>  DEF_HELPER_3(stvewx, void, env, avr, tl)
> diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c
> index 6548715..a56051a 100644
> --- a/target-ppc/mem_helper.c
> +++ b/target-ppc/mem_helper.c
> @@ -285,6 +285,11 @@ STVE(stvewx, cpu_stl_data_ra, bswap32, u32)
>  #undef I
>  #undef LVE
>  
> +uint64_t helper_bswap32x2(uint64_t x)
> +{
> +    return deposit64((x >> 32), 32, 32, (x));
> +}
> +
>  #undef HI_IDX
>  #undef LO_IDX
>  
> diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
> index eee6052..e3374df 100644
> --- a/target-ppc/translate/vsx-impl.inc.c
> +++ b/target-ppc/translate/vsx-impl.inc.c
> @@ -75,7 +75,7 @@ static void gen_lxvdsx(DisasContext *ctx)
>  static void gen_lxvw4x(DisasContext *ctx)
>  {
>      TCGv EA;
> -    TCGv_i64 tmp;
> +    TCGv_i64 t0, t1;
>      TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
>      TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
>      if (unlikely(!ctx->vsx_enabled)) {
> @@ -84,22 +84,28 @@ static void gen_lxvw4x(DisasContext *ctx)
>      }
>      gen_set_access_type(ctx, ACCESS_INT);
>      EA = tcg_temp_new();
> -    tmp = tcg_temp_new_i64();
>  
>      gen_addr_reg_index(ctx, EA);
> -    gen_qemu_ld32u_i64(ctx, tmp, EA);
> -    tcg_gen_addi_tl(EA, EA, 4);
> -    gen_qemu_ld32u_i64(ctx, xth, EA);
> -    tcg_gen_deposit_i64(xth, xth, tmp, 32, 32);
> -
> -    tcg_gen_addi_tl(EA, EA, 4);
> -    gen_qemu_ld32u_i64(ctx, tmp, EA);
> -    tcg_gen_addi_tl(EA, EA, 4);
> -    gen_qemu_ld32u_i64(ctx, xtl, EA);
> -    tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32);
> -
> +    if (ctx->le_mode) {
> +        t0 = tcg_temp_new_i64();
> +        t1 = tcg_temp_new_i64();
> +        tcg_gen_qemu_ld_i64(t0, EA, ctx->mem_idx, MO_LEQ);
> +        tcg_gen_shri_i64(t1, t0, 32);
> +        tcg_gen_deposit_i64(xth, t1, t0, 32, 32);
> +        tcg_gen_addi_tl(EA, EA, 8);
> +        tcg_gen_qemu_ld_i64(t0, EA, ctx->mem_idx, MO_LEQ);
> +        tcg_gen_shri_i64(t1, t0, 32);
> +        tcg_gen_deposit_i64(xtl, t1, t0, 32, 32);
> +        tcg_temp_free_i64(t0);
> +        tcg_temp_free_i64(t1);
> +    } else {
> +        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
> +        gen_helper_bswap32x2(xth, xth);
> +        tcg_gen_addi_tl(EA, EA, 8);
> +        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
> +        gen_helper_bswap32x2(xtl, xtl);
> +    }

So.. for starters using a helper for just one endianness is kind of
ugly.  But for going on with, it looks like the two paths are doing
the same thing, just one is doing it with TCG ops and the other via a
helper.

>      tcg_temp_free(EA);
> -    tcg_temp_free_i64(tmp);
>  }
>  
>  #define VSX_STORE_SCALAR(name, operation)                     \

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND v2 15/17] target-ppc: add lxvb16x and lxvh8x
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 15/17] target-ppc: add lxvb16x and lxvh8x Nikunj A Dadhania
@ 2016-09-15  1:41   ` David Gibson
  2016-09-16  8:26     ` Nikunj A Dadhania
  0 siblings, 1 reply; 32+ messages in thread
From: David Gibson @ 2016-09-15  1:41 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, benh

[-- Attachment #1: Type: text/plain, Size: 5778 bytes --]

On Mon, Sep 12, 2016 at 12:11:44PM +0530, Nikunj A Dadhania wrote:
> lxvb16x: Load VSX Vector Byte*16
> lxvh8x:  Load VSX Vector Halfword*8
> 
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  target-ppc/helper.h                 |  1 +
>  target-ppc/mem_helper.c             |  6 ++++
>  target-ppc/translate/vsx-impl.inc.c | 57 +++++++++++++++++++++++++++++++++++++
>  target-ppc/translate/vsx-ops.inc.c  |  2 ++
>  4 files changed, 66 insertions(+)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index 1bbeac4..6de0db7 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -298,6 +298,7 @@ DEF_HELPER_3(lvebx, void, env, avr, tl)
>  DEF_HELPER_3(lvehx, void, env, avr, tl)
>  DEF_HELPER_3(lvewx, void, env, avr, tl)
>  DEF_HELPER_1(bswap32x2, i64, i64)
> +DEF_HELPER_1(bswap16x4, i64, i64)
>  DEF_HELPER_3(stvebx, void, env, avr, tl)
>  DEF_HELPER_3(stvehx, void, env, avr, tl)
>  DEF_HELPER_3(stvewx, void, env, avr, tl)
> diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c
> index a56051a..608803f 100644
> --- a/target-ppc/mem_helper.c
> +++ b/target-ppc/mem_helper.c
> @@ -290,6 +290,12 @@ uint64_t helper_bswap32x2(uint64_t x)
>      return deposit64((x >> 32), 32, 32, (x));
>  }
>  
> +uint64_t helper_bswap16x4(uint64_t x)
> +{
> +    uint64_t m = 0x00ff00ff00ff00ffull;
> +    return ((x & m) << 8) | ((x >> 8) & m);
> +}

This doesn't seem to match the bswap32x2 function above.  bswap32x2
just swaps the two 32-bit words in the 64-bit word.  This one swaps
the bytes in each individual 16 bit work in the 64-bit word.

I suspect the bswap32x2 is wrong, which would explain why the previous
patch didn't seem to make sense.

> +
>  #undef HI_IDX
>  #undef LO_IDX
>  
> diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
> index e3374df..caa6660 100644
> --- a/target-ppc/translate/vsx-impl.inc.c
> +++ b/target-ppc/translate/vsx-impl.inc.c
> @@ -108,6 +108,63 @@ static void gen_lxvw4x(DisasContext *ctx)
>      tcg_temp_free(EA);
>  }
>  
> +static void gen_lxvb16x(DisasContext *ctx)
> +{
> +    TCGv EA;
> +    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> +    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> +
> +    if (unlikely(!ctx->vsx_enabled)) {
> +        gen_exception(ctx, POWERPC_EXCP_VSXU);
> +        return;
> +    }
> +    gen_set_access_type(ctx, ACCESS_INT);
> +    EA = tcg_temp_new();
> +    gen_addr_reg_index(ctx, EA);
> +    if (ctx->le_mode) {
> +        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_BEQ);
> +        tcg_gen_addi_tl(EA, EA, 8);
> +        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
> +    } else {
> +        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
> +        gen_helper_bswap32x2(xth, xth);

I really don't understand how a bswap32x2 helps here, either as it's
defined now, or as I suspect it should be defined by analogy with
bswap16x4.

> +        tcg_gen_addi_tl(EA, EA, 8);
> +        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
> +        gen_helper_bswap32x2(xtl, xtl);

Also.. if I'm understanding the ISA correctly, this instruction loads
subsequent higher-address bytes into subsequent (i.e. less signficant,
since IBM uses BE bit/byte numbering) bytes in the vector.  Doesn't
that mean you want a BE load in all cases, not just the LE guest case?

> +    }
> +    tcg_temp_free(EA);
> +}
> +
> +static void gen_lxvh8x(DisasContext *ctx)
> +{
> +    TCGv EA;
> +    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> +    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> +
> +    if (unlikely(!ctx->vsx_enabled)) {
> +        gen_exception(ctx, POWERPC_EXCP_VSXU);
> +        return;
> +    }
> +    gen_set_access_type(ctx, ACCESS_INT);
> +    EA = tcg_temp_new();
> +    gen_addr_reg_index(ctx, EA);
> +
> +    if (ctx->le_mode) {
> +        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_BEQ);
> +        gen_helper_bswap16x4(xth, xth);
> +        tcg_gen_addi_tl(EA, EA, 8);
> +        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
> +        gen_helper_bswap16x4(xtl, xtl);
> +    } else {
> +        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
> +        gen_helper_bswap32x2(xth, xth);
> +        tcg_gen_addi_tl(EA, EA, 8);
> +        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
> +        gen_helper_bswap32x2(xtl, xtl);

Again, I think you want a BE load in both cases, and the bswap32x2
makes no sense to me.

> +    }
> +    tcg_temp_free(EA);
> +}
> +
>  #define VSX_STORE_SCALAR(name, operation)                     \
>  static void gen_##name(DisasContext *ctx)                     \
>  {                                                             \
> diff --git a/target-ppc/translate/vsx-ops.inc.c b/target-ppc/translate/vsx-ops.inc.c
> index 414b73b..598b349 100644
> --- a/target-ppc/translate/vsx-ops.inc.c
> +++ b/target-ppc/translate/vsx-ops.inc.c
> @@ -7,6 +7,8 @@ GEN_HANDLER_E(lxsspx, 0x1F, 0x0C, 0x10, 0, PPC_NONE, PPC2_VSX207),
>  GEN_HANDLER_E(lxvd2x, 0x1F, 0x0C, 0x1A, 0, PPC_NONE, PPC2_VSX),
>  GEN_HANDLER_E(lxvdsx, 0x1F, 0x0C, 0x0A, 0, PPC_NONE, PPC2_VSX),
>  GEN_HANDLER_E(lxvw4x, 0x1F, 0x0C, 0x18, 0, PPC_NONE, PPC2_VSX),
> +GEN_HANDLER_E(lxvh8x, 0x1F, 0x0C, 0x19, 0, PPC_NONE,  PPC2_ISA300),
> +GEN_HANDLER_E(lxvb16x, 0x1F, 0x0C, 0x1B, 0, PPC_NONE, PPC2_ISA300),
>  
>  GEN_HANDLER_E(stxsdx, 0x1F, 0xC, 0x16, 0, PPC_NONE, PPC2_VSX),
>  GEN_HANDLER_E(stxsibx, 0x1F, 0xD, 0x1C, 0, PPC_NONE, PPC2_ISA300),

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND v2 16/17] target-ppc: improve stxvw4x implementation
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 16/17] target-ppc: improve stxvw4x implementation Nikunj A Dadhania
@ 2016-09-15  1:44   ` David Gibson
  0 siblings, 0 replies; 32+ messages in thread
From: David Gibson @ 2016-09-15  1:44 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, benh

[-- Attachment #1: Type: text/plain, Size: 2713 bytes --]

On Mon, Sep 12, 2016 at 12:11:45PM +0530, Nikunj A Dadhania wrote:
> Manipulate data and store 8bytes instead of 4bytes.
> 
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  target-ppc/translate/vsx-impl.inc.c | 27 +++++++++++++--------------
>  1 file changed, 13 insertions(+), 14 deletions(-)
> 
> diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
> index caa6660..f2fc5f9 100644
> --- a/target-ppc/translate/vsx-impl.inc.c
> +++ b/target-ppc/translate/vsx-impl.inc.c
> @@ -205,7 +205,8 @@ static void gen_stxvd2x(DisasContext *ctx)
>  
>  static void gen_stxvw4x(DisasContext *ctx)
>  {
> -    TCGv_i64 tmp;
> +    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
> +    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
>      TCGv EA;
>      if (unlikely(!ctx->vsx_enabled)) {
>          gen_exception(ctx, POWERPC_EXCP_VSXU);
> @@ -214,21 +215,19 @@ static void gen_stxvw4x(DisasContext *ctx)
>      gen_set_access_type(ctx, ACCESS_INT);
>      EA = tcg_temp_new();
>      gen_addr_reg_index(ctx, EA);
> -    tmp = tcg_temp_new_i64();
> -
> -    tcg_gen_shri_i64(tmp, cpu_vsrh(xS(ctx->opcode)), 32);
> -    gen_qemu_st32_i64(ctx, tmp, EA);
> -    tcg_gen_addi_tl(EA, EA, 4);
> -    gen_qemu_st32_i64(ctx, cpu_vsrh(xS(ctx->opcode)), EA);
> -
> -    tcg_gen_shri_i64(tmp, cpu_vsrl(xS(ctx->opcode)), 32);
> -    tcg_gen_addi_tl(EA, EA, 4);
> -    gen_qemu_st32_i64(ctx, tmp, EA);
> -    tcg_gen_addi_tl(EA, EA, 4);
> -    gen_qemu_st32_i64(ctx, cpu_vsrl(xS(ctx->opcode)), EA);
>  
> +    if (ctx->le_mode) {
> +        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_BEQ);

This looks wrong again.  The BE store will storethe two 32-bit halves
in the right order, but nothing swaps the bytes within those halves
back to LE.

> +        tcg_gen_addi_tl(EA, EA, 8);
> +        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
> +    } else {
> +        gen_helper_bswap32x2(xsh, xsh);
> +        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_LEQ);

Whereas the LE store here will also get the bytes within each 32-bit
word in the wrong order for a BE guest. (bswap32x2 possibly should be
fixing that, but doesn't).

> +        tcg_gen_addi_tl(EA, EA, 8);
> +        gen_helper_bswap32x2(xsl, xsl);
> +        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_LEQ);
> +    }
>      tcg_temp_free(EA);
> -    tcg_temp_free_i64(tmp);
>  }
>  
>  #define MV_VSRW(name, tcgop1, tcgop2, target, source)           \

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND v2 17/17] target-ppc: add stxvb16x and stxvh8x
  2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 17/17] target-ppc: add stxvb16x and stxvh8x Nikunj A Dadhania
@ 2016-09-15  1:46   ` David Gibson
  2016-09-16  8:28     ` Nikunj A Dadhania
  0 siblings, 1 reply; 32+ messages in thread
From: David Gibson @ 2016-09-15  1:46 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, benh

[-- Attachment #1: Type: text/plain, Size: 4026 bytes --]

On Mon, Sep 12, 2016 at 12:11:46PM +0530, Nikunj A Dadhania wrote:
> stxvb16x: Store VSX Vector Byte*16
> stxvh8x:  Store VSX Vector Halfword*8
> 
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>

Basically the same comments as on the load side - this looks bogus to
me.

I think it would make sense to fold together the corresponding load
and store patches - makes it easier to review that they're doing
matching things.

> ---
>  target-ppc/translate/vsx-impl.inc.c | 55 +++++++++++++++++++++++++++++++++++++
>  target-ppc/translate/vsx-ops.inc.c  |  2 ++
>  2 files changed, 57 insertions(+)
> 
> diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
> index f2fc5f9..20afe3b 100644
> --- a/target-ppc/translate/vsx-impl.inc.c
> +++ b/target-ppc/translate/vsx-impl.inc.c
> @@ -165,6 +165,61 @@ static void gen_lxvh8x(DisasContext *ctx)
>      tcg_temp_free(EA);
>  }
>  
> +static void gen_stxvb16x(DisasContext *ctx)
> +{
> +    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
> +    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
> +    TCGv EA;
> +
> +    if (unlikely(!ctx->vsx_enabled)) {
> +        gen_exception(ctx, POWERPC_EXCP_VSXU);
> +        return;
> +    }
> +    gen_set_access_type(ctx, ACCESS_INT);
> +    EA = tcg_temp_new();
> +    gen_addr_reg_index(ctx, EA);
> +
> +    if (ctx->le_mode) {
> +        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_BEQ);
> +        tcg_gen_addi_tl(EA, EA, 8);
> +        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
> +    } else {
> +        gen_helper_bswap32x2(xsh, xsh);
> +        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_LEQ);
> +        tcg_gen_addi_tl(EA, EA, 8);
> +        gen_helper_bswap32x2(xsl, xsl);
> +        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_LEQ);
> +    }
> +    tcg_temp_free(EA);
> +}
> +
> +static void gen_stxvh8x(DisasContext *ctx)
> +{
> +    TCGv_i64 xsh = cpu_vsrh(xS(ctx->opcode));
> +    TCGv_i64 xsl = cpu_vsrl(xS(ctx->opcode));
> +    TCGv EA;
> +
> +    if (unlikely(!ctx->vsx_enabled)) {
> +        gen_exception(ctx, POWERPC_EXCP_VSXU);
> +        return;
> +    }
> +    gen_set_access_type(ctx, ACCESS_INT);
> +    EA = tcg_temp_new();
> +    gen_addr_reg_index(ctx, EA);
> +    if (ctx->le_mode) {
> +        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_BEQ);
> +        tcg_gen_addi_tl(EA, EA, 8);
> +        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEQ);
> +    } else {
> +        gen_helper_bswap32x2(xsh, xsh);
> +        tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_LEQ);
> +        tcg_gen_addi_tl(EA, EA, 8);
> +        gen_helper_bswap32x2(xsl, xsl);
> +        tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_LEQ);
> +    }
> +    tcg_temp_free(EA);
> +}
> +
>  #define VSX_STORE_SCALAR(name, operation)                     \
>  static void gen_##name(DisasContext *ctx)                     \
>  {                                                             \
> diff --git a/target-ppc/translate/vsx-ops.inc.c b/target-ppc/translate/vsx-ops.inc.c
> index 598b349..f5afa0f 100644
> --- a/target-ppc/translate/vsx-ops.inc.c
> +++ b/target-ppc/translate/vsx-ops.inc.c
> @@ -17,6 +17,8 @@ GEN_HANDLER_E(stxsiwx, 0x1F, 0xC, 0x04, 0, PPC_NONE, PPC2_VSX207),
>  GEN_HANDLER_E(stxsspx, 0x1F, 0xC, 0x14, 0, PPC_NONE, PPC2_VSX207),
>  GEN_HANDLER_E(stxvd2x, 0x1F, 0xC, 0x1E, 0, PPC_NONE, PPC2_VSX),
>  GEN_HANDLER_E(stxvw4x, 0x1F, 0xC, 0x1C, 0, PPC_NONE, PPC2_VSX),
> +GEN_HANDLER_E(stxvh8x, 0x1F, 0x0C, 0x1D, 0, PPC_NONE,  PPC2_ISA300),
> +GEN_HANDLER_E(stxvb16x, 0x1F, 0x0C, 0x1F, 0, PPC_NONE, PPC2_ISA300),
>  
>  GEN_HANDLER_E(mfvsrwz, 0x1F, 0x13, 0x03, 0x0000F800, PPC_NONE, PPC2_VSX207),
>  GEN_HANDLER_E(mtvsrwa, 0x1F, 0x13, 0x06, 0x0000F800, PPC_NONE, PPC2_VSX207),

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4
  2016-09-15  0:56 ` David Gibson
@ 2016-09-15  1:49   ` David Gibson
  0 siblings, 0 replies; 32+ messages in thread
From: David Gibson @ 2016-09-15  1:49 UTC (permalink / raw)
  To: Nikunj A Dadhania; +Cc: qemu-ppc, rth, qemu-devel, benh

[-- Attachment #1: Type: text/plain, Size: 3621 bytes --]

On Thu, Sep 15, 2016 at 10:56:56AM +1000, David Gibson wrote:
> On Mon, Sep 12, 2016 at 12:11:29PM +0530, Nikunj A Dadhania wrote:
> > 1) Consolidate Load/Store operations using tcg_gen_qemu_ld/st functions
> > 2) This series contains 10 new instructions for POWER9 ISA3.0
> >    Use newer qemu load/store tcg helpers and optimize stxvw4x and lxvw4x.
> > 
> > Patches:
> >  01-09:  Cleanup load/store operations in ppc translator
> 
> I've applied 1-9 to ppc-for-2.8.  Still need to review the remainder
> of the series.

I've also applied 10, 12 and 13.  The remainder I still have things
I'd like addressed.

> 
> >     10:  xxspltib: VSX Vector Splat Immediate Byte
> >     11:  darn: Deliver a random number
> >     12:  lxsibzx - Load VSX Scalar as Integer Byte & Zero Indexed
> >          lxsihzx - Load VSX Scalar as Integer Halfword & Zero Indexed
> >     13:  stxsibx - Store VSX Scalar as Integer Byte Indexed
> >          stxsihx - Store VSX Scalar as Integer Halfword Indexed
> >     14:  lxvw4x - improve implementation
> >     15:  lxvb16x: Load VSX Vector Byte*16
> >          lxvh8x:  Load VSX Vector Halfword*8
> >     16:  stxv4x - improve implementation
> >     17:  stxvb16x: Store VSX Vector Byte*16
> >          stxvh8x:  Store VSX Vector Halfword*8
> > 
> > Series also available here: https://github.com/nikunjad/qemu/tree/p9-tcg
> > 
> > Changelog:
> > v1: 
> > * More load/store cleanups in byte reverse routines
> > * ld64/st64 converted to newer macro and updated call sites
> > * Cleanup load with reservation and store conditional
> > * Return invalid random for darn instruction
> > 
> > v0:
> > * darn - read /dev/random to get the random number
> > * xxspltib - make is PPC64 only
> > * Consolidate load/store operations and use macros to generate qemu_st/ld
> > * Simplify load/store vsx endian manipulation
> > 
> > Nikunj A Dadhania (16):
> >   target-ppc: consolidate load operations
> >   target-ppc: convert ld64 to use new macro
> >   target-ppc: convert ld[16,32,64]ur to use new macro
> >   target-ppc: consolidate store operations
> >   target-ppc: convert st64 to use new macro
> >   target-ppc: convert st[16,32,64]r to use new macro
> >   target-ppc: consolidate load with reservation
> >   target-ppc: move out stqcx impementation
> >   target-ppc: consolidate store conditional
> >   target-ppc: add xxspltib instruction
> >   target-ppc: add lxsi[bw]zx instruction
> >   target-ppc: add stxsi[bh]x instruction
> >   target-ppc: improve lxvw4x implementation
> >   target-ppc: add lxvb16x and lxvh8x
> >   target-ppc: improve stxvw4x implementation
> >   target-ppc: add stxvb16x and stxvh8x
> > 
> > Ravi Bangoria (1):
> >   target-ppc: implement darn instruction
> > 
> >  target-ppc/helper.h                 |   4 +
> >  target-ppc/int_helper.c             |  16 ++
> >  target-ppc/mem_helper.c             |  11 ++
> >  target-ppc/translate.c              | 379 +++++++++++++++++-------------------
> >  target-ppc/translate/fp-impl.inc.c  |  84 ++++----
> >  target-ppc/translate/fp-ops.inc.c   |   2 +-
> >  target-ppc/translate/spe-impl.inc.c |   4 +-
> >  target-ppc/translate/vmx-impl.inc.c |  24 +--
> >  target-ppc/translate/vsx-impl.inc.c | 208 ++++++++++++++++----
> >  target-ppc/translate/vsx-ops.inc.c  |  13 ++
> >  10 files changed, 460 insertions(+), 285 deletions(-)
> > 
> 



-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND v2 11/17] target-ppc: implement darn instruction
  2016-09-15  1:07   ` David Gibson
@ 2016-09-15  6:40     ` Nikunj A Dadhania
  0 siblings, 0 replies; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-15  6:40 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, rth, qemu-devel, benh, Ravi Bangoria

David Gibson <david@gibson.dropbear.id.au> writes:

> [ Unknown signature status ]
> On Mon, Sep 12, 2016 at 12:11:40PM +0530, Nikunj A Dadhania wrote:
>> From: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
>> 
>> darn: Deliver A Random Number
>> 
>> Currently return invalid random number for all the case. This needs
>> proper algorithm to provide cryptographically suitable random data.
>> Reading from /dev/random can block and that is not an expected behaviour
>> while the cpu instruction is getting executed. Moreover, /dev/random
>> would only work for linux-user
>> 
>> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
>> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>> ---
>>  target-ppc/helper.h     |  2 ++
>>  target-ppc/int_helper.c | 16 ++++++++++++++++
>>  target-ppc/translate.c  | 18 ++++++++++++++++++
>>  3 files changed, 36 insertions(+)
>> 
>> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
>> index e75d070..966f2ce 100644
>> --- a/target-ppc/helper.h
>> +++ b/target-ppc/helper.h
>> @@ -50,6 +50,8 @@ DEF_HELPER_FLAGS_1(cnttzd, TCG_CALL_NO_RWG_SE, tl, tl)
>>  DEF_HELPER_FLAGS_1(popcntd, TCG_CALL_NO_RWG_SE, tl, tl)
>>  DEF_HELPER_FLAGS_2(bpermd, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>>  DEF_HELPER_3(srad, tl, env, tl, tl)
>> +DEF_HELPER_0(darn32, tl)
>> +DEF_HELPER_0(darn64, tl)
>>  #endif
>>  
>>  DEF_HELPER_FLAGS_1(cntlsw32, TCG_CALL_NO_RWG_SE, i32, i32)
>> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
>> index 291fba0..51a9ac5 100644
>> --- a/target-ppc/int_helper.c
>> +++ b/target-ppc/int_helper.c
>> @@ -182,6 +182,22 @@ target_ulong helper_cnttzd(target_ulong t)
>>  {
>>      return ctz64(t);
>>  }
>> +
>> +/* Return invalid random number.
>> + *
>> + * FIXME: Add rng backend or other mechanism to get cryptographically suitable
>> + * random number
>> + */
>> +target_ulong helper_darn32(void)
>> +{
>> +    return -1;
>> +}
>> +
>> +target_ulong helper_darn64(void)
>> +{
>> +    return -1;
>> +}
>> +
>
> TBH, I think you're going to want a single helper for both 32-bit and
> 64-bit cases.

Initially, we used to split 32/64-bit in helper, Richard suggested to
make that decision in translate.c

>
>>  #endif
>>  
>>  #if defined(TARGET_PPC64)
>> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
>> index 133c531..e9dad3f 100644
>> --- a/target-ppc/translate.c
>> +++ b/target-ppc/translate.c
>> @@ -528,6 +528,8 @@ EXTRACT_HELPER(FPW, 16, 1);
>>  
>>  /* addpcis */
>>  EXTRACT_HELPER_DXFORM(DX, 10, 6, 6, 5, 16, 1, 1, 0, 0)
>> +/* darn */
>> +EXTRACT_HELPER(L, 16, 2);
>>  
>>  /***                            Jump target decoding                       ***/
>>  /* Immediate address */
>> @@ -1895,6 +1897,21 @@ static void gen_cnttzd(DisasContext *ctx)
>>          gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
>>      }
>>  }
>> +
>> +/* darn */
>> +static void gen_darn(DisasContext *ctx)
>> +{
>> +    int l = L(ctx->opcode);
>> +
>> +    if (l == 0) {
>> +        gen_helper_darn32(cpu_gpr[rD(ctx->opcode)]);
>> +    } else if (l <= 2) {
>> +        /* Return 64-bit random for both CRN and RRN */
>> +        gen_helper_darn64(cpu_gpr[rD(ctx->opcode)]);
>
> So it might be simpler to just leave out the helper stubs for now, and
> always return the invalid value from the generated code.

The idea was to retain all the translation work that we did as stubs.
And fill it whenever we have suitable implementation. Wouldn't hurt to
leave it like this. I am fine both ways.

Regards,
Nikunj

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND v2 14/17] target-ppc: improve lxvw4x implementation
  2016-09-15  1:20   ` David Gibson
@ 2016-09-15  9:57     ` Nikunj A Dadhania
  0 siblings, 0 replies; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-15  9:57 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, rth, qemu-devel, benh

David Gibson <david@gibson.dropbear.id.au> writes:

> [ Unknown signature status ]
> On Mon, Sep 12, 2016 at 12:11:43PM +0530, Nikunj A Dadhania wrote:
>> Load 8byte at a time and manipulate.
>> 
>> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>> ---
>>  target-ppc/helper.h                 |  1 +
>>  target-ppc/mem_helper.c             |  5 +++++
>>  target-ppc/translate/vsx-impl.inc.c | 34 ++++++++++++++++++++--------------
>>  3 files changed, 26 insertions(+), 14 deletions(-)
>> 
>> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
>> index 966f2ce..1bbeac4 100644
>> --- a/target-ppc/helper.h
>> +++ b/target-ppc/helper.h
>> @@ -297,6 +297,7 @@ DEF_HELPER_2(mtvscr, void, env, avr)
>>  DEF_HELPER_3(lvebx, void, env, avr, tl)
>>  DEF_HELPER_3(lvehx, void, env, avr, tl)
>>  DEF_HELPER_3(lvewx, void, env, avr, tl)
>> +DEF_HELPER_1(bswap32x2, i64, i64)
>>  DEF_HELPER_3(stvebx, void, env, avr, tl)
>>  DEF_HELPER_3(stvehx, void, env, avr, tl)
>>  DEF_HELPER_3(stvewx, void, env, avr, tl)
>> diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c
>> index 6548715..a56051a 100644
>> --- a/target-ppc/mem_helper.c
>> +++ b/target-ppc/mem_helper.c
>> @@ -285,6 +285,11 @@ STVE(stvewx, cpu_stl_data_ra, bswap32, u32)
>>  #undef I
>>  #undef LVE
>>  
>> +uint64_t helper_bswap32x2(uint64_t x)
>> +{
>> +    return deposit64((x >> 32), 32, 32, (x));
>> +}
>> +
>>  #undef HI_IDX
>>  #undef LO_IDX
>>  
>> diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
>> index eee6052..e3374df 100644
>> --- a/target-ppc/translate/vsx-impl.inc.c
>> +++ b/target-ppc/translate/vsx-impl.inc.c
>> @@ -75,7 +75,7 @@ static void gen_lxvdsx(DisasContext *ctx)
>>  static void gen_lxvw4x(DisasContext *ctx)
>>  {
>>      TCGv EA;
>> -    TCGv_i64 tmp;
>> +    TCGv_i64 t0, t1;
>>      TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
>>      TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
>>      if (unlikely(!ctx->vsx_enabled)) {
>> @@ -84,22 +84,28 @@ static void gen_lxvw4x(DisasContext *ctx)
>>      }
>>      gen_set_access_type(ctx, ACCESS_INT);
>>      EA = tcg_temp_new();
>> -    tmp = tcg_temp_new_i64();
>>  
>>      gen_addr_reg_index(ctx, EA);
>> -    gen_qemu_ld32u_i64(ctx, tmp, EA);
>> -    tcg_gen_addi_tl(EA, EA, 4);
>> -    gen_qemu_ld32u_i64(ctx, xth, EA);
>> -    tcg_gen_deposit_i64(xth, xth, tmp, 32, 32);
>> -
>> -    tcg_gen_addi_tl(EA, EA, 4);
>> -    gen_qemu_ld32u_i64(ctx, tmp, EA);
>> -    tcg_gen_addi_tl(EA, EA, 4);
>> -    gen_qemu_ld32u_i64(ctx, xtl, EA);
>> -    tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32);
>> -
>> +    if (ctx->le_mode) {
>> +        t0 = tcg_temp_new_i64();
>> +        t1 = tcg_temp_new_i64();
>> +        tcg_gen_qemu_ld_i64(t0, EA, ctx->mem_idx, MO_LEQ);
>> +        tcg_gen_shri_i64(t1, t0, 32);
>> +        tcg_gen_deposit_i64(xth, t1, t0, 32, 32);
>> +        tcg_gen_addi_tl(EA, EA, 8);
>> +        tcg_gen_qemu_ld_i64(t0, EA, ctx->mem_idx, MO_LEQ);
>> +        tcg_gen_shri_i64(t1, t0, 32);
>> +        tcg_gen_deposit_i64(xtl, t1, t0, 32, 32);
>> +        tcg_temp_free_i64(t0);
>> +        tcg_temp_free_i64(t1);
>> +    } else {
>> +        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
>> +        gen_helper_bswap32x2(xth, xth);
>> +        tcg_gen_addi_tl(EA, EA, 8);
>> +        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
>> +        gen_helper_bswap32x2(xtl, xtl);
>> +    }
>
> So.. for starters using a helper for just one endianness is kind of
> ugly.  But for going on with, it looks like the two paths are doing
> the same thing, just one is doing it with TCG ops and the other via a
> helper.

Right, while iterating and optimizing logic for BE, I got it same as LE ;-)
We can just use BE. Following is good enough for both BE/LE.

+    tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
+    gen_helper_bswap32x2(xth, xth);
+    tcg_gen_addi_tl(EA, EA, 8);
+    tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
+    gen_helper_bswap32x2(xtl, xtl);

Regards,
Nikunj

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND v2 15/17] target-ppc: add lxvb16x and lxvh8x
  2016-09-15  1:41   ` David Gibson
@ 2016-09-16  8:26     ` Nikunj A Dadhania
  0 siblings, 0 replies; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-16  8:26 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, rth, qemu-devel, benh

David Gibson <david@gibson.dropbear.id.au> writes:

> [ Unknown signature status ]
> On Mon, Sep 12, 2016 at 12:11:44PM +0530, Nikunj A Dadhania wrote:
>> lxvb16x: Load VSX Vector Byte*16
>> lxvh8x:  Load VSX Vector Halfword*8
>> 
>> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>> ---
>>  target-ppc/helper.h                 |  1 +
>>  target-ppc/mem_helper.c             |  6 ++++
>>  target-ppc/translate/vsx-impl.inc.c | 57 +++++++++++++++++++++++++++++++++++++
>>  target-ppc/translate/vsx-ops.inc.c  |  2 ++
>>  4 files changed, 66 insertions(+)
>> 
>> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
>> index 1bbeac4..6de0db7 100644
>> --- a/target-ppc/helper.h
>> +++ b/target-ppc/helper.h
>> @@ -298,6 +298,7 @@ DEF_HELPER_3(lvebx, void, env, avr, tl)
>>  DEF_HELPER_3(lvehx, void, env, avr, tl)
>>  DEF_HELPER_3(lvewx, void, env, avr, tl)
>>  DEF_HELPER_1(bswap32x2, i64, i64)
>> +DEF_HELPER_1(bswap16x4, i64, i64)
>>  DEF_HELPER_3(stvebx, void, env, avr, tl)
>>  DEF_HELPER_3(stvehx, void, env, avr, tl)
>>  DEF_HELPER_3(stvewx, void, env, avr, tl)
>> diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c
>> index a56051a..608803f 100644
>> --- a/target-ppc/mem_helper.c
>> +++ b/target-ppc/mem_helper.c
>> @@ -290,6 +290,12 @@ uint64_t helper_bswap32x2(uint64_t x)
>>      return deposit64((x >> 32), 32, 32, (x));
>>  }
>>  
>> +uint64_t helper_bswap16x4(uint64_t x)
>> +{
>> +    uint64_t m = 0x00ff00ff00ff00ffull;
>> +    return ((x & m) << 8) | ((x >> 8) & m);
>> +}
>
> This doesn't seem to match the bswap32x2 function above.  bswap32x2
> just swaps the two 32-bit words in the 64-bit word.  This one swaps
> the bytes in each individual 16 bit work in the 64-bit word.
>
> I suspect the bswap32x2 is wrong, which would explain why the previous
> patch didn't seem to make sense.

The confusion is because of the name(bswap32x2), I will rename it as
deposit32x2. And let bswap16x4 as is, as that is a required operation in
certain cases to get the right order expected by the instruction.

>
>> +
>>  #undef HI_IDX
>>  #undef LO_IDX
>>  
>> diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
>> index e3374df..caa6660 100644
>> --- a/target-ppc/translate/vsx-impl.inc.c
>> +++ b/target-ppc/translate/vsx-impl.inc.c
>> @@ -108,6 +108,63 @@ static void gen_lxvw4x(DisasContext *ctx)
>>      tcg_temp_free(EA);
>>  }
>>  
>> +static void gen_lxvb16x(DisasContext *ctx)
>> +{
>> +    TCGv EA;
>> +    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
>> +    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
>> +
>> +    if (unlikely(!ctx->vsx_enabled)) {
>> +        gen_exception(ctx, POWERPC_EXCP_VSXU);
>> +        return;
>> +    }
>> +    gen_set_access_type(ctx, ACCESS_INT);
>> +    EA = tcg_temp_new();
>> +    gen_addr_reg_index(ctx, EA);
>> +    if (ctx->le_mode) {
>> +        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_BEQ);
>> +        tcg_gen_addi_tl(EA, EA, 8);
>> +        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
>> +    } else {
>> +        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
>> +        gen_helper_bswap32x2(xth, xth);
>
> I really don't understand how a bswap32x2 helps here, either as it's
> defined now, or as I suspect it should be defined by analogy with
> bswap16x4.
>
>> +        tcg_gen_addi_tl(EA, EA, 8);
>> +        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
>> +        gen_helper_bswap32x2(xtl, xtl);
>
> Also.. if I'm understanding the ISA correctly, this instruction loads
> subsequent higher-address bytes into subsequent (i.e. less signficant,
> since IBM uses BE bit/byte numbering) bytes in the vector.  Doesn't
> that mean you want a BE load in all cases, not just the LE guest case?
>
>> +    }
>> +    tcg_temp_free(EA);
>> +}
>> +
>> +static void gen_lxvh8x(DisasContext *ctx)
>> +{
>> +    TCGv EA;
>> +    TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
>> +    TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
>> +
>> +    if (unlikely(!ctx->vsx_enabled)) {
>> +        gen_exception(ctx, POWERPC_EXCP_VSXU);
>> +        return;
>> +    }
>> +    gen_set_access_type(ctx, ACCESS_INT);
>> +    EA = tcg_temp_new();
>> +    gen_addr_reg_index(ctx, EA);
>> +
>> +    if (ctx->le_mode) {
>> +        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_BEQ);
>> +        gen_helper_bswap16x4(xth, xth);
>> +        tcg_gen_addi_tl(EA, EA, 8);
>> +        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
>> +        gen_helper_bswap16x4(xtl, xtl);
>> +    } else {
>> +        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
>> +        gen_helper_bswap32x2(xth, xth);
>> +        tcg_gen_addi_tl(EA, EA, 8);
>> +        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
>> +        gen_helper_bswap32x2(xtl, xtl);
>
> Again, I think you want a BE load in both cases, and the bswap32x2
> makes no sense to me.

BTW, MO_BEQ(big-endian load) and MO_LEQ(little-endian) also does magic
during loads. Now when I have 8bytes loaded, i do the manipulation
within to get the right order.

For an input of:
uint16_t rb16[8] = {0x0001, 0x1011, 0x2021, 0x3031, 0x4041, 0x5051, 0x6061, 0x7071};

For LE I should get this result:
7071 6061 5051 4041 3031 2021 1011 0001

For BE it should be:
0001 1011 2021 3031 4041 5051 6061 7071

Thats what the tcg operations achieves. I guess renaming bswap32x2 will
clear the confusion.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH RESEND v2 17/17] target-ppc: add stxvb16x and stxvh8x
  2016-09-15  1:46   ` David Gibson
@ 2016-09-16  8:28     ` Nikunj A Dadhania
  0 siblings, 0 replies; 32+ messages in thread
From: Nikunj A Dadhania @ 2016-09-16  8:28 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, rth, qemu-devel, benh

David Gibson <david@gibson.dropbear.id.au> writes:

> [ Unknown signature status ]
> On Mon, Sep 12, 2016 at 12:11:46PM +0530, Nikunj A Dadhania wrote:
>> stxvb16x: Store VSX Vector Byte*16
>> stxvh8x:  Store VSX Vector Halfword*8
>> 
>> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
>
> Basically the same comments as on the load side - this looks bogus to
> me.
>
> I think it would make sense to fold together the corresponding load
> and store patches - makes it easier to review that they're doing
> matching things.

Sure, I will fold store/load together.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-09-16  8:28 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-12  6:41 [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 Nikunj A Dadhania
2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 01/17] target-ppc: consolidate load operations Nikunj A Dadhania
2016-09-15  0:46   ` David Gibson
2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 02/17] target-ppc: convert ld64 to use new macro Nikunj A Dadhania
2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 03/17] target-ppc: convert ld[16, 32, 64]ur " Nikunj A Dadhania
2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 04/17] target-ppc: consolidate store operations Nikunj A Dadhania
2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 05/17] target-ppc: convert st64 to use new macro Nikunj A Dadhania
2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 06/17] target-ppc: convert st[16, 32, 64]r " Nikunj A Dadhania
2016-09-15  0:48   ` David Gibson
2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 07/17] target-ppc: consolidate load with reservation Nikunj A Dadhania
2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 08/17] target-ppc: move out stqcx impementation Nikunj A Dadhania
2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 09/17] target-ppc: consolidate store conditional Nikunj A Dadhania
2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 10/17] target-ppc: add xxspltib instruction Nikunj A Dadhania
2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 11/17] target-ppc: implement darn instruction Nikunj A Dadhania
2016-09-15  1:07   ` David Gibson
2016-09-15  6:40     ` Nikunj A Dadhania
2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 12/17] target-ppc: add lxsi[bw]zx instruction Nikunj A Dadhania
2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 13/17] target-ppc: add stxsi[bh]x instruction Nikunj A Dadhania
2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 14/17] target-ppc: improve lxvw4x implementation Nikunj A Dadhania
2016-09-15  1:20   ` David Gibson
2016-09-15  9:57     ` Nikunj A Dadhania
2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 15/17] target-ppc: add lxvb16x and lxvh8x Nikunj A Dadhania
2016-09-15  1:41   ` David Gibson
2016-09-16  8:26     ` Nikunj A Dadhania
2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 16/17] target-ppc: improve stxvw4x implementation Nikunj A Dadhania
2016-09-15  1:44   ` David Gibson
2016-09-12  6:41 ` [Qemu-devel] [PATCH RESEND v2 17/17] target-ppc: add stxvb16x and stxvh8x Nikunj A Dadhania
2016-09-15  1:46   ` David Gibson
2016-09-16  8:28     ` Nikunj A Dadhania
2016-09-12  7:19 ` [Qemu-devel] [PATCH RESEND v2 00/17] POWER9 TCG enablements - part4 no-reply
2016-09-15  0:56 ` David Gibson
2016-09-15  1:49   ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.