All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/4] POWER9 TCG enablements - part7
@ 2016-10-24  9:14 Nikunj A Dadhania
  2016-10-24  9:14 ` [Qemu-devel] [PATCH 1/4] target-ppc: add xscmp[eq, gt, ge, ne]dp instructions Nikunj A Dadhania
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Nikunj A Dadhania @ 2016-10-24  9:14 UTC (permalink / raw)
  To: qemu-ppc, david, rth
  Cc: qemu-devel, nikunj, bharata, hegdevasant, sandipandas1990, ego

This series contains 12 new instructions for POWER9 ISA3.0
    VSX Scalar compare
    Vector Multiply-by-10
    Vector Rotate Left Dword
    Vector Rotate Left Word 

Patches:
01: 
    xscmpeqdp: VSX Scalar Compare Equal Double-Precision
    xscmpgedp: VSX Scalar Compare Greater Than or Equal Double-Precision
    xscmpgtdp: VSX Scalar Compare Greater Than Double-Precision
    xscmpnedp: VSX Scalar Compare Not Equal Double-Precision
02: 
    vmul10uq  : Vector Multiply-by-10 Unsigned Quadword VX-form
    vmul10euq : Vector Multiply-by-10 Extended Unsigned Quadword VX-form
    vmul10cuq : Vector Multiply-by-10 & write Carry Unsigned Quadword VX-form
    vmul10ecuq: Vector Multiply-by-10 Extended & write Carry Unsigned Quadword VX-form
03: 
    vrldmi: Vector Rotate Left Dword then Mask Insert
    vrlwmi: Vector Rotate Left Word then Mask Insert
04: 
    vrldnm: Vector Rotate Left Doubleword then AND with Mask
    vrlwnm: Vector Rotate Left Word then AND with Mask

Bharata B Rao (1):
  target-ppc: add vrldnm and vrlwnm instructions

Gautham R. Shenoy (1):
  target-ppc: add vrldnmi and vrlwmi instructions

Sandipan Das (1):
  target-ppc: add xscmp[eq,gt,ge,ne]dp instructions

Vasant Hegde (1):
  target-ppc: add vmul10[u,eu,cu,ecu]q instructions

 disas/ppc.c                         |   4 ++
 target-ppc/fpu_helper.c             |  52 ++++++++++++++++++
 target-ppc/helper.h                 |   8 +++
 target-ppc/int_helper.c             | 106 ++++++++++++++++++++++++++++++++++++
 target-ppc/translate/vmx-impl.inc.c |  84 ++++++++++++++++++++++++++++
 target-ppc/translate/vmx-ops.inc.c  |  16 +++---
 target-ppc/translate/vsx-impl.inc.c |   4 ++
 target-ppc/translate/vsx-ops.inc.c  |   4 ++
 8 files changed, 270 insertions(+), 8 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 1/4] target-ppc: add xscmp[eq, gt, ge, ne]dp instructions
  2016-10-24  9:14 [Qemu-devel] [PATCH 0/4] POWER9 TCG enablements - part7 Nikunj A Dadhania
@ 2016-10-24  9:14 ` Nikunj A Dadhania
  2016-10-24  9:14 ` [Qemu-devel] [PATCH 2/4] target-ppc: add vmul10[u, eu, cu, ecu]q instructions Nikunj A Dadhania
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Nikunj A Dadhania @ 2016-10-24  9:14 UTC (permalink / raw)
  To: qemu-ppc, david, rth
  Cc: qemu-devel, nikunj, bharata, hegdevasant, sandipandas1990, ego

From: Sandipan Das <sandipandas1990@gmail.com>

xscmpeqdp: VSX Scalar Compare Equal Double-Precision
xscmpgedp: VSX Scalar Compare Greater Than or Equal Double-Precision
xscmpgtdp: VSX Scalar Compare Greater Than Double-Precision
xscmpnedp: VSX Scalar Compare Not Equal Double-Precision

Signed-off-by: Sandipan Das <sandipandas1990@gmail.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/fpu_helper.c             | 52 +++++++++++++++++++++++++++++++++++++
 target-ppc/helper.h                 |  4 +++
 target-ppc/translate/vsx-impl.inc.c |  4 +++
 target-ppc/translate/vsx-ops.inc.c  |  4 +++
 4 files changed, 64 insertions(+)

diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c
index b0760f0..4906372 100644
--- a/target-ppc/fpu_helper.c
+++ b/target-ppc/fpu_helper.c
@@ -2362,6 +2362,58 @@ VSX_MADD(xvnmaddmsp, 4, float32, VsrW(i), NMADD_FLGS, 0, 0, 0)
 VSX_MADD(xvnmsubasp, 4, float32, VsrW(i), NMSUB_FLGS, 1, 0, 0)
 VSX_MADD(xvnmsubmsp, 4, float32, VsrW(i), NMSUB_FLGS, 0, 0, 0)
 
+/* VSX_SCALAR_CMP_DP - VSX scalar floating point compare double precision
+ *   op    - instruction mnemonic
+ *   cmp   - comparison operation
+ *   exp   - expected result of comparison
+ *   svxvc - set VXVC bit
+ */
+#define VSX_SCALAR_CMP_DP(op, cmp, exp, svxvc)                                \
+void helper_##op(CPUPPCState *env, uint32_t opcode)                           \
+{                                                                             \
+    ppc_vsr_t xt, xa, xb;                                                     \
+    bool vxsnan_flag = false, vxvc_flag = false, vex_flag = false;            \
+                                                                              \
+    getVSR(xA(opcode), &xa, env);                                             \
+    getVSR(xB(opcode), &xb, env);                                             \
+    getVSR(xT(opcode), &xt, env);                                             \
+                                                                              \
+    if (float64_is_signaling_nan(xa.VsrD(0), &env->fp_status) ||              \
+        float64_is_signaling_nan(xb.VsrD(0), &env->fp_status)) {              \
+        vxsnan_flag = true;                                                   \
+        if (fpscr_ve == 0 && svxvc) {                                         \
+            vxvc_flag = true;                                                 \
+        }                                                                     \
+    } else if (svxvc) {                                                       \
+        vxvc_flag = float64_is_quiet_nan(xa.VsrD(0), &env->fp_status) ||      \
+            float64_is_quiet_nan(xb.VsrD(0), &env->fp_status);                \
+    }                                                                         \
+    if (vxsnan_flag) {                                                        \
+        float_invalid_op_excp(env, POWERPC_EXCP_FP_VXSNAN, 0);                \
+    }                                                                         \
+    if (vxvc_flag) {                                                          \
+        float_invalid_op_excp(env, POWERPC_EXCP_FP_VXVC, 0);                  \
+    }                                                                         \
+    vex_flag = fpscr_ve && (vxvc_flag || vxsnan_flag);                        \
+                                                                              \
+    if (!vex_flag) {                                                          \
+        if (float64_##cmp(xb.VsrD(0), xa.VsrD(0), &env->fp_status) == exp) {  \
+            xt.VsrD(0) = -1;                                                  \
+            xt.VsrD(1) = 0;                                                   \
+        } else {                                                              \
+            xt.VsrD(0) = 0;                                                   \
+            xt.VsrD(1) = 0;                                                   \
+        }                                                                     \
+    }                                                                         \
+    putVSR(xT(opcode), &xt, env);                                             \
+    helper_float_check_status(env);                                           \
+}
+
+VSX_SCALAR_CMP_DP(xscmpeqdp, eq, 1, 0)
+VSX_SCALAR_CMP_DP(xscmpgedp, le, 1, 1)
+VSX_SCALAR_CMP_DP(xscmpgtdp, lt, 1, 1)
+VSX_SCALAR_CMP_DP(xscmpnedp, eq, 0, 0)
+
 #define VSX_SCALAR_CMP(op, ordered)                                      \
 void helper_##op(CPUPPCState *env, uint32_t opcode)                      \
 {                                                                        \
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 5fcc546..0337292 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -389,6 +389,10 @@ DEF_HELPER_2(xsnmaddadp, void, env, i32)
 DEF_HELPER_2(xsnmaddmdp, void, env, i32)
 DEF_HELPER_2(xsnmsubadp, void, env, i32)
 DEF_HELPER_2(xsnmsubmdp, void, env, i32)
+DEF_HELPER_2(xscmpeqdp, void, env, i32)
+DEF_HELPER_2(xscmpgtdp, void, env, i32)
+DEF_HELPER_2(xscmpgedp, void, env, i32)
+DEF_HELPER_2(xscmpnedp, void, env, i32)
 DEF_HELPER_2(xscmpodp, void, env, i32)
 DEF_HELPER_2(xscmpudp, void, env, i32)
 DEF_HELPER_2(xsmaxdp, void, env, i32)
diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
index 1508bd1..bf167d0 100644
--- a/target-ppc/translate/vsx-impl.inc.c
+++ b/target-ppc/translate/vsx-impl.inc.c
@@ -620,6 +620,10 @@ GEN_VSX_HELPER_2(xsnmaddadp, 0x04, 0x14, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xsnmaddmdp, 0x04, 0x15, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xsnmsubadp, 0x04, 0x16, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xsnmsubmdp, 0x04, 0x17, 0, PPC2_VSX)
+GEN_VSX_HELPER_2(xscmpeqdp, 0x0C, 0x00, 0, PPC2_ISA300)
+GEN_VSX_HELPER_2(xscmpgtdp, 0x0C, 0x01, 0, PPC2_ISA300)
+GEN_VSX_HELPER_2(xscmpgedp, 0x0C, 0x02, 0, PPC2_ISA300)
+GEN_VSX_HELPER_2(xscmpnedp, 0x0C, 0x03, 0, PPC2_ISA300)
 GEN_VSX_HELPER_2(xscmpodp, 0x0C, 0x05, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xscmpudp, 0x0C, 0x04, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xsmaxdp, 0x00, 0x14, 0, PPC2_VSX)
diff --git a/target-ppc/translate/vsx-ops.inc.c b/target-ppc/translate/vsx-ops.inc.c
index af0d27e..202c557 100644
--- a/target-ppc/translate/vsx-ops.inc.c
+++ b/target-ppc/translate/vsx-ops.inc.c
@@ -114,6 +114,10 @@ GEN_XX3FORM(xsnmaddadp, 0x04, 0x14, PPC2_VSX),
 GEN_XX3FORM(xsnmaddmdp, 0x04, 0x15, PPC2_VSX),
 GEN_XX3FORM(xsnmsubadp, 0x04, 0x16, PPC2_VSX),
 GEN_XX3FORM(xsnmsubmdp, 0x04, 0x17, PPC2_VSX),
+GEN_XX3FORM(xscmpeqdp, 0x0C, 0x00, PPC2_ISA300),
+GEN_XX3FORM(xscmpgtdp, 0x0C, 0x01, PPC2_ISA300),
+GEN_XX3FORM(xscmpgedp, 0x0C, 0x02, PPC2_ISA300),
+GEN_XX3FORM(xscmpnedp, 0x0C, 0x03, PPC2_ISA300),
 GEN_XX2IFORM(xscmpodp,  0x0C, 0x05, PPC2_VSX),
 GEN_XX2IFORM(xscmpudp,  0x0C, 0x04, PPC2_VSX),
 GEN_XX3FORM(xsmaxdp, 0x00, 0x14, PPC2_VSX),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 2/4] target-ppc: add vmul10[u, eu, cu, ecu]q instructions
  2016-10-24  9:14 [Qemu-devel] [PATCH 0/4] POWER9 TCG enablements - part7 Nikunj A Dadhania
  2016-10-24  9:14 ` [Qemu-devel] [PATCH 1/4] target-ppc: add xscmp[eq, gt, ge, ne]dp instructions Nikunj A Dadhania
@ 2016-10-24  9:14 ` Nikunj A Dadhania
  2016-10-24 16:04   ` Richard Henderson
  2016-10-24  9:14 ` [Qemu-devel] [PATCH 3/4] target-ppc: add vrldnmi and vrlwmi instructions Nikunj A Dadhania
  2016-10-24  9:15 ` [Qemu-devel] [PATCH 4/4] target-ppc: add vrldnm and vrlwnm instructions Nikunj A Dadhania
  3 siblings, 1 reply; 14+ messages in thread
From: Nikunj A Dadhania @ 2016-10-24  9:14 UTC (permalink / raw)
  To: qemu-ppc, david, rth
  Cc: qemu-devel, nikunj, bharata, hegdevasant, sandipandas1990, ego

From: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>

vmul10uq  : Vector Multiply-by-10 Unsigned Quadword VX-form
vmul10euq : Vector Multiply-by-10 Extended Unsigned Quadword VX-form
vmul10cuq : Vector Multiply-by-10 & write Carry Unsigned Quadword VX-form
vmul10ecuq: Vector Multiply-by-10 Extended & write Carry Unsigned Quadword VX-form

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
[ Add GEN_VXFORM_DUAL_EXT with invalid bit mask ]
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target-ppc/translate/vmx-impl.inc.c | 72 +++++++++++++++++++++++++++++++++++++
 target-ppc/translate/vmx-ops.inc.c  |  8 ++---
 2 files changed, 76 insertions(+), 4 deletions(-)

diff --git a/target-ppc/translate/vmx-impl.inc.c b/target-ppc/translate/vmx-impl.inc.c
index 563f101..fc612d9 100644
--- a/target-ppc/translate/vmx-impl.inc.c
+++ b/target-ppc/translate/vmx-impl.inc.c
@@ -182,6 +182,52 @@ static void gen_mtvscr(DisasContext *ctx)
     tcg_temp_free_ptr(p);
 }
 
+#define GEN_VX_VMUL10(name, add_cin, ret_carry)                         \
+static void glue(gen_, name)(DisasContext *ctx)                         \
+{                                                                       \
+    TCGv_i64 t0 = tcg_temp_new_i64();                                   \
+    TCGv_i64 t1 = tcg_temp_new_i64();                                   \
+    TCGv_i64 t2 = tcg_temp_new_i64();                                   \
+    TCGv_i64 ten, z;                                                    \
+                                                                        \
+    if (unlikely(!ctx->altivec_enabled)) {                              \
+        gen_exception(ctx, POWERPC_EXCP_VPU);                           \
+        return;                                                         \
+    }                                                                   \
+                                                                        \
+    ten = tcg_const_i64(10);                                            \
+    z = tcg_const_i64(0);                                               \
+                                                                        \
+    if (add_cin) {                                                      \
+        tcg_gen_mulu2_i64(t0, t1, cpu_avrl[rA(ctx->opcode)], ten);      \
+        tcg_gen_andi_i64(t2, cpu_avrl[rB(ctx->opcode)], 0xF);           \
+        tcg_gen_add2_i64(cpu_avrl[rD(ctx->opcode)], t2, t0, t1, t2, z); \
+    } else {                                                            \
+        tcg_gen_mulu2_i64(cpu_avrl[rD(ctx->opcode)], t2,                \
+                          cpu_avrl[rA(ctx->opcode)], ten);              \
+    }                                                                   \
+                                                                        \
+    if (ret_carry) {                                                    \
+        tcg_gen_mulu2_i64(t0, t1, cpu_avrh[rA(ctx->opcode)], ten);      \
+        tcg_gen_add2_i64(t0, cpu_avrl[rD(ctx->opcode)], t0, t1, t2, z); \
+        tcg_gen_movi_i64(cpu_avrh[rD(ctx->opcode)], 0);                 \
+    } else {                                                            \
+        tcg_gen_mul_i64(t0, cpu_avrh[rA(ctx->opcode)], ten);            \
+        tcg_gen_add_i64(cpu_avrh[rD(ctx->opcode)], t0, t2);             \
+    }                                                                   \
+                                                                        \
+    tcg_temp_free_i64(t0);                                              \
+    tcg_temp_free_i64(t1);                                              \
+    tcg_temp_free_i64(t2);                                              \
+    tcg_temp_free_i64(ten);                                             \
+    tcg_temp_free_i64(z);                                               \
+}                                                                       \
+
+GEN_VX_VMUL10(vmul10uq, 0, 0);
+GEN_VX_VMUL10(vmul10euq, 1, 0);
+GEN_VX_VMUL10(vmul10cuq, 0, 1);
+GEN_VX_VMUL10(vmul10ecuq, 1, 1);
+
 /* Logical operations */
 #define GEN_VX_LOGICAL(name, tcg_op, opc2, opc3)                        \
 static void glue(gen_, name)(DisasContext *ctx)                                 \
@@ -276,8 +322,30 @@ static void glue(gen_, name0##_##name1)(DisasContext *ctx)             \
     }                                                                  \
 }
 
+/* Adds support to provide invalid mask */
+#define GEN_VXFORM_DUAL_EXT(name0, flg0, flg2_0, inval0,                \
+                            name1, flg1, flg2_1, inval1)                \
+static void glue(gen_, name0##_##name1)(DisasContext *ctx)              \
+{                                                                       \
+    if ((Rc(ctx->opcode) == 0) &&                                       \
+        ((ctx->insns_flags & flg0) || (ctx->insns_flags2 & flg2_0)) &&  \
+        !(ctx->opcode & inval0)) {                                      \
+        gen_##name0(ctx);                                               \
+    } else if ((Rc(ctx->opcode) == 1) &&                                \
+               ((ctx->insns_flags & flg1) || (ctx->insns_flags2 & flg2_1)) && \
+               !(ctx->opcode & inval1)) {                               \
+        gen_##name1(ctx);                                               \
+    } else {                                                            \
+        gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);             \
+    }                                                                   \
+}
+
 GEN_VXFORM(vaddubm, 0, 0);
+GEN_VXFORM_DUAL_EXT(vaddubm, PPC_ALTIVEC, PPC_NONE, 0,       \
+                    vmul10cuq, PPC_NONE, PPC2_ISA300, 0x0000F800)
 GEN_VXFORM(vadduhm, 0, 1);
+GEN_VXFORM_DUAL(vadduhm, PPC_ALTIVEC, PPC_NONE,  \
+                vmul10ecuq, PPC_NONE, PPC2_ISA300)
 GEN_VXFORM(vadduwm, 0, 2);
 GEN_VXFORM(vaddudm, 0, 3);
 GEN_VXFORM(vsububm, 0, 16);
@@ -390,7 +458,11 @@ GEN_VXFORM(vsro, 6, 17);
 GEN_VXFORM(vaddcuw, 0, 6);
 GEN_VXFORM(vsubcuw, 0, 22);
 GEN_VXFORM_ENV(vaddubs, 0, 8);
+GEN_VXFORM_DUAL_EXT(vaddubs, PPC_ALTIVEC, PPC_NONE, 0,       \
+                    vmul10uq, PPC_NONE, PPC2_ISA300, 0x0000F800)
 GEN_VXFORM_ENV(vadduhs, 0, 9);
+GEN_VXFORM_DUAL(vadduhs, PPC_ALTIVEC, PPC_NONE, \
+                vmul10euq, PPC_NONE, PPC2_ISA300)
 GEN_VXFORM_ENV(vadduws, 0, 10);
 GEN_VXFORM_ENV(vaddsbs, 0, 12);
 GEN_VXFORM_ENV(vaddshs, 0, 13);
diff --git a/target-ppc/translate/vmx-ops.inc.c b/target-ppc/translate/vmx-ops.inc.c
index ab64ab2..cc7ed7e 100644
--- a/target-ppc/translate/vmx-ops.inc.c
+++ b/target-ppc/translate/vmx-ops.inc.c
@@ -55,8 +55,8 @@ GEN_HANDLER_E(name0##_##name1, 0x4, opc2, opc3, 0x00000000, type0, type1)
 GEN_HANDLER_E(name0##_##name1, 0x4, opc2, opc3, 0x00000000, tp0, tp1), \
 GEN_HANDLER_E(name0##_##name1, 0x4, opc2, (opc3 | 0x10), 0x00000000, tp0, tp1),
 
-GEN_VXFORM(vaddubm, 0, 0),
-GEN_VXFORM(vadduhm, 0, 1),
+GEN_VXFORM_DUAL(vaddubm, vmul10cuq, 0, 0, PPC_ALTIVEC, PPC_NONE),
+GEN_VXFORM_DUAL(vadduhm, vmul10ecuq, 0, 1, PPC_ALTIVEC, PPC_NONE),
 GEN_VXFORM(vadduwm, 0, 2),
 GEN_VXFORM_207(vaddudm, 0, 3),
 GEN_VXFORM_DUAL(vsububm, bcdadd, 0, 16, PPC_ALTIVEC, PPC_NONE),
@@ -123,8 +123,8 @@ GEN_VXFORM(vslo, 6, 16),
 GEN_VXFORM(vsro, 6, 17),
 GEN_VXFORM(vaddcuw, 0, 6),
 GEN_VXFORM(vsubcuw, 0, 22),
-GEN_VXFORM(vaddubs, 0, 8),
-GEN_VXFORM(vadduhs, 0, 9),
+GEN_VXFORM_DUAL(vaddubs, vmul10uq, 0, 8, PPC_ALTIVEC, PPC_NONE),
+GEN_VXFORM_DUAL(vadduhs, vmul10euq, 0, 9, PPC_ALTIVEC, PPC_NONE),
 GEN_VXFORM(vadduws, 0, 10),
 GEN_VXFORM(vaddsbs, 0, 12),
 GEN_VXFORM(vaddshs, 0, 13),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 3/4] target-ppc: add vrldnmi and vrlwmi instructions
  2016-10-24  9:14 [Qemu-devel] [PATCH 0/4] POWER9 TCG enablements - part7 Nikunj A Dadhania
  2016-10-24  9:14 ` [Qemu-devel] [PATCH 1/4] target-ppc: add xscmp[eq, gt, ge, ne]dp instructions Nikunj A Dadhania
  2016-10-24  9:14 ` [Qemu-devel] [PATCH 2/4] target-ppc: add vmul10[u, eu, cu, ecu]q instructions Nikunj A Dadhania
@ 2016-10-24  9:14 ` Nikunj A Dadhania
  2016-10-24 16:16   ` Richard Henderson
  2016-10-24  9:15 ` [Qemu-devel] [PATCH 4/4] target-ppc: add vrldnm and vrlwnm instructions Nikunj A Dadhania
  3 siblings, 1 reply; 14+ messages in thread
From: Nikunj A Dadhania @ 2016-10-24  9:14 UTC (permalink / raw)
  To: qemu-ppc, david, rth
  Cc: qemu-devel, nikunj, bharata, hegdevasant, sandipandas1990, ego

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

vrldmi: Vector Rotate Left Dword then Mask Insert
vrlwmi: Vector Rotate Left Word then Mask Insert

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 disas/ppc.c                         |  2 +
 target-ppc/helper.h                 |  2 +
 target-ppc/int_helper.c             | 88 +++++++++++++++++++++++++++++++++++++
 target-ppc/translate/vmx-impl.inc.c |  6 +++
 target-ppc/translate/vmx-ops.inc.c  |  4 +-
 5 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/disas/ppc.c b/disas/ppc.c
index 052cebe..32f0d8d 100644
--- a/disas/ppc.c
+++ b/disas/ppc.c
@@ -2286,6 +2286,8 @@ const struct powerpc_opcode powerpc_opcodes[] = {
 { "vrlh",      VX(4,   68), VX_MASK,	PPCVEC,		{ VD, VA, VB } },
 { "vrlw",      VX(4,  132), VX_MASK,	PPCVEC,		{ VD, VA, VB } },
 { "vrsqrtefp", VX(4,  330), VX_MASK,	PPCVEC,		{ VD, VB } },
+{ "vrldmi",    VX(4,  197), VX_MASK,    PPCVEC,         { VD, VA, VB } },
+{ "vrlwmi",    VX(4,  133), VX_MASK,    PPCVEC,         { VD, VA, VB} },
 { "vsel",      VXA(4,  42), VXA_MASK,	PPCVEC,		{ VD, VA, VB, VC } },
 { "vsl",       VX(4,  452), VX_MASK,	PPCVEC,		{ VD, VA, VB } },
 { "vslb",      VX(4,  260), VX_MASK,	PPCVEC,		{ VD, VA, VB } },
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 0337292..9fb8f0d 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -325,6 +325,8 @@ DEF_HELPER_4(vmaxfp, void, env, avr, avr, avr)
 DEF_HELPER_4(vminfp, void, env, avr, avr, avr)
 DEF_HELPER_3(vrefp, void, env, avr, avr)
 DEF_HELPER_3(vrsqrtefp, void, env, avr, avr)
+DEF_HELPER_3(vrlwmi, void, avr, avr, avr)
+DEF_HELPER_3(vrldmi, void, avr, avr, avr)
 DEF_HELPER_5(vmaddfp, void, env, avr, avr, avr, avr)
 DEF_HELPER_5(vnmsubfp, void, env, avr, avr, avr, avr)
 DEF_HELPER_3(vexptefp, void, env, avr, avr)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index dca4798..2273872 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -1717,6 +1717,94 @@ void helper_vrsqrtefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
     }
 }
 
+#define EXTRACT_BITS(size)                                              \
+static inline uint##size##_t extract_bits_u##size(uint##size##_t reg,   \
+                                                  uint##size##_t start, \
+                                                  uint##size##_t end)   \
+{                                                                       \
+    uint##size##_t nr_mask_bits = end - start + 1;                      \
+    uint##size##_t val = 1;                                             \
+    uint##size##_t mask = (val << nr_mask_bits) - 1;                    \
+    uint##size##_t shifted_reg = reg  >> ((size - 1)  - end);           \
+    return shifted_reg & mask;                                          \
+}
+
+EXTRACT_BITS(64);
+EXTRACT_BITS(32);
+
+#define MASK(size, max_val)                                     \
+static inline uint##size##_t mask_u##size(uint##size##_t start, \
+                                uint##size##_t end)             \
+{                                                               \
+    uint##size##_t ret, max_bit = size - 1;                     \
+                                                                \
+    if (likely(start == 0)) {                                   \
+        ret = max_val << (max_bit - end);                       \
+    } else if (likely(end == max_bit)) {                        \
+        ret = max_val >> start;                                 \
+    } else {                                                    \
+        ret = (((uint##size##_t)(-1ULL)) >> (start)) ^          \
+            (((uint##size##_t)(-1ULL) >> (end)) >> 1);          \
+        if (unlikely(start > end)) {                            \
+            return ~ret;                                        \
+        }                                                       \
+    }                                                           \
+                                                                \
+    return ret;                                                 \
+}
+
+MASK(32, UINT32_MAX);
+MASK(64, UINT64_MAX);
+
+#define LEFT_ROTATE(size)                                            \
+static inline uint##size##_t left_rotate_u##size(uint##size##_t val, \
+                                              uint##size##_t shift)  \
+{                                                                    \
+    if (!shift) {                                                    \
+        return val;                                                  \
+    }                                                                \
+                                                                     \
+    uint##size##_t left_val = extract_bits_u##size(val, 0, shift - 1); \
+    uint##size##_t right_val = val & mask_u##size(shift, size - 1);    \
+                                                                     \
+    return right_val << shift | left_val;                            \
+}
+
+LEFT_ROTATE(32);
+LEFT_ROTATE(64);
+
+#define VRLMI(name, size, element,                                  \
+                     begin_first, begin_last,                       \
+                     end_first, end_last,                           \
+                     shift_first, shift_last)                       \
+void helper_##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)        \
+{                                                                   \
+    int i;                                                          \
+    for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
+        uint##size##_t src1 = a->element[i];                        \
+        uint##size##_t src2 = b->element[i];                        \
+        uint##size##_t src3 = r->element[i];                        \
+        uint##size##_t begin, end, shift, mask, rot_val;            \
+                                                                    \
+        begin = extract_bits_u##size(src2, begin_first, begin_last);\
+        end = extract_bits_u##size(src2, end_first, end_last);      \
+        shift = extract_bits_u##size(src2, shift_first, shift_last);\
+        rot_val = left_rotate_u##size(src1, shift);                 \
+        mask = mask_u##size(begin, end);                            \
+        r->element[i] = (rot_val & mask) | (src3 & ~mask);          \
+    }                                                               \
+}
+
+VRLMI(vrldmi, 64, u64,
+             42, 47,  /* begin_first, begin_last */
+             50, 55,  /* end_first, end_last */
+             58, 63); /* shift_first, shift_last */
+
+VRLMI(vrlwmi, 32, u32,
+             11, 15,  /* begin_first, begin_last */
+             19, 23,  /* end_first, end_last */
+             27, 31); /* shift_first, shift_last */
+
 void helper_vsel(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b,
                  ppc_avr_t *c)
 {
diff --git a/target-ppc/translate/vmx-impl.inc.c b/target-ppc/translate/vmx-impl.inc.c
index fc612d9..fdfbd6a 100644
--- a/target-ppc/translate/vmx-impl.inc.c
+++ b/target-ppc/translate/vmx-impl.inc.c
@@ -488,7 +488,13 @@ GEN_VXFORM_DUAL(vsubeuqm, PPC_NONE, PPC2_ALTIVEC_207, \
 GEN_VXFORM(vrlb, 2, 0);
 GEN_VXFORM(vrlh, 2, 1);
 GEN_VXFORM(vrlw, 2, 2);
+GEN_VXFORM(vrlwmi, 2, 2);
+GEN_VXFORM_DUAL(vrlw, PPC_ALTIVEC, PPC_NONE, \
+                vrlwmi, PPC_NONE, PPC2_ISA300)
 GEN_VXFORM(vrld, 2, 3);
+GEN_VXFORM(vrldmi, 2, 3);
+GEN_VXFORM_DUAL(vrld, PPC_NONE, PPC2_ALTIVEC_207, \
+                vrldmi, PPC_NONE, PPC2_ISA300)
 GEN_VXFORM(vsl, 2, 7);
 GEN_VXFORM(vsr, 2, 11);
 GEN_VXFORM_ENV(vpkuhum, 7, 0);
diff --git a/target-ppc/translate/vmx-ops.inc.c b/target-ppc/translate/vmx-ops.inc.c
index cc7ed7e..76b3593 100644
--- a/target-ppc/translate/vmx-ops.inc.c
+++ b/target-ppc/translate/vmx-ops.inc.c
@@ -143,8 +143,8 @@ GEN_VXFORM_207(vsubcuq, 0, 21),
 GEN_VXFORM_DUAL(vsubeuqm, vsubecuq, 31, 0xFF, PPC_NONE, PPC2_ALTIVEC_207),
 GEN_VXFORM(vrlb, 2, 0),
 GEN_VXFORM(vrlh, 2, 1),
-GEN_VXFORM(vrlw, 2, 2),
-GEN_VXFORM_207(vrld, 2, 3),
+GEN_VXFORM_DUAL(vrlw, vrlwmi, 2, 2, PPC_ALTIVEC, PPC_NONE),
+GEN_VXFORM_DUAL(vrld, vrldmi, 2, 3, PPC_NONE, PPC2_ALTIVEC_207),
 GEN_VXFORM(vsl, 2, 7),
 GEN_VXFORM(vsr, 2, 11),
 GEN_VXFORM(vpkuhum, 7, 0),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 4/4] target-ppc: add vrldnm and vrlwnm instructions
  2016-10-24  9:14 [Qemu-devel] [PATCH 0/4] POWER9 TCG enablements - part7 Nikunj A Dadhania
                   ` (2 preceding siblings ...)
  2016-10-24  9:14 ` [Qemu-devel] [PATCH 3/4] target-ppc: add vrldnmi and vrlwmi instructions Nikunj A Dadhania
@ 2016-10-24  9:15 ` Nikunj A Dadhania
  3 siblings, 0 replies; 14+ messages in thread
From: Nikunj A Dadhania @ 2016-10-24  9:15 UTC (permalink / raw)
  To: qemu-ppc, david, rth
  Cc: qemu-devel, nikunj, bharata, hegdevasant, sandipandas1990, ego

From: Bharata B Rao <bharata@linux.vnet.ibm.com>

vrldnm: Vector Rotate Left Doubleword then AND with Mask
vrlwnm: Vector Rotate Left Word then AND with Mask

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 disas/ppc.c                         |  2 ++
 target-ppc/helper.h                 |  2 ++
 target-ppc/int_helper.c             | 26 ++++++++++++++++++++++----
 target-ppc/translate/vmx-impl.inc.c |  6 ++++++
 target-ppc/translate/vmx-ops.inc.c  |  4 ++--
 5 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/disas/ppc.c b/disas/ppc.c
index 32f0d8d..bd05623 100644
--- a/disas/ppc.c
+++ b/disas/ppc.c
@@ -2287,7 +2287,9 @@ const struct powerpc_opcode powerpc_opcodes[] = {
 { "vrlw",      VX(4,  132), VX_MASK,	PPCVEC,		{ VD, VA, VB } },
 { "vrsqrtefp", VX(4,  330), VX_MASK,	PPCVEC,		{ VD, VB } },
 { "vrldmi",    VX(4,  197), VX_MASK,    PPCVEC,         { VD, VA, VB } },
+{ "vrldnm",    VX(4,  453), VX_MASK,    PPCVEC,         { VD, VA, VB } },
 { "vrlwmi",    VX(4,  133), VX_MASK,    PPCVEC,         { VD, VA, VB} },
+{ "vrlwnm",    VX(4,  389), VX_MASK,    PPCVEC,         { VD, VA, VB } },
 { "vsel",      VXA(4,  42), VXA_MASK,	PPCVEC,		{ VD, VA, VB, VC } },
 { "vsl",       VX(4,  452), VX_MASK,	PPCVEC,		{ VD, VA, VB } },
 { "vslb",      VX(4,  260), VX_MASK,	PPCVEC,		{ VD, VA, VB } },
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 9fb8f0d..d6ee26e 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -327,6 +327,8 @@ DEF_HELPER_3(vrefp, void, env, avr, avr)
 DEF_HELPER_3(vrsqrtefp, void, env, avr, avr)
 DEF_HELPER_3(vrlwmi, void, avr, avr, avr)
 DEF_HELPER_3(vrldmi, void, avr, avr, avr)
+DEF_HELPER_3(vrldnm, void, avr, avr, avr)
+DEF_HELPER_3(vrlwnm, void, avr, avr, avr)
 DEF_HELPER_5(vmaddfp, void, env, avr, avr, avr, avr)
 DEF_HELPER_5(vnmsubfp, void, env, avr, avr, avr, avr)
 DEF_HELPER_3(vexptefp, void, env, avr, avr)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index 2273872..29d7afc 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -1776,7 +1776,7 @@ LEFT_ROTATE(64);
 #define VRLMI(name, size, element,                                  \
                      begin_first, begin_last,                       \
                      end_first, end_last,                           \
-                     shift_first, shift_last)                       \
+                     shift_first, shift_last, insert)               \
 void helper_##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)        \
 {                                                                   \
     int i;                                                          \
@@ -1791,19 +1791,37 @@ void helper_##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)        \
         shift = extract_bits_u##size(src2, shift_first, shift_last);\
         rot_val = left_rotate_u##size(src1, shift);                 \
         mask = mask_u##size(begin, end);                            \
-        r->element[i] = (rot_val & mask) | (src3 & ~mask);          \
+        if (insert) {                                               \
+            r->element[i] = (rot_val & mask) | (src3 & ~mask);      \
+        } else {                                                    \
+            r->element[i] = (rot_val & mask);                       \
+        }                                                           \
     }                                                               \
 }
 
 VRLMI(vrldmi, 64, u64,
              42, 47,  /* begin_first, begin_last */
              50, 55,  /* end_first, end_last */
-             58, 63); /* shift_first, shift_last */
+             58, 63,  /* shift_first, shift_last */
+             1);      /* mask and insert */
 
 VRLMI(vrlwmi, 32, u32,
              11, 15,  /* begin_first, begin_last */
              19, 23,  /* end_first, end_last */
-             27, 31); /* shift_first, shift_last */
+             27, 31,  /* shift_first, shift_last */
+             1);      /* mask and insert */
+
+VRLMI(vrldnm, 64, u64,
+             42, 47,  /* begin_first, begin_last */
+             50, 55,  /* end_first, end_last */
+             58, 63,  /* shift_first, shift_last */
+             0);      /* mask only */
+
+VRLMI(vrlwnm, 32, u32,
+             11, 15,  /* begin_first, begin_last */
+             19, 23,  /* end_first, end_last */
+             27, 31,  /* shift_first, shift_last */
+             0);      /* mask only */
 
 void helper_vsel(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b,
                  ppc_avr_t *c)
diff --git a/target-ppc/translate/vmx-impl.inc.c b/target-ppc/translate/vmx-impl.inc.c
index fdfbd6a..500c43f 100644
--- a/target-ppc/translate/vmx-impl.inc.c
+++ b/target-ppc/translate/vmx-impl.inc.c
@@ -442,6 +442,9 @@ GEN_VXFORM(vmulesw, 4, 14);
 GEN_VXFORM(vslb, 2, 4);
 GEN_VXFORM(vslh, 2, 5);
 GEN_VXFORM(vslw, 2, 6);
+GEN_VXFORM(vrlwnm, 2, 6);
+GEN_VXFORM_DUAL(vslw, PPC_ALTIVEC, PPC_NONE, \
+                vrlwnm, PPC_NONE, PPC2_ISA300)
 GEN_VXFORM(vsld, 2, 23);
 GEN_VXFORM(vsrb, 2, 8);
 GEN_VXFORM(vsrh, 2, 9);
@@ -496,6 +499,9 @@ GEN_VXFORM(vrldmi, 2, 3);
 GEN_VXFORM_DUAL(vrld, PPC_NONE, PPC2_ALTIVEC_207, \
                 vrldmi, PPC_NONE, PPC2_ISA300)
 GEN_VXFORM(vsl, 2, 7);
+GEN_VXFORM(vrldnm, 2, 7);
+GEN_VXFORM_DUAL(vsl, PPC_ALTIVEC, PPC_NONE, \
+                vrldnm, PPC_NONE, PPC2_ISA300)
 GEN_VXFORM(vsr, 2, 11);
 GEN_VXFORM_ENV(vpkuhum, 7, 0);
 GEN_VXFORM_ENV(vpkuwum, 7, 1);
diff --git a/target-ppc/translate/vmx-ops.inc.c b/target-ppc/translate/vmx-ops.inc.c
index 76b3593..a5ad4d4 100644
--- a/target-ppc/translate/vmx-ops.inc.c
+++ b/target-ppc/translate/vmx-ops.inc.c
@@ -107,7 +107,7 @@ GEN_VXFORM(vmulesh, 4, 13),
 GEN_VXFORM_207(vmulesw, 4, 14),
 GEN_VXFORM(vslb, 2, 4),
 GEN_VXFORM(vslh, 2, 5),
-GEN_VXFORM(vslw, 2, 6),
+GEN_VXFORM_DUAL(vslw, vrlwnm, 2, 6, PPC_ALTIVEC, PPC_NONE),
 GEN_VXFORM_207(vsld, 2, 23),
 GEN_VXFORM(vsrb, 2, 8),
 GEN_VXFORM(vsrh, 2, 9),
@@ -145,7 +145,7 @@ GEN_VXFORM(vrlb, 2, 0),
 GEN_VXFORM(vrlh, 2, 1),
 GEN_VXFORM_DUAL(vrlw, vrlwmi, 2, 2, PPC_ALTIVEC, PPC_NONE),
 GEN_VXFORM_DUAL(vrld, vrldmi, 2, 3, PPC_NONE, PPC2_ALTIVEC_207),
-GEN_VXFORM(vsl, 2, 7),
+GEN_VXFORM_DUAL(vsl, vrldnm, 2, 7, PPC_ALTIVEC, PPC_NONE),
 GEN_VXFORM(vsr, 2, 11),
 GEN_VXFORM(vpkuhum, 7, 0),
 GEN_VXFORM(vpkuwum, 7, 1),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] target-ppc: add vmul10[u, eu, cu, ecu]q instructions
  2016-10-24  9:14 ` [Qemu-devel] [PATCH 2/4] target-ppc: add vmul10[u, eu, cu, ecu]q instructions Nikunj A Dadhania
@ 2016-10-24 16:04   ` Richard Henderson
  2016-10-25  2:38     ` David Gibson
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2016-10-24 16:04 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-ppc, david
  Cc: qemu-devel, bharata, hegdevasant, sandipandas1990, ego

On 10/24/2016 02:14 AM, Nikunj A Dadhania wrote:
> From: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
> 
> vmul10uq  : Vector Multiply-by-10 Unsigned Quadword VX-form
> vmul10euq : Vector Multiply-by-10 Extended Unsigned Quadword VX-form
> vmul10cuq : Vector Multiply-by-10 & write Carry Unsigned Quadword VX-form
> vmul10ecuq: Vector Multiply-by-10 Extended & write Carry Unsigned Quadword VX-form
> 
> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
> [ Add GEN_VXFORM_DUAL_EXT with invalid bit mask ]
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  target-ppc/translate/vmx-impl.inc.c | 72 +++++++++++++++++++++++++++++++++++++
>  target-ppc/translate/vmx-ops.inc.c  |  8 ++---
>  2 files changed, 76 insertions(+), 4 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] target-ppc: add vrldnmi and vrlwmi instructions
  2016-10-24  9:14 ` [Qemu-devel] [PATCH 3/4] target-ppc: add vrldnmi and vrlwmi instructions Nikunj A Dadhania
@ 2016-10-24 16:16   ` Richard Henderson
  2016-10-25  4:08     ` Nikunj A Dadhania
  2016-10-25  6:02     ` Nikunj A Dadhania
  0 siblings, 2 replies; 14+ messages in thread
From: Richard Henderson @ 2016-10-24 16:16 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-ppc, david
  Cc: qemu-devel, bharata, hegdevasant, sandipandas1990, ego

On 10/24/2016 02:14 AM, Nikunj A Dadhania wrote:
> +#define EXTRACT_BITS(size)                                              \
> +static inline uint##size##_t extract_bits_u##size(uint##size##_t reg,   \
> +                                                  uint##size##_t start, \
> +                                                  uint##size##_t end)   \
> +{                                                                       \
> +    uint##size##_t nr_mask_bits = end - start + 1;                      \
> +    uint##size##_t val = 1;                                             \
> +    uint##size##_t mask = (val << nr_mask_bits) - 1;                    \
> +    uint##size##_t shifted_reg = reg  >> ((size - 1)  - end);           \
> +    return shifted_reg & mask;                                          \
> +}
> +
> +EXTRACT_BITS(64);
> +EXTRACT_BITS(32);

We already have extract32 and extract64, which you're (nearly) duplicating.

> +#define MASK(size, max_val)                                     \
> +static inline uint##size##_t mask_u##size(uint##size##_t start, \
> +                                uint##size##_t end)             \
> +{                                                               \
> +    uint##size##_t ret, max_bit = size - 1;                     \
> +                                                                \
> +    if (likely(start == 0)) {                                   \
> +        ret = max_val << (max_bit - end);                       \
> +    } else if (likely(end == max_bit)) {                        \
> +        ret = max_val >> start;                                 \
> +    } else {                                                    \
> +        ret = (((uint##size##_t)(-1ULL)) >> (start)) ^          \
> +            (((uint##size##_t)(-1ULL) >> (end)) >> 1);          \
> +        if (unlikely(start > end)) {                            \
> +            return ~ret;                                        \
> +        }                                                       \
> +    }                                                           \

Why the two likely cases?  Doesn't the third case cover them?

Also, (uint##size##_t)(-1ULL) should be just (uint##size##_t)-1.
Please remove all the other unnecessarry parenthesis too.

Hmph.  I see you've copied all this silliness from translate.c, so...
nevermind, I guess.  Let's leave this a near-exact copy.

> +#define LEFT_ROTATE(size)                                            \
> +static inline uint##size##_t left_rotate_u##size(uint##size##_t val, \
> +                                              uint##size##_t shift)  \
> +{                                                                    \
> +    if (!shift) {                                                    \
> +        return val;                                                  \
> +    }                                                                \
> +                                                                     \
> +    uint##size##_t left_val = extract_bits_u##size(val, 0, shift - 1); \
> +    uint##size##_t right_val = val & mask_u##size(shift, size - 1);    \
> +                                                                     \
> +    return right_val << shift | left_val;                            \
> +}
> +
> +LEFT_ROTATE(32);
> +LEFT_ROTATE(64);

We already have rol32 and rol64.

Which I see are broken for shift == 0.  Let's please fix that, as a separate
patch, like so:

  return (word << shift) | (word >> ((32 - shift) & 31));


r~

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] target-ppc: add vmul10[u, eu, cu, ecu]q instructions
  2016-10-24 16:04   ` Richard Henderson
@ 2016-10-25  2:38     ` David Gibson
  0 siblings, 0 replies; 14+ messages in thread
From: David Gibson @ 2016-10-25  2:38 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Nikunj A Dadhania, qemu-ppc, qemu-devel, bharata, hegdevasant,
	sandipandas1990, ego

[-- Attachment #1: Type: text/plain, Size: 1164 bytes --]

On Mon, Oct 24, 2016 at 09:04:13AM -0700, Richard Henderson wrote:
> On 10/24/2016 02:14 AM, Nikunj A Dadhania wrote:
> > From: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
> > 
> > vmul10uq  : Vector Multiply-by-10 Unsigned Quadword VX-form
> > vmul10euq : Vector Multiply-by-10 Extended Unsigned Quadword VX-form
> > vmul10cuq : Vector Multiply-by-10 & write Carry Unsigned Quadword VX-form
> > vmul10ecuq: Vector Multiply-by-10 Extended & write Carry Unsigned Quadword VX-form
> > 
> > Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
> > [ Add GEN_VXFORM_DUAL_EXT with invalid bit mask ]
> > Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> > ---
> >  target-ppc/translate/vmx-impl.inc.c | 72 +++++++++++++++++++++++++++++++++++++
> >  target-ppc/translate/vmx-ops.inc.c  |  8 ++---
> >  2 files changed, 76 insertions(+), 4 deletions(-)
> 
> Reviewed-by: Richard Henderson <rth@twiddle.net>

Applied to ppc-for-2.8.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] target-ppc: add vrldnmi and vrlwmi instructions
  2016-10-24 16:16   ` Richard Henderson
@ 2016-10-25  4:08     ` Nikunj A Dadhania
  2016-10-25  4:21       ` Richard Henderson
  2016-10-25  6:02     ` Nikunj A Dadhania
  1 sibling, 1 reply; 14+ messages in thread
From: Nikunj A Dadhania @ 2016-10-25  4:08 UTC (permalink / raw)
  To: Richard Henderson, qemu-ppc, david
  Cc: qemu-devel, bharata, hegdevasant, sandipandas1990, ego

Richard Henderson <rth@twiddle.net> writes:

> On 10/24/2016 02:14 AM, Nikunj A Dadhania wrote:
>> +#define EXTRACT_BITS(size)                                              \
>> +static inline uint##size##_t extract_bits_u##size(uint##size##_t reg,   \
>> +                                                  uint##size##_t start, \
>> +                                                  uint##size##_t end)   \
>> +{                                                                       \
>> +    uint##size##_t nr_mask_bits = end - start + 1;                      \
>> +    uint##size##_t val = 1;                                             \
>> +    uint##size##_t mask = (val << nr_mask_bits) - 1;                    \
>> +    uint##size##_t shifted_reg = reg  >> ((size - 1)  - end);           \
>> +    return shifted_reg & mask;                                          \
>> +}
>> +
>> +EXTRACT_BITS(64);
>> +EXTRACT_BITS(32);
>
> We already have extract32 and extract64, which you're (nearly) duplicating.

The bit position number notation is different, because of this using the
above routine, MSB=0 and LSB=63.

While the below assumes: MSB=63 and LSB=0

static inline uint64_t extract64(uint64_t value, int start, int length)
{
    assert(start >= 0 && length > 0 && length <= 64 - start);
    return (value >> start) & (~0ULL >> (64 - length));
}

Let me know if I am missing something here.

>> +#define MASK(size, max_val)                                     \
>> +static inline uint##size##_t mask_u##size(uint##size##_t start, \
>> +                                uint##size##_t end)             \
>> +{                                                               \
>> +    uint##size##_t ret, max_bit = size - 1;                     \
>> +                                                                \
>> +    if (likely(start == 0)) {                                   \
>> +        ret = max_val << (max_bit - end);                       \
>> +    } else if (likely(end == max_bit)) {                        \
>> +        ret = max_val >> start;                                 \
>> +    } else {                                                    \
>> +        ret = (((uint##size##_t)(-1ULL)) >> (start)) ^          \
>> +            (((uint##size##_t)(-1ULL) >> (end)) >> 1);          \
>> +        if (unlikely(start > end)) {                            \
>> +            return ~ret;                                        \
>> +        }                                                       \
>> +    }                                                           \
>
> Why the two likely cases?  Doesn't the third case cover them?
>
> Also, (uint##size##_t)(-1ULL) should be just (uint##size##_t)-1.
> Please remove all the other unnecessarry parenthesis too.
>
> Hmph.  I see you've copied all this silliness from translate.c, so...
> nevermind, I guess.  Let's leave this a near-exact copy.

Ok.

>> +#define LEFT_ROTATE(size)                                            \
>> +static inline uint##size##_t left_rotate_u##size(uint##size##_t val, \
>> +                                              uint##size##_t shift)  \
>> +{                                                                    \
>> +    if (!shift) {                                                    \
>> +        return val;                                                  \
>> +    }                                                                \
>> +                                                                     \
>> +    uint##size##_t left_val = extract_bits_u##size(val, 0, shift - 1); \
>> +    uint##size##_t right_val = val & mask_u##size(shift, size - 1);    \
>> +                                                                     \
>> +    return right_val << shift | left_val;                            \
>> +}
>> +
>> +LEFT_ROTATE(32);
>> +LEFT_ROTATE(64);
>
> We already have rol32 and rol64.
>
> Which I see are broken for shift == 0.  Let's please fix that, as a separate
> patch, like so:
>
>   return (word << shift) | (word >> ((32 - shift) & 31));

Sure.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] target-ppc: add vrldnmi and vrlwmi instructions
  2016-10-25  4:08     ` Nikunj A Dadhania
@ 2016-10-25  4:21       ` Richard Henderson
  2016-10-25  4:44         ` Nikunj A Dadhania
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2016-10-25  4:21 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-ppc, david
  Cc: qemu-devel, bharata, hegdevasant, sandipandas1990, ego

On 10/24/2016 09:08 PM, Nikunj A Dadhania wrote:
> Richard Henderson <rth@twiddle.net> writes:
>
>> On 10/24/2016 02:14 AM, Nikunj A Dadhania wrote:
>>> +#define EXTRACT_BITS(size)                                              \
>>> +static inline uint##size##_t extract_bits_u##size(uint##size##_t reg,   \
>>> +                                                  uint##size##_t start, \
>>> +                                                  uint##size##_t end)   \
>>> +{                                                                       \
>>> +    uint##size##_t nr_mask_bits = end - start + 1;                      \
>>> +    uint##size##_t val = 1;                                             \
>>> +    uint##size##_t mask = (val << nr_mask_bits) - 1;                    \
>>> +    uint##size##_t shifted_reg = reg  >> ((size - 1)  - end);           \
>>> +    return shifted_reg & mask;                                          \
>>> +}
>>> +
>>> +EXTRACT_BITS(64);
>>> +EXTRACT_BITS(32);
>>
>> We already have extract32 and extract64, which you're (nearly) duplicating.
>
> The bit position number notation is different, because of this using the
> above routine, MSB=0 and LSB=63.
>
> While the below assumes: MSB=63 and LSB=0
>
> static inline uint64_t extract64(uint64_t value, int start, int length)
> {
>     assert(start >= 0 && length > 0 && length <= 64 - start);
>     return (value >> start) & (~0ULL >> (64 - length));
> }
>
> Let me know if I am missing something here.

Since the arguments to extract_bits_uN are completely under your control, via 
the arguments to VRLMI, this is a non-argument.  Just change them to 
little-endian position + length.

(And, after you do that conversion for vrldmi and vilwmi, you'll see why 
big-endian bit numbering is the spawn of the devil.  ;-)


r~

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] target-ppc: add vrldnmi and vrlwmi instructions
  2016-10-25  4:21       ` Richard Henderson
@ 2016-10-25  4:44         ` Nikunj A Dadhania
  0 siblings, 0 replies; 14+ messages in thread
From: Nikunj A Dadhania @ 2016-10-25  4:44 UTC (permalink / raw)
  To: Richard Henderson, qemu-ppc, david
  Cc: qemu-devel, bharata, hegdevasant, sandipandas1990, ego

Richard Henderson <rth@twiddle.net> writes:

> On 10/24/2016 09:08 PM, Nikunj A Dadhania wrote:
>> Richard Henderson <rth@twiddle.net> writes:
>>
>>> On 10/24/2016 02:14 AM, Nikunj A Dadhania wrote:
>>>> +#define EXTRACT_BITS(size)                                              \
>>>> +static inline uint##size##_t extract_bits_u##size(uint##size##_t reg,   \
>>>> +                                                  uint##size##_t start, \
>>>> +                                                  uint##size##_t end)   \
>>>> +{                                                                       \
>>>> +    uint##size##_t nr_mask_bits = end - start + 1;                      \
>>>> +    uint##size##_t val = 1;                                             \
>>>> +    uint##size##_t mask = (val << nr_mask_bits) - 1;                    \
>>>> +    uint##size##_t shifted_reg = reg  >> ((size - 1)  - end);           \
>>>> +    return shifted_reg & mask;                                          \
>>>> +}
>>>> +
>>>> +EXTRACT_BITS(64);
>>>> +EXTRACT_BITS(32);
>>>
>>> We already have extract32 and extract64, which you're (nearly) duplicating.
>>
>> The bit position number notation is different, because of this using the
>> above routine, MSB=0 and LSB=63.
>>
>> While the below assumes: MSB=63 and LSB=0
>>
>> static inline uint64_t extract64(uint64_t value, int start, int length)
>> {
>>     assert(start >= 0 && length > 0 && length <= 64 - start);
>>     return (value >> start) & (~0ULL >> (64 - length));
>> }
>>
>> Let me know if I am missing something here.
>
> Since the arguments to extract_bits_uN are completely under your control, via 
> the arguments to VRLMI, this is a non-argument.  Just change them to 
> little-endian position + length.

Sure, was already trying that, I have the changed version now:

#define VRLMI(name, size, element,                                  \
              begin_last,                                           \
              end_last,                                             \
              shift_last, num_bits, insert)                         \
void helper_##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)        \
{                                                                   \
    int i;                                                          \
    for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
        uint##size##_t src1 = a->element[i];                        \
        uint##size##_t src2 = b->element[i];                        \
        uint##size##_t src3 = r->element[i];                        \
        uint##size##_t begin, end, shift, mask, rot_val;            \
                                                                    \
        begin = extract##size(src2, size - begin_last - 1, num_bits);   \
        end = extract##size(src2, size - end_last - 1, num_bits);       \
        shift = extract##size(src2, size - shift_last - 1, num_bits);   \
        rot_val = rol##size(src1, shift);                               \
        mask = mask_u##size(begin, end);                            \
        if (insert) {                                               \
            r->element[i] = (rot_val & mask) | (src3 & ~mask);      \
        } else {                                                    \
            r->element[i] = (rot_val & mask);                       \
        }                                                           \
    }                                                               \
}

VRLMI(vrldmi, 64, u64,
      47,  /* begin_last */
      55,  /* end_last */
      63,  /* shift_last */
      6,   /* num_bits */
      1);  /* mask and insert */

>
> (And, after you do that conversion for vrldmi and vilwmi, you'll see why 
> big-endian bit numbering is the spawn of the devil.  ;-)

That bit numbering gives nightmares ;-)

Regards
Nikunj

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] target-ppc: add vrldnmi and vrlwmi instructions
  2016-10-24 16:16   ` Richard Henderson
  2016-10-25  4:08     ` Nikunj A Dadhania
@ 2016-10-25  6:02     ` Nikunj A Dadhania
  2016-10-25 16:33       ` Richard Henderson
  1 sibling, 1 reply; 14+ messages in thread
From: Nikunj A Dadhania @ 2016-10-25  6:02 UTC (permalink / raw)
  To: Richard Henderson, qemu-ppc, david
  Cc: qemu-devel, bharata, hegdevasant, sandipandas1990, ego

Richard Henderson <rth@twiddle.net> writes:


> We already have rol32 and rol64.
>
> Which I see are broken for shift == 0.

I tried with different shift (including 0) in a test program, and the
result is as expected:

0: ccddeeff

static inline unsigned int rol32(unsigned int word, unsigned int shift)
{
  return (word << shift) | (word >> (32 - shift));
}

void main(void)
{
  unsigned int value32 = 0xCCDDEEFF;

  for (int i = 0; i < 32; i++)
    printf("%d: %08x\n", i, rol32(value32, i));
}

> Let's please fix that, as a separate patch, like so:
>
>   return (word << shift) | (word >> ((32 - shift) & 31));

Doesn't seems to be necessary.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] target-ppc: add vrldnmi and vrlwmi instructions
  2016-10-25  6:02     ` Nikunj A Dadhania
@ 2016-10-25 16:33       ` Richard Henderson
  2016-10-26  4:44         ` Nikunj A Dadhania
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2016-10-25 16:33 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-ppc, david
  Cc: qemu-devel, bharata, hegdevasant, sandipandas1990, ego

On 10/24/2016 11:02 PM, Nikunj A Dadhania wrote:
> Richard Henderson <rth@twiddle.net> writes:
> 
> 
>> We already have rol32 and rol64.
>>
>> Which I see are broken for shift == 0.
> 
> I tried with different shift (including 0) in a test program, and the
> result is as expected:
> 
> 0: ccddeeff
> 
> static inline unsigned int rol32(unsigned int word, unsigned int shift)
> {
>   return (word << shift) | (word >> (32 - shift));
> }

Technically, a shift by 32 is invalid.  Practically, there are two common
cases: shift >= 32 produces zero and shift is truncated to the word size, both
of which produce the correct results here.

That said, there's also the case of clang's sanitizers, which will in fact
signal this as a runtime error.


r~

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] target-ppc: add vrldnmi and vrlwmi instructions
  2016-10-25 16:33       ` Richard Henderson
@ 2016-10-26  4:44         ` Nikunj A Dadhania
  0 siblings, 0 replies; 14+ messages in thread
From: Nikunj A Dadhania @ 2016-10-26  4:44 UTC (permalink / raw)
  To: Richard Henderson, qemu-ppc, david
  Cc: qemu-devel, bharata, hegdevasant, sandipandas1990, ego

Richard Henderson <rth@twiddle.net> writes:

> On 10/24/2016 11:02 PM, Nikunj A Dadhania wrote:
>> Richard Henderson <rth@twiddle.net> writes:
>> 
>> 
>>> We already have rol32 and rol64.
>>>
>>> Which I see are broken for shift == 0.
>> 
>> I tried with different shift (including 0) in a test program, and the
>> result is as expected:
>> 
>> 0: ccddeeff
>> 
>> static inline unsigned int rol32(unsigned int word, unsigned int shift)
>> {
>>   return (word << shift) | (word >> (32 - shift));
>> }
>
> Technically, a shift by 32 is invalid.  Practically, there are two common
> cases: shift >= 32 produces zero and shift is truncated to the word size, both
> of which produce the correct results here.
>
> That said, there's also the case of clang's sanitizers, which will in fact
> signal this as a runtime error.

In that case, will send patch updating them as part of my next revision

Regards
Nikunj

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-10-26  4:44 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-24  9:14 [Qemu-devel] [PATCH 0/4] POWER9 TCG enablements - part7 Nikunj A Dadhania
2016-10-24  9:14 ` [Qemu-devel] [PATCH 1/4] target-ppc: add xscmp[eq, gt, ge, ne]dp instructions Nikunj A Dadhania
2016-10-24  9:14 ` [Qemu-devel] [PATCH 2/4] target-ppc: add vmul10[u, eu, cu, ecu]q instructions Nikunj A Dadhania
2016-10-24 16:04   ` Richard Henderson
2016-10-25  2:38     ` David Gibson
2016-10-24  9:14 ` [Qemu-devel] [PATCH 3/4] target-ppc: add vrldnmi and vrlwmi instructions Nikunj A Dadhania
2016-10-24 16:16   ` Richard Henderson
2016-10-25  4:08     ` Nikunj A Dadhania
2016-10-25  4:21       ` Richard Henderson
2016-10-25  4:44         ` Nikunj A Dadhania
2016-10-25  6:02     ` Nikunj A Dadhania
2016-10-25 16:33       ` Richard Henderson
2016-10-26  4:44         ` Nikunj A Dadhania
2016-10-24  9:15 ` [Qemu-devel] [PATCH 4/4] target-ppc: add vrldnm and vrlwnm instructions Nikunj A Dadhania

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.