All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch
@ 2022-02-22 14:35 matheus.ferst
  2022-02-22 14:35 ` [PATCH v4 01/47] target/ppc: Introduce TRANS*FLAGS macros matheus.ferst
                   ` (46 more replies)
  0 siblings, 47 replies; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:35 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

This patch series implements 5 missing instructions from PowerISA v3.0
and 56 new instructions from PowerISA v3.1, moving 87 other instructions
to decodetree along the way.

Patches without review: 2-5, 7, 9-12, 14-16, 18-36, 38-47.

This series can also be found at:
https://github.com/PPC64/qemu/tree/ppc-isa31-2112-v4

v4:
 - Rebase on master;
 - 16 new instructions: vs[lr]q, vrlq, vextsd2q, lxvr[bhwd]x/stxvr[bhwd]x,
   plxssp/pstxssp and plxsd/pstxsd;
 - Multiple fixes/optimizations (rth)

v3:
 - Dropped patch 33, which caused a regression in xxperm[r]

v2:
 - New patch (30) to remove xscmpnedp

Leandro Lupori (2):
  target/ppc: implement plxsd/pstxsd
  target/ppc: implement plxssp/pstxssp

Lucas Coutinho (3):
  target/ppc: Move vexts[bhw]2[wd] to decodetree
  target/ppc: Implement vextsd2q
  target/ppc: implement lxvr[bhwd]/stxvr[bhwd]x

Lucas Mateus Castro (alqotel) (3):
  target/ppc: moved vector even and odd multiplication to decodetree
  target/ppc: Moved vector multiply high and low to decodetree
  target/ppc: vmulh* instructions without helpers

Luis Pires (1):
  target/ppc: Introduce TRANS*FLAGS macros

Matheus Ferst (27):
  target/ppc: Move Vector Compare Equal/Not Equal/Greater Than to
    decodetree
  target/ppc: Move Vector Compare Not Equal or Zero to decodetree
  target/ppc: Implement Vector Compare Equal Quadword
  target/ppc: Implement Vector Compare Greater Than Quadword
  target/ppc: Implement Vector Compare Quadword
  target/ppc: implement vstri[bh][lr]
  target/ppc: implement vclrlb
  target/ppc: implement vclrrb
  target/ppc: implement vcntmb[bhwd]
  target/ppc: implement vgnb
  target/ppc: move vs[lr][a][bhwd] to decodetree
  target/ppc: implement vslq
  target/ppc: implement vsrq
  target/ppc: implement vsraq
  target/ppc: move vrl[bhwd] to decodetree
  target/ppc: move vrl[bhwd]nm/vrl[bhwd]mi to decodetree
  target/ppc: implement vrlq
  target/ppc: Move vsel and vperm/vpermr to decodetree
  target/ppc: Move xxsel to decodetree
  target/ppc: move xxperm/xxpermr to decodetree
  target/ppc: Move xxpermdi to decodetree
  target/ppc: Implement xxpermx instruction
  tcg/tcg-op-gvec.c: Introduce tcg_gen_gvec_4i
  target/ppc: Implement xxeval
  target/ppc: Implement xxgenpcv[bhwd]m instruction
  target/ppc: move xs[n]madd[am][ds]p/xs[n]msub[am][ds]p to decodetree
  target/ppc: implement xs[n]maddqp[o]/xs[n]msubqp[o]

Víctor Colombo (11):
  target/ppc: Implement vmsumcud instruction
  target/ppc: Implement vmsumudm instruction
  target/ppc: Implement xvtlsbb instruction
  target/ppc: Remove xscmpnedp instruction
  target/ppc: Refactor VSX_SCALAR_CMP_DP
  target/ppc: Implement xscmp{eq,ge,gt}qp
  target/ppc: Move xscmp{eq,ge,gt}dp to decodetree
  target/ppc: Move xs{max, min}[cj]dp to use do_helper_XX3
  target/ppc: Refactor VSX_MAX_MINC helper
  target/ppc: Implement xs{max,min}cqp
  target/ppc: Implement xvcvbf16spn and xvcvspbf16 instructions

 include/tcg/tcg-op-gvec.h           |   22 +
 target/ppc/fpu_helper.c             |  171 ++--
 target/ppc/helper.h                 |  147 ++--
 target/ppc/insn32.decode            |  232 ++++-
 target/ppc/insn64.decode            |   56 +-
 target/ppc/int_helper.c             |  419 +++++----
 target/ppc/translate.c              |   58 +-
 target/ppc/translate/vmx-impl.c.inc | 1272 ++++++++++++++++++++++++---
 target/ppc/translate/vmx-ops.c.inc  |   59 +-
 target/ppc/translate/vsx-impl.c.inc |  726 ++++++++++++---
 target/ppc/translate/vsx-ops.c.inc  |   67 --
 tcg/ppc/tcg-target.c.inc            |    6 +
 tcg/tcg-op-gvec.c                   |  146 +++
 13 files changed, 2632 insertions(+), 749 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v4 01/47] target/ppc: Introduce TRANS*FLAGS macros
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
@ 2022-02-22 14:35 ` matheus.ferst
  2022-02-22 14:36 ` [PATCH v4 02/47] target/ppc: moved vector even and odd multiplication to decodetree matheus.ferst
                   ` (45 subsequent siblings)
  46 siblings, 0 replies; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:35 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, Luis Pires, clg,
	Matheus Ferst, david

From: Luis Pires <luis.pires@eldorado.org.br>

New macros that add FLAGS and FLAGS2 checking were added for
both TRANS and TRANS64.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
[ferst: - TRANS_FLAGS2 instead of TRANS_FLAGS_E
        - Use the new macros in load/store vector insns ]
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/translate.c              | 19 +++++++++++++++
 target/ppc/translate/vsx-impl.c.inc | 37 ++++++++++-------------------
 2 files changed, 31 insertions(+), 25 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 2eaffd432a..b647430012 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -6604,10 +6604,29 @@ static int times_16(DisasContext *ctx, int x)
 #define TRANS(NAME, FUNC, ...) \
     static bool trans_##NAME(DisasContext *ctx, arg_##NAME *a) \
     { return FUNC(ctx, a, __VA_ARGS__); }
+#define TRANS_FLAGS(FLAGS, NAME, FUNC, ...) \
+    static bool trans_##NAME(DisasContext *ctx, arg_##NAME *a) \
+    {                                                          \
+        REQUIRE_INSNS_FLAGS(ctx, FLAGS);                       \
+        return FUNC(ctx, a, __VA_ARGS__);                      \
+    }
+#define TRANS_FLAGS2(FLAGS2, NAME, FUNC, ...) \
+    static bool trans_##NAME(DisasContext *ctx, arg_##NAME *a) \
+    {                                                          \
+        REQUIRE_INSNS_FLAGS2(ctx, FLAGS2);                     \
+        return FUNC(ctx, a, __VA_ARGS__);                      \
+    }
 
 #define TRANS64(NAME, FUNC, ...) \
     static bool trans_##NAME(DisasContext *ctx, arg_##NAME *a) \
     { REQUIRE_64BIT(ctx); return FUNC(ctx, a, __VA_ARGS__); }
+#define TRANS64_FLAGS2(FLAGS2, NAME, FUNC, ...) \
+    static bool trans_##NAME(DisasContext *ctx, arg_##NAME *a) \
+    {                                                          \
+        REQUIRE_64BIT(ctx);                                    \
+        REQUIRE_INSNS_FLAGS2(ctx, FLAGS2);                     \
+        return FUNC(ctx, a, __VA_ARGS__);                      \
+    }
 
 /* TODO: More TRANS* helpers for extra insn_flags checks. */
 
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 128968b5e7..e8a4ba0cfa 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2072,12 +2072,6 @@ static bool do_lstxv(DisasContext *ctx, int ra, TCGv displ,
 
 static bool do_lstxv_D(DisasContext *ctx, arg_D *a, bool store, bool paired)
 {
-    if (paired) {
-        REQUIRE_INSNS_FLAGS2(ctx, ISA310);
-    } else {
-        REQUIRE_INSNS_FLAGS2(ctx, ISA300);
-    }
-
     if (paired || a->rt >= 32) {
         REQUIRE_VSX(ctx);
     } else {
@@ -2091,7 +2085,6 @@ static bool do_lstxv_PLS_D(DisasContext *ctx, arg_PLS_D *a,
                            bool store, bool paired)
 {
     arg_D d;
-    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
     REQUIRE_VSX(ctx);
 
     if (!resolve_PLS_D(ctx, &d, a)) {
@@ -2103,12 +2096,6 @@ static bool do_lstxv_PLS_D(DisasContext *ctx, arg_PLS_D *a,
 
 static bool do_lstxv_X(DisasContext *ctx, arg_X *a, bool store, bool paired)
 {
-    if (paired) {
-        REQUIRE_INSNS_FLAGS2(ctx, ISA310);
-    } else {
-        REQUIRE_INSNS_FLAGS2(ctx, ISA300);
-    }
-
     if (paired || a->rt >= 32) {
         REQUIRE_VSX(ctx);
     } else {
@@ -2118,18 +2105,18 @@ static bool do_lstxv_X(DisasContext *ctx, arg_X *a, bool store, bool paired)
     return do_lstxv(ctx, a->ra, cpu_gpr[a->rb], a->rt, store, paired);
 }
 
-TRANS(STXV, do_lstxv_D, true, false)
-TRANS(LXV, do_lstxv_D, false, false)
-TRANS(STXVP, do_lstxv_D, true, true)
-TRANS(LXVP, do_lstxv_D, false, true)
-TRANS(STXVX, do_lstxv_X, true, false)
-TRANS(LXVX, do_lstxv_X, false, false)
-TRANS(STXVPX, do_lstxv_X, true, true)
-TRANS(LXVPX, do_lstxv_X, false, true)
-TRANS64(PSTXV, do_lstxv_PLS_D, true, false)
-TRANS64(PLXV, do_lstxv_PLS_D, false, false)
-TRANS64(PSTXVP, do_lstxv_PLS_D, true, true)
-TRANS64(PLXVP, do_lstxv_PLS_D, false, true)
+TRANS_FLAGS2(ISA300, STXV, do_lstxv_D, true, false)
+TRANS_FLAGS2(ISA300, LXV, do_lstxv_D, false, false)
+TRANS_FLAGS2(ISA310, STXVP, do_lstxv_D, true, true)
+TRANS_FLAGS2(ISA310, LXVP, do_lstxv_D, false, true)
+TRANS_FLAGS2(ISA300, STXVX, do_lstxv_X, true, false)
+TRANS_FLAGS2(ISA300, LXVX, do_lstxv_X, false, false)
+TRANS_FLAGS2(ISA310, STXVPX, do_lstxv_X, true, true)
+TRANS_FLAGS2(ISA310, LXVPX, do_lstxv_X, false, true)
+TRANS64_FLAGS2(ISA310, PSTXV, do_lstxv_PLS_D, true, false)
+TRANS64_FLAGS2(ISA310, PLXV, do_lstxv_PLS_D, false, false)
+TRANS64_FLAGS2(ISA310, PSTXVP, do_lstxv_PLS_D, true, true)
+TRANS64_FLAGS2(ISA310, PLXVP, do_lstxv_PLS_D, false, true)
 
 static void gen_xxblendv_vec(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b,
                              TCGv_vec c)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 02/47] target/ppc: moved vector even and odd multiplication to decodetree
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
  2022-02-22 14:35 ` [PATCH v4 01/47] target/ppc: Introduce TRANS*FLAGS macros matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 18:19   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 03/47] target/ppc: Moved vector multiply high and low " matheus.ferst
                   ` (44 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: Lucas Mateus Castro (alqotel),
	danielhb413, richard.henderson, groug, Lucas Mateus Castro, clg,
	Matheus Ferst, david

From: "Lucas Mateus Castro (alqotel)" <lucas.castro@eldorado.org.br>

Moved the instructions vmulesb, vmulosb, vmuleub, vmuloub,
vmulesh, vmulosh, vmuleuh, vmulouh, vmulesw, vmulosw,
muleuw and vmulouw from legacy to decodetree. Implemented
the instructions vmulesd, vmulosd, vmuleud, vmuloud.

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/helper.h                 | 28 +++++++++-------
 target/ppc/insn32.decode            | 22 ++++++++++++
 target/ppc/int_helper.c             | 36 ++++++++++++++------
 target/ppc/translate/vmx-impl.c.inc | 52 +++++++++++++++++++----------
 target/ppc/translate/vmx-ops.c.inc  | 15 ++-------
 tcg/ppc/tcg-target.c.inc            |  6 ++++
 6 files changed, 107 insertions(+), 52 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index ab008c9d4e..04689522f8 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -190,18 +190,22 @@ DEF_HELPER_3(vmrglw, void, avr, avr, avr)
 DEF_HELPER_3(vmrghb, void, avr, avr, avr)
 DEF_HELPER_3(vmrghh, void, avr, avr, avr)
 DEF_HELPER_3(vmrghw, void, avr, avr, avr)
-DEF_HELPER_3(vmulesb, void, avr, avr, avr)
-DEF_HELPER_3(vmulesh, void, avr, avr, avr)
-DEF_HELPER_3(vmulesw, void, avr, avr, avr)
-DEF_HELPER_3(vmuleub, void, avr, avr, avr)
-DEF_HELPER_3(vmuleuh, void, avr, avr, avr)
-DEF_HELPER_3(vmuleuw, void, avr, avr, avr)
-DEF_HELPER_3(vmulosb, void, avr, avr, avr)
-DEF_HELPER_3(vmulosh, void, avr, avr, avr)
-DEF_HELPER_3(vmulosw, void, avr, avr, avr)
-DEF_HELPER_3(vmuloub, void, avr, avr, avr)
-DEF_HELPER_3(vmulouh, void, avr, avr, avr)
-DEF_HELPER_3(vmulouw, void, avr, avr, avr)
+DEF_HELPER_3(VMULESB, void, avr, avr, avr)
+DEF_HELPER_3(VMULESH, void, avr, avr, avr)
+DEF_HELPER_3(VMULESW, void, avr, avr, avr)
+DEF_HELPER_3(VMULESD, void, avr, avr, avr)
+DEF_HELPER_3(VMULEUB, void, avr, avr, avr)
+DEF_HELPER_3(VMULEUH, void, avr, avr, avr)
+DEF_HELPER_3(VMULEUW, void, avr, avr, avr)
+DEF_HELPER_3(VMULEUD, void, avr, avr, avr)
+DEF_HELPER_3(VMULOSB, void, avr, avr, avr)
+DEF_HELPER_3(VMULOSH, void, avr, avr, avr)
+DEF_HELPER_3(VMULOSW, void, avr, avr, avr)
+DEF_HELPER_3(VMULOSD, void, avr, avr, avr)
+DEF_HELPER_3(VMULOUB, void, avr, avr, avr)
+DEF_HELPER_3(VMULOUH, void, avr, avr, avr)
+DEF_HELPER_3(VMULOUW, void, avr, avr, avr)
+DEF_HELPER_3(VMULOUD, void, avr, avr, avr)
 DEF_HELPER_3(vmulhsw, void, avr, avr, avr)
 DEF_HELPER_3(vmulhuw, void, avr, avr, avr)
 DEF_HELPER_3(vmulhsd, void, avr, avr, avr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 2a9c91a423..092ea79618 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -440,6 +440,28 @@ VEXTRACTWM      000100 ..... 01010 ..... 11001000010    @VX_tb
 VEXTRACTDM      000100 ..... 01011 ..... 11001000010    @VX_tb
 VEXTRACTQM      000100 ..... 01100 ..... 11001000010    @VX_tb
 
+## Vector Multiply Instruction
+
+VMULESB         000100 ..... ..... ..... 01100001000    @VX
+VMULOSB         000100 ..... ..... ..... 00100001000    @VX
+VMULEUB         000100 ..... ..... ..... 01000001000    @VX
+VMULOUB         000100 ..... ..... ..... 00000001000    @VX
+
+VMULESH         000100 ..... ..... ..... 01101001000    @VX
+VMULOSH         000100 ..... ..... ..... 00101001000    @VX
+VMULEUH         000100 ..... ..... ..... 01001001000    @VX
+VMULOUH         000100 ..... ..... ..... 00001001000    @VX
+
+VMULESW         000100 ..... ..... ..... 01110001000    @VX
+VMULOSW         000100 ..... ..... ..... 00110001000    @VX
+VMULEUW         000100 ..... ..... ..... 01010001000    @VX
+VMULOUW         000100 ..... ..... ..... 00010001000    @VX
+
+VMULESD         000100 ..... ..... ..... 01111001000    @VX
+VMULOSD         000100 ..... ..... ..... 00111001000    @VX
+VMULEUD         000100 ..... ..... ..... 01011001000    @VX
+VMULOUD         000100 ..... ..... ..... 00011001000    @VX
+
 # VSX Load/Store Instructions
 
 LXV             111101 ..... ..... ............ . 001   @DQ_TSX
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index d1b12788b2..7d925418d4 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1063,7 +1063,7 @@ void helper_vmsumuhs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
 }
 
 #define VMUL_DO_EVN(name, mul_element, mul_access, prod_access, cast)   \
-    void helper_v##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)       \
+    void helper_V##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)       \
     {                                                                   \
         int i;                                                          \
                                                                         \
@@ -1074,7 +1074,7 @@ void helper_vmsumuhs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
     }
 
 #define VMUL_DO_ODD(name, mul_element, mul_access, prod_access, cast)   \
-    void helper_v##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)       \
+    void helper_V##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)       \
     {                                                                   \
         int i;                                                          \
                                                                         \
@@ -1085,17 +1085,33 @@ void helper_vmsumuhs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
     }
 
 #define VMUL(suffix, mul_element, mul_access, prod_access, cast)       \
-    VMUL_DO_EVN(mule##suffix, mul_element, mul_access, prod_access, cast)  \
-    VMUL_DO_ODD(mulo##suffix, mul_element, mul_access, prod_access, cast)
-VMUL(sb, s8, VsrSB, VsrSH, int16_t)
-VMUL(sh, s16, VsrSH, VsrSW, int32_t)
-VMUL(sw, s32, VsrSW, VsrSD, int64_t)
-VMUL(ub, u8, VsrB, VsrH, uint16_t)
-VMUL(uh, u16, VsrH, VsrW, uint32_t)
-VMUL(uw, u32, VsrW, VsrD, uint64_t)
+    VMUL_DO_EVN(MULE##suffix, mul_element, mul_access, prod_access, cast)  \
+    VMUL_DO_ODD(MULO##suffix, mul_element, mul_access, prod_access, cast)
+VMUL(SB, s8, VsrSB, VsrSH, int16_t)
+VMUL(SH, s16, VsrSH, VsrSW, int32_t)
+VMUL(SW, s32, VsrSW, VsrSD, int64_t)
+VMUL(UB, u8, VsrB, VsrH, uint16_t)
+VMUL(UH, u16, VsrH, VsrW, uint32_t)
+VMUL(UW, u32, VsrW, VsrD, uint64_t)
 #undef VMUL_DO_EVN
 #undef VMUL_DO_ODD
 #undef VMUL
+void helper_VMULESD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
+{
+    muls64(&r->VsrD(1), &r->VsrD(0), a->VsrSD(0), b->VsrSD(0));
+}
+void helper_VMULOSD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
+{
+    muls64(&r->VsrD(1), &r->VsrD(0), a->VsrSD(1), b->VsrSD(1));
+}
+void helper_VMULEUD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
+{
+    mulu64(&r->VsrD(1), &r->VsrD(0), a->VsrD(0), b->VsrD(0));
+}
+void helper_VMULOUD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
+{
+    mulu64(&r->VsrD(1), &r->VsrD(0), a->VsrD(1), b->VsrD(1));
+}
 
 void helper_vmulhsw(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index d5e02fd7f2..430579addd 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -798,29 +798,11 @@ static void trans_vclzd(DisasContext *ctx)
     tcg_temp_free_i64(avr);
 }
 
-GEN_VXFORM(vmuloub, 4, 0);
-GEN_VXFORM(vmulouh, 4, 1);
-GEN_VXFORM(vmulouw, 4, 2);
 GEN_VXFORM_V(vmuluwm, MO_32, tcg_gen_gvec_mul, 4, 2);
-GEN_VXFORM_DUAL(vmulouw, PPC_ALTIVEC, PPC_NONE,
-                vmuluwm, PPC_NONE, PPC2_ALTIVEC_207)
-GEN_VXFORM(vmulosb, 4, 4);
-GEN_VXFORM(vmulosh, 4, 5);
-GEN_VXFORM(vmulosw, 4, 6);
 GEN_VXFORM_V(vmulld, MO_64, tcg_gen_gvec_mul, 4, 7);
-GEN_VXFORM(vmuleub, 4, 8);
-GEN_VXFORM(vmuleuh, 4, 9);
-GEN_VXFORM(vmuleuw, 4, 10);
 GEN_VXFORM(vmulhuw, 4, 10);
 GEN_VXFORM(vmulhud, 4, 11);
-GEN_VXFORM_DUAL(vmuleuw, PPC_ALTIVEC, PPC_NONE,
-                vmulhuw, PPC_NONE, PPC2_ISA310);
-GEN_VXFORM(vmulesb, 4, 12);
-GEN_VXFORM(vmulesh, 4, 13);
-GEN_VXFORM(vmulesw, 4, 14);
 GEN_VXFORM(vmulhsw, 4, 14);
-GEN_VXFORM_DUAL(vmulesw, PPC_ALTIVEC, PPC_NONE,
-                vmulhsw, PPC_NONE, PPC2_ISA310);
 GEN_VXFORM(vmulhsd, 4, 15);
 GEN_VXFORM_V(vslb, MO_8, tcg_gen_gvec_shlv, 2, 4);
 GEN_VXFORM_V(vslh, MO_16, tcg_gen_gvec_shlv, 2, 5);
@@ -2104,6 +2086,40 @@ static bool trans_VPEXTD(DisasContext *ctx, arg_VX *a)
     return true;
 }
 
+static bool do_vx_helper(DisasContext *ctx, arg_VX *a,
+                         void (*gen_helper) (TCGv_ptr, TCGv_ptr, TCGv_ptr))
+{
+    TCGv_ptr ra, rb, rd;
+    REQUIRE_VECTOR(ctx);
+
+    ra = gen_avr_ptr(a->vra);
+    rb = gen_avr_ptr(a->vrb);
+    rd = gen_avr_ptr(a->vrt);
+    gen_helper(rd, ra, rb);
+    tcg_temp_free_ptr(ra);
+    tcg_temp_free_ptr(rb);
+    tcg_temp_free_ptr(rd);
+
+    return true;
+}
+
+TRANS_FLAGS2(ALTIVEC_207, VMULESB, do_vx_helper, gen_helper_VMULESB)
+TRANS_FLAGS2(ALTIVEC_207, VMULOSB, do_vx_helper, gen_helper_VMULOSB)
+TRANS_FLAGS2(ALTIVEC_207, VMULEUB, do_vx_helper, gen_helper_VMULEUB)
+TRANS_FLAGS2(ALTIVEC_207, VMULOUB, do_vx_helper, gen_helper_VMULOUB)
+TRANS_FLAGS2(ALTIVEC_207, VMULESH, do_vx_helper, gen_helper_VMULESH)
+TRANS_FLAGS2(ALTIVEC_207, VMULOSH, do_vx_helper, gen_helper_VMULOSH)
+TRANS_FLAGS2(ALTIVEC_207, VMULEUH, do_vx_helper, gen_helper_VMULEUH)
+TRANS_FLAGS2(ALTIVEC_207, VMULOUH, do_vx_helper, gen_helper_VMULOUH)
+TRANS_FLAGS2(ALTIVEC_207, VMULESW, do_vx_helper, gen_helper_VMULESW)
+TRANS_FLAGS2(ALTIVEC_207, VMULOSW, do_vx_helper, gen_helper_VMULOSW)
+TRANS_FLAGS2(ALTIVEC_207, VMULEUW, do_vx_helper, gen_helper_VMULEUW)
+TRANS_FLAGS2(ALTIVEC_207, VMULOUW, do_vx_helper, gen_helper_VMULOUW)
+TRANS_FLAGS2(ISA310, VMULESD, do_vx_helper, gen_helper_VMULESD)
+TRANS_FLAGS2(ISA310, VMULOSD, do_vx_helper, gen_helper_VMULOSD)
+TRANS_FLAGS2(ISA310, VMULEUD, do_vx_helper, gen_helper_VMULEUD)
+TRANS_FLAGS2(ISA310, VMULOUD, do_vx_helper, gen_helper_VMULOUD)
+
 #undef GEN_VR_LDX
 #undef GEN_VR_STX
 #undef GEN_VR_LVE
diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc
index 25ee715b43..f310b2fbde 100644
--- a/target/ppc/translate/vmx-ops.c.inc
+++ b/target/ppc/translate/vmx-ops.c.inc
@@ -101,20 +101,11 @@ GEN_VXFORM_DUAL(vmrgow, vextuwlx, 6, 26, PPC_NONE, PPC2_ALTIVEC_207),
 GEN_VXFORM_300(vextubrx, 6, 28),
 GEN_VXFORM_300(vextuhrx, 6, 29),
 GEN_VXFORM_DUAL(vmrgew, vextuwrx, 6, 30, PPC_NONE, PPC2_ALTIVEC_207),
-GEN_VXFORM(vmuloub, 4, 0),
-GEN_VXFORM(vmulouh, 4, 1),
-GEN_VXFORM_DUAL(vmulouw, vmuluwm, 4, 2, PPC_ALTIVEC, PPC_NONE),
-GEN_VXFORM(vmulosb, 4, 4),
-GEN_VXFORM(vmulosh, 4, 5),
-GEN_VXFORM_207(vmulosw, 4, 6),
+GEN_VXFORM_207(vmuluwm, 4, 2),
 GEN_VXFORM_310(vmulld, 4, 7),
-GEN_VXFORM(vmuleub, 4, 8),
-GEN_VXFORM(vmuleuh, 4, 9),
-GEN_VXFORM_DUAL(vmuleuw, vmulhuw, 4, 10, PPC_ALTIVEC, PPC_NONE),
+GEN_VXFORM_310(vmulhuw, 4, 10),
 GEN_VXFORM_310(vmulhud, 4, 11),
-GEN_VXFORM(vmulesb, 4, 12),
-GEN_VXFORM(vmulesh, 4, 13),
-GEN_VXFORM_DUAL(vmulesw, vmulhsw, 4, 14, PPC_ALTIVEC, PPC_NONE),
+GEN_VXFORM_310(vmulhsw, 4, 14),
 GEN_VXFORM_310(vmulhsd, 4, 15),
 GEN_VXFORM(vslb, 2, 4),
 GEN_VXFORM(vslh, 2, 5),
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index dea24f23c4..69d22e08cb 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -3987,3 +3987,9 @@ void tcg_register_jit(const void *buf, size_t buf_size)
     tcg_register_jit_int(buf, buf_size, &debug_frame, sizeof(debug_frame));
 }
 #endif /* __ELF__ */
+#undef VMULEUB
+#undef VMULEUH
+#undef VMULEUW
+#undef VMULOUB
+#undef VMULOUH
+#undef VMULOUW
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 03/47] target/ppc: Moved vector multiply high and low to decodetree
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
  2022-02-22 14:35 ` [PATCH v4 01/47] target/ppc: Introduce TRANS*FLAGS macros matheus.ferst
  2022-02-22 14:36 ` [PATCH v4 02/47] target/ppc: moved vector even and odd multiplication to decodetree matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 18:19   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 04/47] target/ppc: vmulh* instructions without helpers matheus.ferst
                   ` (43 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: Lucas Mateus Castro (alqotel),
	danielhb413, richard.henderson, groug, Lucas Mateus Castro, clg,
	Matheus Ferst, david

From: "Lucas Mateus Castro (alqotel)" <lucas.castro@eldorado.org.br>

Moved instructions vmulld, vmulhuw, vmulhsw, vmulhud and vmulhsd to
decodetree

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/helper.h                 |  8 ++++----
 target/ppc/insn32.decode            |  6 ++++++
 target/ppc/int_helper.c             |  8 ++++----
 target/ppc/translate/vmx-impl.c.inc | 21 ++++++++++++++++-----
 target/ppc/translate/vmx-ops.c.inc  |  5 -----
 5 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 04689522f8..5d11158f1f 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -206,10 +206,10 @@ DEF_HELPER_3(VMULOUB, void, avr, avr, avr)
 DEF_HELPER_3(VMULOUH, void, avr, avr, avr)
 DEF_HELPER_3(VMULOUW, void, avr, avr, avr)
 DEF_HELPER_3(VMULOUD, void, avr, avr, avr)
-DEF_HELPER_3(vmulhsw, void, avr, avr, avr)
-DEF_HELPER_3(vmulhuw, void, avr, avr, avr)
-DEF_HELPER_3(vmulhsd, void, avr, avr, avr)
-DEF_HELPER_3(vmulhud, void, avr, avr, avr)
+DEF_HELPER_3(VMULHSW, void, avr, avr, avr)
+DEF_HELPER_3(VMULHUW, void, avr, avr, avr)
+DEF_HELPER_3(VMULHSD, void, avr, avr, avr)
+DEF_HELPER_3(VMULHUD, void, avr, avr, avr)
 DEF_HELPER_3(vslo, void, avr, avr, avr)
 DEF_HELPER_3(vsro, void, avr, avr, avr)
 DEF_HELPER_3(vsrv, void, avr, avr, avr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 092ea79618..d817e44c71 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -462,6 +462,12 @@ VMULOSD         000100 ..... ..... ..... 00111001000    @VX
 VMULEUD         000100 ..... ..... ..... 01011001000    @VX
 VMULOUD         000100 ..... ..... ..... 00011001000    @VX
 
+VMULHSW         000100 ..... ..... ..... 01110001001    @VX
+VMULHUW         000100 ..... ..... ..... 01010001001    @VX
+VMULHSD         000100 ..... ..... ..... 01111001001    @VX
+VMULHUD         000100 ..... ..... ..... 01011001001    @VX
+VMULLD          000100 ..... ..... ..... 00111001001    @VX
+
 # VSX Load/Store Instructions
 
 LXV             111101 ..... ..... ............ . 001   @DQ_TSX
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 7d925418d4..8ddeccef12 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1113,7 +1113,7 @@ void helper_VMULOUD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     mulu64(&r->VsrD(1), &r->VsrD(0), a->VsrD(1), b->VsrD(1));
 }
 
-void helper_vmulhsw(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
+void helper_VMULHSW(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
     int i;
 
@@ -1122,7 +1122,7 @@ void helper_vmulhsw(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     }
 }
 
-void helper_vmulhuw(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
+void helper_VMULHUW(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
     int i;
 
@@ -1132,7 +1132,7 @@ void helper_vmulhuw(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     }
 }
 
-void helper_vmulhsd(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
+void helper_VMULHSD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
     uint64_t discard;
 
@@ -1140,7 +1140,7 @@ void helper_vmulhsd(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     muls64(&discard, &r->u64[1], a->s64[1], b->s64[1]);
 }
 
-void helper_vmulhud(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
+void helper_VMULHUD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
     uint64_t discard;
 
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index 430579addd..62d0642226 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -799,11 +799,6 @@ static void trans_vclzd(DisasContext *ctx)
 }
 
 GEN_VXFORM_V(vmuluwm, MO_32, tcg_gen_gvec_mul, 4, 2);
-GEN_VXFORM_V(vmulld, MO_64, tcg_gen_gvec_mul, 4, 7);
-GEN_VXFORM(vmulhuw, 4, 10);
-GEN_VXFORM(vmulhud, 4, 11);
-GEN_VXFORM(vmulhsw, 4, 14);
-GEN_VXFORM(vmulhsd, 4, 15);
 GEN_VXFORM_V(vslb, MO_8, tcg_gen_gvec_shlv, 2, 4);
 GEN_VXFORM_V(vslh, MO_16, tcg_gen_gvec_shlv, 2, 5);
 GEN_VXFORM_V(vslw, MO_32, tcg_gen_gvec_shlv, 2, 6);
@@ -2103,6 +2098,17 @@ static bool do_vx_helper(DisasContext *ctx, arg_VX *a,
     return true;
 }
 
+static bool trans_VMULLD(DisasContext *ctx, arg_VX *a)
+{
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VECTOR(ctx);
+
+    tcg_gen_gvec_mul(MO_64, avr_full_offset(a->vrt), avr_full_offset(a->vra),
+                     avr_full_offset(a->vrb), 16, 16);
+
+    return true;
+}
+
 TRANS_FLAGS2(ALTIVEC_207, VMULESB, do_vx_helper, gen_helper_VMULESB)
 TRANS_FLAGS2(ALTIVEC_207, VMULOSB, do_vx_helper, gen_helper_VMULOSB)
 TRANS_FLAGS2(ALTIVEC_207, VMULEUB, do_vx_helper, gen_helper_VMULEUB)
@@ -2120,6 +2126,11 @@ TRANS_FLAGS2(ISA310, VMULOSD, do_vx_helper, gen_helper_VMULOSD)
 TRANS_FLAGS2(ISA310, VMULEUD, do_vx_helper, gen_helper_VMULEUD)
 TRANS_FLAGS2(ISA310, VMULOUD, do_vx_helper, gen_helper_VMULOUD)
 
+TRANS_FLAGS2(ISA310, VMULHSW, do_vx_helper, gen_helper_VMULHSW)
+TRANS_FLAGS2(ISA310, VMULHSD, do_vx_helper, gen_helper_VMULHSD)
+TRANS_FLAGS2(ISA310, VMULHUW, do_vx_helper, gen_helper_VMULHUW)
+TRANS_FLAGS2(ISA310, VMULHUD, do_vx_helper, gen_helper_VMULHUD)
+
 #undef GEN_VR_LDX
 #undef GEN_VR_STX
 #undef GEN_VR_LVE
diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc
index f310b2fbde..914e68e5b0 100644
--- a/target/ppc/translate/vmx-ops.c.inc
+++ b/target/ppc/translate/vmx-ops.c.inc
@@ -102,11 +102,6 @@ GEN_VXFORM_300(vextubrx, 6, 28),
 GEN_VXFORM_300(vextuhrx, 6, 29),
 GEN_VXFORM_DUAL(vmrgew, vextuwrx, 6, 30, PPC_NONE, PPC2_ALTIVEC_207),
 GEN_VXFORM_207(vmuluwm, 4, 2),
-GEN_VXFORM_310(vmulld, 4, 7),
-GEN_VXFORM_310(vmulhuw, 4, 10),
-GEN_VXFORM_310(vmulhud, 4, 11),
-GEN_VXFORM_310(vmulhsw, 4, 14),
-GEN_VXFORM_310(vmulhsd, 4, 15),
 GEN_VXFORM(vslb, 2, 4),
 GEN_VXFORM(vslh, 2, 5),
 GEN_VXFORM_DUAL(vslw, vrlwnm, 2, 6, PPC_ALTIVEC, PPC_NONE),
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 04/47] target/ppc: vmulh* instructions without helpers
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (2 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 03/47] target/ppc: Moved vector multiply high and low " matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 18:23   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 05/47] target/ppc: Implement vmsumcud instruction matheus.ferst
                   ` (42 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: Lucas Mateus Castro (alqotel),
	danielhb413, richard.henderson, groug, Lucas Mateus Castro, clg,
	Matheus Ferst, david

From: "Lucas Mateus Castro (alqotel)" <lucas.castro@eldorado.org.br>

Changed vmulhuw, vmulhud, vmulhsw, vmulhsd to not
use helpers.

Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
Changes in v4:
Changed from gvec to i64, this resulted in a better performance on
a Power host for all 4 instructions and a better performance for
vmulhsw and vmulhuw in x86, but a worse performance for vmulhsd and
vmulhud in a x86 host.
---
 target/ppc/helper.h                 |   4 -
 target/ppc/int_helper.c             |  35 --------
 target/ppc/translate/vmx-impl.c.inc | 123 +++++++++++++++++++++++++++-
 3 files changed, 119 insertions(+), 43 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 5d11158f1f..d0c5a3fef1 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -206,10 +206,6 @@ DEF_HELPER_3(VMULOUB, void, avr, avr, avr)
 DEF_HELPER_3(VMULOUH, void, avr, avr, avr)
 DEF_HELPER_3(VMULOUW, void, avr, avr, avr)
 DEF_HELPER_3(VMULOUD, void, avr, avr, avr)
-DEF_HELPER_3(VMULHSW, void, avr, avr, avr)
-DEF_HELPER_3(VMULHUW, void, avr, avr, avr)
-DEF_HELPER_3(VMULHSD, void, avr, avr, avr)
-DEF_HELPER_3(VMULHUD, void, avr, avr, avr)
 DEF_HELPER_3(vslo, void, avr, avr, avr)
 DEF_HELPER_3(vsro, void, avr, avr, avr)
 DEF_HELPER_3(vsrv, void, avr, avr, avr)
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 8ddeccef12..64c87d9418 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1113,41 +1113,6 @@ void helper_VMULOUD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     mulu64(&r->VsrD(1), &r->VsrD(0), a->VsrD(1), b->VsrD(1));
 }
 
-void helper_VMULHSW(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
-{
-    int i;
-
-    for (i = 0; i < 4; i++) {
-        r->s32[i] = (int32_t)(((int64_t)a->s32[i] * (int64_t)b->s32[i]) >> 32);
-    }
-}
-
-void helper_VMULHUW(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
-{
-    int i;
-
-    for (i = 0; i < 4; i++) {
-        r->u32[i] = (uint32_t)(((uint64_t)a->u32[i] *
-                               (uint64_t)b->u32[i]) >> 32);
-    }
-}
-
-void helper_VMULHSD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
-{
-    uint64_t discard;
-
-    muls64(&discard, &r->u64[0], a->s64[0], b->s64[0]);
-    muls64(&discard, &r->u64[1], a->s64[1], b->s64[1]);
-}
-
-void helper_VMULHUD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
-{
-    uint64_t discard;
-
-    mulu64(&discard, &r->u64[0], a->u64[0], b->u64[0]);
-    mulu64(&discard, &r->u64[1], a->u64[1], b->u64[1]);
-}
-
 void helper_vperm(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b,
                   ppc_avr_t *c)
 {
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index 62d0642226..3951ae124a 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -2126,10 +2126,125 @@ TRANS_FLAGS2(ISA310, VMULOSD, do_vx_helper, gen_helper_VMULOSD)
 TRANS_FLAGS2(ISA310, VMULEUD, do_vx_helper, gen_helper_VMULEUD)
 TRANS_FLAGS2(ISA310, VMULOUD, do_vx_helper, gen_helper_VMULOUD)
 
-TRANS_FLAGS2(ISA310, VMULHSW, do_vx_helper, gen_helper_VMULHSW)
-TRANS_FLAGS2(ISA310, VMULHSD, do_vx_helper, gen_helper_VMULHSD)
-TRANS_FLAGS2(ISA310, VMULHUW, do_vx_helper, gen_helper_VMULHUW)
-TRANS_FLAGS2(ISA310, VMULHUD, do_vx_helper, gen_helper_VMULHUD)
+static void do_vx_vmulhw_i64(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b, bool sign)
+{
+    TCGv_i64 hh, lh, temp;
+
+    uint64_t c;
+    hh = tcg_temp_new_i64();
+    lh = tcg_temp_new_i64();
+    temp = tcg_temp_new_i64();
+
+    c = 0xFFFFFFFF;
+
+    if (sign) {
+        tcg_gen_ext32s_i64(lh, a);
+        tcg_gen_ext32s_i64(temp, b);
+    } else {
+        tcg_gen_andi_i64(lh, a, c);
+        tcg_gen_andi_i64(temp, b, c);
+    }
+    tcg_gen_mul_i64(lh, lh, temp);
+
+    if (sign) {
+        tcg_gen_sari_i64(hh, a, 32);
+        tcg_gen_sari_i64(temp, b, 32);
+    } else {
+        tcg_gen_shri_i64(hh, a, 32);
+        tcg_gen_shri_i64(temp, b, 32);
+    }
+    tcg_gen_mul_i64(hh, hh, temp);
+
+    tcg_gen_shri_i64(lh, lh, 32);
+    tcg_gen_andi_i64(hh, hh, c << 32);
+    tcg_gen_or_i64(t, hh, lh);
+
+    tcg_temp_free_i64(hh);
+    tcg_temp_free_i64(lh);
+    tcg_temp_free_i64(temp);
+}
+
+static void do_vx_vmulhd_i64(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b, bool sign)
+{
+    TCGv_i64 a1, b1, mask, w, k;
+    void (*tcg_gen_shift_imm)(TCGv_i64, TCGv_i64, int64_t);
+
+    a1 = tcg_temp_new_i64();
+    b1 = tcg_temp_new_i64();
+    w  = tcg_temp_new_i64();
+    k  = tcg_temp_new_i64();
+    mask = tcg_temp_new_i64();
+    if (sign) {
+        tcg_gen_shift_imm = tcg_gen_sari_i64;
+    } else {
+        tcg_gen_shift_imm = tcg_gen_shri_i64;
+    }
+
+    tcg_gen_movi_i64(mask, 0xFFFFFFFF);
+    tcg_gen_and_i64(a1, a, mask);
+    tcg_gen_and_i64(b1, b, mask);
+    tcg_gen_mul_i64(t, a1, b1);
+    tcg_gen_shri_i64(k, t, 32);
+
+    tcg_gen_shift_imm(a1, a, 32);
+    tcg_gen_mul_i64(t, a1, b1);
+    tcg_gen_add_i64(t, t, k);
+    tcg_gen_and_i64(k, t, mask);
+    tcg_gen_shift_imm(w, t, 32);
+
+    tcg_gen_and_i64(a1, a, mask);
+    tcg_gen_shift_imm(b1, b, 32);
+    tcg_gen_mul_i64(t, a1, b1);
+    tcg_gen_add_i64(t, t, k);
+    tcg_gen_shift_imm(k, t, 32);
+
+    tcg_gen_shift_imm(a1, a, 32);
+    tcg_gen_mul_i64(t, a1, b1);
+    tcg_gen_add_i64(t, t, w);
+    tcg_gen_add_i64(t, t, k);
+
+    tcg_temp_free_i64(a1);
+    tcg_temp_free_i64(b1);
+    tcg_temp_free_i64(w);
+    tcg_temp_free_i64(k);
+    tcg_temp_free_i64(mask);
+}
+
+static bool do_vx_mulh(DisasContext *ctx, arg_VX *a, bool sign,
+                       void (*func)(TCGv_i64, TCGv_i64, TCGv_i64, bool))
+{
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VECTOR(ctx);
+
+    TCGv_i64 vra, vrb, vrt;
+    int i;
+
+    vra = tcg_temp_new_i64();
+    vrb = tcg_temp_new_i64();
+    vrt = tcg_temp_new_i64();
+
+    for (i = 0; i < 2; i++) {
+        get_avr64(vra, a->vra, i);
+        get_avr64(vrb, a->vrb, i);
+        get_avr64(vrt, a->vrt, i);
+
+        func(vrt, vra, vrb, sign);
+
+        set_avr64(a->vrt, vrt, i);
+    }
+
+    tcg_temp_free_i64(vra);
+    tcg_temp_free_i64(vrb);
+    tcg_temp_free_i64(vrt);
+
+    return true;
+
+}
+
+TRANS(VMULHSW, do_vx_mulh, true , do_vx_vmulhw_i64)
+TRANS(VMULHSD, do_vx_mulh, true , do_vx_vmulhd_i64)
+TRANS(VMULHUW, do_vx_mulh, false, do_vx_vmulhw_i64)
+TRANS(VMULHUD, do_vx_mulh, false, do_vx_vmulhd_i64)
 
 #undef GEN_VR_LDX
 #undef GEN_VR_STX
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 05/47] target/ppc: Implement vmsumcud instruction
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (3 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 04/47] target/ppc: vmulh* instructions without helpers matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 18:28   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 06/47] target/ppc: Implement vmsumudm instruction matheus.ferst
                   ` (41 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, Víctor Colombo, clg,
	Matheus Ferst, david

From: Víctor Colombo <victor.colombo@eldorado.org.br>

Based on [1] by Lijun Pan <ljp@linux.ibm.com>, which was never merged
into master.

[1]: https://lists.gnu.org/archive/html/qemu-ppc/2020-07/msg00419.html

Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
Changes in v4:

Fixed dead move into tmp1
---
 target/ppc/insn32.decode            |  4 +++
 target/ppc/translate/vmx-impl.c.inc | 53 +++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index d817e44c71..e85a75db2f 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -468,6 +468,10 @@ VMULHSD         000100 ..... ..... ..... 01111001001    @VX
 VMULHUD         000100 ..... ..... ..... 01011001001    @VX
 VMULLD          000100 ..... ..... ..... 00111001001    @VX
 
+## Vector Multiply-Sum Instructions
+
+VMSUMCUD        000100 ..... ..... ..... ..... 010111   @VA
+
 # VSX Load/Store Instructions
 
 LXV             111101 ..... ..... ............ . 001   @DQ_TSX
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index 3951ae124a..e029873ae0 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -2081,6 +2081,59 @@ static bool trans_VPEXTD(DisasContext *ctx, arg_VX *a)
     return true;
 }
 
+static bool trans_VMSUMCUD(DisasContext *ctx, arg_VA *a)
+{
+    TCGv_i64 tmp0, tmp1, prod1h, prod1l, prod0h, prod0l, zero;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VECTOR(ctx);
+
+    tmp0 = tcg_temp_new_i64();
+    tmp1 = tcg_temp_new_i64();
+    prod1h = tcg_temp_new_i64();
+    prod1l = tcg_temp_new_i64();
+    prod0h = tcg_temp_new_i64();
+    prod0l = tcg_temp_new_i64();
+    zero = tcg_constant_i64(0);
+
+    /* prod1 = vsr[vra+32].dw[1] * vsr[vrb+32].dw[1] */
+    get_avr64(tmp0, a->vra, false);
+    get_avr64(tmp1, a->vrb, false);
+    tcg_gen_mulu2_i64(prod1l, prod1h, tmp0, tmp1);
+
+    /* prod0 = vsr[vra+32].dw[0] * vsr[vrb+32].dw[0] */
+    get_avr64(tmp0, a->vra, true);
+    get_avr64(tmp1, a->vrb, true);
+    tcg_gen_mulu2_i64(prod0l, prod0h, tmp0, tmp1);
+
+    /* Sum lower 64-bits elements */
+    get_avr64(tmp1, a->rc, false);
+    tcg_gen_add2_i64(tmp1, tmp0, tmp1, zero, prod1l, zero);
+    tcg_gen_add2_i64(tmp1, tmp0, tmp1, tmp0, prod0l, zero);
+
+    /*
+     * Discard lower 64-bits, leaving the carry into bit 64.
+     * Then sum the higher 64-bit elements.
+     */
+    get_avr64(tmp1, a->rc, true);
+    tcg_gen_add2_i64(tmp1, tmp0, tmp0, zero, tmp1, zero);
+    tcg_gen_add2_i64(tmp1, tmp0, tmp1, tmp0, prod1h, zero);
+    tcg_gen_add2_i64(tmp1, tmp0, tmp1, tmp0, prod0h, zero);
+
+    /* Discard 64 more bits to complete the CHOP128(temp >> 128) */
+    set_avr64(a->vrt, tmp0, false);
+    set_avr64(a->vrt, zero, true);
+
+    tcg_temp_free_i64(tmp0);
+    tcg_temp_free_i64(tmp1);
+    tcg_temp_free_i64(prod1h);
+    tcg_temp_free_i64(prod1l);
+    tcg_temp_free_i64(prod0h);
+    tcg_temp_free_i64(prod0l);
+
+    return true;
+}
+
 static bool do_vx_helper(DisasContext *ctx, arg_VX *a,
                          void (*gen_helper) (TCGv_ptr, TCGv_ptr, TCGv_ptr))
 {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 06/47] target/ppc: Implement vmsumudm instruction
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (4 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 05/47] target/ppc: Implement vmsumcud instruction matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 14:36 ` [PATCH v4 07/47] target/ppc: Move vexts[bhw]2[wd] to decodetree matheus.ferst
                   ` (40 subsequent siblings)
  46 siblings, 0 replies; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, Víctor Colombo, clg,
	Matheus Ferst, david

From: Víctor Colombo <victor.colombo@eldorado.org.br>

Based on [1] by Lijun Pan <ljp@linux.ibm.com>, which was never merged
into master.

[1]: https://lists.gnu.org/archive/html/qemu-ppc/2020-07/msg00419.html

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/insn32.decode            |  1 +
 target/ppc/translate/vmx-impl.c.inc | 34 +++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index e85a75db2f..732a2bb79e 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -471,6 +471,7 @@ VMULLD          000100 ..... ..... ..... 00111001001    @VX
 ## Vector Multiply-Sum Instructions
 
 VMSUMCUD        000100 ..... ..... ..... ..... 010111   @VA
+VMSUMUDM        000100 ..... ..... ..... ..... 100011   @VA
 
 # VSX Load/Store Instructions
 
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index e029873ae0..afe895ab7f 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -2081,6 +2081,40 @@ static bool trans_VPEXTD(DisasContext *ctx, arg_VX *a)
     return true;
 }
 
+static bool trans_VMSUMUDM(DisasContext *ctx, arg_VA *a)
+{
+    TCGv_i64 rl, rh, src1, src2;
+    int dw;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+    REQUIRE_VECTOR(ctx);
+
+    rh = tcg_temp_new_i64();
+    rl = tcg_temp_new_i64();
+    src1 = tcg_temp_new_i64();
+    src2 = tcg_temp_new_i64();
+
+    get_avr64(rl, a->rc, false);
+    get_avr64(rh, a->rc, true);
+
+    for (dw = 0; dw < 2; dw++) {
+        get_avr64(src1, a->vra, dw);
+        get_avr64(src2, a->vrb, dw);
+        tcg_gen_mulu2_i64(src1, src2, src1, src2);
+        tcg_gen_add2_i64(rl, rh, rl, rh, src1, src2);
+    }
+
+    set_avr64(a->vrt, rl, false);
+    set_avr64(a->vrt, rh, true);
+
+    tcg_temp_free_i64(rl);
+    tcg_temp_free_i64(rh);
+    tcg_temp_free_i64(src1);
+    tcg_temp_free_i64(src2);
+
+    return true;
+}
+
 static bool trans_VMSUMCUD(DisasContext *ctx, arg_VA *a)
 {
     TCGv_i64 tmp0, tmp1, prod1h, prod1l, prod0h, prod0l, zero;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 07/47] target/ppc: Move vexts[bhw]2[wd] to decodetree
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (5 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 06/47] target/ppc: Implement vmsumudm instruction matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 18:34   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 08/47] target/ppc: Implement vextsd2q matheus.ferst
                   ` (39 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst,
	Lucas Coutinho, david

From: Lucas Coutinho <lucas.coutinho@eldorado.org.br>

Move the following instructions to decodetree:
vextsb2w: Vector Extend Sign Byte To Word
vextsh2w: Vector Extend Sign Halfword To Word
vextsb2d: Vector Extend Sign Byte To Doubleword
vextsh2d: Vector Extend Sign Halfword To Doubleword
vextsw2d: Vector Extend Sign Word To Doubleword

Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/helper.h                 |  5 ---
 target/ppc/insn32.decode            |  8 ++++
 target/ppc/int_helper.c             | 15 --------
 target/ppc/translate/vmx-impl.c.inc | 60 ++++++++++++++++++++++++++---
 target/ppc/translate/vmx-ops.c.inc  |  5 ---
 5 files changed, 63 insertions(+), 30 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index d0c5a3fef1..6ac72868bb 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -244,11 +244,6 @@ DEF_HELPER_4(VINSBLX, void, env, avr, i64, tl)
 DEF_HELPER_4(VINSHLX, void, env, avr, i64, tl)
 DEF_HELPER_4(VINSWLX, void, env, avr, i64, tl)
 DEF_HELPER_4(VINSDLX, void, env, avr, i64, tl)
-DEF_HELPER_2(vextsb2w, void, avr, avr)
-DEF_HELPER_2(vextsh2w, void, avr, avr)
-DEF_HELPER_2(vextsb2d, void, avr, avr)
-DEF_HELPER_2(vextsh2d, void, avr, avr)
-DEF_HELPER_2(vextsw2d, void, avr, avr)
 DEF_HELPER_2(vnegw, void, avr, avr)
 DEF_HELPER_2(vnegd, void, avr, avr)
 DEF_HELPER_2(vupkhpx, void, avr, avr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 732a2bb79e..1dcf9c61e9 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -419,6 +419,14 @@ VINSWVRX        000100 ..... ..... ..... 00110001111    @VX
 VSLDBI          000100 ..... ..... ..... 00 ... 010110  @VN
 VSRDBI          000100 ..... ..... ..... 01 ... 010110  @VN
 
+## Vector Integer Arithmetic Instructions
+
+VEXTSB2W        000100 ..... 10000 ..... 11000000010    @VX_tb
+VEXTSH2W        000100 ..... 10001 ..... 11000000010    @VX_tb
+VEXTSB2D        000100 ..... 11000 ..... 11000000010    @VX_tb
+VEXTSH2D        000100 ..... 11001 ..... 11000000010    @VX_tb
+VEXTSW2D        000100 ..... 11010 ..... 11000000010    @VX_tb
+
 ## Vector Mask Manipulation Instructions
 
 MTVSRBM         000100 ..... 10000 ..... 11001000010    @VX_tb
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 64c87d9418..ade2b28795 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1646,21 +1646,6 @@ XXBLEND(W, 32)
 XXBLEND(D, 64)
 #undef XXBLEND
 
-#define VEXT_SIGNED(name, element, cast)                            \
-void helper_##name(ppc_avr_t *r, ppc_avr_t *b)                      \
-{                                                                   \
-    int i;                                                          \
-    for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
-        r->element[i] = (cast)b->element[i];                        \
-    }                                                               \
-}
-VEXT_SIGNED(vextsb2w, s32, int8_t)
-VEXT_SIGNED(vextsb2d, s64, int8_t)
-VEXT_SIGNED(vextsh2w, s32, int16_t)
-VEXT_SIGNED(vextsh2d, s64, int16_t)
-VEXT_SIGNED(vextsw2d, s64, int32_t)
-#undef VEXT_SIGNED
-
 #define VNEG(name, element)                                         \
 void helper_##name(ppc_avr_t *r, ppc_avr_t *b)                      \
 {                                                                   \
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index afe895ab7f..522f8ac142 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1772,11 +1772,61 @@ GEN_VXFORM_TRANS(vclzw, 1, 30)
 GEN_VXFORM_TRANS(vclzd, 1, 31)
 GEN_VXFORM_NOA_2(vnegw, 1, 24, 6)
 GEN_VXFORM_NOA_2(vnegd, 1, 24, 7)
-GEN_VXFORM_NOA_2(vextsb2w, 1, 24, 16)
-GEN_VXFORM_NOA_2(vextsh2w, 1, 24, 17)
-GEN_VXFORM_NOA_2(vextsb2d, 1, 24, 24)
-GEN_VXFORM_NOA_2(vextsh2d, 1, 24, 25)
-GEN_VXFORM_NOA_2(vextsw2d, 1, 24, 26)
+
+static void gen_vexts_i64(TCGv_i64 t, TCGv_i64 b, int64_t s)
+{
+    tcg_gen_shli_i64(t, b, s);
+    tcg_gen_sari_i64(t, t, s);
+}
+
+static void gen_vexts_i32(TCGv_i32 t, TCGv_i32 b, int32_t s)
+{
+    tcg_gen_shli_i32(t, b, s);
+    tcg_gen_sari_i32(t, t, s);
+}
+
+static void gen_vexts_vec(unsigned vece, TCGv_vec t, TCGv_vec b, int64_t s)
+{
+    tcg_gen_shli_vec(vece, t, b, s);
+    tcg_gen_sari_vec(vece, t, t, s);
+}
+
+static bool do_vexts(DisasContext *ctx, arg_VX_tb *a, unsigned vece, int64_t s)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_shli_vec, INDEX_op_sari_vec, 0
+    };
+
+    static const GVecGen2i op[2] = {
+        {
+            .fni4 = gen_vexts_i32,
+            .fniv = gen_vexts_vec,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        },
+        {
+            .fni8 = gen_vexts_i64,
+            .fniv = gen_vexts_vec,
+            .opt_opc = vecop_list,
+            .vece = MO_64
+        },
+    };
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+    REQUIRE_VECTOR(ctx);
+
+    tcg_gen_gvec_2i(avr_full_offset(a->vrt), avr_full_offset(a->vrb),
+                    16, 16, s, &op[vece - MO_32]);
+
+    return true;
+}
+
+TRANS(VEXTSB2W, do_vexts, MO_32, 24);
+TRANS(VEXTSH2W, do_vexts, MO_32, 16);
+TRANS(VEXTSB2D, do_vexts, MO_64, 56);
+TRANS(VEXTSH2D, do_vexts, MO_64, 48);
+TRANS(VEXTSW2D, do_vexts, MO_64, 32);
+
 GEN_VXFORM_NOA_2(vctzb, 1, 24, 28)
 GEN_VXFORM_NOA_2(vctzh, 1, 24, 29)
 GEN_VXFORM_NOA_2(vctzw, 1, 24, 30)
diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc
index 914e68e5b0..6787327f56 100644
--- a/target/ppc/translate/vmx-ops.c.inc
+++ b/target/ppc/translate/vmx-ops.c.inc
@@ -216,11 +216,6 @@ GEN_VXFORM(vspltish, 6, 13),
 GEN_VXFORM(vspltisw, 6, 14),
 GEN_VXFORM_300_EO(vnegw, 0x01, 0x18, 0x06),
 GEN_VXFORM_300_EO(vnegd, 0x01, 0x18, 0x07),
-GEN_VXFORM_300_EO(vextsb2w, 0x01, 0x18, 0x10),
-GEN_VXFORM_300_EO(vextsh2w, 0x01, 0x18, 0x11),
-GEN_VXFORM_300_EO(vextsb2d, 0x01, 0x18, 0x18),
-GEN_VXFORM_300_EO(vextsh2d, 0x01, 0x18, 0x19),
-GEN_VXFORM_300_EO(vextsw2d, 0x01, 0x18, 0x1A),
 GEN_VXFORM_300_EO(vctzb, 0x01, 0x18, 0x1C),
 GEN_VXFORM_300_EO(vctzh, 0x01, 0x18, 0x1D),
 GEN_VXFORM_300_EO(vctzw, 0x01, 0x18, 0x1E),
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 08/47] target/ppc: Implement vextsd2q
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (6 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 07/47] target/ppc: Move vexts[bhw]2[wd] to decodetree matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 14:36 ` [PATCH v4 09/47] target/ppc: Move Vector Compare Equal/Not Equal/Greater Than to decodetree matheus.ferst
                   ` (38 subsequent siblings)
  46 siblings, 0 replies; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst,
	Lucas Coutinho, david

From: Lucas Coutinho <lucas.coutinho@eldorado.org.br>

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/insn32.decode            |  1 +
 target/ppc/translate/vmx-impl.c.inc | 18 ++++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 1dcf9c61e9..cba680075b 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -426,6 +426,7 @@ VEXTSH2W        000100 ..... 10001 ..... 11000000010    @VX_tb
 VEXTSB2D        000100 ..... 11000 ..... 11000000010    @VX_tb
 VEXTSH2D        000100 ..... 11001 ..... 11000000010    @VX_tb
 VEXTSW2D        000100 ..... 11010 ..... 11000000010    @VX_tb
+VEXTSD2Q        000100 ..... 11011 ..... 11000000010    @VX_tb
 
 ## Vector Mask Manipulation Instructions
 
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index 522f8ac142..cf69f4c412 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1827,6 +1827,24 @@ TRANS(VEXTSB2D, do_vexts, MO_64, 56);
 TRANS(VEXTSH2D, do_vexts, MO_64, 48);
 TRANS(VEXTSW2D, do_vexts, MO_64, 32);
 
+static bool trans_VEXTSD2Q(DisasContext *ctx, arg_VX_tb *a)
+{
+    TCGv_i64 tmp;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VECTOR(ctx);
+
+    tmp = tcg_temp_new_i64();
+
+    get_avr64(tmp, a->vrb, false);
+    set_avr64(a->vrt, tmp, false);
+    tcg_gen_sari_i64(tmp, tmp, 63);
+    set_avr64(a->vrt, tmp, true);
+
+    tcg_temp_free_i64(tmp);
+    return true;
+}
+
 GEN_VXFORM_NOA_2(vctzb, 1, 24, 28)
 GEN_VXFORM_NOA_2(vctzh, 1, 24, 29)
 GEN_VXFORM_NOA_2(vctzw, 1, 24, 30)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 09/47] target/ppc: Move Vector Compare Equal/Not Equal/Greater Than to decodetree
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (7 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 08/47] target/ppc: Implement vextsd2q matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 18:37   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 10/47] target/ppc: Move Vector Compare Not Equal or Zero " matheus.ferst
                   ` (37 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/helper.h                 | 30 ----------
 target/ppc/insn32.decode            | 24 ++++++++
 target/ppc/int_helper.c             | 54 -----------------
 target/ppc/translate/vmx-impl.c.inc | 89 ++++++++++++++++++++---------
 target/ppc/translate/vmx-ops.c.inc  | 15 +----
 5 files changed, 88 insertions(+), 124 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 6ac72868bb..fb421dd343 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -140,46 +140,16 @@ DEF_HELPER_3(vabsduw, void, avr, avr, avr)
 DEF_HELPER_3(vavgsb, void, avr, avr, avr)
 DEF_HELPER_3(vavgsh, void, avr, avr, avr)
 DEF_HELPER_3(vavgsw, void, avr, avr, avr)
-DEF_HELPER_4(vcmpequb, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpequh, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpequw, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpequd, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpneb, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpneh, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpnew, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpnezb, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpnezh, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpnezw, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpgtub, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpgtuh, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpgtuw, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpgtud, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpgtsb, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpgtsh, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpgtsw, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpgtsd, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpeqfp, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpgefp, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpgtfp, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpbfp, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpequb_dot, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpequh_dot, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpequw_dot, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpequd_dot, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpneb_dot, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpneh_dot, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpnew_dot, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpnezb_dot, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpnezh_dot, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpnezw_dot, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpgtub_dot, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpgtuh_dot, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpgtuw_dot, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpgtud_dot, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpgtsb_dot, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpgtsh_dot, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpgtsw_dot, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpgtsd_dot, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpeqfp_dot, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpgefp_dot, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpgtfp_dot, void, env, avr, avr, avr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index cba680075b..5443ee0394 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -51,6 +51,9 @@
 &VA             vrt vra vrb rc
 @VA             ...... vrt:5 vra:5 vrb:5 rc:5 ......    &VA
 
+&VC             vrt vra vrb rc:bool
+@VC             ...... vrt:5 vra:5 vrb:5 rc:1 ..........        &VC
+
 &VN             vrt vra vrb sh
 @VN             ...... vrt:5 vra:5 vrb:5 .. sh:3 ......         &VN
 
@@ -373,6 +376,27 @@ DSCLIQ          111111 ..... ..... ...... 001000010 .   @Z22_tap_sh_rc
 DSCRI           111011 ..... ..... ...... 001100010 .   @Z22_ta_sh_rc
 DSCRIQ          111111 ..... ..... ...... 001100010 .   @Z22_tap_sh_rc
 
+## Vector Integer Instructions
+
+VCMPEQUB        000100 ..... ..... ..... . 0000000110   @VC
+VCMPEQUH        000100 ..... ..... ..... . 0001000110   @VC
+VCMPEQUW        000100 ..... ..... ..... . 0010000110   @VC
+VCMPEQUD        000100 ..... ..... ..... . 0011000111   @VC
+
+VCMPGTSB        000100 ..... ..... ..... . 1100000110   @VC
+VCMPGTSH        000100 ..... ..... ..... . 1101000110   @VC
+VCMPGTSW        000100 ..... ..... ..... . 1110000110   @VC
+VCMPGTSD        000100 ..... ..... ..... . 1111000111   @VC
+
+VCMPGTUB        000100 ..... ..... ..... . 1000000110   @VC
+VCMPGTUH        000100 ..... ..... ..... . 1001000110   @VC
+VCMPGTUW        000100 ..... ..... ..... . 1010000110   @VC
+VCMPGTUD        000100 ..... ..... ..... . 1011000111   @VC
+
+VCMPNEB         000100 ..... ..... ..... . 0000000111   @VC
+VCMPNEH         000100 ..... ..... ..... . 0001000111   @VC
+VCMPNEW         000100 ..... ..... ..... . 0010000111   @VC
+
 ## Vector Bit Manipulation Instruction
 
 VCFUGED         000100 ..... ..... ..... 10101001101    @VX
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index ade2b28795..c9e64014dc 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -662,57 +662,6 @@ VCF(ux, uint32_to_float32, u32)
 VCF(sx, int32_to_float32, s32)
 #undef VCF
 
-#define VCMP_DO(suffix, compare, element, record)                       \
-    void helper_vcmp##suffix(CPUPPCState *env, ppc_avr_t *r,            \
-                             ppc_avr_t *a, ppc_avr_t *b)                \
-    {                                                                   \
-        uint64_t ones = (uint64_t)-1;                                   \
-        uint64_t all = ones;                                            \
-        uint64_t none = 0;                                              \
-        int i;                                                          \
-                                                                        \
-        for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
-            uint64_t result = (a->element[i] compare b->element[i] ?    \
-                               ones : 0x0);                             \
-            switch (sizeof(a->element[0])) {                            \
-            case 8:                                                     \
-                r->u64[i] = result;                                     \
-                break;                                                  \
-            case 4:                                                     \
-                r->u32[i] = result;                                     \
-                break;                                                  \
-            case 2:                                                     \
-                r->u16[i] = result;                                     \
-                break;                                                  \
-            case 1:                                                     \
-                r->u8[i] = result;                                      \
-                break;                                                  \
-            }                                                           \
-            all &= result;                                              \
-            none |= result;                                             \
-        }                                                               \
-        if (record) {                                                   \
-            env->crf[6] = ((all != 0) << 3) | ((none == 0) << 1);       \
-        }                                                               \
-    }
-#define VCMP(suffix, compare, element)          \
-    VCMP_DO(suffix, compare, element, 0)        \
-    VCMP_DO(suffix##_dot, compare, element, 1)
-VCMP(equb, ==, u8)
-VCMP(equh, ==, u16)
-VCMP(equw, ==, u32)
-VCMP(equd, ==, u64)
-VCMP(gtub, >, u8)
-VCMP(gtuh, >, u16)
-VCMP(gtuw, >, u32)
-VCMP(gtud, >, u64)
-VCMP(gtsb, >, s8)
-VCMP(gtsh, >, s16)
-VCMP(gtsw, >, s32)
-VCMP(gtsd, >, s64)
-#undef VCMP_DO
-#undef VCMP
-
 #define VCMPNE_DO(suffix, element, etype, cmpzero, record)              \
 void helper_vcmpne##suffix(CPUPPCState *env, ppc_avr_t *r,              \
                             ppc_avr_t *a, ppc_avr_t *b)                 \
@@ -751,9 +700,6 @@ void helper_vcmpne##suffix(CPUPPCState *env, ppc_avr_t *r,              \
 VCMPNE(zb, u8, uint8_t, 1)
 VCMPNE(zh, u16, uint16_t, 1)
 VCMPNE(zw, u32, uint32_t, 1)
-VCMPNE(b, u8, uint8_t, 0)
-VCMPNE(h, u16, uint16_t, 0)
-VCMPNE(w, u32, uint32_t, 0)
 #undef VCMPNE_DO
 #undef VCMPNE
 
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index cf69f4c412..e007003f14 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -985,41 +985,74 @@ static void glue(gen_, name0##_##name1)(DisasContext *ctx)             \
     }                                                                  \
 }
 
-GEN_VXRFORM(vcmpequb, 3, 0)
-GEN_VXRFORM(vcmpequh, 3, 1)
-GEN_VXRFORM(vcmpequw, 3, 2)
-GEN_VXRFORM(vcmpequd, 3, 3)
 GEN_VXRFORM(vcmpnezb, 3, 4)
 GEN_VXRFORM(vcmpnezh, 3, 5)
 GEN_VXRFORM(vcmpnezw, 3, 6)
-GEN_VXRFORM(vcmpgtsb, 3, 12)
-GEN_VXRFORM(vcmpgtsh, 3, 13)
-GEN_VXRFORM(vcmpgtsw, 3, 14)
-GEN_VXRFORM(vcmpgtsd, 3, 15)
-GEN_VXRFORM(vcmpgtub, 3, 8)
-GEN_VXRFORM(vcmpgtuh, 3, 9)
-GEN_VXRFORM(vcmpgtuw, 3, 10)
-GEN_VXRFORM(vcmpgtud, 3, 11)
+
+static void do_vcmp_rc(int vrt)
+{
+    TCGv_i64 tmp, set, clr;
+
+    tmp = tcg_temp_new_i64();
+    set = tcg_temp_new_i64();
+    clr = tcg_temp_new_i64();
+
+    get_avr64(tmp, vrt, true);
+    tcg_gen_mov_i64(set, tmp);
+    get_avr64(tmp, vrt, false);
+    tcg_gen_or_i64(clr, set, tmp);
+    tcg_gen_and_i64(set, set, tmp);
+
+    tcg_gen_setcondi_i64(TCG_COND_EQ, clr, clr, 0);
+    tcg_gen_shli_i64(clr, clr, 1);
+
+    tcg_gen_setcondi_i64(TCG_COND_EQ, set, set, -1);
+    tcg_gen_shli_i64(set, set, 3);
+
+    tcg_gen_or_i64(tmp, set, clr);
+    tcg_gen_extrl_i64_i32(cpu_crf[6], tmp);
+
+    tcg_temp_free_i64(tmp);
+    tcg_temp_free_i64(set);
+    tcg_temp_free_i64(clr);
+}
+
+static bool do_vcmp(DisasContext *ctx, arg_VC *a, TCGCond cond, int vece)
+{
+    REQUIRE_VECTOR(ctx);
+
+    tcg_gen_gvec_cmp(cond, vece, avr_full_offset(a->vrt),
+                     avr_full_offset(a->vra), avr_full_offset(a->vrb), 16, 16);
+
+    if (a->rc) {
+        do_vcmp_rc(a->vrt);
+    }
+
+    return true;
+}
+
+TRANS_FLAGS(ALTIVEC, VCMPEQUB, do_vcmp, TCG_COND_EQ, MO_8)
+TRANS_FLAGS(ALTIVEC, VCMPEQUH, do_vcmp, TCG_COND_EQ, MO_16)
+TRANS_FLAGS(ALTIVEC, VCMPEQUW, do_vcmp, TCG_COND_EQ, MO_32)
+TRANS_FLAGS2(ALTIVEC_207, VCMPEQUD, do_vcmp, TCG_COND_EQ, MO_64)
+
+TRANS_FLAGS(ALTIVEC, VCMPGTSB, do_vcmp, TCG_COND_GT, MO_8)
+TRANS_FLAGS(ALTIVEC, VCMPGTSH, do_vcmp, TCG_COND_GT, MO_16)
+TRANS_FLAGS(ALTIVEC, VCMPGTSW, do_vcmp, TCG_COND_GT, MO_32)
+TRANS_FLAGS2(ALTIVEC_207, VCMPGTSD, do_vcmp, TCG_COND_GT, MO_64)
+TRANS_FLAGS(ALTIVEC, VCMPGTUB, do_vcmp, TCG_COND_GTU, MO_8)
+TRANS_FLAGS(ALTIVEC, VCMPGTUH, do_vcmp, TCG_COND_GTU, MO_16)
+TRANS_FLAGS(ALTIVEC, VCMPGTUW, do_vcmp, TCG_COND_GTU, MO_32)
+TRANS_FLAGS2(ALTIVEC_207, VCMPGTUD, do_vcmp, TCG_COND_GTU, MO_64)
+
+TRANS_FLAGS2(ISA300, VCMPNEB, do_vcmp, TCG_COND_NE, MO_8)
+TRANS_FLAGS2(ISA300, VCMPNEH, do_vcmp, TCG_COND_NE, MO_16)
+TRANS_FLAGS2(ISA300, VCMPNEW, do_vcmp, TCG_COND_NE, MO_32)
+
 GEN_VXRFORM(vcmpeqfp, 3, 3)
 GEN_VXRFORM(vcmpgefp, 3, 7)
 GEN_VXRFORM(vcmpgtfp, 3, 11)
 GEN_VXRFORM(vcmpbfp, 3, 15)
-GEN_VXRFORM(vcmpneb, 3, 0)
-GEN_VXRFORM(vcmpneh, 3, 1)
-GEN_VXRFORM(vcmpnew, 3, 2)
-
-GEN_VXRFORM_DUAL(vcmpequb, PPC_ALTIVEC, PPC_NONE, \
-                 vcmpneb, PPC_NONE, PPC2_ISA300)
-GEN_VXRFORM_DUAL(vcmpequh, PPC_ALTIVEC, PPC_NONE, \
-                 vcmpneh, PPC_NONE, PPC2_ISA300)
-GEN_VXRFORM_DUAL(vcmpequw, PPC_ALTIVEC, PPC_NONE, \
-                 vcmpnew, PPC_NONE, PPC2_ISA300)
-GEN_VXRFORM_DUAL(vcmpeqfp, PPC_ALTIVEC, PPC_NONE, \
-                 vcmpequd, PPC_NONE, PPC2_ALTIVEC_207)
-GEN_VXRFORM_DUAL(vcmpbfp, PPC_ALTIVEC, PPC_NONE, \
-                 vcmpgtsd, PPC_NONE, PPC2_ALTIVEC_207)
-GEN_VXRFORM_DUAL(vcmpgtfp, PPC_ALTIVEC, PPC_NONE, \
-                 vcmpgtud, PPC_NONE, PPC2_ALTIVEC_207)
 
 static void gen_vsplti(DisasContext *ctx, int vece)
 {
diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc
index 6787327f56..80d460c34e 100644
--- a/target/ppc/translate/vmx-ops.c.inc
+++ b/target/ppc/translate/vmx-ops.c.inc
@@ -187,19 +187,10 @@ GEN_HANDLER2_E(name, str, 0x4, opc2, opc3, 0x00000000, PPC_NONE, PPC2_ISA300),
 GEN_VXRFORM_300(vcmpnezb, 3, 4)
 GEN_VXRFORM_300(vcmpnezh, 3, 5)
 GEN_VXRFORM_300(vcmpnezw, 3, 6)
-GEN_VXRFORM(vcmpgtsb, 3, 12)
-GEN_VXRFORM(vcmpgtsh, 3, 13)
-GEN_VXRFORM(vcmpgtsw, 3, 14)
-GEN_VXRFORM(vcmpgtub, 3, 8)
-GEN_VXRFORM(vcmpgtuh, 3, 9)
-GEN_VXRFORM(vcmpgtuw, 3, 10)
-GEN_VXRFORM_DUAL(vcmpeqfp, vcmpequd, 3, 3, PPC_ALTIVEC, PPC_NONE)
+GEN_VXRFORM(vcmpeqfp, 3, 3)
 GEN_VXRFORM(vcmpgefp, 3, 7)
-GEN_VXRFORM_DUAL(vcmpgtfp, vcmpgtud, 3, 11, PPC_ALTIVEC, PPC_NONE)
-GEN_VXRFORM_DUAL(vcmpbfp, vcmpgtsd, 3, 15, PPC_ALTIVEC, PPC_NONE)
-GEN_VXRFORM_DUAL(vcmpequb, vcmpneb, 3, 0, PPC_ALTIVEC, PPC_NONE)
-GEN_VXRFORM_DUAL(vcmpequh, vcmpneh, 3, 1, PPC_ALTIVEC, PPC_NONE)
-GEN_VXRFORM_DUAL(vcmpequw, vcmpnew, 3, 2, PPC_ALTIVEC, PPC_NONE)
+GEN_VXRFORM(vcmpgtfp, 3, 11)
+GEN_VXRFORM(vcmpbfp, 3, 15)
 
 #define GEN_VXFORM_DUAL_INV(name0, name1, opc2, opc3, inval0, inval1, type) \
 GEN_OPCODE_DUAL(name0##_##name1, 0x04, opc2, opc3, inval0, inval1, type, \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 10/47] target/ppc: Move Vector Compare Not Equal or Zero to decodetree
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (8 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 09/47] target/ppc: Move Vector Compare Equal/Not Equal/Greater Than to decodetree matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 19:04   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 11/47] target/ppc: Implement Vector Compare Equal Quadword matheus.ferst
                   ` (36 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/helper.h                 |  9 ++--
 target/ppc/insn32.decode            |  4 ++
 target/ppc/int_helper.c             | 50 +++++-----------------
 target/ppc/translate/vmx-impl.c.inc | 66 +++++++++++++++++++++++++++--
 target/ppc/translate/vmx-ops.c.inc  |  3 --
 5 files changed, 80 insertions(+), 52 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index fb421dd343..303a29fb5a 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -140,16 +140,13 @@ DEF_HELPER_3(vabsduw, void, avr, avr, avr)
 DEF_HELPER_3(vavgsb, void, avr, avr, avr)
 DEF_HELPER_3(vavgsh, void, avr, avr, avr)
 DEF_HELPER_3(vavgsw, void, avr, avr, avr)
-DEF_HELPER_4(vcmpnezb, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpnezh, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpnezw, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpeqfp, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpgefp, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpgtfp, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpbfp, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpnezb_dot, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpnezh_dot, void, env, avr, avr, avr)
-DEF_HELPER_4(vcmpnezw_dot, void, env, avr, avr, avr)
+DEF_HELPER_4(VCMPNEZB, void, avr, avr, avr, i32)
+DEF_HELPER_4(VCMPNEZH, void, avr, avr, avr, i32)
+DEF_HELPER_4(VCMPNEZW, void, avr, avr, avr, i32)
 DEF_HELPER_4(vcmpeqfp_dot, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpgefp_dot, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpgtfp_dot, void, env, avr, avr, avr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 5443ee0394..be9e05cc73 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -397,6 +397,10 @@ VCMPNEB         000100 ..... ..... ..... . 0000000111   @VC
 VCMPNEH         000100 ..... ..... ..... . 0001000111   @VC
 VCMPNEW         000100 ..... ..... ..... . 0010000111   @VC
 
+VCMPNEZB        000100 ..... ..... ..... . 0100000111   @VC
+VCMPNEZH        000100 ..... ..... ..... . 0101000111   @VC
+VCMPNEZW        000100 ..... ..... ..... . 0110000111   @VC
+
 ## Vector Bit Manipulation Instruction
 
 VCFUGED         000100 ..... ..... ..... 10101001101    @VX
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index c9e64014dc..fce782499f 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -662,46 +662,18 @@ VCF(ux, uint32_to_float32, u32)
 VCF(sx, int32_to_float32, s32)
 #undef VCF
 
-#define VCMPNE_DO(suffix, element, etype, cmpzero, record)              \
-void helper_vcmpne##suffix(CPUPPCState *env, ppc_avr_t *r,              \
-                            ppc_avr_t *a, ppc_avr_t *b)                 \
-{                                                                       \
-    etype ones = (etype)-1;                                             \
-    etype all = ones;                                                   \
-    etype result, none = 0;                                             \
-    int i;                                                              \
-                                                                        \
-    for (i = 0; i < ARRAY_SIZE(r->element); i++) {                      \
-        if (cmpzero) {                                                  \
-            result = ((a->element[i] == 0)                              \
-                           || (b->element[i] == 0)                      \
-                           || (a->element[i] != b->element[i]) ?        \
-                           ones : 0x0);                                 \
-        } else {                                                        \
-            result = (a->element[i] != b->element[i]) ? ones : 0x0;     \
-        }                                                               \
-        r->element[i] = result;                                         \
-        all &= result;                                                  \
-        none |= result;                                                 \
-    }                                                                   \
-    if (record) {                                                       \
-        env->crf[6] = ((all != 0) << 3) | ((none == 0) << 1);           \
-    }                                                                   \
+#define VCMPNEZ(NAME, ELEM) \
+void helper_##NAME(ppc_vsr_t *t, ppc_vsr_t *a, ppc_vsr_t *b, uint32_t desc) \
+{                                                                           \
+    for (int i = 0; i < ARRAY_SIZE(t->ELEM); i++) {                         \
+        t->ELEM[i] = ((a->ELEM[i] == 0) || (b->ELEM[i] == 0) ||             \
+                      (a->ELEM[i] != b->ELEM[i])) ? -1 : 0;                 \
+    }                                                                       \
 }
-
-/*
- * VCMPNEZ - Vector compare not equal to zero
- *   suffix  - instruction mnemonic suffix (b: byte, h: halfword, w: word)
- *   element - element type to access from vector
- */
-#define VCMPNE(suffix, element, etype, cmpzero)         \
-    VCMPNE_DO(suffix, element, etype, cmpzero, 0)       \
-    VCMPNE_DO(suffix##_dot, element, etype, cmpzero, 1)
-VCMPNE(zb, u8, uint8_t, 1)
-VCMPNE(zh, u16, uint16_t, 1)
-VCMPNE(zw, u32, uint32_t, 1)
-#undef VCMPNE_DO
-#undef VCMPNE
+VCMPNEZ(VCMPNEZB, u8)
+VCMPNEZ(VCMPNEZH, u16)
+VCMPNEZ(VCMPNEZW, u32)
+#undef VCMPNEZ
 
 #define VCMPFP_DO(suffix, compare, order, record)                       \
     void helper_vcmp##suffix(CPUPPCState *env, ppc_avr_t *r,            \
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index e007003f14..d7f807b81d 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -985,10 +985,6 @@ static void glue(gen_, name0##_##name1)(DisasContext *ctx)             \
     }                                                                  \
 }
 
-GEN_VXRFORM(vcmpnezb, 3, 4)
-GEN_VXRFORM(vcmpnezh, 3, 5)
-GEN_VXRFORM(vcmpnezw, 3, 6)
-
 static void do_vcmp_rc(int vrt)
 {
     TCGv_i64 tmp, set, clr;
@@ -1049,6 +1045,68 @@ TRANS_FLAGS2(ISA300, VCMPNEB, do_vcmp, TCG_COND_NE, MO_8)
 TRANS_FLAGS2(ISA300, VCMPNEH, do_vcmp, TCG_COND_NE, MO_16)
 TRANS_FLAGS2(ISA300, VCMPNEW, do_vcmp, TCG_COND_NE, MO_32)
 
+static void gen_vcmpnez_vec(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t0, t1, zero;
+
+    t0 = tcg_temp_new_vec_matching(t);
+    t1 = tcg_temp_new_vec_matching(t);
+    zero = tcg_constant_vec_matching(t, vece, 0);
+
+    tcg_gen_cmp_vec(TCG_COND_EQ, vece, t0, a, zero);
+    tcg_gen_cmp_vec(TCG_COND_EQ, vece, t1, b, zero);
+    tcg_gen_cmp_vec(TCG_COND_NE, vece, t, a, b);
+
+    tcg_gen_or_vec(vece, t, t, t0);
+    tcg_gen_or_vec(vece, t, t, t1);
+
+    tcg_temp_free_vec(t0);
+    tcg_temp_free_vec(t1);
+}
+
+static bool do_vcmpnez(DisasContext *ctx, arg_VC *a, int vece)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_cmp_vec, 0
+    };
+    static const GVecGen3 ops[3] = {
+        {
+            .fniv = gen_vcmpnez_vec,
+            .fno = gen_helper_VCMPNEZB,
+            .opt_opc = vecop_list,
+            .vece = MO_8
+        },
+        {
+            .fniv = gen_vcmpnez_vec,
+            .fno = gen_helper_VCMPNEZH,
+            .opt_opc = vecop_list,
+            .vece = MO_16
+        },
+        {
+            .fniv = gen_vcmpnez_vec,
+            .fno = gen_helper_VCMPNEZW,
+            .opt_opc = vecop_list,
+            .vece = MO_32
+        }
+    };
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+    REQUIRE_VECTOR(ctx);
+
+    tcg_gen_gvec_3(avr_full_offset(a->vrt), avr_full_offset(a->vra),
+                   avr_full_offset(a->vrb), 16, 16, &ops[vece]);
+
+    if (a->rc) {
+        do_vcmp_rc(a->vrt);
+    }
+
+    return true;
+}
+
+TRANS(VCMPNEZB, do_vcmpnez, MO_8)
+TRANS(VCMPNEZH, do_vcmpnez, MO_16)
+TRANS(VCMPNEZW, do_vcmpnez, MO_32)
+
 GEN_VXRFORM(vcmpeqfp, 3, 3)
 GEN_VXRFORM(vcmpgefp, 3, 7)
 GEN_VXRFORM(vcmpgtfp, 3, 11)
diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc
index 80d460c34e..cb4c5bb953 100644
--- a/target/ppc/translate/vmx-ops.c.inc
+++ b/target/ppc/translate/vmx-ops.c.inc
@@ -184,9 +184,6 @@ GEN_HANDLER2_E(name, str, 0x4, opc2, opc3, 0x00000000, PPC_NONE, PPC2_ISA300),
     GEN_VXRFORM1_300(name, name, #name, opc2, opc3)                         \
     GEN_VXRFORM1_300(name##_dot, name##_, #name ".", opc2, (opc3 | (0x1 << 4)))
 
-GEN_VXRFORM_300(vcmpnezb, 3, 4)
-GEN_VXRFORM_300(vcmpnezh, 3, 5)
-GEN_VXRFORM_300(vcmpnezw, 3, 6)
 GEN_VXRFORM(vcmpeqfp, 3, 3)
 GEN_VXRFORM(vcmpgefp, 3, 7)
 GEN_VXRFORM(vcmpgtfp, 3, 11)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 11/47] target/ppc: Implement Vector Compare Equal Quadword
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (9 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 10/47] target/ppc: Move Vector Compare Not Equal or Zero " matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 19:05   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 12/47] target/ppc: Implement Vector Compare Greater Than Quadword matheus.ferst
                   ` (35 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Implement the following PowerISA v3.1 instructions:
vcmpequq: Vector Compare Equal Quadword

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
v4:
 - Branchless implementation (rth)
---
 target/ppc/insn32.decode            |  1 +
 target/ppc/translate/vmx-impl.c.inc | 36 +++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index be9e05cc73..437a3e29e0 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -382,6 +382,7 @@ VCMPEQUB        000100 ..... ..... ..... . 0000000110   @VC
 VCMPEQUH        000100 ..... ..... ..... . 0001000110   @VC
 VCMPEQUW        000100 ..... ..... ..... . 0010000110   @VC
 VCMPEQUD        000100 ..... ..... ..... . 0011000111   @VC
+VCMPEQUQ        000100 ..... ..... ..... . 0111000111   @VC
 
 VCMPGTSB        000100 ..... ..... ..... . 1100000110   @VC
 VCMPGTSH        000100 ..... ..... ..... . 1101000110   @VC
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index d7f807b81d..d66a642b67 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1107,6 +1107,42 @@ TRANS(VCMPNEZB, do_vcmpnez, MO_8)
 TRANS(VCMPNEZH, do_vcmpnez, MO_16)
 TRANS(VCMPNEZW, do_vcmpnez, MO_32)
 
+static bool trans_VCMPEQUQ(DisasContext *ctx, arg_VC *a)
+{
+    TCGv_i64 t0, t1, t2;
+
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+
+    get_avr64(t0, a->vra, true);
+    get_avr64(t1, a->vrb, true);
+    tcg_gen_xor_i64(t2, t0, t1);
+
+    get_avr64(t0, a->vra, false);
+    get_avr64(t1, a->vrb, false);
+    tcg_gen_xor_i64(t1, t0, t1);
+
+    tcg_gen_or_i64(t1, t1, t2);
+    tcg_gen_setcondi_i64(TCG_COND_EQ, t1, t1, 0);
+    tcg_gen_neg_i64(t1, t1);
+
+    set_avr64(a->vrt, t1, true);
+    set_avr64(a->vrt, t1, false);
+
+    if (a->rc) {
+        tcg_gen_extrl_i64_i32(cpu_crf[6], t1);
+        tcg_gen_andi_i32(cpu_crf[6], cpu_crf[6], 0xa);
+        tcg_gen_xori_i32(cpu_crf[6], cpu_crf[6], 0x2);
+    }
+
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+
+    return true;
+}
+
 GEN_VXRFORM(vcmpeqfp, 3, 3)
 GEN_VXRFORM(vcmpgefp, 3, 7)
 GEN_VXRFORM(vcmpgtfp, 3, 11)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 12/47] target/ppc: Implement Vector Compare Greater Than Quadword
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (10 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 11/47] target/ppc: Implement Vector Compare Equal Quadword matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 19:07   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 13/47] target/ppc: Implement Vector Compare Quadword matheus.ferst
                   ` (34 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Implement the following PowerISA v3.1 instructions:
vcmpgtsq: Vector Compare Greater Than Signed Quadword
vcmpgtuq: Vector Compare Greater Than Unsigned Quadword

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
v4:
 - Branchless implementation (rth)
---
 target/ppc/insn32.decode            |  2 ++
 target/ppc/translate/vmx-impl.c.inc | 39 +++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 437a3e29e0..07a4ef9103 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -388,11 +388,13 @@ VCMPGTSB        000100 ..... ..... ..... . 1100000110   @VC
 VCMPGTSH        000100 ..... ..... ..... . 1101000110   @VC
 VCMPGTSW        000100 ..... ..... ..... . 1110000110   @VC
 VCMPGTSD        000100 ..... ..... ..... . 1111000111   @VC
+VCMPGTSQ        000100 ..... ..... ..... . 1110000111   @VC
 
 VCMPGTUB        000100 ..... ..... ..... . 1000000110   @VC
 VCMPGTUH        000100 ..... ..... ..... . 1001000110   @VC
 VCMPGTUW        000100 ..... ..... ..... . 1010000110   @VC
 VCMPGTUD        000100 ..... ..... ..... . 1011000111   @VC
+VCMPGTUQ        000100 ..... ..... ..... . 1010000111   @VC
 
 VCMPNEB         000100 ..... ..... ..... . 0000000111   @VC
 VCMPNEH         000100 ..... ..... ..... . 0001000111   @VC
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index d66a642b67..4a76e370fc 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1143,6 +1143,45 @@ static bool trans_VCMPEQUQ(DisasContext *ctx, arg_VC *a)
     return true;
 }
 
+static bool do_vcmpgtq(DisasContext *ctx, arg_VC *a, bool sign)
+{
+    TCGv_i64 t0, t1, t2;
+
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+
+    get_avr64(t0, a->vra, false);
+    get_avr64(t1, a->vrb, false);
+    tcg_gen_setcond_i64(TCG_COND_GTU, t2, t0, t1);
+
+    get_avr64(t0, a->vra, true);
+    get_avr64(t1, a->vrb, true);
+    tcg_gen_movcond_i64(TCG_COND_EQ, t2, t0, t1, t2, tcg_constant_i64(0));
+    tcg_gen_setcond_i64(sign ? TCG_COND_GT : TCG_COND_GTU, t1, t0, t1);
+
+    tcg_gen_or_i64(t1, t1, t2);
+    tcg_gen_neg_i64(t1, t1);
+
+    set_avr64(a->vrt, t1, true);
+    set_avr64(a->vrt, t1, false);
+
+    if (a->rc) {
+        tcg_gen_extrl_i64_i32(cpu_crf[6], t1);
+        tcg_gen_andi_i32(cpu_crf[6], cpu_crf[6], 0xa);
+        tcg_gen_xori_i32(cpu_crf[6], cpu_crf[6], 0x2);
+    }
+
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+
+    return true;
+}
+
+TRANS(VCMPGTSQ, do_vcmpgtq, true)
+TRANS(VCMPGTUQ, do_vcmpgtq, false)
+
 GEN_VXRFORM(vcmpeqfp, 3, 3)
 GEN_VXRFORM(vcmpgefp, 3, 7)
 GEN_VXRFORM(vcmpgtfp, 3, 11)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 13/47] target/ppc: Implement Vector Compare Quadword
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (11 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 12/47] target/ppc: Implement Vector Compare Greater Than Quadword matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 14:36 ` [PATCH v4 14/47] target/ppc: implement vstri[bh][lr] matheus.ferst
                   ` (33 subsequent siblings)
  46 siblings, 0 replies; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Implement the following PowerISA v3.1 instructions:
vcmpsq: Vector Compare Signed Quadword
vcmpuq: Vector Compare Unsigned Quadword

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/insn32.decode            |  6 ++++
 target/ppc/translate/vmx-impl.c.inc | 45 +++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 07a4ef9103..f0cb6602e2 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -60,6 +60,9 @@
 &VX             vrt vra vrb
 @VX             ...... vrt:5 vra:5 vrb:5 .......... .   &VX
 
+&VX_bf          bf vra vrb
+@VX_bf          ...... bf:3 .. vra:5 vrb:5 ...........          &VX_bf
+
 &VX_uim4        vrt uim vrb
 @VX_uim4        ...... vrt:5 . uim:4 vrb:5 ...........  &VX_uim4
 
@@ -404,6 +407,9 @@ VCMPNEZB        000100 ..... ..... ..... . 0100000111   @VC
 VCMPNEZH        000100 ..... ..... ..... . 0101000111   @VC
 VCMPNEZW        000100 ..... ..... ..... . 0110000111   @VC
 
+VCMPSQ          000100 ... -- ..... ..... 00101000001   @VX_bf
+VCMPUQ          000100 ... -- ..... ..... 00100000001   @VX_bf
+
 ## Vector Bit Manipulation Instruction
 
 VCFUGED         000100 ..... ..... ..... 10101001101    @VX
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index 4a76e370fc..335bef56ff 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1182,6 +1182,51 @@ static bool do_vcmpgtq(DisasContext *ctx, arg_VC *a, bool sign)
 TRANS(VCMPGTSQ, do_vcmpgtq, true)
 TRANS(VCMPGTUQ, do_vcmpgtq, false)
 
+static bool do_vcmpq(DisasContext *ctx, arg_VX_bf *a, bool sign)
+{
+    TCGv_i64 vra, vrb;
+    TCGLabel *gt, *lt, *done;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VECTOR(ctx);
+
+    vra = tcg_temp_local_new_i64();
+    vrb = tcg_temp_local_new_i64();
+    gt = gen_new_label();
+    lt = gen_new_label();
+    done = gen_new_label();
+
+    get_avr64(vra, a->vra, true);
+    get_avr64(vrb, a->vrb, true);
+    tcg_gen_brcond_i64((sign ? TCG_COND_GT : TCG_COND_GTU), vra, vrb, gt);
+    tcg_gen_brcond_i64((sign ? TCG_COND_LT : TCG_COND_LTU), vra, vrb, lt);
+
+    get_avr64(vra, a->vra, false);
+    get_avr64(vrb, a->vrb, false);
+    tcg_gen_brcond_i64(TCG_COND_GTU, vra, vrb, gt);
+    tcg_gen_brcond_i64(TCG_COND_LTU, vra, vrb, lt);
+
+    tcg_gen_movi_i32(cpu_crf[a->bf], CRF_EQ);
+    tcg_gen_br(done);
+
+    gen_set_label(gt);
+    tcg_gen_movi_i32(cpu_crf[a->bf], CRF_GT);
+    tcg_gen_br(done);
+
+    gen_set_label(lt);
+    tcg_gen_movi_i32(cpu_crf[a->bf], CRF_LT);
+    tcg_gen_br(done);
+
+    gen_set_label(done);
+    tcg_temp_free_i64(vra);
+    tcg_temp_free_i64(vrb);
+
+    return true;
+}
+
+TRANS(VCMPSQ, do_vcmpq, true)
+TRANS(VCMPUQ, do_vcmpq, false)
+
 GEN_VXRFORM(vcmpeqfp, 3, 3)
 GEN_VXRFORM(vcmpgefp, 3, 7)
 GEN_VXRFORM(vcmpgtfp, 3, 11)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 14/47] target/ppc: implement vstri[bh][lr]
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (12 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 13/47] target/ppc: Implement Vector Compare Quadword matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 19:13   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 15/47] target/ppc: implement vclrlb matheus.ferst
                   ` (32 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
v4:
 - vstri helpers return CR field (rth)
---
 target/ppc/helper.h                 |  4 ++++
 target/ppc/insn32.decode            | 10 ++++++++++
 target/ppc/int_helper.c             | 28 +++++++++++++++++++++++++++
 target/ppc/translate/vmx-impl.c.inc | 30 +++++++++++++++++++++++++++++
 4 files changed, 72 insertions(+)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 303a29fb5a..269150b197 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -211,6 +211,10 @@ DEF_HELPER_4(VINSBLX, void, env, avr, i64, tl)
 DEF_HELPER_4(VINSHLX, void, env, avr, i64, tl)
 DEF_HELPER_4(VINSWLX, void, env, avr, i64, tl)
 DEF_HELPER_4(VINSDLX, void, env, avr, i64, tl)
+DEF_HELPER_2(VSTRIBL, i32, avr, avr)
+DEF_HELPER_2(VSTRIBR, i32, avr, avr)
+DEF_HELPER_2(VSTRIHL, i32, avr, avr)
+DEF_HELPER_2(VSTRIHR, i32, avr, avr)
 DEF_HELPER_2(vnegw, void, avr, avr)
 DEF_HELPER_2(vnegd, void, avr, avr)
 DEF_HELPER_2(vupkhpx, void, avr, avr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index f0cb6602e2..d844d86829 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -63,6 +63,9 @@
 &VX_bf          bf vra vrb
 @VX_bf          ...... bf:3 .. vra:5 vrb:5 ...........          &VX_bf
 
+&VX_tb_rc       vrt vrb rc:bool
+@VX_tb_rc       ...... vrt:5 ..... vrb:5 rc:1 ..........        &VX_tb_rc
+
 &VX_uim4        vrt uim vrb
 @VX_uim4        ...... vrt:5 . uim:4 vrb:5 ...........  &VX_uim4
 
@@ -519,6 +522,13 @@ VMULLD          000100 ..... ..... ..... 00111001001    @VX
 VMSUMCUD        000100 ..... ..... ..... ..... 010111   @VA
 VMSUMUDM        000100 ..... ..... ..... ..... 100011   @VA
 
+## Vector String Instructions
+
+VSTRIBL         000100 ..... 00000 ..... . 0000001101   @VX_tb_rc
+VSTRIBR         000100 ..... 00001 ..... . 0000001101   @VX_tb_rc
+VSTRIHL         000100 ..... 00010 ..... . 0000001101   @VX_tb_rc
+VSTRIHR         000100 ..... 00011 ..... . 0000001101   @VX_tb_rc
+
 # VSX Load/Store Instructions
 
 LXV             111101 ..... ..... ............ . 001   @DQ_TSX
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index fce782499f..0a094b535a 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1518,6 +1518,34 @@ VEXTRACT(uw, u32)
 VEXTRACT(d, u64)
 #undef VEXTRACT
 
+#define VSTRI(NAME, ELEM, NUM_ELEMS, LEFT) \
+uint32_t helper_##NAME(ppc_avr_t *t, ppc_avr_t *b) \
+{                                                   \
+    int i, idx, crf = 0;                            \
+                                                    \
+    for (i = 0; i < NUM_ELEMS; i++) {               \
+        idx = LEFT ? i : NUM_ELEMS - i - 1;         \
+        if (b->Vsr##ELEM(idx)) {                    \
+            t->Vsr##ELEM(idx) = b->Vsr##ELEM(idx);  \
+        } else {                                    \
+            crf = 0b0010;                           \
+            break;                                  \
+        }                                           \
+    }                                               \
+                                                    \
+    for (; i < NUM_ELEMS; i++) {                    \
+        idx = LEFT ? i : NUM_ELEMS - i - 1;         \
+        t->Vsr##ELEM(idx) = 0;                      \
+    }                                               \
+                                                    \
+    return crf;                                     \
+}
+VSTRI(VSTRIBL, B, 16, true)
+VSTRI(VSTRIBR, B, 16, false)
+VSTRI(VSTRIHL, H, 8, true)
+VSTRI(VSTRIHR, H, 8, false)
+#undef VSTRI
+
 void helper_xxextractuw(CPUPPCState *env, ppc_vsr_t *xt,
                         ppc_vsr_t *xb, uint32_t index)
 {
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index 335bef56ff..1a69931d36 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1910,6 +1910,36 @@ static bool trans_MTVSRBMI(DisasContext *ctx, arg_DX_b *a)
     return true;
 }
 
+static bool do_vstri(DisasContext *ctx, arg_VX_tb_rc *a,
+                     void (*gen_helper)(TCGv_i32, TCGv_ptr, TCGv_ptr))
+{
+    TCGv_ptr vrt, vrb;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VECTOR(ctx);
+
+    vrt = gen_avr_ptr(a->vrt);
+    vrb = gen_avr_ptr(a->vrb);
+
+    if (a->rc) {
+        gen_helper(cpu_crf[6], vrt, vrb);
+    } else {
+        TCGv_i32 discard = tcg_temp_new_i32();
+        gen_helper(discard, vrt, vrb);
+        tcg_temp_free_i32(discard);
+    }
+
+    tcg_temp_free_ptr(vrt);
+    tcg_temp_free_ptr(vrb);
+
+    return true;
+}
+
+TRANS(VSTRIBL, do_vstri, gen_helper_VSTRIBL)
+TRANS(VSTRIBR, do_vstri, gen_helper_VSTRIBR)
+TRANS(VSTRIHL, do_vstri, gen_helper_VSTRIHL)
+TRANS(VSTRIHR, do_vstri, gen_helper_VSTRIHR)
+
 #define GEN_VAFORM_PAIRED(name0, name1, opc2)                           \
 static void glue(gen_, name0##_##name1)(DisasContext *ctx)              \
     {                                                                   \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 15/47] target/ppc: implement vclrlb
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (13 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 14/47] target/ppc: implement vstri[bh][lr] matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 19:15   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 16/47] target/ppc: implement vclrrb matheus.ferst
                   ` (31 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
v4:
 - Branchless implementation (rth)
---
 target/ppc/insn32.decode            |  2 ++
 target/ppc/translate/vmx-impl.c.inc | 40 +++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index d844d86829..31cdbba86b 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -529,6 +529,8 @@ VSTRIBR         000100 ..... 00001 ..... . 0000001101   @VX_tb_rc
 VSTRIHL         000100 ..... 00010 ..... . 0000001101   @VX_tb_rc
 VSTRIHR         000100 ..... 00011 ..... . 0000001101   @VX_tb_rc
 
+VCLRLB          000100 ..... ..... ..... 00110001101    @VX
+
 # VSX Load/Store Instructions
 
 LXV             111101 ..... ..... ............ . 001   @DQ_TSX
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index 1a69931d36..8f12d78071 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1940,6 +1940,46 @@ TRANS(VSTRIBR, do_vstri, gen_helper_VSTRIBR)
 TRANS(VSTRIHL, do_vstri, gen_helper_VSTRIHL)
 TRANS(VSTRIHR, do_vstri, gen_helper_VSTRIHR)
 
+static bool trans_VCLRLB(DisasContext *ctx, arg_VX *a)
+{
+    TCGv_i64 rb, mh, ml, tmp,
+             ones = tcg_constant_i64(-1),
+             zero = tcg_constant_i64(0);
+
+    rb = tcg_temp_new_i64();
+    mh = tcg_temp_new_i64();
+    ml = tcg_temp_new_i64();
+    tmp = tcg_temp_new_i64();
+
+    tcg_gen_extu_tl_i64(rb, cpu_gpr[a->vrb]);
+    tcg_gen_andi_i64(tmp, rb, 7);
+    tcg_gen_shli_i64(tmp, tmp, 3);
+    tcg_gen_shl_i64(tmp, tcg_constant_i64(-1), tmp);
+    tcg_gen_not_i64(tmp, tmp);
+
+    tcg_gen_movcond_i64(TCG_COND_LTU, ml, rb, tcg_constant_i64(8),
+                        tmp, ones);
+    tcg_gen_movcond_i64(TCG_COND_LTU, mh, rb, tcg_constant_i64(8),
+                        zero, tmp);
+    tcg_gen_movcond_i64(TCG_COND_LTU, mh, rb, tcg_constant_i64(16),
+                        mh, ones);
+
+    get_avr64(tmp, a->vra, true);
+    tcg_gen_and_i64(tmp, tmp, mh);
+    set_avr64(a->vrt, tmp, true);
+
+    get_avr64(tmp, a->vra, false);
+    tcg_gen_and_i64(tmp, tmp, ml);
+    set_avr64(a->vrt, tmp, false);
+
+    tcg_temp_free_i64(rb);
+    tcg_temp_free_i64(mh);
+    tcg_temp_free_i64(ml);
+    tcg_temp_free_i64(tmp);
+
+    return true;
+}
+
 #define GEN_VAFORM_PAIRED(name0, name1, opc2)                           \
 static void glue(gen_, name0##_##name1)(DisasContext *ctx)              \
     {                                                                   \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 16/47] target/ppc: implement vclrrb
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (14 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 15/47] target/ppc: implement vclrlb matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 19:17   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 17/47] target/ppc: implement vcntmb[bhwd] matheus.ferst
                   ` (30 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/insn32.decode            |  1 +
 target/ppc/translate/vmx-impl.c.inc | 32 +++++++++++++++++++++--------
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 31cdbba86b..b20f1eaa8e 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -530,6 +530,7 @@ VSTRIHL         000100 ..... 00010 ..... . 0000001101   @VX_tb_rc
 VSTRIHR         000100 ..... 00011 ..... . 0000001101   @VX_tb_rc
 
 VCLRLB          000100 ..... ..... ..... 00110001101    @VX
+VCLRRB          000100 ..... ..... ..... 00111001101    @VX
 
 # VSX Load/Store Instructions
 
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index 8f12d78071..4510b4ecde 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1940,7 +1940,7 @@ TRANS(VSTRIBR, do_vstri, gen_helper_VSTRIBR)
 TRANS(VSTRIHL, do_vstri, gen_helper_VSTRIHL)
 TRANS(VSTRIHR, do_vstri, gen_helper_VSTRIHR)
 
-static bool trans_VCLRLB(DisasContext *ctx, arg_VX *a)
+static bool do_vclrb(DisasContext *ctx, arg_VX *a, bool right)
 {
     TCGv_i64 rb, mh, ml, tmp,
              ones = tcg_constant_i64(-1),
@@ -1954,15 +1954,28 @@ static bool trans_VCLRLB(DisasContext *ctx, arg_VX *a)
     tcg_gen_extu_tl_i64(rb, cpu_gpr[a->vrb]);
     tcg_gen_andi_i64(tmp, rb, 7);
     tcg_gen_shli_i64(tmp, tmp, 3);
-    tcg_gen_shl_i64(tmp, tcg_constant_i64(-1), tmp);
+    if (right) {
+        tcg_gen_shr_i64(tmp, tcg_constant_i64(-1), tmp);
+    } else {
+        tcg_gen_shl_i64(tmp, tcg_constant_i64(-1), tmp);
+    }
     tcg_gen_not_i64(tmp, tmp);
 
-    tcg_gen_movcond_i64(TCG_COND_LTU, ml, rb, tcg_constant_i64(8),
-                        tmp, ones);
-    tcg_gen_movcond_i64(TCG_COND_LTU, mh, rb, tcg_constant_i64(8),
-                        zero, tmp);
-    tcg_gen_movcond_i64(TCG_COND_LTU, mh, rb, tcg_constant_i64(16),
-                        mh, ones);
+    if (right) {
+        tcg_gen_movcond_i64(TCG_COND_LTU, mh, rb, tcg_constant_i64(8),
+                            tmp, ones);
+        tcg_gen_movcond_i64(TCG_COND_LTU, ml, rb, tcg_constant_i64(8),
+                            zero, tmp);
+        tcg_gen_movcond_i64(TCG_COND_LTU, ml, rb, tcg_constant_i64(16),
+                            ml, ones);
+    } else {
+        tcg_gen_movcond_i64(TCG_COND_LTU, ml, rb, tcg_constant_i64(8),
+                            tmp, ones);
+        tcg_gen_movcond_i64(TCG_COND_LTU, mh, rb, tcg_constant_i64(8),
+                            zero, tmp);
+        tcg_gen_movcond_i64(TCG_COND_LTU, mh, rb, tcg_constant_i64(16),
+                            mh, ones);
+    }
 
     get_avr64(tmp, a->vra, true);
     tcg_gen_and_i64(tmp, tmp, mh);
@@ -1980,6 +1993,9 @@ static bool trans_VCLRLB(DisasContext *ctx, arg_VX *a)
     return true;
 }
 
+TRANS(VCLRLB, do_vclrb, false)
+TRANS(VCLRRB, do_vclrb, true)
+
 #define GEN_VAFORM_PAIRED(name0, name1, opc2)                           \
 static void glue(gen_, name0##_##name1)(DisasContext *ctx)              \
     {                                                                   \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 17/47] target/ppc: implement vcntmb[bhwd]
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (15 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 16/47] target/ppc: implement vclrrb matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 14:36 ` [PATCH v4 18/47] target/ppc: implement vgnb matheus.ferst
                   ` (29 subsequent siblings)
  46 siblings, 0 replies; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/insn32.decode            |  8 ++++++++
 target/ppc/translate/vmx-impl.c.inc | 32 +++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index b20f1eaa8e..31a3c3b508 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -63,6 +63,9 @@
 &VX_bf          bf vra vrb
 @VX_bf          ...... bf:3 .. vra:5 vrb:5 ...........          &VX_bf
 
+&VX_mp          rt mp:bool vrb
+@VX_mp          ...... rt:5 .... mp:1 vrb:5 ...........         &VX_mp
+
 &VX_tb_rc       vrt vrb rc:bool
 @VX_tb_rc       ...... vrt:5 ..... vrb:5 rc:1 ..........        &VX_tb_rc
 
@@ -489,6 +492,11 @@ VEXTRACTWM      000100 ..... 01010 ..... 11001000010    @VX_tb
 VEXTRACTDM      000100 ..... 01011 ..... 11001000010    @VX_tb
 VEXTRACTQM      000100 ..... 01100 ..... 11001000010    @VX_tb
 
+VCNTMBB         000100 ..... 1100 . ..... 11001000010   @VX_mp
+VCNTMBH         000100 ..... 1101 . ..... 11001000010   @VX_mp
+VCNTMBW         000100 ..... 1110 . ..... 11001000010   @VX_mp
+VCNTMBD         000100 ..... 1111 . ..... 11001000010   @VX_mp
+
 ## Vector Multiply Instruction
 
 VMULESB         000100 ..... ..... ..... 01100001000    @VX
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index 4510b4ecde..17fc25d1bd 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1910,6 +1910,38 @@ static bool trans_MTVSRBMI(DisasContext *ctx, arg_DX_b *a)
     return true;
 }
 
+static bool do_vcntmb(DisasContext *ctx, arg_VX_mp *a, int vece)
+{
+    TCGv_i64 rt, vrb, mask;
+    rt = tcg_const_i64(0);
+    vrb = tcg_temp_new_i64();
+    mask = tcg_constant_i64(dup_const(vece, 1ULL << ((8 << vece) - 1)));
+
+    for (int i = 0; i < 2; i++) {
+        get_avr64(vrb, a->vrb, i);
+        if (a->mp) {
+            tcg_gen_and_i64(vrb, mask, vrb);
+        } else {
+            tcg_gen_andc_i64(vrb, mask, vrb);
+        }
+        tcg_gen_ctpop_i64(vrb, vrb);
+        tcg_gen_add_i64(rt, rt, vrb);
+    }
+
+    tcg_gen_shli_i64(rt, rt, TARGET_LONG_BITS - 8 + vece);
+    tcg_gen_trunc_i64_tl(cpu_gpr[a->rt], rt);
+
+    tcg_temp_free_i64(vrb);
+    tcg_temp_free_i64(rt);
+
+    return true;
+}
+
+TRANS(VCNTMBB, do_vcntmb, MO_8)
+TRANS(VCNTMBH, do_vcntmb, MO_16)
+TRANS(VCNTMBW, do_vcntmb, MO_32)
+TRANS(VCNTMBD, do_vcntmb, MO_64)
+
 static bool do_vstri(DisasContext *ctx, arg_VX_tb_rc *a,
                      void (*gen_helper)(TCGv_i32, TCGv_ptr, TCGv_ptr))
 {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 18/47] target/ppc: implement vgnb
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (16 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 17/47] target/ppc: implement vcntmb[bhwd] matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 21:58   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 19/47] target/ppc: move vs[lr][a][bhwd] to decodetree matheus.ferst
                   ` (28 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
v4:
 - Optimized implementation (rth)
---
 target/ppc/insn32.decode            |   5 ++
 target/ppc/translate/vmx-impl.c.inc | 135 ++++++++++++++++++++++++++++
 2 files changed, 140 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 31a3c3b508..02df4a98e6 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -66,6 +66,9 @@
 &VX_mp          rt mp:bool vrb
 @VX_mp          ...... rt:5 .... mp:1 vrb:5 ...........         &VX_mp
 
+&VX_n           rt vrb n
+@VX_n           ...... rt:5 .. n:3 vrb:5 ...........            &VX_n
+
 &VX_tb_rc       vrt vrb rc:bool
 @VX_tb_rc       ...... vrt:5 ..... vrb:5 rc:1 ..........        &VX_tb_rc
 
@@ -418,6 +421,8 @@ VCMPUQ          000100 ... -- ..... ..... 00100000001   @VX_bf
 
 ## Vector Bit Manipulation Instruction
 
+VGNB            000100 ..... -- ... ..... 10011001100   @VX_n
+
 VCFUGED         000100 ..... ..... ..... 10101001101    @VX
 VCLZDM          000100 ..... ..... ..... 11110000100    @VX
 VCTZDM          000100 ..... ..... ..... 11111000100    @VX
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index 17fc25d1bd..19219b0010 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1416,6 +1416,141 @@ GEN_VXFORM_DUAL(vsplth, PPC_ALTIVEC, PPC_NONE,
 GEN_VXFORM_DUAL(vspltw, PPC_ALTIVEC, PPC_NONE,
                 vextractuw, PPC_NONE, PPC2_ISA300);
 
+static bool trans_VGNB(DisasContext *ctx, arg_VX_n *a)
+{
+    /*
+     * Similar to do_vextractm, we'll use a sequence of mask-shift-or operations
+     * to gather the bits. The masks can be created with
+     *
+     * uint64_t mask(uint64_t n, uint64_t step)
+     * {
+     *     uint64_t p = ((1UL << (1UL << step)) - 1UL) << ((n - 1UL) << step),
+     *                  plen = n << step, m = 0;
+     *     for(int i = 0; i < 64/plen; i++) {
+     *         m |= p;
+     *         m = ror64(m, plen);
+     *     }
+     *     p >>= plen * DIV_ROUND_UP(64, plen) - 64;
+     *     return m | p;
+     * }
+     *
+     * But since there are few values of N, we'll use a lookup table to avoid
+     * these calculations at runtime.
+     */
+    static const uint64_t mask[6][5] = {
+        {
+            0xAAAAAAAAAAAAAAAAULL, 0xccccccccccccccccULL, 0xf0f0f0f0f0f0f0f0ULL,
+            0xff00ff00ff00ff00ULL, 0xffff0000ffff0000ULL
+        },
+        {
+            0x9249249249249249ULL, 0xC30C30C30C30C30CULL, 0xF00F00F00F00F00FULL,
+            0xFF0000FF0000FF00ULL, 0xFFFF00000000FFFFULL
+        },
+        {
+            /* For N >= 4, some mask operations can be elided */
+            0x8888888888888888ULL, 0, 0xf000f000f000f000ULL, 0,
+            0xFFFF000000000000ULL
+        },
+        {
+            0x8421084210842108ULL, 0, 0xF0000F0000F0000FULL, 0, 0
+        },
+        {
+            0x8208208208208208ULL, 0, 0xF00000F00000F000ULL, 0, 0
+        },
+        {
+            0x8102040810204081ULL, 0, 0xF000000F000000F0ULL, 0, 0
+        }
+    };
+    uint64_t m;
+    int i, sh, nbits = DIV_ROUND_UP(64, a->n);
+    TCGv_i64 hi, lo, t0, t1;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VECTOR(ctx);
+
+    if (a->n < 2) {
+        /*
+         * "N can be any value between 2 and 7, inclusive." Otherwise, the
+         * result is undefined, so we don't need to change RT. Also, N > 7 is
+         * impossible since the immediate field is 3 bits only.
+         */
+        return true;
+    }
+
+    hi = tcg_temp_new_i64();
+    lo = tcg_temp_new_i64();
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+
+    get_avr64(hi, a->vrb, true);
+    get_avr64(lo, a->vrb, false);
+
+    /* Align the lower doubleword so we can use the same mask */
+    tcg_gen_shli_i64(lo, lo, a->n * nbits - 64);
+
+    /*
+     * Starting from the most significant bit, gather every Nth bit with a
+     * sequence of mask-shift-or operation. E.g.: for N=3
+     * AxxBxxCxxDxxExxFxxGxxHxxIxxJxxKxxLxxMxxNxxOxxPxxQxxRxxSxxTxxUxxV
+     *     & rep(0b100)
+     * A..B..C..D..E..F..G..H..I..J..K..L..M..N..O..P..Q..R..S..T..U..V
+     *     << 2
+     * .B..C..D..E..F..G..H..I..J..K..L..M..N..O..P..Q..R..S..T..U..V..
+     *     |
+     * AB.BC.CD.DE.EF.FG.GH.HI.IJ.JK.KL.LM.MN.NO.OP.PQ.QR.RS.ST.TU.UV.V
+     *  & rep(0b110000)
+     * AB....CD....EF....GH....IJ....KL....MN....OP....QR....ST....UV..
+     *     << 4
+     * ..CD....EF....GH....IJ....KL....MN....OP....QR....ST....UV......
+     *     |
+     * ABCD..CDEF..EFGH..GHIJ..IJKL..KLMN..MNOP..OPQR..QRST..STUV..UV..
+     *     & rep(0b111100000000)
+     * ABCD........EFGH........IJKL........MNOP........QRST........UV..
+     *     << 8
+     * ....EFGH........IJKL........MNOP........QRST........UV..........
+     *     |
+     * ABCDEFGH....EFGHIJKL....IJKLMNOP....MNOPQRST....QRSTUV......UV..
+     *  & rep(0b111111110000000000000000)
+     * ABCDEFGH................IJKLMNOP................QRSTUV..........
+     *     << 16
+     * ........IJKLMNOP................QRSTUV..........................
+     *     |
+     * ABCDEFGHIJKLMNOP........IJKLMNOPQRSTUV..........QRSTUV..........
+     *     & rep(0b111111111111111100000000000000000000000000000000)
+     * ABCDEFGHIJKLMNOP................................QRSTUV..........
+     *     << 32
+     * ................QRSTUV..........................................
+     *     |
+     * ABCDEFGHIJKLMNOPQRSTUV..........................QRSTUV..........
+     */
+    for (i = 0, sh = a->n - 1; i < 5; i++, sh <<= 1) {
+        m = mask[a->n - 2][i];
+        if (m) {
+            tcg_gen_andi_i64(hi, hi, m);
+            tcg_gen_andi_i64(lo, lo, m);
+        }
+        if (sh < 64) {
+            tcg_gen_shli_i64(t0, hi, sh);
+            tcg_gen_shli_i64(t1, lo, sh);
+            tcg_gen_or_i64(hi, t0, hi);
+            tcg_gen_or_i64(lo, t1, lo);
+        }
+    }
+
+    tcg_gen_andi_i64(hi, hi, ~(~0ULL >> nbits));
+    tcg_gen_andi_i64(lo, lo, ~(~0ULL >> nbits));
+    tcg_gen_shri_i64(lo, lo, nbits);
+    tcg_gen_or_i64(hi, hi, lo);
+    tcg_gen_trunc_i64_tl(cpu_gpr[a->rt], hi);
+
+    tcg_temp_free_i64(hi);
+    tcg_temp_free_i64(lo);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+
+    return true;
+}
+
 static bool do_vextdx(DisasContext *ctx, arg_VA *a, int size, bool right,
                void (*gen_helper)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv))
 {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 19/47] target/ppc: move vs[lr][a][bhwd] to decodetree
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (17 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 18/47] target/ppc: implement vgnb matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 22:01   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 20/47] target/ppc: implement vslq matheus.ferst
                   ` (27 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
v4:
 -  New in v4.
---
 target/ppc/insn32.decode            | 17 ++++++++++++
 target/ppc/translate/vmx-impl.c.inc | 41 +++++++++++++++++++----------
 target/ppc/translate/vmx-ops.c.inc  | 13 +--------
 3 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 02df4a98e6..88baebe35e 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -467,6 +467,23 @@ VINSWVRX        000100 ..... ..... ..... 00110001111    @VX
 VSLDBI          000100 ..... ..... ..... 00 ... 010110  @VN
 VSRDBI          000100 ..... ..... ..... 01 ... 010110  @VN
 
+## Vector Integer Shift Instruction
+
+VSLB            000100 ..... ..... ..... 00100000100    @VX
+VSLH            000100 ..... ..... ..... 00101000100    @VX
+VSLW            000100 ..... ..... ..... 00110000100    @VX
+VSLD            000100 ..... ..... ..... 10111000100    @VX
+
+VSRB            000100 ..... ..... ..... 01000000100    @VX
+VSRH            000100 ..... ..... ..... 01001000100    @VX
+VSRW            000100 ..... ..... ..... 01010000100    @VX
+VSRD            000100 ..... ..... ..... 11011000100    @VX
+
+VSRAB           000100 ..... ..... ..... 01100000100    @VX
+VSRAH           000100 ..... ..... ..... 01101000100    @VX
+VSRAW           000100 ..... ..... ..... 01110000100    @VX
+VSRAD           000100 ..... ..... ..... 01111000100    @VX
+
 ## Vector Integer Arithmetic Instructions
 
 VEXTSB2W        000100 ..... 10000 ..... 11000000010    @VX_tb
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index 19219b0010..ec4f0e7654 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -799,21 +799,7 @@ static void trans_vclzd(DisasContext *ctx)
 }
 
 GEN_VXFORM_V(vmuluwm, MO_32, tcg_gen_gvec_mul, 4, 2);
-GEN_VXFORM_V(vslb, MO_8, tcg_gen_gvec_shlv, 2, 4);
-GEN_VXFORM_V(vslh, MO_16, tcg_gen_gvec_shlv, 2, 5);
-GEN_VXFORM_V(vslw, MO_32, tcg_gen_gvec_shlv, 2, 6);
 GEN_VXFORM(vrlwnm, 2, 6);
-GEN_VXFORM_DUAL(vslw, PPC_ALTIVEC, PPC_NONE, \
-                vrlwnm, PPC_NONE, PPC2_ISA300)
-GEN_VXFORM_V(vsld, MO_64, tcg_gen_gvec_shlv, 2, 23);
-GEN_VXFORM_V(vsrb, MO_8, tcg_gen_gvec_shrv, 2, 8);
-GEN_VXFORM_V(vsrh, MO_16, tcg_gen_gvec_shrv, 2, 9);
-GEN_VXFORM_V(vsrw, MO_32, tcg_gen_gvec_shrv, 2, 10);
-GEN_VXFORM_V(vsrd, MO_64, tcg_gen_gvec_shrv, 2, 27);
-GEN_VXFORM_V(vsrab, MO_8, tcg_gen_gvec_sarv, 2, 12);
-GEN_VXFORM_V(vsrah, MO_16, tcg_gen_gvec_sarv, 2, 13);
-GEN_VXFORM_V(vsraw, MO_32, tcg_gen_gvec_sarv, 2, 14);
-GEN_VXFORM_V(vsrad, MO_64, tcg_gen_gvec_sarv, 2, 15);
 GEN_VXFORM(vsrv, 2, 28);
 GEN_VXFORM(vslv, 2, 29);
 GEN_VXFORM(vslo, 6, 16);
@@ -821,6 +807,33 @@ GEN_VXFORM(vsro, 6, 17);
 GEN_VXFORM(vaddcuw, 0, 6);
 GEN_VXFORM(vsubcuw, 0, 22);
 
+static bool do_vector_gvec3_VX(DisasContext *ctx, arg_VX *a, int vece,
+                               void (*gen_gvec)(unsigned, uint32_t, uint32_t,
+                                                uint32_t, uint32_t, uint32_t))
+{
+    REQUIRE_VECTOR(ctx);
+
+    gen_gvec(vece, avr_full_offset(a->vrt), avr_full_offset(a->vra),
+             avr_full_offset(a->vrb), 16, 16);
+
+    return true;
+}
+
+TRANS_FLAGS(ALTIVEC, VSLB, do_vector_gvec3_VX, MO_8, tcg_gen_gvec_shlv);
+TRANS_FLAGS(ALTIVEC, VSLH, do_vector_gvec3_VX, MO_16, tcg_gen_gvec_shlv);
+TRANS_FLAGS(ALTIVEC, VSLW, do_vector_gvec3_VX, MO_32, tcg_gen_gvec_shlv);
+TRANS_FLAGS2(ALTIVEC_207, VSLD, do_vector_gvec3_VX, MO_64, tcg_gen_gvec_shlv);
+
+TRANS_FLAGS(ALTIVEC, VSRB, do_vector_gvec3_VX, MO_8, tcg_gen_gvec_shrv);
+TRANS_FLAGS(ALTIVEC, VSRH, do_vector_gvec3_VX, MO_16, tcg_gen_gvec_shrv);
+TRANS_FLAGS(ALTIVEC, VSRW, do_vector_gvec3_VX, MO_32, tcg_gen_gvec_shrv);
+TRANS_FLAGS2(ALTIVEC_207, VSRD, do_vector_gvec3_VX, MO_64, tcg_gen_gvec_shrv);
+
+TRANS_FLAGS(ALTIVEC, VSRAB, do_vector_gvec3_VX, MO_8, tcg_gen_gvec_sarv);
+TRANS_FLAGS(ALTIVEC, VSRAH, do_vector_gvec3_VX, MO_16, tcg_gen_gvec_sarv);
+TRANS_FLAGS(ALTIVEC, VSRAW, do_vector_gvec3_VX, MO_32, tcg_gen_gvec_sarv);
+TRANS_FLAGS2(ALTIVEC_207, VSRAD, do_vector_gvec3_VX, MO_64, tcg_gen_gvec_sarv);
+
 #define GEN_VXFORM_SAT(NAME, VECE, NORM, SAT, OPC2, OPC3)               \
 static void glue(glue(gen_, NAME), _vec)(unsigned vece, TCGv_vec t,     \
                                          TCGv_vec sat, TCGv_vec a,      \
diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc
index cb4c5bb953..878bce92c6 100644
--- a/target/ppc/translate/vmx-ops.c.inc
+++ b/target/ppc/translate/vmx-ops.c.inc
@@ -102,18 +102,7 @@ GEN_VXFORM_300(vextubrx, 6, 28),
 GEN_VXFORM_300(vextuhrx, 6, 29),
 GEN_VXFORM_DUAL(vmrgew, vextuwrx, 6, 30, PPC_NONE, PPC2_ALTIVEC_207),
 GEN_VXFORM_207(vmuluwm, 4, 2),
-GEN_VXFORM(vslb, 2, 4),
-GEN_VXFORM(vslh, 2, 5),
-GEN_VXFORM_DUAL(vslw, vrlwnm, 2, 6, PPC_ALTIVEC, PPC_NONE),
-GEN_VXFORM_207(vsld, 2, 23),
-GEN_VXFORM(vsrb, 2, 8),
-GEN_VXFORM(vsrh, 2, 9),
-GEN_VXFORM(vsrw, 2, 10),
-GEN_VXFORM_207(vsrd, 2, 27),
-GEN_VXFORM(vsrab, 2, 12),
-GEN_VXFORM(vsrah, 2, 13),
-GEN_VXFORM(vsraw, 2, 14),
-GEN_VXFORM_207(vsrad, 2, 15),
+GEN_VXFORM_300(vrlwnm, 2, 6),
 GEN_VXFORM_300(vsrv, 2, 28),
 GEN_VXFORM_300(vslv, 2, 29),
 GEN_VXFORM(vslo, 6, 16),
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 20/47] target/ppc: implement vslq
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (18 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 19/47] target/ppc: move vs[lr][a][bhwd] to decodetree matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 22:14   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 21/47] target/ppc: implement vsrq matheus.ferst
                   ` (26 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
v4:
 -  New in v4.
---
 target/ppc/insn32.decode            |  1 +
 target/ppc/translate/vmx-impl.c.inc | 40 +++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 88baebe35e..3799065508 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -473,6 +473,7 @@ VSLB            000100 ..... ..... ..... 00100000100    @VX
 VSLH            000100 ..... ..... ..... 00101000100    @VX
 VSLW            000100 ..... ..... ..... 00110000100    @VX
 VSLD            000100 ..... ..... ..... 10111000100    @VX
+VSLQ            000100 ..... ..... ..... 00100000101    @VX
 
 VSRB            000100 ..... ..... ..... 01000000100    @VX
 VSRH            000100 ..... ..... ..... 01001000100    @VX
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index ec4f0e7654..ca98a545ef 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -834,6 +834,46 @@ TRANS_FLAGS(ALTIVEC, VSRAH, do_vector_gvec3_VX, MO_16, tcg_gen_gvec_sarv);
 TRANS_FLAGS(ALTIVEC, VSRAW, do_vector_gvec3_VX, MO_32, tcg_gen_gvec_sarv);
 TRANS_FLAGS2(ALTIVEC_207, VSRAD, do_vector_gvec3_VX, MO_64, tcg_gen_gvec_sarv);
 
+static bool trans_VSLQ(DisasContext *ctx, arg_VX *a)
+{
+    TCGv_i64 hi, lo, tmp, n, sf = tcg_constant_i64(64);
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VECTOR(ctx);
+
+    n = tcg_temp_new_i64();
+    hi = tcg_temp_new_i64();
+    lo = tcg_temp_new_i64();
+    tmp = tcg_const_i64(0);
+
+    get_avr64(lo, a->vra, false);
+    get_avr64(hi, a->vra, true);
+
+    get_avr64(n, a->vrb, true);
+    tcg_gen_andi_i64(n, n, 0x7F);
+
+    tcg_gen_movcond_i64(TCG_COND_GE, hi, n, sf, lo, hi);
+    tcg_gen_movcond_i64(TCG_COND_GE, lo, n, sf, tmp, lo);
+    tcg_gen_andi_i64(n, n, ~64ULL);
+
+    tcg_gen_shl_i64(tmp, lo, n);
+    set_avr64(a->vrt, tmp, false);
+
+    tcg_gen_shl_i64(hi, hi, n);
+    tcg_gen_xori_i64(n, n, 63);
+    tcg_gen_shr_i64(lo, lo, n);
+    tcg_gen_shri_i64(lo, lo, 1);
+    tcg_gen_or_i64(hi, hi, lo);
+    set_avr64(a->vrt, hi, true);
+
+    tcg_temp_free_i64(hi);
+    tcg_temp_free_i64(lo);
+    tcg_temp_free_i64(tmp);
+    tcg_temp_free_i64(n);
+
+    return true;
+}
+
 #define GEN_VXFORM_SAT(NAME, VECE, NORM, SAT, OPC2, OPC3)               \
 static void glue(glue(gen_, NAME), _vec)(unsigned vece, TCGv_vec t,     \
                                          TCGv_vec sat, TCGv_vec a,      \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 21/47] target/ppc: implement vsrq
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (19 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 20/47] target/ppc: implement vslq matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 22:15   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 22/47] target/ppc: implement vsraq matheus.ferst
                   ` (25 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
v4:
 -  New in v4.
---
 target/ppc/insn32.decode            |  1 +
 target/ppc/translate/vmx-impl.c.inc | 40 +++++++++++++++++++++--------
 2 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 3799065508..96ee730242 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -479,6 +479,7 @@ VSRB            000100 ..... ..... ..... 01000000100    @VX
 VSRH            000100 ..... ..... ..... 01001000100    @VX
 VSRW            000100 ..... ..... ..... 01010000100    @VX
 VSRD            000100 ..... ..... ..... 11011000100    @VX
+VSRQ            000100 ..... ..... ..... 01000000101    @VX
 
 VSRAB           000100 ..... ..... ..... 01100000100    @VX
 VSRAH           000100 ..... ..... ..... 01101000100    @VX
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index ca98a545ef..ec2b47b4aa 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -834,11 +834,10 @@ TRANS_FLAGS(ALTIVEC, VSRAH, do_vector_gvec3_VX, MO_16, tcg_gen_gvec_sarv);
 TRANS_FLAGS(ALTIVEC, VSRAW, do_vector_gvec3_VX, MO_32, tcg_gen_gvec_sarv);
 TRANS_FLAGS2(ALTIVEC_207, VSRAD, do_vector_gvec3_VX, MO_64, tcg_gen_gvec_sarv);
 
-static bool trans_VSLQ(DisasContext *ctx, arg_VX *a)
+static bool do_vector_shift_quad(DisasContext *ctx, arg_VX *a, bool right)
 {
     TCGv_i64 hi, lo, tmp, n, sf = tcg_constant_i64(64);
 
-    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
     REQUIRE_VECTOR(ctx);
 
     n = tcg_temp_new_i64();
@@ -852,19 +851,37 @@ static bool trans_VSLQ(DisasContext *ctx, arg_VX *a)
     get_avr64(n, a->vrb, true);
     tcg_gen_andi_i64(n, n, 0x7F);
 
-    tcg_gen_movcond_i64(TCG_COND_GE, hi, n, sf, lo, hi);
-    tcg_gen_movcond_i64(TCG_COND_GE, lo, n, sf, tmp, lo);
+    if (right) {
+        tcg_gen_movcond_i64(TCG_COND_GE, lo, n, sf, hi, lo);
+        tcg_gen_movcond_i64(TCG_COND_GE, hi, n, sf, tmp, hi);
+    } else {
+        tcg_gen_movcond_i64(TCG_COND_GE, hi, n, sf, lo, hi);
+        tcg_gen_movcond_i64(TCG_COND_GE, lo, n, sf, tmp, lo);
+    }
     tcg_gen_andi_i64(n, n, ~64ULL);
 
-    tcg_gen_shl_i64(tmp, lo, n);
-    set_avr64(a->vrt, tmp, false);
+    if (right) {
+        tcg_gen_shr_i64(tmp, hi, n);
+    } else {
+        tcg_gen_shl_i64(tmp, lo, n);
+    }
+    set_avr64(a->vrt, tmp, right);
 
-    tcg_gen_shl_i64(hi, hi, n);
+    if (right) {
+        tcg_gen_shr_i64(lo, lo, n);
+    } else {
+        tcg_gen_shl_i64(hi, hi, n);
+    }
     tcg_gen_xori_i64(n, n, 63);
-    tcg_gen_shr_i64(lo, lo, n);
-    tcg_gen_shri_i64(lo, lo, 1);
+    if (right) {
+        tcg_gen_shl_i64(hi, hi, n);
+        tcg_gen_shli_i64(hi, hi, 1);
+    } else {
+        tcg_gen_shr_i64(lo, lo, n);
+        tcg_gen_shri_i64(lo, lo, 1);
+    }
     tcg_gen_or_i64(hi, hi, lo);
-    set_avr64(a->vrt, hi, true);
+    set_avr64(a->vrt, hi, !right);
 
     tcg_temp_free_i64(hi);
     tcg_temp_free_i64(lo);
@@ -874,6 +891,9 @@ static bool trans_VSLQ(DisasContext *ctx, arg_VX *a)
     return true;
 }
 
+TRANS_FLAGS2(ISA310, VSLQ, do_vector_shift_quad, false);
+TRANS_FLAGS2(ISA310, VSRQ, do_vector_shift_quad, true);
+
 #define GEN_VXFORM_SAT(NAME, VECE, NORM, SAT, OPC2, OPC3)               \
 static void glue(glue(gen_, NAME), _vec)(unsigned vece, TCGv_vec t,     \
                                          TCGv_vec sat, TCGv_vec a,      \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 22/47] target/ppc: implement vsraq
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (20 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 21/47] target/ppc: implement vsrq matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 22:19   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 23/47] target/ppc: move vrl[bhwd] to decodetree matheus.ferst
                   ` (24 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
v4:
 -  New in v4.
---
 target/ppc/insn32.decode            |  1 +
 target/ppc/translate/vmx-impl.c.inc | 17 +++++++++++++----
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 96ee730242..7a9fc1dffa 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -485,6 +485,7 @@ VSRAB           000100 ..... ..... ..... 01100000100    @VX
 VSRAH           000100 ..... ..... ..... 01101000100    @VX
 VSRAW           000100 ..... ..... ..... 01110000100    @VX
 VSRAD           000100 ..... ..... ..... 01111000100    @VX
+VSRAQ           000100 ..... ..... ..... 01100000101    @VX
 
 ## Vector Integer Arithmetic Instructions
 
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index ec2b47b4aa..2eee187499 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -834,7 +834,8 @@ TRANS_FLAGS(ALTIVEC, VSRAH, do_vector_gvec3_VX, MO_16, tcg_gen_gvec_sarv);
 TRANS_FLAGS(ALTIVEC, VSRAW, do_vector_gvec3_VX, MO_32, tcg_gen_gvec_sarv);
 TRANS_FLAGS2(ALTIVEC_207, VSRAD, do_vector_gvec3_VX, MO_64, tcg_gen_gvec_sarv);
 
-static bool do_vector_shift_quad(DisasContext *ctx, arg_VX *a, bool right)
+static bool do_vector_shift_quad(DisasContext *ctx, arg_VX *a, bool right,
+                                 bool alg)
 {
     TCGv_i64 hi, lo, tmp, n, sf = tcg_constant_i64(64);
 
@@ -853,6 +854,9 @@ static bool do_vector_shift_quad(DisasContext *ctx, arg_VX *a, bool right)
 
     if (right) {
         tcg_gen_movcond_i64(TCG_COND_GE, lo, n, sf, hi, lo);
+        if (alg) {
+            tcg_gen_sari_i64(tmp, lo, 63);
+        }
         tcg_gen_movcond_i64(TCG_COND_GE, hi, n, sf, tmp, hi);
     } else {
         tcg_gen_movcond_i64(TCG_COND_GE, hi, n, sf, lo, hi);
@@ -861,7 +865,11 @@ static bool do_vector_shift_quad(DisasContext *ctx, arg_VX *a, bool right)
     tcg_gen_andi_i64(n, n, ~64ULL);
 
     if (right) {
-        tcg_gen_shr_i64(tmp, hi, n);
+        if (alg) {
+            tcg_gen_sar_i64(tmp, hi, n);
+        } else {
+            tcg_gen_shr_i64(tmp, hi, n);
+        }
     } else {
         tcg_gen_shl_i64(tmp, lo, n);
     }
@@ -891,8 +899,9 @@ static bool do_vector_shift_quad(DisasContext *ctx, arg_VX *a, bool right)
     return true;
 }
 
-TRANS_FLAGS2(ISA310, VSLQ, do_vector_shift_quad, false);
-TRANS_FLAGS2(ISA310, VSRQ, do_vector_shift_quad, true);
+TRANS_FLAGS2(ISA310, VSLQ, do_vector_shift_quad, false, false);
+TRANS_FLAGS2(ISA310, VSRQ, do_vector_shift_quad, true, false);
+TRANS_FLAGS2(ISA310, VSRAQ, do_vector_shift_quad, true, true);
 
 #define GEN_VXFORM_SAT(NAME, VECE, NORM, SAT, OPC2, OPC3)               \
 static void glue(glue(gen_, NAME), _vec)(unsigned vece, TCGv_vec t,     \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 23/47] target/ppc: move vrl[bhwd] to decodetree
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (21 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 22/47] target/ppc: implement vsraq matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 22:20   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 24/47] target/ppc: move vrl[bhwd]nm/vrl[bhwd]mi " matheus.ferst
                   ` (23 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
v4:
 -  New in v4.
---
 target/ppc/insn32.decode            |  5 +++++
 target/ppc/translate/vmx-impl.c.inc | 13 +++++--------
 target/ppc/translate/vmx-ops.c.inc  |  6 ++----
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 7a9fc1dffa..d918e2d0f2 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -487,6 +487,11 @@ VSRAW           000100 ..... ..... ..... 01110000100    @VX
 VSRAD           000100 ..... ..... ..... 01111000100    @VX
 VSRAQ           000100 ..... ..... ..... 01100000101    @VX
 
+VRLB            000100 ..... ..... ..... 00000000100    @VX
+VRLH            000100 ..... ..... ..... 00001000100    @VX
+VRLW            000100 ..... ..... ..... 00010000100    @VX
+VRLD            000100 ..... ..... ..... 00011000100    @VX
+
 ## Vector Integer Arithmetic Instructions
 
 VEXTSB2W        000100 ..... 10000 ..... 11000000010    @VX_tb
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index 2eee187499..9dcac4243f 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -834,6 +834,11 @@ TRANS_FLAGS(ALTIVEC, VSRAH, do_vector_gvec3_VX, MO_16, tcg_gen_gvec_sarv);
 TRANS_FLAGS(ALTIVEC, VSRAW, do_vector_gvec3_VX, MO_32, tcg_gen_gvec_sarv);
 TRANS_FLAGS2(ALTIVEC_207, VSRAD, do_vector_gvec3_VX, MO_64, tcg_gen_gvec_sarv);
 
+TRANS_FLAGS(ALTIVEC, VRLB, do_vector_gvec3_VX, MO_8, tcg_gen_gvec_rotlv)
+TRANS_FLAGS(ALTIVEC, VRLH, do_vector_gvec3_VX, MO_16, tcg_gen_gvec_rotlv)
+TRANS_FLAGS(ALTIVEC, VRLW, do_vector_gvec3_VX, MO_32, tcg_gen_gvec_rotlv)
+TRANS_FLAGS2(ALTIVEC_207, VRLD, do_vector_gvec3_VX, MO_64, tcg_gen_gvec_rotlv)
+
 static bool do_vector_shift_quad(DisasContext *ctx, arg_VX *a, bool right,
                                  bool alg)
 {
@@ -968,16 +973,8 @@ GEN_VXFORM3(vsubeuqm, 31, 0);
 GEN_VXFORM3(vsubecuq, 31, 0);
 GEN_VXFORM_DUAL(vsubeuqm, PPC_NONE, PPC2_ALTIVEC_207, \
             vsubecuq, PPC_NONE, PPC2_ALTIVEC_207)
-GEN_VXFORM_V(vrlb, MO_8, tcg_gen_gvec_rotlv, 2, 0);
-GEN_VXFORM_V(vrlh, MO_16, tcg_gen_gvec_rotlv, 2, 1);
-GEN_VXFORM_V(vrlw, MO_32, tcg_gen_gvec_rotlv, 2, 2);
 GEN_VXFORM(vrlwmi, 2, 2);
-GEN_VXFORM_DUAL(vrlw, PPC_ALTIVEC, PPC_NONE, \
-                vrlwmi, PPC_NONE, PPC2_ISA300)
-GEN_VXFORM_V(vrld, MO_64, tcg_gen_gvec_rotlv, 2, 3);
 GEN_VXFORM(vrldmi, 2, 3);
-GEN_VXFORM_DUAL(vrld, PPC_NONE, PPC2_ALTIVEC_207, \
-                vrldmi, PPC_NONE, PPC2_ISA300)
 GEN_VXFORM_TRANS(vsl, 2, 7);
 GEN_VXFORM(vrldnm, 2, 7);
 GEN_VXFORM_DUAL(vsl, PPC_ALTIVEC, PPC_NONE, \
diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc
index 878bce92c6..a7acea3ca7 100644
--- a/target/ppc/translate/vmx-ops.c.inc
+++ b/target/ppc/translate/vmx-ops.c.inc
@@ -133,10 +133,8 @@ GEN_VXFORM_DUAL(vaddeuqm, vaddecuq, 30, 0xFF, PPC_NONE, PPC2_ALTIVEC_207),
 GEN_VXFORM_DUAL(vsubuqm, bcdtrunc, 0, 20, PPC2_ALTIVEC_207, PPC2_ISA300),
 GEN_VXFORM_DUAL(vsubcuq, bcdutrunc, 0, 21, PPC2_ALTIVEC_207, PPC2_ISA300),
 GEN_VXFORM_DUAL(vsubeuqm, vsubecuq, 31, 0xFF, PPC_NONE, PPC2_ALTIVEC_207),
-GEN_VXFORM(vrlb, 2, 0),
-GEN_VXFORM(vrlh, 2, 1),
-GEN_VXFORM_DUAL(vrlw, vrlwmi, 2, 2, PPC_ALTIVEC, PPC_NONE),
-GEN_VXFORM_DUAL(vrld, vrldmi, 2, 3, PPC_NONE, PPC2_ALTIVEC_207),
+GEN_VXFORM_300(vrlwmi, 2, 2),
+GEN_VXFORM_300(vrldmi, 2, 3),
 GEN_VXFORM_DUAL(vsl, vrldnm, 2, 7, PPC_ALTIVEC, PPC_NONE),
 GEN_VXFORM(vsr, 2, 11),
 GEN_VXFORM(vpkuhum, 7, 0),
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 24/47] target/ppc: move vrl[bhwd]nm/vrl[bhwd]mi to decodetree
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (22 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 23/47] target/ppc: move vrl[bhwd] to decodetree matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 22:30   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 25/47] target/ppc: implement vrlq matheus.ferst
                   ` (22 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
v4:
 -  New in v4.
---
 target/ppc/helper.h                 |   8 +-
 target/ppc/insn32.decode            |   6 ++
 target/ppc/int_helper.c             |  50 ++++-----
 target/ppc/translate/vmx-impl.c.inc | 152 ++++++++++++++++++++++++++--
 target/ppc/translate/vmx-ops.c.inc  |   5 +-
 5 files changed, 182 insertions(+), 39 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 269150b197..a2a0d461dd 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -275,10 +275,10 @@ DEF_HELPER_4(vmaxfp, void, env, avr, avr, avr)
 DEF_HELPER_4(vminfp, void, env, avr, avr, avr)
 DEF_HELPER_3(vrefp, void, env, avr, avr)
 DEF_HELPER_3(vrsqrtefp, void, env, avr, avr)
-DEF_HELPER_3(vrlwmi, void, avr, avr, avr)
-DEF_HELPER_3(vrldmi, void, avr, avr, avr)
-DEF_HELPER_3(vrldnm, void, avr, avr, avr)
-DEF_HELPER_3(vrlwnm, void, avr, avr, avr)
+DEF_HELPER_4(VRLWMI, void, avr, avr, avr, i32)
+DEF_HELPER_4(VRLDMI, void, avr, avr, avr, i32)
+DEF_HELPER_4(VRLDNM, void, avr, avr, avr, i32)
+DEF_HELPER_4(VRLWNM, void, avr, avr, avr, i32)
 DEF_HELPER_5(vmaddfp, void, env, avr, avr, avr, avr)
 DEF_HELPER_5(vnmsubfp, void, env, avr, avr, avr, avr)
 DEF_HELPER_3(vexptefp, void, env, avr, avr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index d918e2d0f2..e788dc5152 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -492,6 +492,12 @@ VRLH            000100 ..... ..... ..... 00001000100    @VX
 VRLW            000100 ..... ..... ..... 00010000100    @VX
 VRLD            000100 ..... ..... ..... 00011000100    @VX
 
+VRLWMI          000100 ..... ..... ..... 00010000101    @VX
+VRLDMI          000100 ..... ..... ..... 00011000101    @VX
+
+VRLWNM          000100 ..... ..... ..... 00110000101    @VX
+VRLDNM          000100 ..... ..... ..... 00111000101    @VX
+
 ## Vector Integer Arithmetic Instructions
 
 VEXTSB2W        000100 ..... 10000 ..... 11000000010    @VX_tb
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 0a094b535a..58e57b2563 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1291,33 +1291,33 @@ void helper_vrsqrtefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
     }
 }
 
-#define VRLMI(name, size, element, insert)                            \
-void helper_##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)          \
-{                                                                     \
-    int i;                                                            \
-    for (i = 0; i < ARRAY_SIZE(r->element); i++) {                    \
-        uint##size##_t src1 = a->element[i];                          \
-        uint##size##_t src2 = b->element[i];                          \
-        uint##size##_t src3 = r->element[i];                          \
-        uint##size##_t begin, end, shift, mask, rot_val;              \
-                                                                      \
-        shift = extract##size(src2, 0, 6);                            \
-        end   = extract##size(src2, 8, 6);                            \
-        begin = extract##size(src2, 16, 6);                           \
-        rot_val = rol##size(src1, shift);                             \
-        mask = mask_u##size(begin, end);                              \
-        if (insert) {                                                 \
-            r->element[i] = (rot_val & mask) | (src3 & ~mask);        \
-        } else {                                                      \
-            r->element[i] = (rot_val & mask);                         \
-        }                                                             \
-    }                                                                 \
+#define VRLMI(name, size, element, insert)                                  \
+void helper_##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t desc) \
+{                                                                           \
+    int i;                                                                  \
+    for (i = 0; i < ARRAY_SIZE(r->element); i++) {                          \
+        uint##size##_t src1 = a->element[i];                                \
+        uint##size##_t src2 = b->element[i];                                \
+        uint##size##_t src3 = r->element[i];                                \
+        uint##size##_t begin, end, shift, mask, rot_val;                    \
+                                                                            \
+        shift = extract##size(src2, 0, 6);                                  \
+        end   = extract##size(src2, 8, 6);                                  \
+        begin = extract##size(src2, 16, 6);                                 \
+        rot_val = rol##size(src1, shift);                                   \
+        mask = mask_u##size(begin, end);                                    \
+        if (insert) {                                                       \
+            r->element[i] = (rot_val & mask) | (src3 & ~mask);              \
+        } else {                                                            \
+            r->element[i] = (rot_val & mask);                               \
+        }                                                                   \
+    }                                                                       \
 }
 
-VRLMI(vrldmi, 64, u64, 1);
-VRLMI(vrlwmi, 32, u32, 1);
-VRLMI(vrldnm, 64, u64, 0);
-VRLMI(vrlwnm, 32, u32, 0);
+VRLMI(VRLDMI, 64, u64, 1);
+VRLMI(VRLWMI, 32, u32, 1);
+VRLMI(VRLDNM, 64, u64, 0);
+VRLMI(VRLWNM, 32, u32, 0);
 
 void helper_vsel(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b,
                  ppc_avr_t *c)
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index 9dcac4243f..a025404032 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -799,7 +799,6 @@ static void trans_vclzd(DisasContext *ctx)
 }
 
 GEN_VXFORM_V(vmuluwm, MO_32, tcg_gen_gvec_mul, 4, 2);
-GEN_VXFORM(vrlwnm, 2, 6);
 GEN_VXFORM(vsrv, 2, 28);
 GEN_VXFORM(vslv, 2, 29);
 GEN_VXFORM(vslo, 6, 16);
@@ -839,6 +838,152 @@ TRANS_FLAGS(ALTIVEC, VRLH, do_vector_gvec3_VX, MO_16, tcg_gen_gvec_rotlv)
 TRANS_FLAGS(ALTIVEC, VRLW, do_vector_gvec3_VX, MO_32, tcg_gen_gvec_rotlv)
 TRANS_FLAGS2(ALTIVEC_207, VRLD, do_vector_gvec3_VX, MO_64, tcg_gen_gvec_rotlv)
 
+static TCGv_vec do_vrl_mask_vec(unsigned vece, TCGv_vec vrb)
+{
+    TCGv_vec t0 = tcg_temp_new_vec_matching(vrb),
+             t1 = tcg_temp_new_vec_matching(vrb),
+             t2 = tcg_temp_new_vec_matching(vrb),
+             ones = tcg_constant_vec_matching(vrb, vece, -1);
+
+    /* Extract b and e */
+    tcg_gen_dupi_vec(vece, t2, (8 << vece) - 1);
+
+    tcg_gen_shri_vec(vece, t0, vrb, 16);
+    tcg_gen_and_vec(vece, t0, t0, t2);
+
+    tcg_gen_shri_vec(vece, t1, vrb, 8);
+    tcg_gen_and_vec(vece, t1, t1, t2);
+
+    /* Compare b and e to negate the mask where begin > end */
+    tcg_gen_cmp_vec(TCG_COND_GT, vece, t2, t0, t1);
+
+    /* Create the mask with (~0 >> b) ^ ((~0 >> e) >> 1) */
+    tcg_gen_shrv_vec(vece, t0, ones, t0);
+    tcg_gen_shrv_vec(vece, t1, ones, t1);
+    tcg_gen_shri_vec(vece, t1, t1, 1);
+    tcg_gen_xor_vec(vece, t0, t0, t1);
+
+    /* negate the mask */
+    tcg_gen_xor_vec(vece, t0, t0, t2);
+
+    tcg_temp_free_vec(t1);
+    tcg_temp_free_vec(t2);
+
+    return t0;
+}
+
+static void gen_vrlnm_vec(unsigned vece, TCGv_vec vrt, TCGv_vec vra,
+                          TCGv_vec vrb)
+{
+    TCGv_vec mask, n = tcg_temp_new_vec_matching(vrt);
+
+    /* Create the mask */
+    mask = do_vrl_mask_vec(vece, vrb);
+
+    /* Extract n */
+    tcg_gen_dupi_vec(vece, n, (8 << vece) - 1);
+    tcg_gen_and_vec(vece, n, vrb, n);
+
+    /* Rotate and mask */
+    tcg_gen_rotlv_vec(vece, vrt, vra, n);
+    tcg_gen_and_vec(vece, vrt, vrt, mask);
+
+    tcg_temp_free_vec(n);
+    tcg_temp_free_vec(mask);
+}
+
+static bool do_vrlnm(DisasContext *ctx, arg_VX *a, int vece)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_cmp_vec, INDEX_op_rotlv_vec, INDEX_op_sari_vec,
+        INDEX_op_shli_vec, INDEX_op_shri_vec, INDEX_op_shrv_vec, 0
+    };
+    static const GVecGen3 ops[2] = {
+        {
+            .fniv = gen_vrlnm_vec,
+            .fno = gen_helper_VRLWNM,
+            .opt_opc = vecop_list,
+            .load_dest = true,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vrlnm_vec,
+            .fno = gen_helper_VRLDNM,
+            .opt_opc = vecop_list,
+            .load_dest = true,
+            .vece = MO_64
+        }
+    };
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+    REQUIRE_VSX(ctx);
+
+    tcg_gen_gvec_3(avr_full_offset(a->vrt), avr_full_offset(a->vra),
+                   avr_full_offset(a->vrb), 16, 16, &ops[vece - 2]);
+
+    return true;
+}
+
+TRANS(VRLWNM, do_vrlnm, MO_32)
+TRANS(VRLDNM, do_vrlnm, MO_64)
+
+static void gen_vrlmi_vec(unsigned vece, TCGv_vec vrt, TCGv_vec vra,
+                          TCGv_vec vrb)
+{
+    TCGv_vec mask, n = tcg_temp_new_vec_matching(vrt),
+             tmp = tcg_temp_new_vec_matching(vrt);
+
+    /* Create the mask */
+    mask = do_vrl_mask_vec(vece, vrb);
+
+    /* Extract n */
+    tcg_gen_dupi_vec(vece, n, (8 << vece) - 1);
+    tcg_gen_and_vec(vece, n, vrb, n);
+
+    /* Rotate and insert */
+    tcg_gen_rotlv_vec(vece, tmp, vra, n);
+    tcg_gen_bitsel_vec(vece, vrt, mask, tmp, vrt);
+
+    tcg_temp_free_vec(n);
+    tcg_temp_free_vec(tmp);
+    tcg_temp_free_vec(mask);
+}
+
+static bool do_vrlmi(DisasContext *ctx, arg_VX *a, int vece)
+{
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_cmp_vec, INDEX_op_rotlv_vec, INDEX_op_sari_vec,
+        INDEX_op_shli_vec, INDEX_op_shri_vec, INDEX_op_shrv_vec, 0
+    };
+    static const GVecGen3 ops[2] = {
+        {
+            .fniv = gen_vrlmi_vec,
+            .fno = gen_helper_VRLWMI,
+            .opt_opc = vecop_list,
+            .load_dest = true,
+            .vece = MO_32
+        },
+        {
+            .fniv = gen_vrlnm_vec,
+            .fno = gen_helper_VRLDMI,
+            .opt_opc = vecop_list,
+            .load_dest = true,
+            .vece = MO_64
+        }
+    };
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+    REQUIRE_VSX(ctx);
+
+    tcg_gen_gvec_3(avr_full_offset(a->vrt), avr_full_offset(a->vra),
+                   avr_full_offset(a->vrb), 16, 16, &ops[vece - 2]);
+
+    return true;
+}
+
+TRANS(VRLWMI, do_vrlmi, MO_32)
+TRANS(VRLDMI, do_vrlmi, MO_64)
+
 static bool do_vector_shift_quad(DisasContext *ctx, arg_VX *a, bool right,
                                  bool alg)
 {
@@ -973,12 +1118,7 @@ GEN_VXFORM3(vsubeuqm, 31, 0);
 GEN_VXFORM3(vsubecuq, 31, 0);
 GEN_VXFORM_DUAL(vsubeuqm, PPC_NONE, PPC2_ALTIVEC_207, \
             vsubecuq, PPC_NONE, PPC2_ALTIVEC_207)
-GEN_VXFORM(vrlwmi, 2, 2);
-GEN_VXFORM(vrldmi, 2, 3);
 GEN_VXFORM_TRANS(vsl, 2, 7);
-GEN_VXFORM(vrldnm, 2, 7);
-GEN_VXFORM_DUAL(vsl, PPC_ALTIVEC, PPC_NONE, \
-                vrldnm, PPC_NONE, PPC2_ISA300)
 GEN_VXFORM_TRANS(vsr, 2, 11);
 GEN_VXFORM_ENV(vpkuhum, 7, 0);
 GEN_VXFORM_ENV(vpkuwum, 7, 1);
diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc
index a7acea3ca7..3a8a9cc564 100644
--- a/target/ppc/translate/vmx-ops.c.inc
+++ b/target/ppc/translate/vmx-ops.c.inc
@@ -102,7 +102,6 @@ GEN_VXFORM_300(vextubrx, 6, 28),
 GEN_VXFORM_300(vextuhrx, 6, 29),
 GEN_VXFORM_DUAL(vmrgew, vextuwrx, 6, 30, PPC_NONE, PPC2_ALTIVEC_207),
 GEN_VXFORM_207(vmuluwm, 4, 2),
-GEN_VXFORM_300(vrlwnm, 2, 6),
 GEN_VXFORM_300(vsrv, 2, 28),
 GEN_VXFORM_300(vslv, 2, 29),
 GEN_VXFORM(vslo, 6, 16),
@@ -133,9 +132,7 @@ GEN_VXFORM_DUAL(vaddeuqm, vaddecuq, 30, 0xFF, PPC_NONE, PPC2_ALTIVEC_207),
 GEN_VXFORM_DUAL(vsubuqm, bcdtrunc, 0, 20, PPC2_ALTIVEC_207, PPC2_ISA300),
 GEN_VXFORM_DUAL(vsubcuq, bcdutrunc, 0, 21, PPC2_ALTIVEC_207, PPC2_ISA300),
 GEN_VXFORM_DUAL(vsubeuqm, vsubecuq, 31, 0xFF, PPC_NONE, PPC2_ALTIVEC_207),
-GEN_VXFORM_300(vrlwmi, 2, 2),
-GEN_VXFORM_300(vrldmi, 2, 3),
-GEN_VXFORM_DUAL(vsl, vrldnm, 2, 7, PPC_ALTIVEC, PPC_NONE),
+GEN_VXFORM(vsl, 2, 7),
 GEN_VXFORM(vsr, 2, 11),
 GEN_VXFORM(vpkuhum, 7, 0),
 GEN_VXFORM(vpkuwum, 7, 1),
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 25/47] target/ppc: implement vrlq
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (23 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 24/47] target/ppc: move vrl[bhwd]nm/vrl[bhwd]mi " matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 22:33   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 26/47] target/ppc: Move vsel and vperm/vpermr to decodetree matheus.ferst
                   ` (21 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
v4:
 -  New in v4.
---
 target/ppc/insn32.decode            |  1 +
 target/ppc/translate/vmx-impl.c.inc | 49 +++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index e788dc5152..c3d47a8815 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -491,6 +491,7 @@ VRLB            000100 ..... ..... ..... 00000000100    @VX
 VRLH            000100 ..... ..... ..... 00001000100    @VX
 VRLW            000100 ..... ..... ..... 00010000100    @VX
 VRLD            000100 ..... ..... ..... 00011000100    @VX
+VRLQ            000100 ..... ..... ..... 00000000101    @VX
 
 VRLWMI          000100 ..... ..... ..... 00010000101    @VX
 VRLDMI          000100 ..... ..... ..... 00011000101    @VX
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index a025404032..6b68a81706 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1053,6 +1053,55 @@ TRANS_FLAGS2(ISA310, VSLQ, do_vector_shift_quad, false, false);
 TRANS_FLAGS2(ISA310, VSRQ, do_vector_shift_quad, true, false);
 TRANS_FLAGS2(ISA310, VSRAQ, do_vector_shift_quad, true, true);
 
+static bool trans_VRLQ(DisasContext *ctx, arg_VX *a)
+{
+    TCGv_i64 ah, al, n, t0, t1, sf = tcg_constant_i64(64);
+
+    REQUIRE_VECTOR(ctx);
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+
+    ah = tcg_temp_new_i64();
+    al = tcg_temp_new_i64();
+    n = tcg_temp_new_i64();
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+
+    get_avr64(ah, a->vra, true);
+    get_avr64(al, a->vra, false);
+    get_avr64(n, a->vrb, true);
+
+    tcg_gen_andi_i64(n, n, 0x7F);
+
+    tcg_gen_mov_i64(t0, ah);
+    tcg_gen_movcond_i64(TCG_COND_GE, ah, n, sf, al, ah);
+    tcg_gen_movcond_i64(TCG_COND_GE, al, n, sf, t0, al);
+    tcg_gen_andi_i64(n, n, ~64ULL);
+
+    tcg_gen_shl_i64(t0, ah, n);
+    tcg_gen_shl_i64(t1, al, n);
+
+    tcg_gen_xori_i64(n, n, 63);
+
+    tcg_gen_shr_i64(al, al, n);
+    tcg_gen_shri_i64(al, al, 1);
+    tcg_gen_or_i64(t0, al, t0);
+
+    tcg_gen_shr_i64(ah, ah, n);
+    tcg_gen_shri_i64(ah, ah, 1);
+    tcg_gen_or_i64(t1, ah, t1);
+
+    set_avr64(a->vrt, t0, true);
+    set_avr64(a->vrt, t1, false);
+
+    tcg_temp_free_i64(ah);
+    tcg_temp_free_i64(al);
+    tcg_temp_free_i64(n);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+
+    return true;
+}
+
 #define GEN_VXFORM_SAT(NAME, VECE, NORM, SAT, OPC2, OPC3)               \
 static void glue(glue(gen_, NAME), _vec)(unsigned vece, TCGv_vec t,     \
                                          TCGv_vec sat, TCGv_vec a,      \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 26/47] target/ppc: Move vsel and vperm/vpermr to decodetree
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (24 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 25/47] target/ppc: implement vrlq matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 22:37   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 27/47] target/ppc: Move xxsel " matheus.ferst
                   ` (20 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/helper.h                 |  5 +--
 target/ppc/insn32.decode            |  5 +++
 target/ppc/int_helper.c             | 13 +-----
 target/ppc/translate/vmx-impl.c.inc | 69 ++++++++++++++++++++++-------
 target/ppc/translate/vmx-ops.c.inc  |  2 -
 5 files changed, 62 insertions(+), 32 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index a2a0d461dd..c57b3035ae 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -227,9 +227,8 @@ DEF_HELPER_2(vupklsh, void, avr, avr)
 DEF_HELPER_2(vupklsw, void, avr, avr)
 DEF_HELPER_5(vmsumubm, void, env, avr, avr, avr, avr)
 DEF_HELPER_5(vmsummbm, void, env, avr, avr, avr, avr)
-DEF_HELPER_5(vsel, void, env, avr, avr, avr, avr)
-DEF_HELPER_5(vperm, void, env, avr, avr, avr, avr)
-DEF_HELPER_5(vpermr, void, env, avr, avr, avr, avr)
+DEF_HELPER_4(VPERM, void, avr, avr, avr, avr)
+DEF_HELPER_4(VPERMR, void, avr, avr, avr, avr)
 DEF_HELPER_4(vpkshss, void, env, avr, avr, avr)
 DEF_HELPER_4(vpkshus, void, env, avr, avr, avr)
 DEF_HELPER_4(vpkswss, void, env, avr, avr, avr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index c3d47a8815..1456fa2b9d 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -467,6 +467,11 @@ VINSWVRX        000100 ..... ..... ..... 00110001111    @VX
 VSLDBI          000100 ..... ..... ..... 00 ... 010110  @VN
 VSRDBI          000100 ..... ..... ..... 01 ... 010110  @VN
 
+VPERM           000100 ..... ..... ..... ..... 101011   @VA
+VPERMR          000100 ..... ..... ..... ..... 111011   @VA
+
+VSEL            000100 ..... ..... ..... ..... 101010   @VA
+
 ## Vector Integer Shift Instruction
 
 VSLB            000100 ..... ..... ..... 00100000100    @VX
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 58e57b2563..05978b686d 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1031,8 +1031,7 @@ void helper_VMULOUD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     mulu64(&r->VsrD(1), &r->VsrD(0), a->VsrD(1), b->VsrD(1));
 }
 
-void helper_vperm(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b,
-                  ppc_avr_t *c)
+void helper_VPERM(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
 {
     ppc_avr_t result;
     int i;
@@ -1050,8 +1049,7 @@ void helper_vperm(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b,
     *r = result;
 }
 
-void helper_vpermr(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b,
-                  ppc_avr_t *c)
+void helper_VPERMR(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
 {
     ppc_avr_t result;
     int i;
@@ -1319,13 +1317,6 @@ VRLMI(VRLWMI, 32, u32, 1);
 VRLMI(VRLDNM, 64, u64, 0);
 VRLMI(VRLWNM, 32, u32, 0);
 
-void helper_vsel(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b,
-                 ppc_avr_t *c)
-{
-    r->u64[0] = (a->u64[0] & ~c->u64[0]) | (b->u64[0] & c->u64[0]);
-    r->u64[1] = (a->u64[1] & ~c->u64[1]) | (b->u64[1] & c->u64[1]);
-}
-
 void helper_vexptefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
 {
     int i;
diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
index 6b68a81706..f734f449e0 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -2474,28 +2474,65 @@ static void gen_vmladduhm(DisasContext *ctx)
     tcg_temp_free_ptr(rd);
 }
 
-static void gen_vpermr(DisasContext *ctx)
+static bool trans_VPERM(DisasContext *ctx, arg_VA *a)
 {
-    TCGv_ptr ra, rb, rc, rd;
-    if (unlikely(!ctx->altivec_enabled)) {
-        gen_exception(ctx, POWERPC_EXCP_VPU);
-        return;
-    }
-    ra = gen_avr_ptr(rA(ctx->opcode));
-    rb = gen_avr_ptr(rB(ctx->opcode));
-    rc = gen_avr_ptr(rC(ctx->opcode));
-    rd = gen_avr_ptr(rD(ctx->opcode));
-    gen_helper_vpermr(cpu_env, rd, ra, rb, rc);
-    tcg_temp_free_ptr(ra);
-    tcg_temp_free_ptr(rb);
-    tcg_temp_free_ptr(rc);
-    tcg_temp_free_ptr(rd);
+    TCGv_ptr vrt, vra, vrb, vrc;
+
+    REQUIRE_INSNS_FLAGS(ctx, ALTIVEC);
+    REQUIRE_VECTOR(ctx);
+
+    vrt = gen_avr_ptr(a->vrt);
+    vra = gen_avr_ptr(a->vra);
+    vrb = gen_avr_ptr(a->vrb);
+    vrc = gen_avr_ptr(a->rc);
+
+    gen_helper_VPERM(vrt, vra, vrb, vrc);
+
+    tcg_temp_free_ptr(vrt);
+    tcg_temp_free_ptr(vra);
+    tcg_temp_free_ptr(vrb);
+    tcg_temp_free_ptr(vrc);
+
+    return true;
+}
+
+static bool trans_VPERMR(DisasContext *ctx, arg_VA *a)
+{
+    TCGv_ptr vrt, vra, vrb, vrc;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+    REQUIRE_VECTOR(ctx);
+
+    vrt = gen_avr_ptr(a->vrt);
+    vra = gen_avr_ptr(a->vra);
+    vrb = gen_avr_ptr(a->vrb);
+    vrc = gen_avr_ptr(a->rc);
+
+    gen_helper_VPERMR(vrt, vra, vrb, vrc);
+
+    tcg_temp_free_ptr(vrt);
+    tcg_temp_free_ptr(vra);
+    tcg_temp_free_ptr(vrb);
+    tcg_temp_free_ptr(vrc);
+
+    return true;
+}
+
+static bool trans_VSEL(DisasContext *ctx, arg_VA *a)
+{
+    REQUIRE_INSNS_FLAGS(ctx, ALTIVEC);
+    REQUIRE_VECTOR(ctx);
+
+    tcg_gen_gvec_bitsel(MO_64, avr_full_offset(a->vrt), avr_full_offset(a->rc),
+                        avr_full_offset(a->vrb), avr_full_offset(a->vra),
+                        16, 16);
+
+    return true;
 }
 
 GEN_VAFORM_PAIRED(vmsumubm, vmsummbm, 18)
 GEN_VAFORM_PAIRED(vmsumuhm, vmsumuhs, 19)
 GEN_VAFORM_PAIRED(vmsumshm, vmsumshs, 20)
-GEN_VAFORM_PAIRED(vsel, vperm, 21)
 GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23)
 
 GEN_VXFORM_NOA(vclzb, 1, 28)
diff --git a/target/ppc/translate/vmx-ops.c.inc b/target/ppc/translate/vmx-ops.c.inc
index 3a8a9cc564..d960648d52 100644
--- a/target/ppc/translate/vmx-ops.c.inc
+++ b/target/ppc/translate/vmx-ops.c.inc
@@ -194,7 +194,6 @@ GEN_VXFORM_300_EO(vctzw, 0x01, 0x18, 0x1E),
 GEN_VXFORM_300_EO(vctzd, 0x01, 0x18, 0x1F),
 GEN_VXFORM_300_EO(vclzlsbb, 0x01, 0x18, 0x0),
 GEN_VXFORM_300_EO(vctzlsbb, 0x01, 0x18, 0x1),
-GEN_VXFORM_300(vpermr, 0x1D, 0xFF),
 
 #define GEN_VXFORM_NOA(name, opc2, opc3)                                \
     GEN_HANDLER(name, 0x04, opc2, opc3, 0x001f0000, PPC_ALTIVEC)
@@ -229,7 +228,6 @@ GEN_VAFORM_PAIRED(vmhaddshs, vmhraddshs, 16),
 GEN_VAFORM_PAIRED(vmsumubm, vmsummbm, 18),
 GEN_VAFORM_PAIRED(vmsumuhm, vmsumuhs, 19),
 GEN_VAFORM_PAIRED(vmsumshm, vmsumshs, 20),
-GEN_VAFORM_PAIRED(vsel, vperm, 21),
 GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23),
 
 GEN_VXFORM_DUAL(vclzb, vpopcntb, 1, 28, PPC_NONE, PPC2_ALTIVEC_207),
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 27/47] target/ppc: Move xxsel to decodetree
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (25 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 26/47] target/ppc: Move vsel and vperm/vpermr to decodetree matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 22:38   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 28/47] target/ppc: move xxperm/xxpermr " matheus.ferst
                   ` (19 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/insn32.decode            |  6 ++++
 target/ppc/insn64.decode            | 24 ++++++++--------
 target/ppc/translate/vsx-impl.c.inc | 20 ++++++--------
 target/ppc/translate/vsx-ops.c.inc  | 43 -----------------------------
 4 files changed, 26 insertions(+), 67 deletions(-)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 1456fa2b9d..ad2aa0257c 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -148,12 +148,16 @@
 %xx_xt          0:1 21:5
 %xx_xb          1:1 11:5
 %xx_xa          2:1 16:5
+%xx_xc          3:1 6:5
 &XX2            xt xb uim:uint8_t
 @XX2            ...... ..... ... uim:2 ..... ......... ..       &XX2 xt=%xx_xt xb=%xx_xb
 
 &XX3            xt xa xb
 @XX3            ...... ..... ..... ..... ........ ...           &XX3 xt=%xx_xt xa=%xx_xa xb=%xx_xb
 
+&XX4            xt xa xb xc
+@XX4            ...... ..... ..... ..... ..... .. ....          &XX4 xt=%xx_xt xa=%xx_xa xb=%xx_xb xc=%xx_xc
+
 &Z22_bf_fra     bf fra dm
 @Z22_bf_fra     ...... bf:3 .. fra:5 dm:6 ......... .           &Z22_bf_fra
 
@@ -598,6 +602,8 @@ STXVPX          011111 ..... ..... ..... 0111001101 -   @X_TSXP
 XXSPLTIB        111100 ..... 00 ........ 0101101000 .   @X_imm8
 XXSPLTW         111100 ..... ---.. ..... 010100100 . .  @XX2
 
+XXSEL           111100 ..... ..... ..... ..... 11 ....  @XX4
+
 ## VSX Vector Load Special Value Instruction
 
 LXVKQ           111100 ..... 11111 ..... 0101101000 .   @X_uim5
diff --git a/target/ppc/insn64.decode b/target/ppc/insn64.decode
index 39e610913d..9e4f531fb9 100644
--- a/target/ppc/insn64.decode
+++ b/target/ppc/insn64.decode
@@ -44,15 +44,15 @@
                 ...... ..... ....  . ................ \
                 &8RR_D si=%8rr_si xt=%8rr_xt
 
-# Format XX4
-&XX4            xt xa xb xc
-%xx4_xt         0:1 21:5
-%xx4_xa         2:1 16:5
-%xx4_xb         1:1 11:5
-%xx4_xc         3:1  6:5
-@XX4            ........ ........ ........ ........ \
+# Format 8RR:XX4
+%8rr_xx_xt      0:1 21:5
+%8rr_xx_xa      2:1 16:5
+%8rr_xx_xb      1:1 11:5
+%8rr_xx_xc      3:1  6:5
+&8RR_XX4        xt xa xb xc
+@8RR_XX4        ........ ........ ........ ........ \
                 ...... ..... ..... ..... ..... .. .... \
-                &XX4 xt=%xx4_xt xa=%xx4_xa xb=%xx4_xb xc=%xx4_xc
+                &8RR_XX4 xt=%8rr_xx_xt xa=%8rr_xx_xa xb=%8rr_xx_xb xc=%8rr_xx_xc
 
 ### Fixed-Point Load Instructions
 
@@ -187,10 +187,10 @@ XXSPLTI32DX     000001 01 0000 -- -- ................ \
                 100000 ..... 000 .. ................    @8RR_D_IX
 
 XXBLENDVD       000001 01 0000 -- ------------------ \
-                100001 ..... ..... ..... ..... 11 ....  @XX4
+                100001 ..... ..... ..... ..... 11 ....  @8RR_XX4
 XXBLENDVW       000001 01 0000 -- ------------------ \
-                100001 ..... ..... ..... ..... 10 ....  @XX4
+                100001 ..... ..... ..... ..... 10 ....  @8RR_XX4
 XXBLENDVH       000001 01 0000 -- ------------------ \
-                100001 ..... ..... ..... ..... 01 ....  @XX4
+                100001 ..... ..... ..... ..... 01 ....  @8RR_XX4
 XXBLENDVB       000001 01 0000 -- ------------------ \
-                100001 ..... ..... ..... ..... 00 ....  @XX4
+                100001 ..... ..... ..... ..... 00 ....  @8RR_XX4
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index e8a4ba0cfa..48e4a2e266 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -1422,19 +1422,15 @@ static void glue(gen_, name)(DisasContext *ctx)             \
 VSX_XXMRG(xxmrghw, 1)
 VSX_XXMRG(xxmrglw, 0)
 
-static void gen_xxsel(DisasContext *ctx)
+static bool trans_XXSEL(DisasContext *ctx, arg_XX4 *a)
 {
-    int rt = xT(ctx->opcode);
-    int ra = xA(ctx->opcode);
-    int rb = xB(ctx->opcode);
-    int rc = xC(ctx->opcode);
+    REQUIRE_INSNS_FLAGS2(ctx, VSX);
+    REQUIRE_VSX(ctx);
 
-    if (unlikely(!ctx->vsx_enabled)) {
-        gen_exception(ctx, POWERPC_EXCP_VSXU);
-        return;
-    }
-    tcg_gen_gvec_bitsel(MO_64, vsr_full_offset(rt), vsr_full_offset(rc),
-                        vsr_full_offset(rb), vsr_full_offset(ra), 16, 16);
+    tcg_gen_gvec_bitsel(MO_64, vsr_full_offset(a->xt), vsr_full_offset(a->xc),
+                        vsr_full_offset(a->xb), vsr_full_offset(a->xa), 16, 16);
+
+    return true;
 }
 
 static bool trans_XXSPLTW(DisasContext *ctx, arg_XX2 *a)
@@ -2127,7 +2123,7 @@ static void gen_xxblendv_vec(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b,
     tcg_temp_free_vec(tmp);
 }
 
-static bool do_xxblendv(DisasContext *ctx, arg_XX4 *a, unsigned vece)
+static bool do_xxblendv(DisasContext *ctx, arg_8RR_XX4 *a, unsigned vece)
 {
     static const TCGOpcode vecop_list[] = {
         INDEX_op_sari_vec, 0
diff --git a/target/ppc/translate/vsx-ops.c.inc b/target/ppc/translate/vsx-ops.c.inc
index c974324c4c..b0dbb38c80 100644
--- a/target/ppc/translate/vsx-ops.c.inc
+++ b/target/ppc/translate/vsx-ops.c.inc
@@ -347,47 +347,4 @@ GEN_XX3FORM_DM(xxsldwi, 0x08, 0x00),
 GEN_XX2FORM_EXT(xxextractuw, 0x0A, 0x0A, PPC2_ISA300),
 GEN_XX2FORM_EXT(xxinsertw, 0x0A, 0x0B, PPC2_ISA300),
 
-#define GEN_XXSEL_ROW(opc3) \
-GEN_HANDLER2_E(xxsel, "xxsel", 0x3C, 0x18, opc3, 0, PPC_NONE, PPC2_VSX), \
-GEN_HANDLER2_E(xxsel, "xxsel", 0x3C, 0x19, opc3, 0, PPC_NONE, PPC2_VSX), \
-GEN_HANDLER2_E(xxsel, "xxsel", 0x3C, 0x1A, opc3, 0, PPC_NONE, PPC2_VSX), \
-GEN_HANDLER2_E(xxsel, "xxsel", 0x3C, 0x1B, opc3, 0, PPC_NONE, PPC2_VSX), \
-GEN_HANDLER2_E(xxsel, "xxsel", 0x3C, 0x1C, opc3, 0, PPC_NONE, PPC2_VSX), \
-GEN_HANDLER2_E(xxsel, "xxsel", 0x3C, 0x1D, opc3, 0, PPC_NONE, PPC2_VSX), \
-GEN_HANDLER2_E(xxsel, "xxsel", 0x3C, 0x1E, opc3, 0, PPC_NONE, PPC2_VSX), \
-GEN_HANDLER2_E(xxsel, "xxsel", 0x3C, 0x1F, opc3, 0, PPC_NONE, PPC2_VSX), \
-
-GEN_XXSEL_ROW(0x00)
-GEN_XXSEL_ROW(0x01)
-GEN_XXSEL_ROW(0x02)
-GEN_XXSEL_ROW(0x03)
-GEN_XXSEL_ROW(0x04)
-GEN_XXSEL_ROW(0x05)
-GEN_XXSEL_ROW(0x06)
-GEN_XXSEL_ROW(0x07)
-GEN_XXSEL_ROW(0x08)
-GEN_XXSEL_ROW(0x09)
-GEN_XXSEL_ROW(0x0A)
-GEN_XXSEL_ROW(0x0B)
-GEN_XXSEL_ROW(0x0C)
-GEN_XXSEL_ROW(0x0D)
-GEN_XXSEL_ROW(0x0E)
-GEN_XXSEL_ROW(0x0F)
-GEN_XXSEL_ROW(0x10)
-GEN_XXSEL_ROW(0x11)
-GEN_XXSEL_ROW(0x12)
-GEN_XXSEL_ROW(0x13)
-GEN_XXSEL_ROW(0x14)
-GEN_XXSEL_ROW(0x15)
-GEN_XXSEL_ROW(0x16)
-GEN_XXSEL_ROW(0x17)
-GEN_XXSEL_ROW(0x18)
-GEN_XXSEL_ROW(0x19)
-GEN_XXSEL_ROW(0x1A)
-GEN_XXSEL_ROW(0x1B)
-GEN_XXSEL_ROW(0x1C)
-GEN_XXSEL_ROW(0x1D)
-GEN_XXSEL_ROW(0x1E)
-GEN_XXSEL_ROW(0x1F)
-
 GEN_XX3FORM_DM(xxpermdi, 0x08, 0x01),
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 28/47] target/ppc: move xxperm/xxpermr to decodetree
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (26 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 27/47] target/ppc: Move xxsel " matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 22:40   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 29/47] target/ppc: Move xxpermdi " matheus.ferst
                   ` (18 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/fpu_helper.c             | 21 ---------------
 target/ppc/helper.h                 |  2 --
 target/ppc/insn32.decode            |  5 ++++
 target/ppc/translate/vsx-impl.c.inc | 42 +++++++++++++++++++++++++++--
 target/ppc/translate/vsx-ops.c.inc  |  2 --
 5 files changed, 45 insertions(+), 27 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index bd76bee7f1..0fd285defc 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -3055,27 +3055,6 @@ uint64_t helper_xsrsp(CPUPPCState *env, uint64_t xb)
     return xt;
 }
 
-#define VSX_XXPERM(op, indexed)                                       \
-void helper_##op(CPUPPCState *env, ppc_vsr_t *xt,                     \
-                 ppc_vsr_t *xa, ppc_vsr_t *pcv)                       \
-{                                                                     \
-    ppc_vsr_t t = *xt;                                                \
-    int i, idx;                                                       \
-                                                                      \
-    for (i = 0; i < 16; i++) {                                        \
-        idx = pcv->VsrB(i) & 0x1F;                                    \
-        if (indexed) {                                                \
-            idx = 31 - idx;                                           \
-        }                                                             \
-        t.VsrB(i) = (idx <= 15) ? xa->VsrB(idx)                       \
-                                : xt->VsrB(idx - 16);                 \
-    }                                                                 \
-    *xt = t;                                                          \
-}
-
-VSX_XXPERM(xxperm, 0)
-VSX_XXPERM(xxpermr, 1)
-
 void helper_xvxsigsp(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)
 {
     ppc_vsr_t t = { };
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index c57b3035ae..7514eebf6a 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -496,8 +496,6 @@ DEF_HELPER_3(xvrspic, void, env, vsr, vsr)
 DEF_HELPER_3(xvrspim, void, env, vsr, vsr)
 DEF_HELPER_3(xvrspip, void, env, vsr, vsr)
 DEF_HELPER_3(xvrspiz, void, env, vsr, vsr)
-DEF_HELPER_4(xxperm, void, env, vsr, vsr, vsr)
-DEF_HELPER_4(xxpermr, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xxextractuw, void, env, vsr, vsr, i32)
 DEF_HELPER_4(xxinsertw, void, env, vsr, vsr, i32)
 DEF_HELPER_3(xvxsigsp, void, env, vsr, vsr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index ad2aa0257c..5fc29eabc6 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -602,6 +602,11 @@ STXVPX          011111 ..... ..... ..... 0111001101 -   @X_TSXP
 XXSPLTIB        111100 ..... 00 ........ 0101101000 .   @X_imm8
 XXSPLTW         111100 ..... ---.. ..... 010100100 . .  @XX2
 
+## VSX Permute Instructions
+
+XXPERM          111100 ..... ..... ..... 00011010 ...   @XX3
+XXPERMR         111100 ..... ..... ..... 00111010 ...   @XX3
+
 XXSEL           111100 ..... ..... ..... ..... 11 ....  @XX4
 
 ## VSX Vector Load Special Value Instruction
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 48e4a2e266..7ce90f18a5 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -1200,8 +1200,46 @@ GEN_VSX_HELPER_X2(xvrspip, 0x12, 0x0A, 0, PPC2_VSX)
 GEN_VSX_HELPER_X2(xvrspiz, 0x12, 0x09, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xvtstdcsp, 0x14, 0x1A, 0, PPC2_VSX)
 GEN_VSX_HELPER_2(xvtstdcdp, 0x14, 0x1E, 0, PPC2_VSX)
-GEN_VSX_HELPER_X3(xxperm, 0x08, 0x03, 0, PPC2_ISA300)
-GEN_VSX_HELPER_X3(xxpermr, 0x08, 0x07, 0, PPC2_ISA300)
+
+static bool trans_XXPERM(DisasContext *ctx, arg_XX3 *a)
+{
+    TCGv_ptr xt, xa, xb;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+    REQUIRE_VSX(ctx);
+
+    xt = gen_vsr_ptr(a->xt);
+    xa = gen_vsr_ptr(a->xa);
+    xb = gen_vsr_ptr(a->xb);
+
+    gen_helper_VPERM(xt, xa, xt, xb);
+
+    tcg_temp_free_ptr(xt);
+    tcg_temp_free_ptr(xa);
+    tcg_temp_free_ptr(xb);
+
+    return true;
+}
+
+static bool trans_XXPERMR(DisasContext *ctx, arg_XX3 *a)
+{
+    TCGv_ptr xt, xa, xb;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+    REQUIRE_VSX(ctx);
+
+    xt = gen_vsr_ptr(a->xt);
+    xa = gen_vsr_ptr(a->xa);
+    xb = gen_vsr_ptr(a->xb);
+
+    gen_helper_VPERMR(xt, xa, xt, xb);
+
+    tcg_temp_free_ptr(xt);
+    tcg_temp_free_ptr(xa);
+    tcg_temp_free_ptr(xb);
+
+    return true;
+}
 
 #define GEN_VSX_HELPER_VSX_MADD(name, op1, aop, mop, inval, type)             \
 static void gen_##name(DisasContext *ctx)                                     \
diff --git a/target/ppc/translate/vsx-ops.c.inc b/target/ppc/translate/vsx-ops.c.inc
index b0dbb38c80..86ed1a996a 100644
--- a/target/ppc/translate/vsx-ops.c.inc
+++ b/target/ppc/translate/vsx-ops.c.inc
@@ -341,8 +341,6 @@ VSX_LOGICAL(xxlnand, 0x8, 0x16, PPC2_VSX207),
 VSX_LOGICAL(xxlorc, 0x8, 0x15, PPC2_VSX207),
 GEN_XX3FORM(xxmrghw, 0x08, 0x02, PPC2_VSX),
 GEN_XX3FORM(xxmrglw, 0x08, 0x06, PPC2_VSX),
-GEN_XX3FORM(xxperm, 0x08, 0x03, PPC2_ISA300),
-GEN_XX3FORM(xxpermr, 0x08, 0x07, PPC2_ISA300),
 GEN_XX3FORM_DM(xxsldwi, 0x08, 0x00),
 GEN_XX2FORM_EXT(xxextractuw, 0x0A, 0x0A, PPC2_ISA300),
 GEN_XX2FORM_EXT(xxinsertw, 0x0A, 0x0B, PPC2_ISA300),
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 29/47] target/ppc: Move xxpermdi to decodetree
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (27 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 28/47] target/ppc: move xxperm/xxpermr " matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 22:42   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 30/47] target/ppc: Implement xxpermx instruction matheus.ferst
                   ` (17 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/insn32.decode            |  4 ++
 target/ppc/translate/vsx-impl.c.inc | 71 +++++++++++++----------------
 target/ppc/translate/vsx-ops.c.inc  |  2 -
 3 files changed, 36 insertions(+), 41 deletions(-)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 5fc29eabc6..185d697458 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -155,6 +155,9 @@
 &XX3            xt xa xb
 @XX3            ...... ..... ..... ..... ........ ...           &XX3 xt=%xx_xt xa=%xx_xa xb=%xx_xb
 
+&XX3_dm         xt xa xb dm
+@XX3_dm         ...... ..... ..... ..... . dm:2 ..... ...       &XX3_dm xt=%xx_xt xa=%xx_xa xb=%xx_xb
+
 &XX4            xt xa xb xc
 @XX4            ...... ..... ..... ..... ..... .. ....          &XX4 xt=%xx_xt xa=%xx_xa xb=%xx_xb xc=%xx_xc
 
@@ -606,6 +609,7 @@ XXSPLTW         111100 ..... ---.. ..... 010100100 . .  @XX2
 
 XXPERM          111100 ..... ..... ..... 00011010 ...   @XX3
 XXPERMR         111100 ..... ..... ..... 00111010 ...   @XX3
+XXPERMDI        111100 ..... ..... ..... 0 .. 01010 ... @XX3_dm
 
 XXSEL           111100 ..... ..... ..... ..... 11 ....  @XX4
 
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 7ce90f18a5..cdefa13590 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -665,45 +665,6 @@ static void gen_mtvsrws(DisasContext *ctx)
 
 #endif
 
-static void gen_xxpermdi(DisasContext *ctx)
-{
-    TCGv_i64 xh, xl;
-
-    if (unlikely(!ctx->vsx_enabled)) {
-        gen_exception(ctx, POWERPC_EXCP_VSXU);
-        return;
-    }
-
-    xh = tcg_temp_new_i64();
-    xl = tcg_temp_new_i64();
-
-    if (unlikely((xT(ctx->opcode) == xA(ctx->opcode)) ||
-                 (xT(ctx->opcode) == xB(ctx->opcode)))) {
-        get_cpu_vsr(xh, xA(ctx->opcode), (DM(ctx->opcode) & 2) == 0);
-        get_cpu_vsr(xl, xB(ctx->opcode), (DM(ctx->opcode) & 1) == 0);
-
-        set_cpu_vsr(xT(ctx->opcode), xh, true);
-        set_cpu_vsr(xT(ctx->opcode), xl, false);
-    } else {
-        if ((DM(ctx->opcode) & 2) == 0) {
-            get_cpu_vsr(xh, xA(ctx->opcode), true);
-            set_cpu_vsr(xT(ctx->opcode), xh, true);
-        } else {
-            get_cpu_vsr(xh, xA(ctx->opcode), false);
-            set_cpu_vsr(xT(ctx->opcode), xh, true);
-        }
-        if ((DM(ctx->opcode) & 1) == 0) {
-            get_cpu_vsr(xl, xB(ctx->opcode), true);
-            set_cpu_vsr(xT(ctx->opcode), xl, false);
-        } else {
-            get_cpu_vsr(xl, xB(ctx->opcode), false);
-            set_cpu_vsr(xT(ctx->opcode), xl, false);
-        }
-    }
-    tcg_temp_free_i64(xh);
-    tcg_temp_free_i64(xl);
-}
-
 #define OP_ABS 1
 #define OP_NABS 2
 #define OP_NEG 3
@@ -1241,6 +1202,38 @@ static bool trans_XXPERMR(DisasContext *ctx, arg_XX3 *a)
     return true;
 }
 
+static bool trans_XXPERMDI(DisasContext *ctx, arg_XX3_dm *a)
+{
+    TCGv_i64 t0, t1;
+
+    REQUIRE_INSNS_FLAGS2(ctx, VSX);
+    REQUIRE_VSX(ctx);
+
+    t0 = tcg_temp_new_i64();
+
+    if (unlikely(a->xt == a->xa || a->xt == a->xb)) {
+        t1 = tcg_temp_new_i64();
+
+        get_cpu_vsr(t0, a->xa, (a->dm & 2) == 0);
+        get_cpu_vsr(t1, a->xb, (a->dm & 1) == 0);
+
+        set_cpu_vsr(a->xt, t0, true);
+        set_cpu_vsr(a->xt, t1, false);
+
+        tcg_temp_free_i64(t1);
+    } else {
+        get_cpu_vsr(t0, a->xa, (a->dm & 2) == 0);
+        set_cpu_vsr(a->xt, t0, true);
+
+        get_cpu_vsr(t0, a->xb, (a->dm & 1) == 0);
+        set_cpu_vsr(a->xt, t0, false);
+    }
+
+    tcg_temp_free_i64(t0);
+
+    return true;
+}
+
 #define GEN_VSX_HELPER_VSX_MADD(name, op1, aop, mop, inval, type)             \
 static void gen_##name(DisasContext *ctx)                                     \
 {                                                                             \
diff --git a/target/ppc/translate/vsx-ops.c.inc b/target/ppc/translate/vsx-ops.c.inc
index 86ed1a996a..0a6b2b31ac 100644
--- a/target/ppc/translate/vsx-ops.c.inc
+++ b/target/ppc/translate/vsx-ops.c.inc
@@ -344,5 +344,3 @@ GEN_XX3FORM(xxmrglw, 0x08, 0x06, PPC2_VSX),
 GEN_XX3FORM_DM(xxsldwi, 0x08, 0x00),
 GEN_XX2FORM_EXT(xxextractuw, 0x0A, 0x0A, PPC2_ISA300),
 GEN_XX2FORM_EXT(xxinsertw, 0x0A, 0x0B, PPC2_ISA300),
-
-GEN_XX3FORM_DM(xxpermdi, 0x08, 0x01),
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 30/47] target/ppc: Implement xxpermx instruction
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (28 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 29/47] target/ppc: Move xxpermdi " matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 22:46   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 31/47] tcg/tcg-op-gvec.c: Introduce tcg_gen_gvec_4i matheus.ferst
                   ` (16 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/helper.h                 |  1 +
 target/ppc/insn64.decode            |  8 ++++++++
 target/ppc/int_helper.c             | 20 ++++++++++++++++++++
 target/ppc/translate/vsx-impl.c.inc | 22 ++++++++++++++++++++++
 4 files changed, 51 insertions(+)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 7514eebf6a..85a13057ca 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -497,6 +497,7 @@ DEF_HELPER_3(xvrspim, void, env, vsr, vsr)
 DEF_HELPER_3(xvrspip, void, env, vsr, vsr)
 DEF_HELPER_3(xvrspiz, void, env, vsr, vsr)
 DEF_HELPER_4(xxextractuw, void, env, vsr, vsr, i32)
+DEF_HELPER_5(XXPERMX, void, vsr, vsr, vsr, vsr, tl)
 DEF_HELPER_4(xxinsertw, void, env, vsr, vsr, i32)
 DEF_HELPER_3(xvxsigsp, void, env, vsr, vsr)
 DEF_HELPER_5(XXBLENDVB, void, vsr, vsr, vsr, vsr, i32)
diff --git a/target/ppc/insn64.decode b/target/ppc/insn64.decode
index 9e4f531fb9..0963e064b1 100644
--- a/target/ppc/insn64.decode
+++ b/target/ppc/insn64.decode
@@ -54,6 +54,11 @@
                 ...... ..... ..... ..... ..... .. .... \
                 &8RR_XX4 xt=%8rr_xx_xt xa=%8rr_xx_xa xb=%8rr_xx_xb xc=%8rr_xx_xc
 
+&8RR_XX4_uim3   xt xa xb xc uim3
+@8RR_XX4_uim3   ...... .. .... .. ............... uim3:3 \
+                ...... ..... ..... ..... ..... .. ....   \
+                &8RR_XX4_uim3 xt=%8rr_xx_xt xa=%8rr_xx_xa xb=%8rr_xx_xb xc=%8rr_xx_xc
+
 ### Fixed-Point Load Instructions
 
 PLBZ            000001 10 0--.-- .................. \
@@ -194,3 +199,6 @@ XXBLENDVH       000001 01 0000 -- ------------------ \
                 100001 ..... ..... ..... ..... 01 ....  @8RR_XX4
 XXBLENDVB       000001 01 0000 -- ------------------ \
                 100001 ..... ..... ..... ..... 00 ....  @8RR_XX4
+
+XXPERMX         000001 01 0000 -- --------------- ... \
+                100010 ..... ..... ..... ..... 00 ....  @8RR_XX4_uim3
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 05978b686d..a92a006c6d 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1031,6 +1031,26 @@ void helper_VMULOUD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
     mulu64(&r->VsrD(1), &r->VsrD(0), a->VsrD(1), b->VsrD(1));
 }
 
+void helper_XXPERMX(ppc_vsr_t *t, ppc_vsr_t *s0, ppc_vsr_t *s1, ppc_vsr_t *pcv,
+                    target_ulong uim)
+{
+    int i, idx;
+    ppc_vsr_t tmp = { .u64 = {0, 0} };
+
+    for (i = 0; i < ARRAY_SIZE(t->u8); i++) {
+        if ((pcv->VsrB(i) >> 5) == uim) {
+            idx = pcv->VsrB(i) & 0x1f;
+            if (idx < ARRAY_SIZE(t->u8)) {
+                tmp.VsrB(i) = s0->VsrB(idx);
+            } else {
+                tmp.VsrB(i) = s1->VsrB(idx - ARRAY_SIZE(t->u8));
+            }
+        }
+    }
+
+    *t = tmp;
+}
+
 void helper_VPERM(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
 {
     ppc_avr_t result;
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index cdefa13590..92851b8926 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -1234,6 +1234,28 @@ static bool trans_XXPERMDI(DisasContext *ctx, arg_XX3_dm *a)
     return true;
 }
 
+static bool trans_XXPERMX(DisasContext *ctx, arg_8RR_XX4_uim3 *a)
+{
+    TCGv_ptr xt, xa, xb, xc;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VSX(ctx);
+
+    xt = gen_vsr_ptr(a->xt);
+    xa = gen_vsr_ptr(a->xa);
+    xb = gen_vsr_ptr(a->xb);
+    xc = gen_vsr_ptr(a->xc);
+
+    gen_helper_XXPERMX(xt, xa, xb, xc, tcg_constant_tl(a->uim3));
+
+    tcg_temp_free_ptr(xt);
+    tcg_temp_free_ptr(xa);
+    tcg_temp_free_ptr(xb);
+    tcg_temp_free_ptr(xc);
+
+    return true;
+}
+
 #define GEN_VSX_HELPER_VSX_MADD(name, op1, aop, mop, inval, type)             \
 static void gen_##name(DisasContext *ctx)                                     \
 {                                                                             \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 31/47] tcg/tcg-op-gvec.c: Introduce tcg_gen_gvec_4i
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (29 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 30/47] target/ppc: Implement xxpermx instruction matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 23:04   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 32/47] target/ppc: Implement xxeval matheus.ferst
                   ` (15 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Following the implementation of tcg_gen_gvec_3i, add a four-vector and
immediate operand expansion method.

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 include/tcg/tcg-op-gvec.h |  22 ++++++
 tcg/tcg-op-gvec.c         | 146 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 168 insertions(+)

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index da55fed870..28cafbcc5c 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -218,6 +218,25 @@ typedef struct {
     bool write_aofs;
 } GVecGen4;
 
+typedef struct {
+    /*
+     * Expand inline as a 64-bit or 32-bit integer. Only one of these will be
+     * non-NULL.
+     */
+    void (*fni8)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64, int64_t);
+    void (*fni4)(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32, int32_t);
+    /* Expand inline with a host vector type.  */
+    void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec, TCGv_vec, int64_t);
+    /* Expand out-of-line helper w/descriptor, data in descriptor.  */
+    gen_helper_gvec_4 *fno;
+    /* The optional opcodes, if any, utilized by .fniv.  */
+    const TCGOpcode *opt_opc;
+    /* The vector element size, if applicable.  */
+    uint8_t vece;
+    /* Prefer i64 to v64.  */
+    bool prefer_i64;
+} GVecGen4i;
+
 void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs,
                     uint32_t oprsz, uint32_t maxsz, const GVecGen2 *);
 void tcg_gen_gvec_2i(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
@@ -231,6 +250,9 @@ void tcg_gen_gvec_3i(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                      const GVecGen3i *);
 void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs,
                     uint32_t oprsz, uint32_t maxsz, const GVecGen4 *);
+void tcg_gen_gvec_4i(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs,
+                     uint32_t oprsz, uint32_t maxsz, int64_t c,
+                     const GVecGen4i *);
 
 /* Expand a specific vector operation.  */
 
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index ffe55e908f..079a761b04 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -836,6 +836,30 @@ static void expand_4_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
     tcg_temp_free_i32(t0);
 }
 
+static void expand_4i_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                          uint32_t cofs, uint32_t oprsz, int32_t c,
+                          void (*fni)(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32,
+                                      int32_t))
+{
+    TCGv_i32 t0 = tcg_temp_new_i32();
+    TCGv_i32 t1 = tcg_temp_new_i32();
+    TCGv_i32 t2 = tcg_temp_new_i32();
+    TCGv_i32 t3 = tcg_temp_new_i32();
+    uint32_t i;
+
+    for (i = 0; i < oprsz; i += 4) {
+        tcg_gen_ld_i32(t1, cpu_env, aofs + i);
+        tcg_gen_ld_i32(t2, cpu_env, bofs + i);
+        tcg_gen_ld_i32(t3, cpu_env, cofs + i);
+        fni(t0, t1, t2, t3, c);
+        tcg_gen_st_i32(t0, cpu_env, dofs + i);
+    }
+    tcg_temp_free_i32(t3);
+    tcg_temp_free_i32(t2);
+    tcg_temp_free_i32(t1);
+    tcg_temp_free_i32(t0);
+}
+
 /* Expand OPSZ bytes worth of two-operand operations using i64 elements.  */
 static void expand_2_i64(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
                          bool load_dest, void (*fni)(TCGv_i64, TCGv_i64))
@@ -971,6 +995,30 @@ static void expand_4_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs,
     tcg_temp_free_i64(t0);
 }
 
+static void expand_4i_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                          uint32_t cofs, uint32_t oprsz, int64_t c,
+                          void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64,
+                                      int64_t))
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    TCGv_i64 t3 = tcg_temp_new_i64();
+    uint32_t i;
+
+    for (i = 0; i < oprsz; i += 8) {
+        tcg_gen_ld_i64(t1, cpu_env, aofs + i);
+        tcg_gen_ld_i64(t2, cpu_env, bofs + i);
+        tcg_gen_ld_i64(t3, cpu_env, cofs + i);
+        fni(t0, t1, t2, t3, c);
+        tcg_gen_st_i64(t0, cpu_env, dofs + i);
+    }
+    tcg_temp_free_i64(t3);
+    tcg_temp_free_i64(t2);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t0);
+}
+
 /* Expand OPSZ bytes worth of two-operand operations using host vectors.  */
 static void expand_2_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
                          uint32_t oprsz, uint32_t tysz, TCGType type,
@@ -1121,6 +1169,35 @@ static void expand_4_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
     tcg_temp_free_vec(t0);
 }
 
+/*
+ * Expand OPSZ bytes worth of four-vector operands and an immediate operand
+ * using host vectors.
+ */
+static void expand_4i_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
+                          uint32_t bofs, uint32_t cofs, uint32_t oprsz,
+                          uint32_t tysz, TCGType type, int64_t c,
+                          void (*fni)(unsigned, TCGv_vec, TCGv_vec,
+                                     TCGv_vec, TCGv_vec, int64_t))
+{
+    TCGv_vec t0 = tcg_temp_new_vec(type);
+    TCGv_vec t1 = tcg_temp_new_vec(type);
+    TCGv_vec t2 = tcg_temp_new_vec(type);
+    TCGv_vec t3 = tcg_temp_new_vec(type);
+    uint32_t i;
+
+    for (i = 0; i < oprsz; i += tysz) {
+        tcg_gen_ld_vec(t1, cpu_env, aofs + i);
+        tcg_gen_ld_vec(t2, cpu_env, bofs + i);
+        tcg_gen_ld_vec(t3, cpu_env, cofs + i);
+        fni(vece, t0, t1, t2, t3, c);
+        tcg_gen_st_vec(t0, cpu_env, dofs + i);
+    }
+    tcg_temp_free_vec(t3);
+    tcg_temp_free_vec(t2);
+    tcg_temp_free_vec(t1);
+    tcg_temp_free_vec(t0);
+}
+
 /* Expand a vector two-operand operation.  */
 void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs,
                     uint32_t oprsz, uint32_t maxsz, const GVecGen2 *g)
@@ -1533,6 +1610,75 @@ void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs,
     }
 }
 
+/* Expand a vector four-operand operation.  */
+void tcg_gen_gvec_4i(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs,
+                     uint32_t oprsz, uint32_t maxsz, int64_t c,
+                     const GVecGen4i *g)
+{
+    const TCGOpcode *this_list = g->opt_opc ? : vecop_list_empty;
+    const TCGOpcode *hold_list = tcg_swap_vecop_list(this_list);
+    TCGType type;
+    uint32_t some;
+
+    check_size_align(oprsz, maxsz, dofs | aofs | bofs | cofs);
+    check_overlap_4(dofs, aofs, bofs, cofs, maxsz);
+
+    type = 0;
+    if (g->fniv) {
+        type = choose_vector_type(g->opt_opc, g->vece, oprsz, g->prefer_i64);
+    }
+    switch (type) {
+    case TCG_TYPE_V256:
+        /*
+         * Recall that ARM SVE allows vector sizes that are not a
+         * power of 2, but always a multiple of 16.  The intent is
+         * that e.g. size == 80 would be expanded with 2x32 + 1x16.
+         */
+        some = QEMU_ALIGN_DOWN(oprsz, 32);
+        expand_4i_vec(g->vece, dofs, aofs, bofs, cofs, some,
+                      32, TCG_TYPE_V256, c, g->fniv);
+        if (some == oprsz) {
+            break;
+        }
+        dofs += some;
+        aofs += some;
+        bofs += some;
+        cofs += some;
+        oprsz -= some;
+        maxsz -= some;
+        /* fallthru */
+    case TCG_TYPE_V128:
+        expand_4i_vec(g->vece, dofs, aofs, bofs, cofs, oprsz,
+                       16, TCG_TYPE_V128, c, g->fniv);
+        break;
+    case TCG_TYPE_V64:
+        expand_4i_vec(g->vece, dofs, aofs, bofs, cofs, oprsz,
+                      8, TCG_TYPE_V64, c, g->fniv);
+        break;
+
+    case 0:
+        if (g->fni8 && check_size_impl(oprsz, 8)) {
+            expand_4i_i64(dofs, aofs, bofs, cofs, oprsz, c, g->fni8);
+        } else if (g->fni4 && check_size_impl(oprsz, 4)) {
+            expand_4i_i32(dofs, aofs, bofs, cofs, oprsz, c, g->fni4);
+        } else {
+            assert(g->fno != NULL);
+            tcg_gen_gvec_4_ool(dofs, aofs, bofs, cofs,
+                               oprsz, maxsz, c, g->fno);
+            oprsz = maxsz;
+        }
+        break;
+
+    default:
+        g_assert_not_reached();
+    }
+    tcg_swap_vecop_list(hold_list);
+
+    if (oprsz < maxsz) {
+        expand_clr(dofs + oprsz, maxsz - oprsz);
+    }
+}
+
 /*
  * Expand specific vector operations.
  */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 32/47] target/ppc: Implement xxeval
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (30 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 31/47] tcg/tcg-op-gvec.c: Introduce tcg_gen_gvec_4i matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 23:43   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 33/47] target/ppc: Implement xxgenpcv[bhwd]m instruction matheus.ferst
                   ` (14 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/helper.h                 |   1 +
 target/ppc/insn64.decode            |   8 ++
 target/ppc/int_helper.c             |  42 ++++++++++
 target/ppc/translate/vsx-impl.c.inc | 121 ++++++++++++++++++++++++++++
 4 files changed, 172 insertions(+)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 85a13057ca..b8c818f573 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -500,6 +500,7 @@ DEF_HELPER_4(xxextractuw, void, env, vsr, vsr, i32)
 DEF_HELPER_5(XXPERMX, void, vsr, vsr, vsr, vsr, tl)
 DEF_HELPER_4(xxinsertw, void, env, vsr, vsr, i32)
 DEF_HELPER_3(xvxsigsp, void, env, vsr, vsr)
+DEF_HELPER_5(XXEVAL, void, vsr, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XXBLENDVB, void, vsr, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XXBLENDVH, void, vsr, vsr, vsr, vsr, i32)
 DEF_HELPER_5(XXBLENDVW, void, vsr, vsr, vsr, vsr, i32)
diff --git a/target/ppc/insn64.decode b/target/ppc/insn64.decode
index 0963e064b1..fdb859f62d 100644
--- a/target/ppc/insn64.decode
+++ b/target/ppc/insn64.decode
@@ -54,6 +54,11 @@
                 ...... ..... ..... ..... ..... .. .... \
                 &8RR_XX4 xt=%8rr_xx_xt xa=%8rr_xx_xa xb=%8rr_xx_xb xc=%8rr_xx_xc
 
+&8RR_XX4_imm    xt xa xb xc imm
+@8RR_XX4_imm    ........ ........ ........ imm:8 \
+                ...... ..... ..... ..... ..... .. .... \
+                &8RR_XX4_imm xt=%8rr_xx_xt xa=%8rr_xx_xa xb=%8rr_xx_xb xc=%8rr_xx_xc
+
 &8RR_XX4_uim3   xt xa xb xc uim3
 @8RR_XX4_uim3   ...... .. .... .. ............... uim3:3 \
                 ...... ..... ..... ..... ..... .. ....   \
@@ -184,6 +189,9 @@ PLXVP           000001 00 0--.-- .................. \
 PSTXVP          000001 00 0--.-- .................. \
                 111110 ..... ..... ................     @8LS_D_TSXP
 
+XXEVAL          000001 01 0000 -- ---------- ........ \
+                100010 ..... ..... ..... ..... 01 ....  @8RR_XX4_imm
+
 XXSPLTIDP       000001 01 0000 -- -- ................ \
                 100000 ..... 0010 . ................    @8RR_D
 XXSPLTIW        000001 01 0000 -- -- ................ \
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index a92a006c6d..255645ef1d 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -28,6 +28,7 @@
 #include "fpu/softfloat.h"
 #include "qapi/error.h"
 #include "qemu/guest-random.h"
+#include "tcg/tcg-gvec-desc.h"
 
 #include "helper_regs.h"
 /*****************************************************************************/
@@ -1588,6 +1589,47 @@ void helper_xxinsertw(CPUPPCState *env, ppc_vsr_t *xt,
     *xt = t;
 }
 
+void helper_XXEVAL(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c,
+                   uint32_t desc)
+{
+    /*
+     * Instead of processing imm bit-by-bit, we'll skip the computation of
+     * conjunctions whose corresponding bit is unset.
+     */
+    int bit, imm = simd_data(desc);
+    Int128 conj, disj = int128_zero();
+
+    /* Iterate over set bits from the least to the most significant bit */
+    while (imm) {
+        /*
+         * Get the next bit to be processed with ctz64. Invert the result of
+         * ctz64 to match the indexing used by PowerISA.
+         */
+        bit = 7 - ctzl(imm);
+        if (bit & 0x4) {
+            conj = a->s128;
+        } else {
+            conj = int128_not(a->s128);
+        }
+        if (bit & 0x2) {
+            conj = int128_and(conj, b->s128);
+        } else {
+            conj = int128_and(conj, int128_not(b->s128));
+        }
+        if (bit & 0x1) {
+            conj = int128_and(conj, c->s128);
+        } else {
+            conj = int128_and(conj, int128_not(c->s128));
+        }
+        disj = int128_or(disj, conj);
+
+        /* Unset the least significant bit that is set */
+        imm &= imm - 1;
+    }
+
+    t->s128 = disj;
+}
+
 #define XXBLEND(name, sz) \
 void glue(helper_XXBLENDV, name)(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b,  \
                                  ppc_avr_t *c, uint32_t desc)               \
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 92851b8926..d389ca2a83 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2167,6 +2167,127 @@ TRANS64_FLAGS2(ISA310, PLXV, do_lstxv_PLS_D, false, false)
 TRANS64_FLAGS2(ISA310, PSTXVP, do_lstxv_PLS_D, true, true)
 TRANS64_FLAGS2(ISA310, PLXVP, do_lstxv_PLS_D, false, true)
 
+static void gen_xxeval_i64(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b, TCGv_i64 c,
+                           int64_t imm)
+{
+    /*
+     * Instead of processing imm bit-by-bit, we'll skip the computation of
+     * conjunctions whose corresponding bit is unset.
+     */
+    int bit;
+    TCGv_i64 conj, disj;
+
+    conj = tcg_temp_new_i64();
+    disj = tcg_temp_new_i64();
+
+    tcg_gen_movi_i64(disj, 0);
+
+    /* Iterate over set bits from the least to the most significant bit */
+    while (imm) {
+        /*
+         * Get the next bit to be processed with ctz64. Invert the result of
+         * ctz64 to match the indexing used by PowerISA.
+         */
+        bit = 7 - ctz64(imm);
+        if (bit & 0x4) {
+            tcg_gen_mov_i64(conj, a);
+        } else {
+            tcg_gen_not_i64(conj, a);
+        }
+        if (bit & 0x2) {
+            tcg_gen_and_i64(conj, conj, b);
+        } else {
+            tcg_gen_andc_i64(conj, conj, b);
+        }
+        if (bit & 0x1) {
+            tcg_gen_and_i64(conj, conj, c);
+        } else {
+            tcg_gen_andc_i64(conj, conj, c);
+        }
+        tcg_gen_or_i64(disj, disj, conj);
+
+        /* Unset the least significant bit that is set */
+        imm &= imm - 1;
+    }
+
+    tcg_gen_mov_i64(t, disj);
+
+    tcg_temp_free_i64(conj);
+    tcg_temp_free_i64(disj);
+}
+
+static void gen_xxeval_vec(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b,
+                           TCGv_vec c, int64_t imm)
+{
+    /*
+     * Instead of processing imm bit-by-bit, we'll skip the computation of
+     * conjunctions whose corresponding bit is unset.
+     */
+    int bit;
+    TCGv_vec disj, conj;
+
+    disj = tcg_temp_new_vec_matching(t);
+    conj = tcg_temp_new_vec_matching(t);
+
+    tcg_gen_dupi_vec(vece, disj, 0);
+
+    /* Iterate over set bits from the least to the most significant bit */
+    while (imm) {
+        /*
+         * Get the next bit to be processed with ctz64. Invert the result of
+         * ctz64 to match the indexing used by PowerISA.
+         */
+        bit = 7 - ctz64(imm);
+        if (bit & 0x4) {
+            tcg_gen_mov_vec(conj, a);
+        } else {
+            tcg_gen_not_vec(vece, conj, a);
+        }
+        if (bit & 0x2) {
+            tcg_gen_and_vec(vece, conj, conj, b);
+        } else {
+            tcg_gen_andc_vec(vece, conj, conj, b);
+        }
+        if (bit & 0x1) {
+            tcg_gen_and_vec(vece, conj, conj, c);
+        } else {
+            tcg_gen_andc_vec(vece, conj, conj, c);
+        }
+        tcg_gen_or_vec(vece, disj, disj, conj);
+
+        /* Unset the least significant bit that is set */
+        imm &= imm - 1;
+    }
+
+    tcg_gen_mov_vec(t, disj);
+
+    tcg_temp_free_vec(disj);
+    tcg_temp_free_vec(conj);
+}
+
+static bool trans_XXEVAL(DisasContext *ctx, arg_8RR_XX4_imm *a)
+{
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VSX(ctx);
+
+    static const TCGOpcode vecop_list[] = {
+        INDEX_op_andc_vec, 0
+    };
+    static const GVecGen4i op = {
+        .fniv = gen_xxeval_vec,
+        .fno = gen_helper_XXEVAL,
+        .fni8 = gen_xxeval_i64,
+        .opt_opc = vecop_list,
+        .vece = MO_64
+    };
+
+    tcg_gen_gvec_4i(vsr_full_offset(a->xt), vsr_full_offset(a->xa),
+                    vsr_full_offset(a->xb), vsr_full_offset(a->xc),
+                    16, 16, a->imm, &op);
+
+    return true;
+}
+
 static void gen_xxblendv_vec(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b,
                              TCGv_vec c)
 {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 33/47] target/ppc: Implement xxgenpcv[bhwd]m instruction
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (31 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 32/47] target/ppc: Implement xxeval matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 23:48   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 34/47] target/ppc: move xs[n]madd[am][ds]p/xs[n]msub[am][ds]p to decodetree matheus.ferst
                   ` (13 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/helper.h                 |  4 ++
 target/ppc/insn32.decode            | 10 ++++
 target/ppc/int_helper.c             | 84 +++++++++++++++++++++++++++++
 target/ppc/translate/vsx-impl.c.inc | 29 ++++++++++
 4 files changed, 127 insertions(+)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index b8c818f573..9751871370 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -496,6 +496,10 @@ DEF_HELPER_3(xvrspic, void, env, vsr, vsr)
 DEF_HELPER_3(xvrspim, void, env, vsr, vsr)
 DEF_HELPER_3(xvrspip, void, env, vsr, vsr)
 DEF_HELPER_3(xvrspiz, void, env, vsr, vsr)
+DEF_HELPER_3(XXGENPCVBM, void, vsr, avr, tl)
+DEF_HELPER_3(XXGENPCVHM, void, vsr, avr, tl)
+DEF_HELPER_3(XXGENPCVWM, void, vsr, avr, tl)
+DEF_HELPER_3(XXGENPCVDM, void, vsr, avr, tl)
 DEF_HELPER_4(xxextractuw, void, env, vsr, vsr, i32)
 DEF_HELPER_5(XXPERMX, void, vsr, vsr, vsr, vsr, tl)
 DEF_HELPER_4(xxinsertw, void, env, vsr, vsr, i32)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 185d697458..b11a3ee29a 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -119,6 +119,9 @@
 @X_bfl          ...... bf:3 - l:1 ra:5 rb:5 ..........- &X_bfl
 
 %x_xt           0:1 21:5
+&X_imm5         xt imm:uint8_t vrb
+@X_imm5         ...... ..... imm:5 vrb:5 .......... .           &X_imm5 xt=%x_xt
+
 &X_imm8         xt imm:uint8_t
 @X_imm8         ...... ..... .. imm:8 .......... .              &X_imm8 xt=%x_xt
 
@@ -613,6 +616,13 @@ XXPERMDI        111100 ..... ..... ..... 0 .. 01010 ... @XX3_dm
 
 XXSEL           111100 ..... ..... ..... ..... 11 ....  @XX4
 
+## VSX Vector Generate PCV
+
+XXGENPCVBM      111100 ..... ..... ..... 1110010100 .   @X_imm5
+XXGENPCVHM      111100 ..... ..... ..... 1110010101 .   @X_imm5
+XXGENPCVWM      111100 ..... ..... ..... 1110110100 .   @X_imm5
+XXGENPCVDM      111100 ..... ..... ..... 1110110101 .   @X_imm5
+
 ## VSX Vector Load Special Value Instruction
 
 LXVKQ           111100 ..... 11111 ..... 0101101000 .   @X_uim5
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 255645ef1d..dc106aaab9 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1088,6 +1088,90 @@ void helper_VPERMR(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
     *r = result;
 }
 
+#define XXGENPCV(NAME, SZ) \
+void helper_##NAME(ppc_vsr_t *t, ppc_vsr_t *b, target_ulong imm)            \
+{                                                                           \
+    ppc_vsr_t tmp = { .u64 = { 0, 0 } };                                    \
+                                                                            \
+    switch (imm) {                                                          \
+    case 0b00000: /* Big-Endian expansion */                                \
+        /* Initialize tmp with the result of an all-zeros mask */           \
+        tmp.VsrD(0) = 0x1011121314151617;                                   \
+        tmp.VsrD(1) = 0x18191A1B1C1D1E1F;                                   \
+                                                                            \
+        /* Iterate over the most significant byte of each element */        \
+        for (int i = 0, j = 0; i < ARRAY_SIZE(b->u8); i += SZ) {            \
+            if (b->VsrB(i) & 0x80) {                                        \
+                /* Update each byte of the element */                       \
+                for (int k = 0; k < SZ; k++) {                              \
+                    tmp.VsrB(i + k) = j + k;                                \
+                }                                                           \
+                j += SZ;                                                    \
+            }                                                               \
+        }                                                                   \
+                                                                            \
+        break;                                                              \
+    case 0b00001: /* Big-Endian compression */                              \
+        /* Iterate over the most significant byte of each element */        \
+        for (int i = 0, j = 0; i < ARRAY_SIZE(b->u8); i += SZ) {            \
+            if (b->VsrB(i) & 0x80) {                                        \
+                /* Update each byte of the element */                       \
+                for (int k = 0; k < SZ; k++) {                              \
+                    tmp.VsrB(j + k) = i + k;                                \
+                }                                                           \
+                j += SZ;                                                    \
+            }                                                               \
+        }                                                                   \
+                                                                            \
+        break;                                                              \
+    case 0b00010: /* Little-Endian expansion */                             \
+        /* Initialize tmp with the result of an all-zeros mask */           \
+        tmp.VsrD(0) = 0x1F1E1D1C1B1A1918;                                   \
+        tmp.VsrD(1) = 0x1716151413121110;                                   \
+                                                                            \
+        /* Iterate over the most significant byte of each element */        \
+        for (int i = 0, j = 0; i < ARRAY_SIZE(b->u8); i += SZ) {            \
+            /* Reverse indexing of "i" */                                   \
+            const int idx = ARRAY_SIZE(b->u8) - i - SZ;                     \
+            if (b->VsrB(idx) & 0x80) {                                      \
+                /* Update each byte of the element */                       \
+                for (int k = 0, rk = SZ - 1; k < SZ; k++, rk--) {           \
+                    tmp.VsrB(idx + rk) = j + k;                             \
+                }                                                           \
+                j += SZ;                                                    \
+            }                                                               \
+        }                                                                   \
+                                                                            \
+        break;                                                              \
+    case 0b00011: /* Little-Endian compression */                           \
+        /* Iterate over the most significant byte of each element */        \
+        for (int i = 0, j = 0; i < ARRAY_SIZE(b->u8); i += SZ) {            \
+            if (b->VsrB(ARRAY_SIZE(b->u8) - i - SZ) & 0x80) {               \
+                /* Update each byte of the element */                       \
+                for (int k = 0, rk = SZ - 1; k < SZ; k++, rk--) {           \
+                    /* Reverse indexing of "j" */                           \
+                    const int idx = ARRAY_SIZE(b->u8) - j - SZ;             \
+                    tmp.VsrB(idx + rk) = i + k;                             \
+                }                                                           \
+                j += SZ;                                                    \
+            }                                                               \
+        }                                                                   \
+                                                                            \
+        break;                                                              \
+    default:                                                                \
+        /* Translation code validates IMM before calling this helper */     \
+        g_assert_not_reached();                                             \
+        break;                                                              \
+    }                                                                       \
+                                                                            \
+    *t = tmp;                                                               \
+}
+XXGENPCV(XXGENPCVBM, 1)
+XXGENPCV(XXGENPCVHM, 2)
+XXGENPCV(XXGENPCVWM, 4)
+XXGENPCV(XXGENPCVDM, 8)
+#undef XXGENPCV
+
 #if defined(HOST_WORDS_BIGENDIAN)
 #define VBPERMQ_INDEX(avr, i) ((avr)->u8[(i)])
 #define VBPERMD_INDEX(i) (i)
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index d389ca2a83..a75c4e68f8 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -1256,6 +1256,35 @@ static bool trans_XXPERMX(DisasContext *ctx, arg_8RR_XX4_uim3 *a)
     return true;
 }
 
+static bool do_xxgenpcv(DisasContext *ctx, arg_X_imm5 *a,
+                        void (*gen_helper)(TCGv_ptr, TCGv_ptr, TCGv))
+{
+    TCGv_ptr xt, vrb;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VSX(ctx);
+
+    if (a->imm & ~0x3) {
+        gen_invalid(ctx);
+        return true;
+    }
+
+    xt = gen_vsr_ptr(a->xt);
+    vrb = gen_avr_ptr(a->vrb);
+
+    gen_helper(xt, vrb, tcg_constant_tl(a->imm));
+
+    tcg_temp_free_ptr(xt);
+    tcg_temp_free_ptr(vrb);
+
+    return true;
+}
+
+TRANS(XXGENPCVBM, do_xxgenpcv, gen_helper_XXGENPCVBM)
+TRANS(XXGENPCVHM, do_xxgenpcv, gen_helper_XXGENPCVHM)
+TRANS(XXGENPCVWM, do_xxgenpcv, gen_helper_XXGENPCVWM)
+TRANS(XXGENPCVDM, do_xxgenpcv, gen_helper_XXGENPCVDM)
+
 #define GEN_VSX_HELPER_VSX_MADD(name, op1, aop, mop, inval, type)             \
 static void gen_##name(DisasContext *ctx)                                     \
 {                                                                             \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 34/47] target/ppc: move xs[n]madd[am][ds]p/xs[n]msub[am][ds]p to decodetree
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (32 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 33/47] target/ppc: Implement xxgenpcv[bhwd]m instruction matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 23:52   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 35/47] target/ppc: implement xs[n]maddqp[o]/xs[n]msubqp[o] matheus.ferst
                   ` (12 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/fpu_helper.c             | 23 ++++++------
 target/ppc/helper.h                 | 16 ++++-----
 target/ppc/insn32.decode            | 22 ++++++++++++
 target/ppc/translate/vsx-impl.c.inc | 56 ++++++++++++++++++++++++-----
 target/ppc/translate/vsx-ops.c.inc  | 16 ---------
 5 files changed, 90 insertions(+), 43 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 0fd285defc..c8797d8053 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2156,10 +2156,11 @@ VSX_TSQRT(xvtsqrtsp, 4, float32, VsrW(i), -126, 23)
  *   maddflgs - flags for the float*muladd routine that control the
  *           various forms (madd, msub, nmadd, nmsub)
  *   sfprf - set FPRF
+ *   r2sp  - round intermediate double precision result to single precision
  */
 #define VSX_MADD(op, nels, tp, fld, maddflgs, sfprf, r2sp)                    \
 void helper_##op(CPUPPCState *env, ppc_vsr_t *xt,                             \
-                 ppc_vsr_t *xa, ppc_vsr_t *b, ppc_vsr_t *c)                   \
+                 ppc_vsr_t *s1, ppc_vsr_t *s2, ppc_vsr_t *s3)                 \
 {                                                                             \
     ppc_vsr_t t = *xt;                                                        \
     int i;                                                                    \
@@ -2175,12 +2176,12 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt,                             \
              * result to odd.                                                 \
              */                                                               \
             set_float_rounding_mode(float_round_to_zero, &tstat);             \
-            t.fld = tp##_muladd(xa->fld, b->fld, c->fld,                      \
+            t.fld = tp##_muladd(s1->fld, s3->fld, s2->fld,                    \
                                 maddflgs, &tstat);                            \
             t.fld |= (get_float_exception_flags(&tstat) &                     \
                       float_flag_inexact) != 0;                               \
         } else {                                                              \
-            t.fld = tp##_muladd(xa->fld, b->fld, c->fld,                      \
+            t.fld = tp##_muladd(s1->fld, s3->fld, s2->fld,                    \
                                 maddflgs, &tstat);                            \
         }                                                                     \
         env->fp_status.float_exception_flags |= tstat.float_exception_flags;  \
@@ -2202,14 +2203,14 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt,                             \
     do_float_check_status(env, GETPC());                                      \
 }
 
-VSX_MADD(xsmadddp, 1, float64, VsrD(0), MADD_FLGS, 1, 0)
-VSX_MADD(xsmsubdp, 1, float64, VsrD(0), MSUB_FLGS, 1, 0)
-VSX_MADD(xsnmadddp, 1, float64, VsrD(0), NMADD_FLGS, 1, 0)
-VSX_MADD(xsnmsubdp, 1, float64, VsrD(0), NMSUB_FLGS, 1, 0)
-VSX_MADD(xsmaddsp, 1, float64, VsrD(0), MADD_FLGS, 1, 1)
-VSX_MADD(xsmsubsp, 1, float64, VsrD(0), MSUB_FLGS, 1, 1)
-VSX_MADD(xsnmaddsp, 1, float64, VsrD(0), NMADD_FLGS, 1, 1)
-VSX_MADD(xsnmsubsp, 1, float64, VsrD(0), NMSUB_FLGS, 1, 1)
+VSX_MADD(XSMADDDP, 1, float64, VsrD(0), MADD_FLGS, 1, 0)
+VSX_MADD(XSMSUBDP, 1, float64, VsrD(0), MSUB_FLGS, 1, 0)
+VSX_MADD(XSNMADDDP, 1, float64, VsrD(0), NMADD_FLGS, 1, 0)
+VSX_MADD(XSNMSUBDP, 1, float64, VsrD(0), NMSUB_FLGS, 1, 0)
+VSX_MADD(XSMADDSP, 1, float64, VsrD(0), MADD_FLGS, 1, 1)
+VSX_MADD(XSMSUBSP, 1, float64, VsrD(0), MSUB_FLGS, 1, 1)
+VSX_MADD(XSNMADDSP, 1, float64, VsrD(0), NMADD_FLGS, 1, 1)
+VSX_MADD(XSNMSUBSP, 1, float64, VsrD(0), NMSUB_FLGS, 1, 1)
 
 VSX_MADD(xvmadddp, 2, float64, VsrD(i), MADD_FLGS, 0, 0)
 VSX_MADD(xvmsubdp, 2, float64, VsrD(i), MSUB_FLGS, 0, 0)
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 9751871370..fd249a22f0 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -357,10 +357,10 @@ DEF_HELPER_3(xssqrtdp, void, env, vsr, vsr)
 DEF_HELPER_3(xsrsqrtedp, void, env, vsr, vsr)
 DEF_HELPER_4(xstdivdp, void, env, i32, vsr, vsr)
 DEF_HELPER_3(xstsqrtdp, void, env, i32, vsr)
-DEF_HELPER_5(xsmadddp, void, env, vsr, vsr, vsr, vsr)
-DEF_HELPER_5(xsmsubdp, void, env, vsr, vsr, vsr, vsr)
-DEF_HELPER_5(xsnmadddp, void, env, vsr, vsr, vsr, vsr)
-DEF_HELPER_5(xsnmsubdp, void, env, vsr, vsr, vsr, vsr)
+DEF_HELPER_5(XSMADDDP, void, env, vsr, vsr, vsr, vsr)
+DEF_HELPER_5(XSMSUBDP, void, env, vsr, vsr, vsr, vsr)
+DEF_HELPER_5(XSNMADDDP, void, env, vsr, vsr, vsr, vsr)
+DEF_HELPER_5(XSNMSUBDP, void, env, vsr, vsr, vsr, vsr)
 DEF_HELPER_4(xscmpeqdp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xscmpgtdp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xscmpgedp, void, env, vsr, vsr, vsr)
@@ -420,10 +420,10 @@ DEF_HELPER_3(xsresp, void, env, vsr, vsr)
 DEF_HELPER_2(xsrsp, i64, env, i64)
 DEF_HELPER_3(xssqrtsp, void, env, vsr, vsr)
 DEF_HELPER_3(xsrsqrtesp, void, env, vsr, vsr)
-DEF_HELPER_5(xsmaddsp, void, env, vsr, vsr, vsr, vsr)
-DEF_HELPER_5(xsmsubsp, void, env, vsr, vsr, vsr, vsr)
-DEF_HELPER_5(xsnmaddsp, void, env, vsr, vsr, vsr, vsr)
-DEF_HELPER_5(xsnmsubsp, void, env, vsr, vsr, vsr, vsr)
+DEF_HELPER_5(XSMADDSP, void, env, vsr, vsr, vsr, vsr)
+DEF_HELPER_5(XSMSUBSP, void, env, vsr, vsr, vsr, vsr)
+DEF_HELPER_5(XSNMADDSP, void, env, vsr, vsr, vsr, vsr)
+DEF_HELPER_5(XSNMSUBSP, void, env, vsr, vsr, vsr, vsr)
 
 DEF_HELPER_4(xvadddp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xvsubdp, void, env, vsr, vsr, vsr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index b11a3ee29a..881b7093f6 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -603,6 +603,28 @@ STXVX           011111 ..... ..... ..... 0110001100 .   @X_TSX
 LXVPX           011111 ..... ..... ..... 0101001101 -   @X_TSXP
 STXVPX          011111 ..... ..... ..... 0111001101 -   @X_TSXP
 
+## VSX Scalar Multiply-Add Instructions
+
+XSMADDADP       111100 ..... ..... ..... 00100001 . . . @XX3
+XSMADDMDP       111100 ..... ..... ..... 00101001 . . . @XX3
+XSMADDASP       111100 ..... ..... ..... 00000001 . . . @XX3
+XSMADDMSP       111100 ..... ..... ..... 00001001 . . . @XX3
+
+XSMSUBADP       111100 ..... ..... ..... 00110001 . . . @XX3
+XSMSUBMDP       111100 ..... ..... ..... 00111001 . . . @XX3
+XSMSUBASP       111100 ..... ..... ..... 00010001 . . . @XX3
+XSMSUBMSP       111100 ..... ..... ..... 00011001 . . . @XX3
+
+XSNMADDASP      111100 ..... ..... ..... 10000001 . . . @XX3
+XSNMADDMSP      111100 ..... ..... ..... 10001001 . . . @XX3
+XSNMADDADP      111100 ..... ..... ..... 10100001 . . . @XX3
+XSNMADDMDP      111100 ..... ..... ..... 10101001 . . . @XX3
+
+XSNMSUBASP      111100 ..... ..... ..... 10010001 . . . @XX3
+XSNMSUBMSP      111100 ..... ..... ..... 10011001 . . . @XX3
+XSNMSUBADP      111100 ..... ..... ..... 10110001 . . . @XX3
+XSNMSUBMDP      111100 ..... ..... ..... 10111001 . . . @XX3
+
 ## VSX splat instruction
 
 XXSPLTIB        111100 ..... 00 ........ 0101101000 .   @X_imm8
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index a75c4e68f8..a54afb4dbb 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -1285,6 +1285,54 @@ TRANS(XXGENPCVHM, do_xxgenpcv, gen_helper_XXGENPCVHM)
 TRANS(XXGENPCVWM, do_xxgenpcv, gen_helper_XXGENPCVWM)
 TRANS(XXGENPCVDM, do_xxgenpcv, gen_helper_XXGENPCVDM)
 
+static bool do_xsmadd(DisasContext *ctx, int tgt, int src1, int src2, int src3,
+        void (gen_helper)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr))
+{
+    TCGv_ptr t, s1, s2, s3;
+
+    t = gen_vsr_ptr(tgt);
+    s1 = gen_vsr_ptr(src1);
+    s2 = gen_vsr_ptr(src2);
+    s3 = gen_vsr_ptr(src3);
+
+    gen_helper(cpu_env, t, s1, s2, s3);
+
+    tcg_temp_free_ptr(t);
+    tcg_temp_free_ptr(s1);
+    tcg_temp_free_ptr(s2);
+    tcg_temp_free_ptr(s3);
+
+    return true;
+}
+
+static bool do_xsmadd_XX3(DisasContext *ctx, arg_XX3 *a, bool type_a,
+        void (gen_helper)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr))
+{
+    REQUIRE_VSX(ctx);
+
+    if (type_a) {
+        return do_xsmadd(ctx, a->xt, a->xa, a->xt, a->xb, gen_helper);
+    }
+    return do_xsmadd(ctx, a->xt, a->xa, a->xb, a->xt, gen_helper);
+}
+
+TRANS_FLAGS2(VSX, XSMADDADP, do_xsmadd_XX3, true, gen_helper_XSMADDDP)
+TRANS_FLAGS2(VSX, XSMADDMDP, do_xsmadd_XX3, false, gen_helper_XSMADDDP)
+TRANS_FLAGS2(VSX, XSMSUBADP, do_xsmadd_XX3, true, gen_helper_XSMSUBDP)
+TRANS_FLAGS2(VSX, XSMSUBMDP, do_xsmadd_XX3, false, gen_helper_XSMSUBDP)
+TRANS_FLAGS2(VSX, XSNMADDADP, do_xsmadd_XX3, true, gen_helper_XSNMADDDP)
+TRANS_FLAGS2(VSX, XSNMADDMDP, do_xsmadd_XX3, false, gen_helper_XSNMADDDP)
+TRANS_FLAGS2(VSX, XSNMSUBADP, do_xsmadd_XX3, true, gen_helper_XSNMSUBDP)
+TRANS_FLAGS2(VSX, XSNMSUBMDP, do_xsmadd_XX3, false, gen_helper_XSNMSUBDP)
+TRANS_FLAGS2(VSX207, XSMADDASP, do_xsmadd_XX3, true, gen_helper_XSMADDSP)
+TRANS_FLAGS2(VSX207, XSMADDMSP, do_xsmadd_XX3, false, gen_helper_XSMADDSP)
+TRANS_FLAGS2(VSX207, XSMSUBASP, do_xsmadd_XX3, true, gen_helper_XSMSUBSP)
+TRANS_FLAGS2(VSX207, XSMSUBMSP, do_xsmadd_XX3, false, gen_helper_XSMSUBSP)
+TRANS_FLAGS2(VSX207, XSNMADDASP, do_xsmadd_XX3, true, gen_helper_XSNMADDSP)
+TRANS_FLAGS2(VSX207, XSNMADDMSP, do_xsmadd_XX3, false, gen_helper_XSNMADDSP)
+TRANS_FLAGS2(VSX207, XSNMSUBASP, do_xsmadd_XX3, true, gen_helper_XSNMSUBSP)
+TRANS_FLAGS2(VSX207, XSNMSUBMSP, do_xsmadd_XX3, false, gen_helper_XSNMSUBSP)
+
 #define GEN_VSX_HELPER_VSX_MADD(name, op1, aop, mop, inval, type)             \
 static void gen_##name(DisasContext *ctx)                                     \
 {                                                                             \
@@ -1315,14 +1363,6 @@ static void gen_##name(DisasContext *ctx)                                     \
     tcg_temp_free_ptr(c);                                                     \
 }
 
-GEN_VSX_HELPER_VSX_MADD(xsmadddp, 0x04, 0x04, 0x05, 0, PPC2_VSX)
-GEN_VSX_HELPER_VSX_MADD(xsmsubdp, 0x04, 0x06, 0x07, 0, PPC2_VSX)
-GEN_VSX_HELPER_VSX_MADD(xsnmadddp, 0x04, 0x14, 0x15, 0, PPC2_VSX)
-GEN_VSX_HELPER_VSX_MADD(xsnmsubdp, 0x04, 0x16, 0x17, 0, PPC2_VSX)
-GEN_VSX_HELPER_VSX_MADD(xsmaddsp, 0x04, 0x00, 0x01, 0, PPC2_VSX207)
-GEN_VSX_HELPER_VSX_MADD(xsmsubsp, 0x04, 0x02, 0x03, 0, PPC2_VSX207)
-GEN_VSX_HELPER_VSX_MADD(xsnmaddsp, 0x04, 0x10, 0x11, 0, PPC2_VSX207)
-GEN_VSX_HELPER_VSX_MADD(xsnmsubsp, 0x04, 0x12, 0x13, 0, PPC2_VSX207)
 GEN_VSX_HELPER_VSX_MADD(xvmadddp, 0x04, 0x0C, 0x0D, 0, PPC2_VSX)
 GEN_VSX_HELPER_VSX_MADD(xvmsubdp, 0x04, 0x0E, 0x0F, 0, PPC2_VSX)
 GEN_VSX_HELPER_VSX_MADD(xvnmadddp, 0x04, 0x1C, 0x1D, 0, PPC2_VSX)
diff --git a/target/ppc/translate/vsx-ops.c.inc b/target/ppc/translate/vsx-ops.c.inc
index 0a6b2b31ac..9cfec53df0 100644
--- a/target/ppc/translate/vsx-ops.c.inc
+++ b/target/ppc/translate/vsx-ops.c.inc
@@ -186,14 +186,6 @@ GEN_XX2FORM(xssqrtdp,  0x16, 0x04, PPC2_VSX),
 GEN_XX2FORM(xsrsqrtedp,  0x14, 0x04, PPC2_VSX),
 GEN_XX3FORM(xstdivdp,  0x14, 0x07, PPC2_VSX),
 GEN_XX2FORM(xstsqrtdp,  0x14, 0x06, PPC2_VSX),
-GEN_XX3FORM_NAME(xsmadddp, "xsmaddadp", 0x04, 0x04, PPC2_VSX),
-GEN_XX3FORM_NAME(xsmadddp, "xsmaddmdp", 0x04, 0x05, PPC2_VSX),
-GEN_XX3FORM_NAME(xsmsubdp, "xsmsubadp", 0x04, 0x06, PPC2_VSX),
-GEN_XX3FORM_NAME(xsmsubdp, "xsmsubmdp", 0x04, 0x07, PPC2_VSX),
-GEN_XX3FORM_NAME(xsnmadddp, "xsnmaddadp", 0x04, 0x14, PPC2_VSX),
-GEN_XX3FORM_NAME(xsnmadddp, "xsnmaddmdp", 0x04, 0x15, PPC2_VSX),
-GEN_XX3FORM_NAME(xsnmsubdp, "xsnmsubadp", 0x04, 0x16, PPC2_VSX),
-GEN_XX3FORM_NAME(xsnmsubdp, "xsnmsubmdp", 0x04, 0x17, PPC2_VSX),
 GEN_XX3FORM(xscmpeqdp, 0x0C, 0x00, PPC2_ISA300),
 GEN_XX3FORM(xscmpgtdp, 0x0C, 0x01, PPC2_ISA300),
 GEN_XX3FORM(xscmpgedp, 0x0C, 0x02, PPC2_ISA300),
@@ -235,14 +227,6 @@ GEN_XX2FORM(xsresp,  0x14, 0x01, PPC2_VSX207),
 GEN_XX2FORM(xsrsp, 0x12, 0x11, PPC2_VSX207),
 GEN_XX2FORM(xssqrtsp,  0x16, 0x00, PPC2_VSX207),
 GEN_XX2FORM(xsrsqrtesp,  0x14, 0x00, PPC2_VSX207),
-GEN_XX3FORM_NAME(xsmaddsp, "xsmaddasp", 0x04, 0x00, PPC2_VSX207),
-GEN_XX3FORM_NAME(xsmaddsp, "xsmaddmsp", 0x04, 0x01, PPC2_VSX207),
-GEN_XX3FORM_NAME(xsmsubsp, "xsmsubasp", 0x04, 0x02, PPC2_VSX207),
-GEN_XX3FORM_NAME(xsmsubsp, "xsmsubmsp", 0x04, 0x03, PPC2_VSX207),
-GEN_XX3FORM_NAME(xsnmaddsp, "xsnmaddasp", 0x04, 0x10, PPC2_VSX207),
-GEN_XX3FORM_NAME(xsnmaddsp, "xsnmaddmsp", 0x04, 0x11, PPC2_VSX207),
-GEN_XX3FORM_NAME(xsnmsubsp, "xsnmsubasp", 0x04, 0x12, PPC2_VSX207),
-GEN_XX3FORM_NAME(xsnmsubsp, "xsnmsubmsp", 0x04, 0x13, PPC2_VSX207),
 GEN_XX2FORM(xscvsxdsp, 0x10, 0x13, PPC2_VSX207),
 GEN_XX2FORM(xscvuxdsp, 0x10, 0x12, PPC2_VSX207),
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 35/47] target/ppc: implement xs[n]maddqp[o]/xs[n]msubqp[o]
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (33 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 34/47] target/ppc: move xs[n]madd[am][ds]p/xs[n]msub[am][ds]p to decodetree matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 23:56   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 36/47] target/ppc: Implement xvtlsbb instruction matheus.ferst
                   ` (11 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Implement the following PowerISA v3.0 instuctions:
xsmaddqp[o]: VSX Scalar Multiply-Add Quad-Precision [using round to Odd]
xsmsubqp[o]: VSX Scalar Multiply-Subtract Quad-Precision [using round
             to Odd]
xsnmaddqp[o]: VSX Scalar Negative Multiply-Add Quad-Precision [using
              round to Odd]
xsnmsubqp[o]: VSX Scalar Negative Multiply-Subtract Quad-Precision
              [using round to Odd]

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/fpu_helper.c             | 42 +++++++++++++++++++++++++++++
 target/ppc/helper.h                 |  9 +++++++
 target/ppc/insn32.decode            |  4 +++
 target/ppc/translate/vsx-impl.c.inc | 25 +++++++++++++++++
 4 files changed, 80 insertions(+)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index c8797d8053..98e9576608 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2222,6 +2222,48 @@ VSX_MADD(xvmsubsp, 4, float32, VsrW(i), MSUB_FLGS, 0, 0)
 VSX_MADD(xvnmaddsp, 4, float32, VsrW(i), NMADD_FLGS, 0, 0)
 VSX_MADD(xvnmsubsp, 4, float32, VsrW(i), NMSUB_FLGS, 0, 0)
 
+/*
+ * VSX_MADDQ - VSX floating point quad-precision muliply/add
+ *   op    - instruction mnemonic
+ *   maddflgs - flags for the float*muladd routine that control the
+ *           various forms (madd, msub, nmadd, nmsub)
+ *   ro    - round to odd
+ */
+#define VSX_MADDQ(op, maddflgs, ro)                                            \
+void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *s1, ppc_vsr_t *s2,\
+                 ppc_vsr_t *s3)                                                \
+{                                                                              \
+    ppc_vsr_t t = *xt;                                                         \
+                                                                               \
+    helper_reset_fpstatus(env);                                                \
+                                                                               \
+    float_status tstat = env->fp_status;                                       \
+    set_float_exception_flags(0, &tstat);                                      \
+    if (ro) {                                                                  \
+        tstat.float_rounding_mode = float_round_to_odd;                        \
+    }                                                                          \
+    t.f128 = float128_muladd(s1->f128, s3->f128, s2->f128, maddflgs, &tstat);  \
+    env->fp_status.float_exception_flags |= tstat.float_exception_flags;       \
+                                                                               \
+    if (unlikely(tstat.float_exception_flags & float_flag_invalid)) {          \
+        float_invalid_op_madd(env, tstat.float_exception_flags,                \
+                              false, GETPC());                                 \
+    }                                                                          \
+                                                                               \
+    helper_compute_fprf_float128(env, t.f128);                                 \
+    *xt = t;                                                                   \
+    do_float_check_status(env, GETPC());                                       \
+}
+
+VSX_MADDQ(XSMADDQP, MADD_FLGS, 0)
+VSX_MADDQ(XSMADDQPO, MADD_FLGS, 1)
+VSX_MADDQ(XSMSUBQP, MSUB_FLGS, 0)
+VSX_MADDQ(XSMSUBQPO, MSUB_FLGS, 1)
+VSX_MADDQ(XSNMADDQP, NMADD_FLGS, 0)
+VSX_MADDQ(XSNMADDQPO, NMADD_FLGS, 1)
+VSX_MADDQ(XSNMSUBQP, NMSUB_FLGS, 0)
+VSX_MADDQ(XSNMSUBQPO, NMSUB_FLGS, 0)
+
 /*
  * VSX_SCALAR_CMP_DP - VSX scalar floating point compare double precision
  *   op    - instruction mnemonic
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index fd249a22f0..1649fffff8 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -425,6 +425,15 @@ DEF_HELPER_5(XSMSUBSP, void, env, vsr, vsr, vsr, vsr)
 DEF_HELPER_5(XSNMADDSP, void, env, vsr, vsr, vsr, vsr)
 DEF_HELPER_5(XSNMSUBSP, void, env, vsr, vsr, vsr, vsr)
 
+DEF_HELPER_5(XSMADDQP, void, env, vsr, vsr, vsr, vsr)
+DEF_HELPER_5(XSMADDQPO, void, env, vsr, vsr, vsr, vsr)
+DEF_HELPER_5(XSMSUBQP, void, env, vsr, vsr, vsr, vsr)
+DEF_HELPER_5(XSMSUBQPO, void, env, vsr, vsr, vsr, vsr)
+DEF_HELPER_5(XSNMADDQP, void, env, vsr, vsr, vsr, vsr)
+DEF_HELPER_5(XSNMADDQPO, void, env, vsr, vsr, vsr, vsr)
+DEF_HELPER_5(XSNMSUBQP, void, env, vsr, vsr, vsr, vsr)
+DEF_HELPER_5(XSNMSUBQPO, void, env, vsr, vsr, vsr, vsr)
+
 DEF_HELPER_4(xvadddp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xvsubdp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xvmuldp, void, env, vsr, vsr, vsr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 881b7093f6..1395a91c44 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -609,21 +609,25 @@ XSMADDADP       111100 ..... ..... ..... 00100001 . . . @XX3
 XSMADDMDP       111100 ..... ..... ..... 00101001 . . . @XX3
 XSMADDASP       111100 ..... ..... ..... 00000001 . . . @XX3
 XSMADDMSP       111100 ..... ..... ..... 00001001 . . . @XX3
+XSMADDQP        111111 ..... ..... ..... 0110000100 .   @X_rc
 
 XSMSUBADP       111100 ..... ..... ..... 00110001 . . . @XX3
 XSMSUBMDP       111100 ..... ..... ..... 00111001 . . . @XX3
 XSMSUBASP       111100 ..... ..... ..... 00010001 . . . @XX3
 XSMSUBMSP       111100 ..... ..... ..... 00011001 . . . @XX3
+XSMSUBQP        111111 ..... ..... ..... 0110100100 .   @X_rc
 
 XSNMADDASP      111100 ..... ..... ..... 10000001 . . . @XX3
 XSNMADDMSP      111100 ..... ..... ..... 10001001 . . . @XX3
 XSNMADDADP      111100 ..... ..... ..... 10100001 . . . @XX3
 XSNMADDMDP      111100 ..... ..... ..... 10101001 . . . @XX3
+XSNMADDQP       111111 ..... ..... ..... 0111000100 .   @X_rc
 
 XSNMSUBASP      111100 ..... ..... ..... 10010001 . . . @XX3
 XSNMSUBMSP      111100 ..... ..... ..... 10011001 . . . @XX3
 XSNMSUBADP      111100 ..... ..... ..... 10110001 . . . @XX3
 XSNMSUBMDP      111100 ..... ..... ..... 10111001 . . . @XX3
+XSNMSUBQP       111111 ..... ..... ..... 0111100100 .   @X_rc
 
 ## VSX splat instruction
 
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index a54afb4dbb..9128407365 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -1333,6 +1333,31 @@ TRANS_FLAGS2(VSX207, XSNMADDMSP, do_xsmadd_XX3, false, gen_helper_XSNMADDSP)
 TRANS_FLAGS2(VSX207, XSNMSUBASP, do_xsmadd_XX3, true, gen_helper_XSNMSUBSP)
 TRANS_FLAGS2(VSX207, XSNMSUBMSP, do_xsmadd_XX3, false, gen_helper_XSNMSUBSP)
 
+static bool do_xsmadd_X(DisasContext *ctx, arg_X_rc *a,
+        void (gen_helper)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr),
+        void (gen_helper_ro)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr))
+{
+    int vrt, vra, vrb;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+    REQUIRE_VSX(ctx);
+
+    vrt = a->rt + 32;
+    vra = a->ra + 32;
+    vrb = a->rb + 32;
+
+    if (a->rc) {
+        return do_xsmadd(ctx, vrt, vra, vrt, vrb, gen_helper_ro);
+    }
+
+    return do_xsmadd(ctx, vrt, vra, vrt, vrb, gen_helper);
+}
+
+TRANS(XSMADDQP, do_xsmadd_X, gen_helper_XSMADDQP, gen_helper_XSMADDQPO)
+TRANS(XSMSUBQP, do_xsmadd_X, gen_helper_XSMSUBQP, gen_helper_XSMSUBQPO)
+TRANS(XSNMADDQP, do_xsmadd_X, gen_helper_XSNMADDQP, gen_helper_XSNMADDQPO)
+TRANS(XSNMSUBQP, do_xsmadd_X, gen_helper_XSNMSUBQP, gen_helper_XSNMSUBQPO)
+
 #define GEN_VSX_HELPER_VSX_MADD(name, op1, aop, mop, inval, type)             \
 static void gen_##name(DisasContext *ctx)                                     \
 {                                                                             \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 36/47] target/ppc: Implement xvtlsbb instruction
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (34 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 35/47] target/ppc: implement xs[n]maddqp[o]/xs[n]msubqp[o] matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-23  0:07   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 37/47] target/ppc: Remove xscmpnedp instruction matheus.ferst
                   ` (10 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, Víctor Colombo, clg,
	Matheus Ferst, david

From: Víctor Colombo <victor.colombo@eldorado.org.br>

Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/insn32.decode            |  7 ++++++
 target/ppc/translate/vsx-impl.c.inc | 37 +++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 1395a91c44..2617ab8ca4 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -155,6 +155,9 @@
 &XX2            xt xb uim:uint8_t
 @XX2            ...... ..... ... uim:2 ..... ......... ..       &XX2 xt=%xx_xt xb=%xx_xb
 
+&XX2_bf_xb      bf xb
+@XX2_bf_xb      ...... bf:3 .. ..... ..... ......... . .        &XX2_bf_xb xb=%xx_xb
+
 &XX3            xt xa xb
 @XX3            ...... ..... ..... ..... ........ ...           &XX3 xt=%xx_xt xa=%xx_xa xb=%xx_xb
 
@@ -664,6 +667,10 @@ XSMINJDP        111100 ..... ..... ..... 10011000 ...   @XX3
 
 XSCVQPDP        111111 ..... 10100 ..... 1101000100 .   @X_tb_rc
 
+## VSX Vector Test Least-Significant Bit by Byte Instruction
+
+XVTLSBB         111100 ... -- 00010 ..... 111011011 . - @XX2_bf_xb
+
 ### rfebb
 &XL_s           s:uint8_t
 @XL_s           ......-------------- s:1 .......... -   &XL_s
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 9128407365..2aecaa8021 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -1690,6 +1690,43 @@ static bool trans_LXVKQ(DisasContext *ctx, arg_X_uim5 *a)
     return true;
 }
 
+static bool trans_XVTLSBB(DisasContext *ctx, arg_XX2_bf_xb *a)
+{
+    TCGv_i64 xb, tmp, all_true, all_false, mask, zero;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VSX(ctx);
+
+    xb = tcg_temp_new_i64();
+    tmp = tcg_temp_new_i64();
+    all_true = tcg_const_i64(0b1000);
+    all_false = tcg_const_i64(0b0010);
+    mask = tcg_constant_i64(dup_const(MO_8, 1));
+    zero = tcg_constant_i64(0);
+
+    for (int dw = 0; dw < 2; dw++) {
+        get_cpu_vsr(xb, a->xb, dw);
+
+        tcg_gen_and_i64(tmp, mask, xb);
+        tcg_gen_movcond_i64(TCG_COND_EQ, all_true, tmp,
+                            mask, all_true, zero);
+
+        tcg_gen_andc_i64(tmp, mask, xb);
+        tcg_gen_movcond_i64(TCG_COND_EQ, all_false, tmp,
+                            mask, all_false, zero);
+    }
+
+    tcg_gen_or_i64(tmp, all_false, all_true);
+    tcg_gen_extrl_i64_i32(cpu_crf[a->bf], tmp);
+
+    tcg_temp_free_i64(xb);
+    tcg_temp_free_i64(tmp);
+    tcg_temp_free_i64(all_true);
+    tcg_temp_free_i64(all_false);
+
+    return true;
+}
+
 static void gen_xxsldwi(DisasContext *ctx)
 {
     TCGv_i64 xth, xtl;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 37/47] target/ppc: Remove xscmpnedp instruction
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (35 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 36/47] target/ppc: Implement xvtlsbb instruction matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-22 14:36 ` [PATCH v4 38/47] target/ppc: Refactor VSX_SCALAR_CMP_DP matheus.ferst
                   ` (9 subsequent siblings)
  46 siblings, 0 replies; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, Víctor Colombo, clg,
	Matheus Ferst, david

From: Víctor Colombo <victor.colombo@eldorado.org.br>

xscmpnedp was added in ISA v3.0 but removed in v3.0B. This patch
removes this instruction as it was not in the final version of v3.0.

Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Acked-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/fpu_helper.c             | 1 -
 target/ppc/helper.h                 | 1 -
 target/ppc/translate/vsx-impl.c.inc | 1 -
 target/ppc/translate/vsx-ops.c.inc  | 1 -
 4 files changed, 4 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 98e9576608..9b034d1fe4 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2313,7 +2313,6 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt,                             \
 VSX_SCALAR_CMP_DP(xscmpeqdp, eq, 1, 0)
 VSX_SCALAR_CMP_DP(xscmpgedp, le, 1, 1)
 VSX_SCALAR_CMP_DP(xscmpgtdp, lt, 1, 1)
-VSX_SCALAR_CMP_DP(xscmpnedp, eq, 0, 0)
 
 void helper_xscmpexpdp(CPUPPCState *env, uint32_t opcode,
                        ppc_vsr_t *xa, ppc_vsr_t *xb)
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 1649fffff8..ee2a89b89d 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -364,7 +364,6 @@ DEF_HELPER_5(XSNMSUBDP, void, env, vsr, vsr, vsr, vsr)
 DEF_HELPER_4(xscmpeqdp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xscmpgtdp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xscmpgedp, void, env, vsr, vsr, vsr)
-DEF_HELPER_4(xscmpnedp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xscmpexpdp, void, env, i32, vsr, vsr)
 DEF_HELPER_4(xscmpexpqp, void, env, i32, vsr, vsr)
 DEF_HELPER_4(xscmpodp, void, env, i32, vsr, vsr)
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 2aecaa8021..751b941bac 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -1055,7 +1055,6 @@ GEN_VSX_HELPER_X1(xstsqrtdp, 0x14, 0x06, 0, PPC2_VSX)
 GEN_VSX_HELPER_X3(xscmpeqdp, 0x0C, 0x00, 0, PPC2_ISA300)
 GEN_VSX_HELPER_X3(xscmpgtdp, 0x0C, 0x01, 0, PPC2_ISA300)
 GEN_VSX_HELPER_X3(xscmpgedp, 0x0C, 0x02, 0, PPC2_ISA300)
-GEN_VSX_HELPER_X3(xscmpnedp, 0x0C, 0x03, 0, PPC2_ISA300)
 GEN_VSX_HELPER_X2_AB(xscmpexpdp, 0x0C, 0x07, 0, PPC2_ISA300)
 GEN_VSX_HELPER_R2_AB(xscmpexpqp, 0x04, 0x05, 0, PPC2_ISA300)
 GEN_VSX_HELPER_X2_AB(xscmpodp, 0x0C, 0x05, 0, PPC2_VSX)
diff --git a/target/ppc/translate/vsx-ops.c.inc b/target/ppc/translate/vsx-ops.c.inc
index 9cfec53df0..34310c1fb5 100644
--- a/target/ppc/translate/vsx-ops.c.inc
+++ b/target/ppc/translate/vsx-ops.c.inc
@@ -189,7 +189,6 @@ GEN_XX2FORM(xstsqrtdp,  0x14, 0x06, PPC2_VSX),
 GEN_XX3FORM(xscmpeqdp, 0x0C, 0x00, PPC2_ISA300),
 GEN_XX3FORM(xscmpgtdp, 0x0C, 0x01, PPC2_ISA300),
 GEN_XX3FORM(xscmpgedp, 0x0C, 0x02, PPC2_ISA300),
-GEN_XX3FORM(xscmpnedp, 0x0C, 0x03, PPC2_ISA300),
 GEN_XX3FORM(xscmpexpdp, 0x0C, 0x07, PPC2_ISA300),
 GEN_VSX_XFORM_300(xscmpexpqp, 0x04, 0x05, 0x00600001),
 GEN_XX2IFORM(xscmpodp,  0x0C, 0x05, PPC2_VSX),
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 38/47] target/ppc: Refactor VSX_SCALAR_CMP_DP
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (36 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 37/47] target/ppc: Remove xscmpnedp instruction matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-23  0:20   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 39/47] target/ppc: Implement xscmp{eq,ge,gt}qp matheus.ferst
                   ` (8 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, Víctor Colombo, clg,
	Matheus Ferst, david

From: Víctor Colombo <victor.colombo@eldorado.org.br>

Refactor VSX_SCALAR_CMP_DP, changing its name to VSX_SCALAR_CMP and
prepare the helper to be used for quadword comparisons.

Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/fpu_helper.c | 31 ++++++++++++++-----------------
 1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 9b034d1fe4..5ebbcfe3b7 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2265,28 +2265,30 @@ VSX_MADDQ(XSNMSUBQP, NMSUB_FLGS, 0)
 VSX_MADDQ(XSNMSUBQPO, NMSUB_FLGS, 0)
 
 /*
- * VSX_SCALAR_CMP_DP - VSX scalar floating point compare double precision
+ * VSX_SCALAR_CMP - VSX scalar floating point compare
  *   op    - instruction mnemonic
+ *   tp    - type
  *   cmp   - comparison operation
  *   exp   - expected result of comparison
+ *   fld   - vsr_t field
  *   svxvc - set VXVC bit
  */
-#define VSX_SCALAR_CMP_DP(op, cmp, exp, svxvc)                                \
+#define VSX_SCALAR_CMP(op, tp, cmp, fld, exp, svxvc)                          \
 void helper_##op(CPUPPCState *env, ppc_vsr_t *xt,                             \
                  ppc_vsr_t *xa, ppc_vsr_t *xb)                                \
 {                                                                             \
-    ppc_vsr_t t = *xt;                                                        \
+    ppc_vsr_t t = { };                                                        \
     bool vxsnan_flag = false, vxvc_flag = false, vex_flag = false;            \
                                                                               \
-    if (float64_is_signaling_nan(xa->VsrD(0), &env->fp_status) ||             \
-        float64_is_signaling_nan(xb->VsrD(0), &env->fp_status)) {             \
+    if (tp##_is_signaling_nan(xa->fld, &env->fp_status) ||                    \
+        tp##_is_signaling_nan(xb->fld, &env->fp_status)) {                    \
         vxsnan_flag = true;                                                   \
         if (fpscr_ve == 0 && svxvc) {                                         \
             vxvc_flag = true;                                                 \
         }                                                                     \
     } else if (svxvc) {                                                       \
-        vxvc_flag = float64_is_quiet_nan(xa->VsrD(0), &env->fp_status) ||     \
-            float64_is_quiet_nan(xb->VsrD(0), &env->fp_status);               \
+        vxvc_flag = tp##_is_quiet_nan(xa->fld, &env->fp_status) ||            \
+            tp##_is_quiet_nan(xb->fld, &env->fp_status);                      \
     }                                                                         \
     if (vxsnan_flag) {                                                        \
         float_invalid_op_vxsnan(env, GETPC());                                \
@@ -2297,22 +2299,17 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt,                             \
     vex_flag = fpscr_ve && (vxvc_flag || vxsnan_flag);                        \
                                                                               \
     if (!vex_flag) {                                                          \
-        if (float64_##cmp(xb->VsrD(0), xa->VsrD(0),                           \
-                          &env->fp_status) == exp) {                          \
-            t.VsrD(0) = -1;                                                   \
-            t.VsrD(1) = 0;                                                    \
-        } else {                                                              \
-            t.VsrD(0) = 0;                                                    \
-            t.VsrD(1) = 0;                                                    \
+        if (tp##_##cmp(xb->fld, xa->fld, &env->fp_status) == exp) {           \
+            memset(&t.fld, 0xFF, sizeof(t.fld));                              \
         }                                                                     \
     }                                                                         \
     *xt = t;                                                                  \
     do_float_check_status(env, GETPC());                                      \
 }
 
-VSX_SCALAR_CMP_DP(xscmpeqdp, eq, 1, 0)
-VSX_SCALAR_CMP_DP(xscmpgedp, le, 1, 1)
-VSX_SCALAR_CMP_DP(xscmpgtdp, lt, 1, 1)
+VSX_SCALAR_CMP(xscmpeqdp, float64, eq, VsrD(0), 1, 0)
+VSX_SCALAR_CMP(xscmpgedp, float64, le, VsrD(0), 1, 1)
+VSX_SCALAR_CMP(xscmpgtdp, float64, lt, VsrD(0), 1, 1)
 
 void helper_xscmpexpdp(CPUPPCState *env, uint32_t opcode,
                        ppc_vsr_t *xa, ppc_vsr_t *xb)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 39/47] target/ppc: Implement xscmp{eq,ge,gt}qp
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (37 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 38/47] target/ppc: Refactor VSX_SCALAR_CMP_DP matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-23  0:21   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 40/47] target/ppc: Move xscmp{eq,ge,gt}dp to decodetree matheus.ferst
                   ` (7 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, Víctor Colombo, clg,
	Matheus Ferst, david

From: Víctor Colombo <victor.colombo@eldorado.org.br>

Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/fpu_helper.c             |  4 ++++
 target/ppc/helper.h                 |  3 +++
 target/ppc/insn32.decode            |  3 +++
 target/ppc/translate/vsx-impl.c.inc | 31 +++++++++++++++++++++++++++++
 4 files changed, 41 insertions(+)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 5ebbcfe3b7..eb62ae5455 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2311,6 +2311,10 @@ VSX_SCALAR_CMP(xscmpeqdp, float64, eq, VsrD(0), 1, 0)
 VSX_SCALAR_CMP(xscmpgedp, float64, le, VsrD(0), 1, 1)
 VSX_SCALAR_CMP(xscmpgtdp, float64, lt, VsrD(0), 1, 1)
 
+VSX_SCALAR_CMP(XSCMPEQQP, float128, eq, f128, 1, 0)
+VSX_SCALAR_CMP(XSCMPGEQP, float128, le, f128, 1, 1)
+VSX_SCALAR_CMP(XSCMPGTQP, float128, lt, f128, 1, 1)
+
 void helper_xscmpexpdp(CPUPPCState *env, uint32_t opcode,
                        ppc_vsr_t *xa, ppc_vsr_t *xb)
 {
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index ee2a89b89d..e44de15d07 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -364,6 +364,9 @@ DEF_HELPER_5(XSNMSUBDP, void, env, vsr, vsr, vsr, vsr)
 DEF_HELPER_4(xscmpeqdp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xscmpgtdp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xscmpgedp, void, env, vsr, vsr, vsr)
+DEF_HELPER_4(XSCMPEQQP, void, env, vsr, vsr, vsr)
+DEF_HELPER_4(XSCMPGTQP, void, env, vsr, vsr, vsr)
+DEF_HELPER_4(XSCMPGEQP, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xscmpexpdp, void, env, i32, vsr, vsr)
 DEF_HELPER_4(xscmpexpqp, void, env, i32, vsr, vsr)
 DEF_HELPER_4(xscmpodp, void, env, i32, vsr, vsr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 2617ab8ca4..d5c3bd13f7 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -662,6 +662,9 @@ XSMAXCDP        111100 ..... ..... ..... 10000000 ...   @XX3
 XSMINCDP        111100 ..... ..... ..... 10001000 ...   @XX3
 XSMAXJDP        111100 ..... ..... ..... 10010000 ...   @XX3
 XSMINJDP        111100 ..... ..... ..... 10011000 ...   @XX3
+XSCMPEQQP       111111 ..... ..... ..... 0001000100 -   @X
+XSCMPGEQP       111111 ..... ..... ..... 0011000100 -   @X
+XSCMPGTQP       111111 ..... ..... ..... 0011100100 -   @X
 
 ## VSX Binary Floating-Point Convert Instructions
 
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 751b941bac..f0d02e61fc 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2499,6 +2499,37 @@ TRANS(XSMINCDP, do_xsmaxmincjdp, gen_helper_xsmincdp)
 TRANS(XSMAXJDP, do_xsmaxmincjdp, gen_helper_xsmaxjdp)
 TRANS(XSMINJDP, do_xsmaxmincjdp, gen_helper_xsminjdp)
 
+static bool do_helper_X(arg_X *a,
+    void (*helper)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr))
+{
+    TCGv_ptr rt, ra, rb;
+
+    rt = gen_avr_ptr(a->rt);
+    ra = gen_avr_ptr(a->ra);
+    rb = gen_avr_ptr(a->rb);
+
+    helper(cpu_env, rt, ra, rb);
+
+    tcg_temp_free_ptr(rt);
+    tcg_temp_free_ptr(ra);
+    tcg_temp_free_ptr(rb);
+
+    return true;
+}
+
+static bool do_xscmpqp(DisasContext *ctx, arg_X *a,
+    void (*helper)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr))
+{
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VSX(ctx);
+
+    return do_helper_X(a, helper);
+}
+
+TRANS(XSCMPEQQP, do_xscmpqp, gen_helper_XSCMPEQQP)
+TRANS(XSCMPGEQP, do_xscmpqp, gen_helper_XSCMPGEQP)
+TRANS(XSCMPGTQP, do_xscmpqp, gen_helper_XSCMPGTQP)
+
 #undef GEN_XX2FORM
 #undef GEN_XX3FORM
 #undef GEN_XX2IFORM
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 40/47] target/ppc: Move xscmp{eq,ge,gt}dp to decodetree
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (38 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 39/47] target/ppc: Implement xscmp{eq,ge,gt}qp matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-23  0:22   ` [PATCH v4 40/47] target/ppc: Move xscmp{eq, ge, gt}dp " Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 41/47] target/ppc: Move xs{max, min}[cj]dp to use do_helper_XX3 matheus.ferst
                   ` (6 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, Víctor Colombo, clg,
	Matheus Ferst, david

From: Víctor Colombo <victor.colombo@eldorado.org.br>

Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/fpu_helper.c             |  7 +++----
 target/ppc/helper.h                 |  6 +++---
 target/ppc/insn32.decode            |  3 +++
 target/ppc/translate/vsx-impl.c.inc | 28 +++++++++++++++++++++++++---
 target/ppc/translate/vsx-ops.c.inc  |  3 ---
 5 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index eb62ae5455..bfe49a63f8 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2307,10 +2307,9 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt,                             \
     do_float_check_status(env, GETPC());                                      \
 }
 
-VSX_SCALAR_CMP(xscmpeqdp, float64, eq, VsrD(0), 1, 0)
-VSX_SCALAR_CMP(xscmpgedp, float64, le, VsrD(0), 1, 1)
-VSX_SCALAR_CMP(xscmpgtdp, float64, lt, VsrD(0), 1, 1)
-
+VSX_SCALAR_CMP(XSCMPEQDP, float64, eq, VsrD(0), 1, 0)
+VSX_SCALAR_CMP(XSCMPGEDP, float64, le, VsrD(0), 1, 1)
+VSX_SCALAR_CMP(XSCMPGTDP, float64, lt, VsrD(0), 1, 1)
 VSX_SCALAR_CMP(XSCMPEQQP, float128, eq, f128, 1, 0)
 VSX_SCALAR_CMP(XSCMPGEQP, float128, le, f128, 1, 1)
 VSX_SCALAR_CMP(XSCMPGTQP, float128, lt, f128, 1, 1)
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index e44de15d07..8a57a48200 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -361,9 +361,9 @@ DEF_HELPER_5(XSMADDDP, void, env, vsr, vsr, vsr, vsr)
 DEF_HELPER_5(XSMSUBDP, void, env, vsr, vsr, vsr, vsr)
 DEF_HELPER_5(XSNMADDDP, void, env, vsr, vsr, vsr, vsr)
 DEF_HELPER_5(XSNMSUBDP, void, env, vsr, vsr, vsr, vsr)
-DEF_HELPER_4(xscmpeqdp, void, env, vsr, vsr, vsr)
-DEF_HELPER_4(xscmpgtdp, void, env, vsr, vsr, vsr)
-DEF_HELPER_4(xscmpgedp, void, env, vsr, vsr, vsr)
+DEF_HELPER_4(XSCMPEQDP, void, env, vsr, vsr, vsr)
+DEF_HELPER_4(XSCMPGTDP, void, env, vsr, vsr, vsr)
+DEF_HELPER_4(XSCMPGEDP, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(XSCMPEQQP, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(XSCMPGTQP, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(XSCMPGEQP, void, env, vsr, vsr, vsr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index d5c3bd13f7..a6e3855958 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -662,6 +662,9 @@ XSMAXCDP        111100 ..... ..... ..... 10000000 ...   @XX3
 XSMINCDP        111100 ..... ..... ..... 10001000 ...   @XX3
 XSMAXJDP        111100 ..... ..... ..... 10010000 ...   @XX3
 XSMINJDP        111100 ..... ..... ..... 10011000 ...   @XX3
+XSCMPEQDP       111100 ..... ..... ..... 00000011 ...   @XX3
+XSCMPGEDP       111100 ..... ..... ..... 00010011 ...   @XX3
+XSCMPGTDP       111100 ..... ..... ..... 00001011 ...   @XX3
 XSCMPEQQP       111111 ..... ..... ..... 0001000100 -   @X
 XSCMPGEQP       111111 ..... ..... ..... 0011000100 -   @X
 XSCMPGTQP       111111 ..... ..... ..... 0011100100 -   @X
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index f0d02e61fc..29f04a4178 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -1052,9 +1052,6 @@ GEN_VSX_HELPER_X2(xssqrtdp, 0x16, 0x04, 0, PPC2_VSX)
 GEN_VSX_HELPER_X2(xsrsqrtedp, 0x14, 0x04, 0, PPC2_VSX)
 GEN_VSX_HELPER_X2_AB(xstdivdp, 0x14, 0x07, 0, PPC2_VSX)
 GEN_VSX_HELPER_X1(xstsqrtdp, 0x14, 0x06, 0, PPC2_VSX)
-GEN_VSX_HELPER_X3(xscmpeqdp, 0x0C, 0x00, 0, PPC2_ISA300)
-GEN_VSX_HELPER_X3(xscmpgtdp, 0x0C, 0x01, 0, PPC2_ISA300)
-GEN_VSX_HELPER_X3(xscmpgedp, 0x0C, 0x02, 0, PPC2_ISA300)
 GEN_VSX_HELPER_X2_AB(xscmpexpdp, 0x0C, 0x07, 0, PPC2_ISA300)
 GEN_VSX_HELPER_R2_AB(xscmpexpqp, 0x04, 0x05, 0, PPC2_ISA300)
 GEN_VSX_HELPER_X2_AB(xscmpodp, 0x0C, 0x05, 0, PPC2_VSX)
@@ -2473,6 +2470,31 @@ TRANS(XXBLENDVH, do_xxblendv, MO_16)
 TRANS(XXBLENDVW, do_xxblendv, MO_32)
 TRANS(XXBLENDVD, do_xxblendv, MO_64)
 
+static bool do_helper_XX3(DisasContext *ctx, arg_XX3 *a,
+    void (*helper)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr))
+{
+    TCGv_ptr xt, xa, xb;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+    REQUIRE_VSX(ctx);
+
+    xt = gen_vsr_ptr(a->xt);
+    xa = gen_vsr_ptr(a->xa);
+    xb = gen_vsr_ptr(a->xb);
+
+    helper(cpu_env, xt, xa, xb);
+
+    tcg_temp_free_ptr(xt);
+    tcg_temp_free_ptr(xa);
+    tcg_temp_free_ptr(xb);
+
+    return true;
+}
+
+TRANS(XSCMPEQDP, do_helper_XX3, gen_helper_XSCMPEQDP)
+TRANS(XSCMPGEDP, do_helper_XX3, gen_helper_XSCMPGEDP)
+TRANS(XSCMPGTDP, do_helper_XX3, gen_helper_XSCMPGTDP)
+
 static bool do_xsmaxmincjdp(DisasContext *ctx, arg_XX3 *a,
                             void (*helper)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr))
 {
diff --git a/target/ppc/translate/vsx-ops.c.inc b/target/ppc/translate/vsx-ops.c.inc
index 34310c1fb5..b8fd116728 100644
--- a/target/ppc/translate/vsx-ops.c.inc
+++ b/target/ppc/translate/vsx-ops.c.inc
@@ -186,9 +186,6 @@ GEN_XX2FORM(xssqrtdp,  0x16, 0x04, PPC2_VSX),
 GEN_XX2FORM(xsrsqrtedp,  0x14, 0x04, PPC2_VSX),
 GEN_XX3FORM(xstdivdp,  0x14, 0x07, PPC2_VSX),
 GEN_XX2FORM(xstsqrtdp,  0x14, 0x06, PPC2_VSX),
-GEN_XX3FORM(xscmpeqdp, 0x0C, 0x00, PPC2_ISA300),
-GEN_XX3FORM(xscmpgtdp, 0x0C, 0x01, PPC2_ISA300),
-GEN_XX3FORM(xscmpgedp, 0x0C, 0x02, PPC2_ISA300),
 GEN_XX3FORM(xscmpexpdp, 0x0C, 0x07, PPC2_ISA300),
 GEN_VSX_XFORM_300(xscmpexpqp, 0x04, 0x05, 0x00600001),
 GEN_XX2IFORM(xscmpodp,  0x0C, 0x05, PPC2_VSX),
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 41/47] target/ppc: Move xs{max, min}[cj]dp to use do_helper_XX3
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (39 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 40/47] target/ppc: Move xscmp{eq,ge,gt}dp to decodetree matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-23  0:23   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 42/47] target/ppc: Refactor VSX_MAX_MINC helper matheus.ferst
                   ` (5 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, Víctor Colombo, clg,
	Matheus Ferst, david

From: Víctor Colombo <victor.colombo@eldorado.org.br>

Also, fixes these instructions not being capitalized.

Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/fpu_helper.c             |  8 ++++----
 target/ppc/helper.h                 |  8 ++++----
 target/ppc/translate/vsx-impl.c.inc | 30 ++++-------------------------
 3 files changed, 12 insertions(+), 34 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index bfe49a63f8..7ae576cba9 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2568,8 +2568,8 @@ void helper_##name(CPUPPCState *env,                                          \
     }                                                                         \
 }                                                                             \
 
-VSX_MAX_MINC(xsmaxcdp, 1);
-VSX_MAX_MINC(xsmincdp, 0);
+VSX_MAX_MINC(XSMAXCDP, 1);
+VSX_MAX_MINC(XSMINCDP, 0);
 
 #define VSX_MAX_MINJ(name, max)                                               \
 void helper_##name(CPUPPCState *env,                                          \
@@ -2623,8 +2623,8 @@ void helper_##name(CPUPPCState *env,                                          \
     }                                                                         \
 }                                                                             \
 
-VSX_MAX_MINJ(xsmaxjdp, 1);
-VSX_MAX_MINJ(xsminjdp, 0);
+VSX_MAX_MINJ(XSMAXJDP, 1);
+VSX_MAX_MINJ(XSMINJDP, 0);
 
 /*
  * VSX_CMP - VSX floating point compare
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 8a57a48200..3a1cb9abf5 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -375,10 +375,10 @@ DEF_HELPER_4(xscmpoqp, void, env, i32, vsr, vsr)
 DEF_HELPER_4(xscmpuqp, void, env, i32, vsr, vsr)
 DEF_HELPER_4(xsmaxdp, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(xsmindp, void, env, vsr, vsr, vsr)
-DEF_HELPER_4(xsmaxcdp, void, env, vsr, vsr, vsr)
-DEF_HELPER_4(xsmincdp, void, env, vsr, vsr, vsr)
-DEF_HELPER_4(xsmaxjdp, void, env, vsr, vsr, vsr)
-DEF_HELPER_4(xsminjdp, void, env, vsr, vsr, vsr)
+DEF_HELPER_4(XSMAXCDP, void, env, vsr, vsr, vsr)
+DEF_HELPER_4(XSMINCDP, void, env, vsr, vsr, vsr)
+DEF_HELPER_4(XSMAXJDP, void, env, vsr, vsr, vsr)
+DEF_HELPER_4(XSMINJDP, void, env, vsr, vsr, vsr)
 DEF_HELPER_3(xscvdphp, void, env, vsr, vsr)
 DEF_HELPER_4(xscvdpqp, void, env, i32, vsr, vsr)
 DEF_HELPER_3(xscvdpsp, void, env, vsr, vsr)
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 29f04a4178..730f073cf5 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2494,32 +2494,10 @@ static bool do_helper_XX3(DisasContext *ctx, arg_XX3 *a,
 TRANS(XSCMPEQDP, do_helper_XX3, gen_helper_XSCMPEQDP)
 TRANS(XSCMPGEDP, do_helper_XX3, gen_helper_XSCMPGEDP)
 TRANS(XSCMPGTDP, do_helper_XX3, gen_helper_XSCMPGTDP)
-
-static bool do_xsmaxmincjdp(DisasContext *ctx, arg_XX3 *a,
-                            void (*helper)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr))
-{
-    TCGv_ptr xt, xa, xb;
-
-    REQUIRE_INSNS_FLAGS2(ctx, ISA300);
-    REQUIRE_VSX(ctx);
-
-    xt = gen_vsr_ptr(a->xt);
-    xa = gen_vsr_ptr(a->xa);
-    xb = gen_vsr_ptr(a->xb);
-
-    helper(cpu_env, xt, xa, xb);
-
-    tcg_temp_free_ptr(xt);
-    tcg_temp_free_ptr(xa);
-    tcg_temp_free_ptr(xb);
-
-    return true;
-}
-
-TRANS(XSMAXCDP, do_xsmaxmincjdp, gen_helper_xsmaxcdp)
-TRANS(XSMINCDP, do_xsmaxmincjdp, gen_helper_xsmincdp)
-TRANS(XSMAXJDP, do_xsmaxmincjdp, gen_helper_xsmaxjdp)
-TRANS(XSMINJDP, do_xsmaxmincjdp, gen_helper_xsminjdp)
+TRANS(XSMAXCDP, do_helper_XX3, gen_helper_XSMAXCDP)
+TRANS(XSMINCDP, do_helper_XX3, gen_helper_XSMINCDP)
+TRANS(XSMAXJDP, do_helper_XX3, gen_helper_XSMAXJDP)
+TRANS(XSMINJDP, do_helper_XX3, gen_helper_XSMINJDP)
 
 static bool do_helper_X(arg_X *a,
     void (*helper)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr))
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 42/47] target/ppc: Refactor VSX_MAX_MINC helper
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (40 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 41/47] target/ppc: Move xs{max, min}[cj]dp to use do_helper_XX3 matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-23  0:40   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 43/47] target/ppc: Implement xs{max,min}cqp matheus.ferst
                   ` (4 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, Víctor Colombo, clg,
	Matheus Ferst, david

From: Víctor Colombo <victor.colombo@eldorado.org.br>

Refactor xs{max,min}cdp VSX_MAX_MINC helper to prepare for
xs{max,min}cqp implementation.

Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/fpu_helper.c | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 7ae576cba9..f6eb8bf2d8 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2536,27 +2536,22 @@ VSX_MAX_MIN(xsmindp, minnum, 1, float64, VsrD(0))
 VSX_MAX_MIN(xvmindp, minnum, 2, float64, VsrD(i))
 VSX_MAX_MIN(xvminsp, minnum, 4, float32, VsrW(i))
 
-#define VSX_MAX_MINC(name, max)                                               \
+#define VSX_MAX_MINC(name, op, tp, fld)                                       \
 void helper_##name(CPUPPCState *env,                                          \
                    ppc_vsr_t *xt, ppc_vsr_t *xa, ppc_vsr_t *xb)               \
 {                                                                             \
     ppc_vsr_t t = { };                                                        \
     bool vxsnan_flag = false, vex_flag = false;                               \
                                                                               \
-    if (unlikely(float64_is_any_nan(xa->VsrD(0)) ||                           \
-                 float64_is_any_nan(xb->VsrD(0)))) {                          \
-        if (float64_is_signaling_nan(xa->VsrD(0), &env->fp_status) ||         \
-            float64_is_signaling_nan(xb->VsrD(0), &env->fp_status)) {         \
+    if (unlikely(tp##_is_any_nan(xa->fld) ||                                  \
+                 tp##_is_any_nan(xb->fld))) {                                 \
+        if (tp##_is_signaling_nan(xa->fld, &env->fp_status) ||                \
+            tp##_is_signaling_nan(xb->fld, &env->fp_status)) {                \
             vxsnan_flag = true;                                               \
         }                                                                     \
-        t.VsrD(0) = xb->VsrD(0);                                              \
-    } else if ((max &&                                                        \
-               !float64_lt(xa->VsrD(0), xb->VsrD(0), &env->fp_status)) ||     \
-               (!max &&                                                       \
-               float64_lt(xa->VsrD(0), xb->VsrD(0), &env->fp_status))) {      \
-        t.VsrD(0) = xa->VsrD(0);                                              \
+        t.fld = xb->fld;                                                      \
     } else {                                                                  \
-        t.VsrD(0) = xb->VsrD(0);                                              \
+        t.fld = tp##_##op(xa->fld, xb->fld, &env->fp_status);                 \
     }                                                                         \
                                                                               \
     vex_flag = fpscr_ve & vxsnan_flag;                                        \
@@ -2568,8 +2563,8 @@ void helper_##name(CPUPPCState *env,                                          \
     }                                                                         \
 }                                                                             \
 
-VSX_MAX_MINC(XSMAXCDP, 1);
-VSX_MAX_MINC(XSMINCDP, 0);
+VSX_MAX_MINC(XSMAXCDP, maxnum, float64, VsrD(0));
+VSX_MAX_MINC(XSMINCDP, minnum, float64, VsrD(0));
 
 #define VSX_MAX_MINJ(name, max)                                               \
 void helper_##name(CPUPPCState *env,                                          \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 43/47] target/ppc: Implement xs{max,min}cqp
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (41 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 42/47] target/ppc: Refactor VSX_MAX_MINC helper matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-23  0:41   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 44/47] target/ppc: Implement xvcvbf16spn and xvcvspbf16 instructions matheus.ferst
                   ` (3 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, Víctor Colombo, clg,
	Matheus Ferst, david

From: Víctor Colombo <victor.colombo@eldorado.org.br>

Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/fpu_helper.c             | 2 ++
 target/ppc/helper.h                 | 2 ++
 target/ppc/insn32.decode            | 3 +++
 target/ppc/translate/vsx-impl.c.inc | 2 ++
 4 files changed, 9 insertions(+)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index f6eb8bf2d8..7773333bd7 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2565,6 +2565,8 @@ void helper_##name(CPUPPCState *env,                                          \
 
 VSX_MAX_MINC(XSMAXCDP, maxnum, float64, VsrD(0));
 VSX_MAX_MINC(XSMINCDP, minnum, float64, VsrD(0));
+VSX_MAX_MINC(XSMAXCQP, maxnum, float128, f128);
+VSX_MAX_MINC(XSMINCQP, minnum, float128, f128);
 
 #define VSX_MAX_MINJ(name, max)                                               \
 void helper_##name(CPUPPCState *env,                                          \
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 3a1cb9abf5..d3af130dc2 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -379,6 +379,8 @@ DEF_HELPER_4(XSMAXCDP, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(XSMINCDP, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(XSMAXJDP, void, env, vsr, vsr, vsr)
 DEF_HELPER_4(XSMINJDP, void, env, vsr, vsr, vsr)
+DEF_HELPER_4(XSMAXCQP, void, env, vsr, vsr, vsr)
+DEF_HELPER_4(XSMINCQP, void, env, vsr, vsr, vsr)
 DEF_HELPER_3(xscvdphp, void, env, vsr, vsr)
 DEF_HELPER_4(xscvdpqp, void, env, i32, vsr, vsr)
 DEF_HELPER_3(xscvdpsp, void, env, vsr, vsr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index a6e3855958..892d4bfd84 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -662,6 +662,9 @@ XSMAXCDP        111100 ..... ..... ..... 10000000 ...   @XX3
 XSMINCDP        111100 ..... ..... ..... 10001000 ...   @XX3
 XSMAXJDP        111100 ..... ..... ..... 10010000 ...   @XX3
 XSMINJDP        111100 ..... ..... ..... 10011000 ...   @XX3
+XSMAXCQP        111111 ..... ..... ..... 1010100100 -   @X
+XSMINCQP        111111 ..... ..... ..... 1011100100 -   @X
+
 XSCMPEQDP       111100 ..... ..... ..... 00000011 ...   @XX3
 XSCMPGEDP       111100 ..... ..... ..... 00010011 ...   @XX3
 XSCMPGTDP       111100 ..... ..... ..... 00001011 ...   @XX3
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 730f073cf5..0546dc736e 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2529,6 +2529,8 @@ static bool do_xscmpqp(DisasContext *ctx, arg_X *a,
 TRANS(XSCMPEQQP, do_xscmpqp, gen_helper_XSCMPEQQP)
 TRANS(XSCMPGEQP, do_xscmpqp, gen_helper_XSCMPGEQP)
 TRANS(XSCMPGTQP, do_xscmpqp, gen_helper_XSCMPGTQP)
+TRANS(XSMAXCQP, do_xscmpqp, gen_helper_XSMAXCQP)
+TRANS(XSMINCQP, do_xscmpqp, gen_helper_XSMINCQP)
 
 #undef GEN_XX2FORM
 #undef GEN_XX3FORM
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 44/47] target/ppc: Implement xvcvbf16spn and xvcvspbf16 instructions
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (42 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 43/47] target/ppc: Implement xs{max,min}cqp matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-23  3:08   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 45/47] target/ppc: implement plxsd/pstxsd matheus.ferst
                   ` (2 subsequent siblings)
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, Víctor Colombo, clg,
	Matheus Ferst, david

From: Víctor Colombo <victor.colombo@eldorado.org.br>

Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/fpu_helper.c             | 21 +++++++++++++++++++
 target/ppc/helper.h                 |  1 +
 target/ppc/insn32.decode            | 11 +++++++---
 target/ppc/translate/vsx-impl.c.inc | 31 ++++++++++++++++++++++++++++-
 4 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 7773333bd7..d77900fff1 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2790,6 +2790,27 @@ VSX_CVT_FP_TO_FP_HP(xscvhpdp, 1, float16, float64, VsrH(3), VsrD(0), 1)
 VSX_CVT_FP_TO_FP_HP(xvcvsphp, 4, float32, float16, VsrW(i), VsrH(2 * i  + 1), 0)
 VSX_CVT_FP_TO_FP_HP(xvcvhpsp, 4, float16, float32, VsrH(2 * i + 1), VsrW(i), 0)
 
+void helper_XVCVSPBF16(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)
+{
+    ppc_vsr_t t = { };
+    int i;
+
+    helper_reset_fpstatus(env);
+    for (i = 0; i < 4; i++) {
+        if (unlikely(float32_is_signaling_nan(xb->VsrW(i), &env->fp_status))) {
+            float_invalid_op_vxsnan(env, GETPC());
+            t.VsrH(2 * i + 1) = float32_to_bfloat16(
+                float32_snan_to_qnan(xb->VsrW(i)), &env->fp_status);
+        } else {
+            t.VsrH(2 * i + 1) =
+                float32_to_bfloat16(xb->VsrW(i), &env->fp_status);
+        }
+    }
+
+    *xt = t;
+    do_float_check_status(env, GETPC());
+}
+
 void helper_XSCVQPDP(CPUPPCState *env, uint32_t ro, ppc_vsr_t *xt,
                      ppc_vsr_t *xb)
 {
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index d3af130dc2..805a5046d8 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -494,6 +494,7 @@ DEF_HELPER_FLAGS_4(xvcmpnesp, TCG_CALL_NO_RWG, i32, env, vsr, vsr, vsr)
 DEF_HELPER_3(xvcvspdp, void, env, vsr, vsr)
 DEF_HELPER_3(xvcvsphp, void, env, vsr, vsr)
 DEF_HELPER_3(xvcvhpsp, void, env, vsr, vsr)
+DEF_HELPER_3(XVCVSPBF16, void, env, vsr, vsr)
 DEF_HELPER_3(xvcvspsxds, void, env, vsr, vsr)
 DEF_HELPER_3(xvcvspsxws, void, env, vsr, vsr)
 DEF_HELPER_3(xvcvspuxds, void, env, vsr, vsr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 892d4bfd84..8964898f20 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -152,8 +152,11 @@
 %xx_xb          1:1 11:5
 %xx_xa          2:1 16:5
 %xx_xc          3:1 6:5
-&XX2            xt xb uim:uint8_t
-@XX2            ...... ..... ... uim:2 ..... ......... ..       &XX2 xt=%xx_xt xb=%xx_xb
+&XX2            xt xb
+@XX2            ...... ..... ..... ..... ......... ..           &XX2 xt=%xx_xt xb=%xx_xb
+
+&XX2_uim2       xt xb uim:uint8_t
+@XX2_uim2       ...... ..... ... uim:2 ..... ......... ..       &XX2_uim2 xt=%xx_xt xb=%xx_xb
 
 &XX2_bf_xb      bf xb
 @XX2_bf_xb      ...... bf:3 .. ..... ..... ......... . .        &XX2_bf_xb xb=%xx_xb
@@ -635,7 +638,7 @@ XSNMSUBQP       111111 ..... ..... ..... 0111100100 .   @X_rc
 ## VSX splat instruction
 
 XXSPLTIB        111100 ..... 00 ........ 0101101000 .   @X_imm8
-XXSPLTW         111100 ..... ---.. ..... 010100100 . .  @XX2
+XXSPLTW         111100 ..... ---.. ..... 010100100 . .  @XX2_uim2
 
 ## VSX Permute Instructions
 
@@ -675,6 +678,8 @@ XSCMPGTQP       111111 ..... ..... ..... 0011100100 -   @X
 ## VSX Binary Floating-Point Convert Instructions
 
 XSCVQPDP        111111 ..... 10100 ..... 1101000100 .   @X_tb_rc
+XVCVBF16SPN     111100 ..... 10000 ..... 111011011 ..   @XX2
+XVCVSPBF16      111100 ..... 10001 ..... 111011011 ..   @XX2
 
 ## VSX Vector Test Least-Significant Bit by Byte Instruction
 
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 0546dc736e..2930537b8e 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -1576,7 +1576,7 @@ static bool trans_XXSEL(DisasContext *ctx, arg_XX4 *a)
     return true;
 }
 
-static bool trans_XXSPLTW(DisasContext *ctx, arg_XX2 *a)
+static bool trans_XXSPLTW(DisasContext *ctx, arg_XX2_uim2 *a)
 {
     int tofs, bofs;
 
@@ -2532,6 +2532,35 @@ TRANS(XSCMPGTQP, do_xscmpqp, gen_helper_XSCMPGTQP)
 TRANS(XSMAXCQP, do_xscmpqp, gen_helper_XSMAXCQP)
 TRANS(XSMINCQP, do_xscmpqp, gen_helper_XSMINCQP)
 
+static bool trans_XVCVSPBF16(DisasContext *ctx, arg_XX2 *a)
+{
+    TCGv_ptr xt, xb;
+
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VSX(ctx);
+
+    xt = gen_vsr_ptr(a->xt);
+    xb = gen_vsr_ptr(a->xb);
+
+    gen_helper_XVCVSPBF16(cpu_env, xt, xb);
+
+    tcg_temp_free_ptr(xt);
+    tcg_temp_free_ptr(xb);
+
+    return true;
+}
+
+static bool trans_XVCVBF16SPN(DisasContext *ctx, arg_XX2 *a)
+{
+    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
+    REQUIRE_VSX(ctx);
+
+    tcg_gen_gvec_shli(MO_32, vsr_full_offset(a->xt), vsr_full_offset(a->xb),
+                      16, 16, 16);
+
+    return true;
+}
+
 #undef GEN_XX2FORM
 #undef GEN_XX3FORM
 #undef GEN_XX2IFORM
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 45/47] target/ppc: implement plxsd/pstxsd
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (43 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 44/47] target/ppc: Implement xvcvbf16spn and xvcvspbf16 instructions matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-23  3:14   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 46/47] target/ppc: implement plxssp/pstxssp matheus.ferst
  2022-02-22 14:36 ` [PATCH v4 47/47] target/ppc: implement lxvr[bhwd]/stxvr[bhwd]x matheus.ferst
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: Leandro Lupori, danielhb413, richard.henderson, groug, clg,
	Matheus Ferst, david

From: Leandro Lupori <leandro.lupori@eldorado.org.br>

Implement instructions plxsd/pstxsd and port lxsd/stxsd to decode
tree.

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/insn32.decode            |  2 ++
 target/ppc/insn64.decode            | 10 ++++++
 target/ppc/translate.c              | 14 ++------
 target/ppc/translate/vsx-impl.c.inc | 55 +++++++++++++++++++++++++++--
 4 files changed, 67 insertions(+), 14 deletions(-)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 8964898f20..d84ff333ec 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -600,6 +600,8 @@ VCLRRB          000100 ..... ..... ..... 00111001101    @VX
 
 # VSX Load/Store Instructions
 
+LXSD            111001 ..... ..... .............. 10    @DS
+STXSD           111101 ..... ..... .............. 10    @DS
 LXV             111101 ..... ..... ............ . 001   @DQ_TSX
 STXV            111101 ..... ..... ............ . 101   @DQ_TSX
 LXVP            000110 ..... ..... ............ 0000    @DQ_TSXP
diff --git a/target/ppc/insn64.decode b/target/ppc/insn64.decode
index fdb859f62d..b7426f5b24 100644
--- a/target/ppc/insn64.decode
+++ b/target/ppc/insn64.decode
@@ -32,6 +32,10 @@
                 ...... ..... ra:5 ................       \
                 &PLS_D si=%pls_si rt=%rt_tsxp
 
+@8LS_D          ...... .. . .. r:1 .. .................. \
+                ...... rt:5 ra:5 ................        \
+                &PLS_D si=%pls_si
+
 # Format 8RR:D
 %8rr_si         32:s16 0:16
 %8rr_xt         16:1 21:5
@@ -180,6 +184,12 @@ PSTFD           000001 10 0--.-- .................. \
 
 ### VSX instructions
 
+PLXSD           000001 00 0--.-- .................. \
+                101010 ..... ..... ................     @8LS_D
+
+PSTXSD          000001 00 0--.-- .................. \
+                101110 ..... ..... ................     @8LS_D
+
 PLXV            000001 00 0--.-- .................. \
                 11001 ...... ..... ................     @8LS_D_TSX
 PSTXV           000001 00 0--.-- .................. \
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index b647430012..aa860d6bf9 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -6668,7 +6668,7 @@ static bool resolve_PLS_D(DisasContext *ctx, arg_D *d, arg_PLS_D *a)
 
 #include "translate/branch-impl.c.inc"
 
-/* Handles lfdp, lxsd, lxssp */
+/* Handles lfdp, lxssp */
 static void gen_dform39(DisasContext *ctx)
 {
     switch (ctx->opcode & 0x3) {
@@ -6677,11 +6677,6 @@ static void gen_dform39(DisasContext *ctx)
             return gen_lfdp(ctx);
         }
         break;
-    case 2: /* lxsd */
-        if (ctx->insns_flags2 & PPC2_ISA300) {
-            return gen_lxsd(ctx);
-        }
-        break;
     case 3: /* lxssp */
         if (ctx->insns_flags2 & PPC2_ISA300) {
             return gen_lxssp(ctx);
@@ -6691,7 +6686,7 @@ static void gen_dform39(DisasContext *ctx)
     return gen_invalid(ctx);
 }
 
-/* handles stfdp, lxv, stxsd, stxssp lxvx */
+/* handles stfdp, lxv, stxssp lxvx */
 static void gen_dform3D(DisasContext *ctx)
 {
     if ((ctx->opcode & 3) != 1) { /* DS-FORM */
@@ -6701,11 +6696,6 @@ static void gen_dform3D(DisasContext *ctx)
                 return gen_stfdp(ctx);
             }
             break;
-        case 2: /* stxsd */
-            if (ctx->insns_flags2 & PPC2_ISA300) {
-                return gen_stxsd(ctx);
-            }
-            break;
         case 3: /* stxssp */
             if (ctx->insns_flags2 & PPC2_ISA300) {
                 return gen_stxssp(ctx);
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 2930537b8e..cabadcf106 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -309,7 +309,6 @@ static void gen_##name(DisasContext *ctx)                         \
     tcg_temp_free_i64(xth);                                       \
 }
 
-VSX_LOAD_SCALAR_DS(lxsd, ld64_i64)
 VSX_LOAD_SCALAR_DS(lxssp, ld32fs)
 
 #define VSX_STORE_SCALAR(name, operation)                     \
@@ -482,7 +481,6 @@ static void gen_##name(DisasContext *ctx)                         \
     tcg_temp_free_i64(xth);                                       \
 }
 
-VSX_STORE_SCALAR_DS(stxsd, st64_i64)
 VSX_STORE_SCALAR_DS(stxssp, st32fs)
 
 static void gen_mfvsrwz(DisasContext *ctx)
@@ -2281,6 +2279,57 @@ static bool do_lstxv_X(DisasContext *ctx, arg_X *a, bool store, bool paired)
     return do_lstxv(ctx, a->ra, cpu_gpr[a->rb], a->rt, store, paired);
 }
 
+static bool do_lstxsd(DisasContext *ctx, int rt, int ra, TCGv displ, bool store)
+{
+    TCGv ea;
+    TCGv_i64 xt;
+    MemOp mop;
+
+    if (store) {
+        REQUIRE_VECTOR(ctx);
+    } else {
+        REQUIRE_VSX(ctx);
+    }
+
+    xt = tcg_temp_new_i64();
+    mop = DEF_MEMOP(MO_UQ);
+
+    gen_set_access_type(ctx, ACCESS_INT);
+    ea = do_ea_calc(ctx, ra, displ);
+
+    if (store) {
+        get_cpu_vsr(xt, rt + 32, true);
+        tcg_gen_qemu_st_i64(xt, ea, ctx->mem_idx, mop);
+    } else {
+        tcg_gen_qemu_ld_i64(xt, ea, ctx->mem_idx, mop);
+        set_cpu_vsr(rt + 32, xt, true);
+        set_cpu_vsr(rt + 32, tcg_constant_i64(0), false);
+    }
+
+    tcg_temp_free(ea);
+    tcg_temp_free_i64(xt);
+
+    return true;
+}
+
+static bool do_lstxsd_DS(DisasContext *ctx, arg_D *a, bool store)
+{
+    return do_lstxsd(ctx, a->rt, a->ra, tcg_constant_tl(a->si), store);
+}
+
+static bool do_plstxsd_PLS_D(DisasContext *ctx, arg_PLS_D *a, bool store)
+{
+    arg_D d;
+
+    if (!resolve_PLS_D(ctx, &d, a)) {
+        return true;
+    }
+
+    return do_lstxsd(ctx, d.rt, d.ra, tcg_constant_tl(d.si), store);
+}
+
+TRANS_FLAGS2(ISA300, LXSD, do_lstxsd_DS, false)
+TRANS_FLAGS2(ISA300, STXSD, do_lstxsd_DS, true)
 TRANS_FLAGS2(ISA300, STXV, do_lstxv_D, true, false)
 TRANS_FLAGS2(ISA300, LXV, do_lstxv_D, false, false)
 TRANS_FLAGS2(ISA310, STXVP, do_lstxv_D, true, true)
@@ -2289,6 +2338,8 @@ TRANS_FLAGS2(ISA300, STXVX, do_lstxv_X, true, false)
 TRANS_FLAGS2(ISA300, LXVX, do_lstxv_X, false, false)
 TRANS_FLAGS2(ISA310, STXVPX, do_lstxv_X, true, true)
 TRANS_FLAGS2(ISA310, LXVPX, do_lstxv_X, false, true)
+TRANS64_FLAGS2(ISA310, PLXSD, do_plstxsd_PLS_D, false)
+TRANS64_FLAGS2(ISA310, PSTXSD, do_plstxsd_PLS_D, true)
 TRANS64_FLAGS2(ISA310, PSTXV, do_lstxv_PLS_D, true, false)
 TRANS64_FLAGS2(ISA310, PLXV, do_lstxv_PLS_D, false, false)
 TRANS64_FLAGS2(ISA310, PSTXVP, do_lstxv_PLS_D, true, true)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 46/47] target/ppc: implement plxssp/pstxssp
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (44 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 45/47] target/ppc: implement plxsd/pstxsd matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-23  3:16   ` Richard Henderson
  2022-02-22 14:36 ` [PATCH v4 47/47] target/ppc: implement lxvr[bhwd]/stxvr[bhwd]x matheus.ferst
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: Leandro Lupori, danielhb413, richard.henderson, groug, clg,
	Matheus Ferst, david

From: Leandro Lupori <leandro.lupori@eldorado.org.br>

Implement instructions plxssp/pstxssp and port lxssp/stxssp to
decode tree.

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/insn32.decode            |  2 +
 target/ppc/insn64.decode            |  6 ++
 target/ppc/translate.c              | 29 +++------
 target/ppc/translate/vsx-impl.c.inc | 93 +++++++++++++++--------------
 4 files changed, 62 insertions(+), 68 deletions(-)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index d84ff333ec..5d3cfadfc6 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -602,6 +602,8 @@ VCLRRB          000100 ..... ..... ..... 00111001101    @VX
 
 LXSD            111001 ..... ..... .............. 10    @DS
 STXSD           111101 ..... ..... .............. 10    @DS
+LXSSP           111001 ..... ..... .............. 11    @DS
+STXSSP          111101 ..... ..... .............. 11    @DS
 LXV             111101 ..... ..... ............ . 001   @DQ_TSX
 STXV            111101 ..... ..... ............ . 101   @DQ_TSX
 LXVP            000110 ..... ..... ............ 0000    @DQ_TSXP
diff --git a/target/ppc/insn64.decode b/target/ppc/insn64.decode
index b7426f5b24..691e8fe6c0 100644
--- a/target/ppc/insn64.decode
+++ b/target/ppc/insn64.decode
@@ -190,6 +190,12 @@ PLXSD           000001 00 0--.-- .................. \
 PSTXSD          000001 00 0--.-- .................. \
                 101110 ..... ..... ................     @8LS_D
 
+PLXSSP          000001 00 0--.-- .................. \
+                101011 ..... ..... ................     @8LS_D
+
+PSTXSSP         000001 00 0--.-- .................. \
+                101111 ..... ..... ................     @8LS_D
+
 PLXV            000001 00 0--.-- .................. \
                 11001 ...... ..... ................     @8LS_D_TSX
 PSTXV           000001 00 0--.-- .................. \
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index aa860d6bf9..589ed8b7c1 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -6668,39 +6668,24 @@ static bool resolve_PLS_D(DisasContext *ctx, arg_D *d, arg_PLS_D *a)
 
 #include "translate/branch-impl.c.inc"
 
-/* Handles lfdp, lxssp */
+/* Handles lfdp */
 static void gen_dform39(DisasContext *ctx)
 {
-    switch (ctx->opcode & 0x3) {
-    case 0: /* lfdp */
+    if ((ctx->opcode & 0x3) == 0) {
         if (ctx->insns_flags2 & PPC2_ISA205) {
             return gen_lfdp(ctx);
         }
-        break;
-    case 3: /* lxssp */
-        if (ctx->insns_flags2 & PPC2_ISA300) {
-            return gen_lxssp(ctx);
-        }
-        break;
     }
     return gen_invalid(ctx);
 }
 
-/* handles stfdp, lxv, stxssp lxvx */
+/* Handles stfdp */
 static void gen_dform3D(DisasContext *ctx)
 {
-    if ((ctx->opcode & 3) != 1) { /* DS-FORM */
-        switch (ctx->opcode & 0x3) {
-        case 0: /* stfdp */
-            if (ctx->insns_flags2 & PPC2_ISA205) {
-                return gen_stfdp(ctx);
-            }
-            break;
-        case 3: /* stxssp */
-            if (ctx->insns_flags2 & PPC2_ISA300) {
-                return gen_stxssp(ctx);
-            }
-            break;
+    if ((ctx->opcode & 3) == 0) { /* DS-FORM */
+        /* stfdp */
+        if (ctx->insns_flags2 & PPC2_ISA205) {
+            return gen_stfdp(ctx);
         }
     }
     return gen_invalid(ctx);
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index cabadcf106..48a398da0e 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -288,29 +288,6 @@ VSX_VECTOR_LOAD_STORE_LENGTH(stxvl)
 VSX_VECTOR_LOAD_STORE_LENGTH(stxvll)
 #endif
 
-#define VSX_LOAD_SCALAR_DS(name, operation)                       \
-static void gen_##name(DisasContext *ctx)                         \
-{                                                                 \
-    TCGv EA;                                                      \
-    TCGv_i64 xth;                                                 \
-                                                                  \
-    if (unlikely(!ctx->altivec_enabled)) {                        \
-        gen_exception(ctx, POWERPC_EXCP_VPU);                     \
-        return;                                                   \
-    }                                                             \
-    xth = tcg_temp_new_i64();                                     \
-    gen_set_access_type(ctx, ACCESS_INT);                         \
-    EA = tcg_temp_new();                                          \
-    gen_addr_imm_index(ctx, EA, 0x03);                            \
-    gen_qemu_##operation(ctx, xth, EA);                           \
-    set_cpu_vsr(rD(ctx->opcode) + 32, xth, true);                 \
-    /* NOTE: cpu_vsrl is undefined */                             \
-    tcg_temp_free(EA);                                            \
-    tcg_temp_free_i64(xth);                                       \
-}
-
-VSX_LOAD_SCALAR_DS(lxssp, ld32fs)
-
 #define VSX_STORE_SCALAR(name, operation)                     \
 static void gen_##name(DisasContext *ctx)                     \
 {                                                             \
@@ -460,29 +437,6 @@ static void gen_stxvb16x(DisasContext *ctx)
     tcg_temp_free_i64(xsl);
 }
 
-#define VSX_STORE_SCALAR_DS(name, operation)                      \
-static void gen_##name(DisasContext *ctx)                         \
-{                                                                 \
-    TCGv EA;                                                      \
-    TCGv_i64 xth;                                                 \
-                                                                  \
-    if (unlikely(!ctx->altivec_enabled)) {                        \
-        gen_exception(ctx, POWERPC_EXCP_VPU);                     \
-        return;                                                   \
-    }                                                             \
-    xth = tcg_temp_new_i64();                                     \
-    get_cpu_vsr(xth, rD(ctx->opcode) + 32, true);                 \
-    gen_set_access_type(ctx, ACCESS_INT);                         \
-    EA = tcg_temp_new();                                          \
-    gen_addr_imm_index(ctx, EA, 0x03);                            \
-    gen_qemu_##operation(ctx, xth, EA);                           \
-    /* NOTE: cpu_vsrl is undefined */                             \
-    tcg_temp_free(EA);                                            \
-    tcg_temp_free_i64(xth);                                       \
-}
-
-VSX_STORE_SCALAR_DS(stxssp, st32fs)
-
 static void gen_mfvsrwz(DisasContext *ctx)
 {
     if (xS(ctx->opcode) < 32) {
@@ -2328,8 +2282,53 @@ static bool do_plstxsd_PLS_D(DisasContext *ctx, arg_PLS_D *a, bool store)
     return do_lstxsd(ctx, d.rt, d.ra, tcg_constant_tl(d.si), store);
 }
 
+static bool do_lstxssp(DisasContext *ctx, int rt, int ra, TCGv displ, bool store)
+{
+    TCGv ea;
+    TCGv_i64 xt;
+
+    REQUIRE_VECTOR(ctx);
+
+    xt = tcg_temp_new_i64();
+
+    gen_set_access_type(ctx, ACCESS_INT);
+    ea = do_ea_calc(ctx, ra, displ);
+
+    if (store) {
+        get_cpu_vsr(xt, rt + 32, true);
+        gen_qemu_st32fs(ctx, xt, ea);
+    } else {
+        gen_qemu_ld32fs(ctx, xt, ea);
+        set_cpu_vsr(rt + 32, xt, true);
+        set_cpu_vsr(rt + 32, tcg_constant_i64(0), false);
+    }
+
+    tcg_temp_free(ea);
+    tcg_temp_free_i64(xt);
+
+    return true;
+}
+
+static bool do_lstxssp_DS(DisasContext *ctx, arg_D *a, bool store)
+{
+    return do_lstxssp(ctx, a->rt, a->ra, tcg_constant_tl(a->si), store);
+}
+
+static bool do_plstxssp_PLS_D(DisasContext *ctx, arg_PLS_D *a, bool store)
+{
+    arg_D d;
+
+    if (!resolve_PLS_D(ctx, &d, a)) {
+        return true;
+    }
+
+    return do_lstxssp(ctx, d.rt, d.ra, tcg_constant_tl(d.si), store);
+}
+
 TRANS_FLAGS2(ISA300, LXSD, do_lstxsd_DS, false)
 TRANS_FLAGS2(ISA300, STXSD, do_lstxsd_DS, true)
+TRANS_FLAGS2(ISA300, LXSSP, do_lstxssp_DS, false)
+TRANS_FLAGS2(ISA300, STXSSP, do_lstxssp_DS, true)
 TRANS_FLAGS2(ISA300, STXV, do_lstxv_D, true, false)
 TRANS_FLAGS2(ISA300, LXV, do_lstxv_D, false, false)
 TRANS_FLAGS2(ISA310, STXVP, do_lstxv_D, true, true)
@@ -2340,6 +2339,8 @@ TRANS_FLAGS2(ISA310, STXVPX, do_lstxv_X, true, true)
 TRANS_FLAGS2(ISA310, LXVPX, do_lstxv_X, false, true)
 TRANS64_FLAGS2(ISA310, PLXSD, do_plstxsd_PLS_D, false)
 TRANS64_FLAGS2(ISA310, PSTXSD, do_plstxsd_PLS_D, true)
+TRANS64_FLAGS2(ISA310, PLXSSP, do_plstxssp_PLS_D, false)
+TRANS64_FLAGS2(ISA310, PSTXSSP, do_plstxssp_PLS_D, true)
 TRANS64_FLAGS2(ISA310, PSTXV, do_lstxv_PLS_D, true, false)
 TRANS64_FLAGS2(ISA310, PLXV, do_lstxv_PLS_D, false, false)
 TRANS64_FLAGS2(ISA310, PSTXVP, do_lstxv_PLS_D, true, true)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v4 47/47] target/ppc: implement lxvr[bhwd]/stxvr[bhwd]x
  2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
                   ` (45 preceding siblings ...)
  2022-02-22 14:36 ` [PATCH v4 46/47] target/ppc: implement plxssp/pstxssp matheus.ferst
@ 2022-02-22 14:36 ` matheus.ferst
  2022-02-23  3:23   ` Richard Henderson
  46 siblings, 1 reply; 97+ messages in thread
From: matheus.ferst @ 2022-02-22 14:36 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: danielhb413, richard.henderson, groug, clg, Matheus Ferst,
	Lucas Coutinho, david

From: Lucas Coutinho <lucas.coutinho@eldorado.org.br>

Implement the following PowerISA v3.1 instuctions:
lxvrbx: Load VSX Vector Rightmost Byte Indexed X-form
lxvrhx: Load VSX Vector Rightmost Halfword Indexed X-form
lxvrwx: Load VSX Vector Rightmost Word Indexed X-form
lxvrdx: Load VSX Vector Rightmost Doubleword Indexed X-form

stxvrbx: Store VSX Vector Rightmost Byte Indexed X-form
stxvrhx: Store VSX Vector Rightmost Halfword Indexed X-form
stxvrwx: Store VSX Vector Rightmost Word Indexed X-form
stxvrdx: Store VSX Vector Rightmost Doubleword Indexed X-form

Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/insn32.decode            |  8 +++++++
 target/ppc/translate/vsx-impl.c.inc | 35 +++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 5d3cfadfc6..00c825b856 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -612,6 +612,14 @@ LXVX            011111 ..... ..... ..... 0100 - 01100 . @X_TSX
 STXVX           011111 ..... ..... ..... 0110001100 .   @X_TSX
 LXVPX           011111 ..... ..... ..... 0101001101 -   @X_TSXP
 STXVPX          011111 ..... ..... ..... 0111001101 -   @X_TSXP
+LXVRBX          011111 ..... ..... ..... 0000001101 .   @X_TSX
+LXVRHX          011111 ..... ..... ..... 0000101101 .   @X_TSX
+LXVRWX          011111 ..... ..... ..... 0001001101 .   @X_TSX
+LXVRDX          011111 ..... ..... ..... 0001101101 .   @X_TSX
+STXVRBX         011111 ..... ..... ..... 0010001101 .   @X_TSX
+STXVRHX         011111 ..... ..... ..... 0010101101 .   @X_TSX
+STXVRWX         011111 ..... ..... ..... 0011001101 .   @X_TSX
+STXVRDX         011111 ..... ..... ..... 0011101101 .   @X_TSX
 
 ## VSX Scalar Multiply-Add Instructions
 
diff --git a/target/ppc/translate/vsx-impl.c.inc b/target/ppc/translate/vsx-impl.c.inc
index 48a398da0e..55a4a9bd27 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2346,6 +2346,41 @@ TRANS64_FLAGS2(ISA310, PLXV, do_lstxv_PLS_D, false, false)
 TRANS64_FLAGS2(ISA310, PSTXVP, do_lstxv_PLS_D, true, true)
 TRANS64_FLAGS2(ISA310, PLXVP, do_lstxv_PLS_D, false, true)
 
+static bool do_lstrm(DisasContext *ctx, arg_X *a, MemOp mop, bool store)
+{
+    TCGv ea;
+    TCGv_i64 xt;
+
+    REQUIRE_VSX(ctx);
+
+    xt = tcg_temp_new_i64();
+
+    gen_set_access_type(ctx, ACCESS_INT);
+    ea = do_ea_calc(ctx, a->ra , cpu_gpr[a->rb]);
+
+    if (store) {
+        get_cpu_vsr(xt, a->rt, false);
+        tcg_gen_qemu_st_i64(xt, ea, ctx->mem_idx, mop);
+    } else {
+        tcg_gen_qemu_ld_i64(xt, ea, ctx->mem_idx, mop);
+        set_cpu_vsr(a->rt, xt, false);
+        set_cpu_vsr(a->rt, tcg_const_i64(0), true);
+    }
+
+    tcg_temp_free(ea);
+    tcg_temp_free_i64(xt);
+    return true;
+}
+
+TRANS_FLAGS2(ISA310, LXVRBX, do_lstrm, DEF_MEMOP(MO_UB), false)
+TRANS_FLAGS2(ISA310, LXVRHX, do_lstrm, DEF_MEMOP(MO_UW), false)
+TRANS_FLAGS2(ISA310, LXVRWX, do_lstrm, DEF_MEMOP(MO_UL), false)
+TRANS_FLAGS2(ISA310, LXVRDX, do_lstrm, DEF_MEMOP(MO_UQ), false)
+TRANS_FLAGS2(ISA310, STXVRBX, do_lstrm, DEF_MEMOP(MO_UB), true)
+TRANS_FLAGS2(ISA310, STXVRHX, do_lstrm, DEF_MEMOP(MO_UW), true)
+TRANS_FLAGS2(ISA310, STXVRWX, do_lstrm, DEF_MEMOP(MO_UL), true)
+TRANS_FLAGS2(ISA310, STXVRDX, do_lstrm, DEF_MEMOP(MO_UQ), true)
+
 static void gen_xxeval_i64(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b, TCGv_i64 c,
                            int64_t imm)
 {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 02/47] target/ppc: moved vector even and odd multiplication to decodetree
  2022-02-22 14:36 ` [PATCH v4 02/47] target/ppc: moved vector even and odd multiplication to decodetree matheus.ferst
@ 2022-02-22 18:19   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 18:19 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: Lucas Mateus Castro (alqotel),
	danielhb413, groug, Lucas Mateus Castro, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: "Lucas Mateus Castro (alqotel)"<lucas.castro@eldorado.org.br>
> 
> Moved the instructions vmulesb, vmulosb, vmuleub, vmuloub,
> vmulesh, vmulosh, vmuleuh, vmulouh, vmulesw, vmulosw,
> muleuw and vmulouw from legacy to decodetree. Implemented
> the instructions vmulesd, vmulosd, vmuleud, vmuloud.
> 
> Signed-off-by: Lucas Mateus Castro (alqotel)<lucas.araujo@eldorado.org.br>
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/helper.h                 | 28 +++++++++-------
>   target/ppc/insn32.decode            | 22 ++++++++++++
>   target/ppc/int_helper.c             | 36 ++++++++++++++------
>   target/ppc/translate/vmx-impl.c.inc | 52 +++++++++++++++++++----------
>   target/ppc/translate/vmx-ops.c.inc  | 15 ++-------
>   tcg/ppc/tcg-target.c.inc            |  6 ++++
>   6 files changed, 107 insertions(+), 52 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


> +void helper_VMULESD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
> +{
> +    muls64(&r->VsrD(1), &r->VsrD(0), a->VsrSD(0), b->VsrSD(0));
> +}
> +void helper_VMULOSD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
> +{
> +    muls64(&r->VsrD(1), &r->VsrD(0), a->VsrSD(1), b->VsrSD(1));
> +}
> +void helper_VMULEUD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
> +{
> +    mulu64(&r->VsrD(1), &r->VsrD(0), a->VsrD(0), b->VsrD(0));
> +}
> +void helper_VMULOUD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
> +{
> +    mulu64(&r->VsrD(1), &r->VsrD(0), a->VsrD(1), b->VsrD(1));
> +}

Did I mention before that these are trivially implemented inline?


r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 03/47] target/ppc: Moved vector multiply high and low to decodetree
  2022-02-22 14:36 ` [PATCH v4 03/47] target/ppc: Moved vector multiply high and low " matheus.ferst
@ 2022-02-22 18:19   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 18:19 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: Lucas Mateus Castro (alqotel),
	danielhb413, groug, Lucas Mateus Castro, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: "Lucas Mateus Castro (alqotel)"<lucas.castro@eldorado.org.br>
> 
> Moved instructions vmulld, vmulhuw, vmulhsw, vmulhud and vmulhsd to
> decodetree
> 
> Signed-off-by: Lucas Mateus Castro (alqotel)<lucas.araujo@eldorado.org.br>
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/helper.h                 |  8 ++++----
>   target/ppc/insn32.decode            |  6 ++++++
>   target/ppc/int_helper.c             |  8 ++++----
>   target/ppc/translate/vmx-impl.c.inc | 21 ++++++++++++++++-----
>   target/ppc/translate/vmx-ops.c.inc  |  5 -----
>   5 files changed, 30 insertions(+), 18 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 04/47] target/ppc: vmulh* instructions without helpers
  2022-02-22 14:36 ` [PATCH v4 04/47] target/ppc: vmulh* instructions without helpers matheus.ferst
@ 2022-02-22 18:23   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 18:23 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: Lucas Mateus Castro (alqotel),
	danielhb413, groug, Lucas Mateus Castro, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: "Lucas Mateus Castro (alqotel)" <lucas.castro@eldorado.org.br>
> 
> Changed vmulhuw, vmulhud, vmulhsw, vmulhsd to not
> use helpers.
> 
> Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
> ---
> Changes in v4:
> Changed from gvec to i64, this resulted in a better performance on
> a Power host for all 4 instructions and a better performance for
> vmulhsw and vmulhuw in x86, but a worse performance for vmulhsd and
> vmulhud in a x86 host.

Unsurprising.

> +static void do_vx_vmulhd_i64(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b, bool sign)
> +{
> +    TCGv_i64 a1, b1, mask, w, k;
> +    void (*tcg_gen_shift_imm)(TCGv_i64, TCGv_i64, int64_t);
> +
> +    a1 = tcg_temp_new_i64();
> +    b1 = tcg_temp_new_i64();
> +    w  = tcg_temp_new_i64();
> +    k  = tcg_temp_new_i64();
> +    mask = tcg_temp_new_i64();
> +    if (sign) {
> +        tcg_gen_shift_imm = tcg_gen_sari_i64;
> +    } else {
> +        tcg_gen_shift_imm = tcg_gen_shri_i64;
> +    }
> +
> +    tcg_gen_movi_i64(mask, 0xFFFFFFFF);
> +    tcg_gen_and_i64(a1, a, mask);
> +    tcg_gen_and_i64(b1, b, mask);
> +    tcg_gen_mul_i64(t, a1, b1);
> +    tcg_gen_shri_i64(k, t, 32);
> +
> +    tcg_gen_shift_imm(a1, a, 32);
> +    tcg_gen_mul_i64(t, a1, b1);
> +    tcg_gen_add_i64(t, t, k);
> +    tcg_gen_and_i64(k, t, mask);
> +    tcg_gen_shift_imm(w, t, 32);
> +
> +    tcg_gen_and_i64(a1, a, mask);
> +    tcg_gen_shift_imm(b1, b, 32);
> +    tcg_gen_mul_i64(t, a1, b1);
> +    tcg_gen_add_i64(t, t, k);
> +    tcg_gen_shift_imm(k, t, 32);
> +
> +    tcg_gen_shift_imm(a1, a, 32);
> +    tcg_gen_mul_i64(t, a1, b1);
> +    tcg_gen_add_i64(t, t, w);
> +    tcg_gen_add_i64(t, t, k);

You should be using tcg_gen_mul{s,u}2_i64 instead of open-coding the high-part multiplication.

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 05/47] target/ppc: Implement vmsumcud instruction
  2022-02-22 14:36 ` [PATCH v4 05/47] target/ppc: Implement vmsumcud instruction matheus.ferst
@ 2022-02-22 18:28   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 18:28 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: Víctor Colombo, groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Víctor Colombo<victor.colombo@eldorado.org.br>
> 
> Based on [1] by Lijun Pan<ljp@linux.ibm.com>, which was never merged
> into master.
> 
> [1]:https://lists.gnu.org/archive/html/qemu-ppc/2020-07/msg00419.html
> 
> Signed-off-by: Víctor Colombo<victor.colombo@eldorado.org.br>
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
> Changes in v4:
> 
> Fixed dead move into tmp1
> ---
>   target/ppc/insn32.decode            |  4 +++
>   target/ppc/translate/vmx-impl.c.inc | 53 +++++++++++++++++++++++++++++
>   2 files changed, 57 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 07/47] target/ppc: Move vexts[bhw]2[wd] to decodetree
  2022-02-22 14:36 ` [PATCH v4 07/47] target/ppc: Move vexts[bhw]2[wd] to decodetree matheus.ferst
@ 2022-02-22 18:34   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 18:34 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: Lucas Coutinho, groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> +static void gen_vexts_i64(TCGv_i64 t, TCGv_i64 b, int64_t s)
> +{
> +    tcg_gen_shli_i64(t, b, s);
> +    tcg_gen_sari_i64(t, t, s);
> +}
> +
> +static void gen_vexts_i32(TCGv_i32 t, TCGv_i32 b, int32_t s)
> +{
> +    tcg_gen_shli_i32(t, b, s);
> +    tcg_gen_sari_i32(t, t, s);
> +}

tcg_gen_sextract_*.

With that,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 09/47] target/ppc: Move Vector Compare Equal/Not Equal/Greater Than to decodetree
  2022-02-22 14:36 ` [PATCH v4 09/47] target/ppc: Move Vector Compare Equal/Not Equal/Greater Than to decodetree matheus.ferst
@ 2022-02-22 18:37   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 18:37 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst<matheus.ferst@eldorado.org.br>
> 
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/helper.h                 | 30 ----------
>   target/ppc/insn32.decode            | 24 ++++++++
>   target/ppc/int_helper.c             | 54 -----------------
>   target/ppc/translate/vmx-impl.c.inc | 89 ++++++++++++++++++++---------
>   target/ppc/translate/vmx-ops.c.inc  | 15 +----
>   5 files changed, 88 insertions(+), 124 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 10/47] target/ppc: Move Vector Compare Not Equal or Zero to decodetree
  2022-02-22 14:36 ` [PATCH v4 10/47] target/ppc: Move Vector Compare Not Equal or Zero " matheus.ferst
@ 2022-02-22 19:04   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 19:04 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst<matheus.ferst@eldorado.org.br>
> 
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/helper.h                 |  9 ++--
>   target/ppc/insn32.decode            |  4 ++
>   target/ppc/int_helper.c             | 50 +++++-----------------
>   target/ppc/translate/vmx-impl.c.inc | 66 +++++++++++++++++++++++++++--
>   target/ppc/translate/vmx-ops.c.inc  |  3 --
>   5 files changed, 80 insertions(+), 52 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 11/47] target/ppc: Implement Vector Compare Equal Quadword
  2022-02-22 14:36 ` [PATCH v4 11/47] target/ppc: Implement Vector Compare Equal Quadword matheus.ferst
@ 2022-02-22 19:05   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 19:05 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst<matheus.ferst@eldorado.org.br>
> 
> Implement the following PowerISA v3.1 instructions:
> vcmpequq: Vector Compare Equal Quadword
> 
> Suggested-by: Richard Henderson<richard.henderson@linaro.org>
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
> v4:
>   - Branchless implementation (rth)
> ---
>   target/ppc/insn32.decode            |  1 +
>   target/ppc/translate/vmx-impl.c.inc | 36 +++++++++++++++++++++++++++++
>   2 files changed, 37 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 12/47] target/ppc: Implement Vector Compare Greater Than Quadword
  2022-02-22 14:36 ` [PATCH v4 12/47] target/ppc: Implement Vector Compare Greater Than Quadword matheus.ferst
@ 2022-02-22 19:07   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 19:07 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst<matheus.ferst@eldorado.org.br>
> 
> Implement the following PowerISA v3.1 instructions:
> vcmpgtsq: Vector Compare Greater Than Signed Quadword
> vcmpgtuq: Vector Compare Greater Than Unsigned Quadword
> 
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
> v4:
>   - Branchless implementation (rth)
> ---
>   target/ppc/insn32.decode            |  2 ++
>   target/ppc/translate/vmx-impl.c.inc | 39 +++++++++++++++++++++++++++++
>   2 files changed, 41 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 14/47] target/ppc: implement vstri[bh][lr]
  2022-02-22 14:36 ` [PATCH v4 14/47] target/ppc: implement vstri[bh][lr] matheus.ferst
@ 2022-02-22 19:13   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 19:13 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
> 
> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
> ---
> v4:
>   - vstri helpers return CR field (rth)
> ---
>   target/ppc/helper.h                 |  4 ++++
>   target/ppc/insn32.decode            | 10 ++++++++++
>   target/ppc/int_helper.c             | 28 +++++++++++++++++++++++++++
>   target/ppc/translate/vmx-impl.c.inc | 30 +++++++++++++++++++++++++++++
>   4 files changed, 72 insertions(+)
> 
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index 303a29fb5a..269150b197 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -211,6 +211,10 @@ DEF_HELPER_4(VINSBLX, void, env, avr, i64, tl)
>   DEF_HELPER_4(VINSHLX, void, env, avr, i64, tl)
>   DEF_HELPER_4(VINSWLX, void, env, avr, i64, tl)
>   DEF_HELPER_4(VINSDLX, void, env, avr, i64, tl)
> +DEF_HELPER_2(VSTRIBL, i32, avr, avr)
> +DEF_HELPER_2(VSTRIBR, i32, avr, avr)
> +DEF_HELPER_2(VSTRIHL, i32, avr, avr)
> +DEF_HELPER_2(VSTRIHR, i32, avr, avr)

Oh, DEF_HELPER_FLAGS_2 with TCG_CALL_NO_RWG.
I should have thought of this wrt the other helpers you're touching in this series -- 
those that only modify vector registers should use this.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 15/47] target/ppc: implement vclrlb
  2022-02-22 14:36 ` [PATCH v4 15/47] target/ppc: implement vclrlb matheus.ferst
@ 2022-02-22 19:15   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 19:15 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> +static bool trans_VCLRLB(DisasContext *ctx, arg_VX *a)
> +{
> +    TCGv_i64 rb, mh, ml, tmp,
> +             ones = tcg_constant_i64(-1),
> +             zero = tcg_constant_i64(0);
> +
> +    rb = tcg_temp_new_i64();
> +    mh = tcg_temp_new_i64();
> +    ml = tcg_temp_new_i64();
> +    tmp = tcg_temp_new_i64();
> +
> +    tcg_gen_extu_tl_i64(rb, cpu_gpr[a->vrb]);
> +    tcg_gen_andi_i64(tmp, rb, 7);
> +    tcg_gen_shli_i64(tmp, tmp, 3);
> +    tcg_gen_shl_i64(tmp, tcg_constant_i64(-1), tmp);

Reuse ones here.  Otherwise,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 16/47] target/ppc: implement vclrrb
  2022-02-22 14:36 ` [PATCH v4 16/47] target/ppc: implement vclrrb matheus.ferst
@ 2022-02-22 19:17   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 19:17 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst<matheus.ferst@eldorado.org.br>
> 
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/insn32.decode            |  1 +
>   target/ppc/translate/vmx-impl.c.inc | 32 +++++++++++++++++++++--------
>   2 files changed, 25 insertions(+), 8 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 18/47] target/ppc: implement vgnb
  2022-02-22 14:36 ` [PATCH v4 18/47] target/ppc: implement vgnb matheus.ferst
@ 2022-02-22 21:58   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 21:58 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst<matheus.ferst@eldorado.org.br>
> 
> Suggested-by: Richard Henderson<richard.henderson@linaro.org>
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
> v4:
>   - Optimized implementation (rth)
> ---
>   target/ppc/insn32.decode            |   5 ++
>   target/ppc/translate/vmx-impl.c.inc | 135 ++++++++++++++++++++++++++++
>   2 files changed, 140 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 19/47] target/ppc: move vs[lr][a][bhwd] to decodetree
  2022-02-22 14:36 ` [PATCH v4 19/47] target/ppc: move vs[lr][a][bhwd] to decodetree matheus.ferst
@ 2022-02-22 22:01   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 22:01 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst<matheus.ferst@eldorado.org.br>
> 
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
> v4:
>   -  New in v4.
> ---
>   target/ppc/insn32.decode            | 17 ++++++++++++
>   target/ppc/translate/vmx-impl.c.inc | 41 +++++++++++++++++++----------
>   target/ppc/translate/vmx-ops.c.inc  | 13 +--------
>   3 files changed, 45 insertions(+), 26 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 20/47] target/ppc: implement vslq
  2022-02-22 14:36 ` [PATCH v4 20/47] target/ppc: implement vslq matheus.ferst
@ 2022-02-22 22:14   ` Richard Henderson
  2022-02-23 21:53     ` Matheus K. Ferst
  0 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 22:14 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
> 
> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
> ---
> v4:
>   -  New in v4.
> ---
>   target/ppc/insn32.decode            |  1 +
>   target/ppc/translate/vmx-impl.c.inc | 40 +++++++++++++++++++++++++++++
>   2 files changed, 41 insertions(+)
> 
> diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
> index 88baebe35e..3799065508 100644
> --- a/target/ppc/insn32.decode
> +++ b/target/ppc/insn32.decode
> @@ -473,6 +473,7 @@ VSLB            000100 ..... ..... ..... 00100000100    @VX
>   VSLH            000100 ..... ..... ..... 00101000100    @VX
>   VSLW            000100 ..... ..... ..... 00110000100    @VX
>   VSLD            000100 ..... ..... ..... 10111000100    @VX
> +VSLQ            000100 ..... ..... ..... 00100000101    @VX
>   
>   VSRB            000100 ..... ..... ..... 01000000100    @VX
>   VSRH            000100 ..... ..... ..... 01001000100    @VX
> diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
> index ec4f0e7654..ca98a545ef 100644
> --- a/target/ppc/translate/vmx-impl.c.inc
> +++ b/target/ppc/translate/vmx-impl.c.inc
> @@ -834,6 +834,46 @@ TRANS_FLAGS(ALTIVEC, VSRAH, do_vector_gvec3_VX, MO_16, tcg_gen_gvec_sarv);
>   TRANS_FLAGS(ALTIVEC, VSRAW, do_vector_gvec3_VX, MO_32, tcg_gen_gvec_sarv);
>   TRANS_FLAGS2(ALTIVEC_207, VSRAD, do_vector_gvec3_VX, MO_64, tcg_gen_gvec_sarv);
>   
> +static bool trans_VSLQ(DisasContext *ctx, arg_VX *a)
> +{
> +    TCGv_i64 hi, lo, tmp, n, sf = tcg_constant_i64(64);
> +
> +    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
> +    REQUIRE_VECTOR(ctx);
> +
> +    n = tcg_temp_new_i64();
> +    hi = tcg_temp_new_i64();
> +    lo = tcg_temp_new_i64();
> +    tmp = tcg_const_i64(0);
> +
> +    get_avr64(lo, a->vra, false);
> +    get_avr64(hi, a->vra, true);
> +
> +    get_avr64(n, a->vrb, true);
> +    tcg_gen_andi_i64(n, n, 0x7F);
> +
> +    tcg_gen_movcond_i64(TCG_COND_GE, hi, n, sf, lo, hi);
> +    tcg_gen_movcond_i64(TCG_COND_GE, lo, n, sf, tmp, lo);

Since you have to mask twice anyway, better use (n & 64) != 0.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 21/47] target/ppc: implement vsrq
  2022-02-22 14:36 ` [PATCH v4 21/47] target/ppc: implement vsrq matheus.ferst
@ 2022-02-22 22:15   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 22:15 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst<matheus.ferst@eldorado.org.br>
> 
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
> v4:
>   -  New in v4.
> ---
>   target/ppc/insn32.decode            |  1 +
>   target/ppc/translate/vmx-impl.c.inc | 40 +++++++++++++++++++++--------
>   2 files changed, 31 insertions(+), 10 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 22/47] target/ppc: implement vsraq
  2022-02-22 14:36 ` [PATCH v4 22/47] target/ppc: implement vsraq matheus.ferst
@ 2022-02-22 22:19   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 22:19 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
> 
> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
> ---
> v4:
>   -  New in v4.
> ---
>   target/ppc/insn32.decode            |  1 +
>   target/ppc/translate/vmx-impl.c.inc | 17 +++++++++++++----
>   2 files changed, 14 insertions(+), 4 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 23/47] target/ppc: move vrl[bhwd] to decodetree
  2022-02-22 14:36 ` [PATCH v4 23/47] target/ppc: move vrl[bhwd] to decodetree matheus.ferst
@ 2022-02-22 22:20   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 22:20 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst<matheus.ferst@eldorado.org.br>
> 
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
> v4:
>   -  New in v4.
> ---
>   target/ppc/insn32.decode            |  5 +++++
>   target/ppc/translate/vmx-impl.c.inc | 13 +++++--------
>   target/ppc/translate/vmx-ops.c.inc  |  6 ++----
>   3 files changed, 12 insertions(+), 12 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 24/47] target/ppc: move vrl[bhwd]nm/vrl[bhwd]mi to decodetree
  2022-02-22 14:36 ` [PATCH v4 24/47] target/ppc: move vrl[bhwd]nm/vrl[bhwd]mi " matheus.ferst
@ 2022-02-22 22:30   ` Richard Henderson
  2022-02-23 21:43     ` Matheus K. Ferst
  0 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 22:30 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> +static void gen_vrlnm_vec(unsigned vece, TCGv_vec vrt, TCGv_vec vra,
> +                          TCGv_vec vrb)
> +{
> +    TCGv_vec mask, n = tcg_temp_new_vec_matching(vrt);
> +
> +    /* Create the mask */
> +    mask = do_vrl_mask_vec(vece, vrb);
> +
> +    /* Extract n */
> +    tcg_gen_dupi_vec(vece, n, (8 << vece) - 1);
> +    tcg_gen_and_vec(vece, n, vrb, n);
> +
> +    /* Rotate and mask */
> +    tcg_gen_rotlv_vec(vece, vrt, vra, n);

Note that rotlv does the masking itself:

/*
  * Expand D = A << (B % element bits)
  *
  * Unlike scalar shifts, where it is easy for the target front end
  * to include the modulo as part of the expansion.  If the target
  * naturally includes the modulo as part of the operation, great!
  * If the target has some other behaviour from out-of-range shifts,
  * then it could not use this function anyway, and would need to
  * do it's own expansion with custom functions.
  */

> +static bool do_vrlnm(DisasContext *ctx, arg_VX *a, int vece)
> +{
> +    static const TCGOpcode vecop_list[] = {
> +        INDEX_op_cmp_vec, INDEX_op_rotlv_vec, INDEX_op_sari_vec,
> +        INDEX_op_shli_vec, INDEX_op_shri_vec, INDEX_op_shrv_vec, 0
> +    };

Where is sari used?



r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 25/47] target/ppc: implement vrlq
  2022-02-22 14:36 ` [PATCH v4 25/47] target/ppc: implement vrlq matheus.ferst
@ 2022-02-22 22:33   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 22:33 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst<matheus.ferst@eldorado.org.br>
> 
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
> v4:
>   -  New in v4.
> ---
>   target/ppc/insn32.decode            |  1 +
>   target/ppc/translate/vmx-impl.c.inc | 49 +++++++++++++++++++++++++++++
>   2 files changed, 50 insertions(+)
...
> +    tcg_gen_andi_i64(n, n, 0x7F);
> +
> +    tcg_gen_mov_i64(t0, ah);
> +    tcg_gen_movcond_i64(TCG_COND_GE, ah, n, sf, al, ah);
> +    tcg_gen_movcond_i64(TCG_COND_GE, al, n, sf, t0, al);

Similar comment re (n & 64) != 0.  Otherwise,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 26/47] target/ppc: Move vsel and vperm/vpermr to decodetree
  2022-02-22 14:36 ` [PATCH v4 26/47] target/ppc: Move vsel and vperm/vpermr to decodetree matheus.ferst
@ 2022-02-22 22:37   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 22:37 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst<matheus.ferst@eldorado.org.br>
> 
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/helper.h                 |  5 +--
>   target/ppc/insn32.decode            |  5 +++
>   target/ppc/int_helper.c             | 13 +-----
>   target/ppc/translate/vmx-impl.c.inc | 69 ++++++++++++++++++++++-------
>   target/ppc/translate/vmx-ops.c.inc  |  2 -
>   5 files changed, 62 insertions(+), 32 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 27/47] target/ppc: Move xxsel to decodetree
  2022-02-22 14:36 ` [PATCH v4 27/47] target/ppc: Move xxsel " matheus.ferst
@ 2022-02-22 22:38   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 22:38 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst<matheus.ferst@eldorado.org.br>
> 
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/insn32.decode            |  6 ++++
>   target/ppc/insn64.decode            | 24 ++++++++--------
>   target/ppc/translate/vsx-impl.c.inc | 20 ++++++--------
>   target/ppc/translate/vsx-ops.c.inc  | 43 -----------------------------
>   4 files changed, 26 insertions(+), 67 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 28/47] target/ppc: move xxperm/xxpermr to decodetree
  2022-02-22 14:36 ` [PATCH v4 28/47] target/ppc: move xxperm/xxpermr " matheus.ferst
@ 2022-02-22 22:40   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 22:40 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst<matheus.ferst@eldorado.org.br>
> 
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/fpu_helper.c             | 21 ---------------
>   target/ppc/helper.h                 |  2 --
>   target/ppc/insn32.decode            |  5 ++++
>   target/ppc/translate/vsx-impl.c.inc | 42 +++++++++++++++++++++++++++--
>   target/ppc/translate/vsx-ops.c.inc  |  2 --
>   5 files changed, 45 insertions(+), 27 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 29/47] target/ppc: Move xxpermdi to decodetree
  2022-02-22 14:36 ` [PATCH v4 29/47] target/ppc: Move xxpermdi " matheus.ferst
@ 2022-02-22 22:42   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 22:42 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst<matheus.ferst@eldorado.org.br>
> 
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/insn32.decode            |  4 ++
>   target/ppc/translate/vsx-impl.c.inc | 71 +++++++++++++----------------
>   target/ppc/translate/vsx-ops.c.inc  |  2 -
>   3 files changed, 36 insertions(+), 41 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 30/47] target/ppc: Implement xxpermx instruction
  2022-02-22 14:36 ` [PATCH v4 30/47] target/ppc: Implement xxpermx instruction matheus.ferst
@ 2022-02-22 22:46   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 22:46 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst<matheus.ferst@eldorado.org.br>
> 
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/helper.h                 |  1 +
>   target/ppc/insn64.decode            |  8 ++++++++
>   target/ppc/int_helper.c             | 20 ++++++++++++++++++++
>   target/ppc/translate/vsx-impl.c.inc | 22 ++++++++++++++++++++++
>   4 files changed, 51 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 31/47] tcg/tcg-op-gvec.c: Introduce tcg_gen_gvec_4i
  2022-02-22 14:36 ` [PATCH v4 31/47] tcg/tcg-op-gvec.c: Introduce tcg_gen_gvec_4i matheus.ferst
@ 2022-02-22 23:04   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 23:04 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst<matheus.ferst@eldorado.org.br>
> 
> Following the implementation of tcg_gen_gvec_3i, add a four-vector and
> immediate operand expansion method.
> 
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   include/tcg/tcg-op-gvec.h |  22 ++++++
>   tcg/tcg-op-gvec.c         | 146 ++++++++++++++++++++++++++++++++++++++
>   2 files changed, 168 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 32/47] target/ppc: Implement xxeval
  2022-02-22 14:36 ` [PATCH v4 32/47] target/ppc: Implement xxeval matheus.ferst
@ 2022-02-22 23:43   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 23:43 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> +    tcg_gen_movi_i64(disj, 0);

The init here means there's one more OR generated than necessary.  Though perhaps it gets 
folded away...

> +
> +    /* Iterate over set bits from the least to the most significant bit */
> +    while (imm) {
> +        /*
> +         * Get the next bit to be processed with ctz64. Invert the result of
> +         * ctz64 to match the indexing used by PowerISA.
> +         */
> +        bit = 7 - ctz64(imm);
> +        if (bit & 0x4) {
> +            tcg_gen_mov_i64(conj, a);
> +        } else {
> +            tcg_gen_not_i64(conj, a);
> +        }
> +        if (bit & 0x2) {
> +            tcg_gen_and_i64(conj, conj, b);
> +        } else {
> +            tcg_gen_andc_i64(conj, conj, b);
> +        }
> +        if (bit & 0x1) {
> +            tcg_gen_and_i64(conj, conj, c);
> +        } else {
> +            tcg_gen_andc_i64(conj, conj, c);
> +        }
> +        tcg_gen_or_i64(disj, disj, conj);
> +
> +        /* Unset the least significant bit that is set */
> +        imm &= imm - 1;

I guess this works, though it's not nearly optimal.
It's certainly a good fallback for the out-of-line function.

Table 145 has the folded equivalent functions.  Implementing all 256 of them as is, twice, 
for both i64 and vec could be tedious.  But we could cherry-pick the easiest, or most 
commonly used, or something, and let all other imm values go through to out-of-line function.


r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 33/47] target/ppc: Implement xxgenpcv[bhwd]m instruction
  2022-02-22 14:36 ` [PATCH v4 33/47] target/ppc: Implement xxgenpcv[bhwd]m instruction matheus.ferst
@ 2022-02-22 23:48   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 23:48 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> +#define XXGENPCV(NAME, SZ) \
> +void helper_##NAME(ppc_vsr_t *t, ppc_vsr_t *b, target_ulong imm)            \
> +{                                                                           \
> +    ppc_vsr_t tmp = { .u64 = { 0, 0 } };                                    \
> +                                                                            \
> +    switch (imm) {                                                          \

You should split the helper and not pass down imm.


r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 34/47] target/ppc: move xs[n]madd[am][ds]p/xs[n]msub[am][ds]p to decodetree
  2022-02-22 14:36 ` [PATCH v4 34/47] target/ppc: move xs[n]madd[am][ds]p/xs[n]msub[am][ds]p to decodetree matheus.ferst
@ 2022-02-22 23:52   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 23:52 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> +static bool do_xsmadd(DisasContext *ctx, int tgt, int src1, int src2, int src3,
> +        void (gen_helper)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr))

Missing a * before gen_helper.  Somewhat surprised this compiled...

> +static bool do_xsmadd_XX3(DisasContext *ctx, arg_XX3 *a, bool type_a,
> +        void (gen_helper)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr))

Likewise.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 35/47] target/ppc: implement xs[n]maddqp[o]/xs[n]msubqp[o]
  2022-02-22 14:36 ` [PATCH v4 35/47] target/ppc: implement xs[n]maddqp[o]/xs[n]msubqp[o] matheus.ferst
@ 2022-02-22 23:56   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-22 23:56 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst<matheus.ferst@eldorado.org.br>
> 
> Implement the following PowerISA v3.0 instuctions:
> xsmaddqp[o]: VSX Scalar Multiply-Add Quad-Precision [using round to Odd]
> xsmsubqp[o]: VSX Scalar Multiply-Subtract Quad-Precision [using round
>               to Odd]
> xsnmaddqp[o]: VSX Scalar Negative Multiply-Add Quad-Precision [using
>                round to Odd]
> xsnmsubqp[o]: VSX Scalar Negative Multiply-Subtract Quad-Precision
>                [using round to Odd]
> 
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/fpu_helper.c             | 42 +++++++++++++++++++++++++++++
>   target/ppc/helper.h                 |  9 +++++++
>   target/ppc/insn32.decode            |  4 +++
>   target/ppc/translate/vsx-impl.c.inc | 25 +++++++++++++++++
>   4 files changed, 80 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 36/47] target/ppc: Implement xvtlsbb instruction
  2022-02-22 14:36 ` [PATCH v4 36/47] target/ppc: Implement xvtlsbb instruction matheus.ferst
@ 2022-02-23  0:07   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-23  0:07 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: Víctor Colombo, groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> +        tcg_gen_and_i64(tmp, mask, xb);
> +        tcg_gen_movcond_i64(TCG_COND_EQ, all_true, tmp,
> +                            mask, all_true, zero);
> +
> +        tcg_gen_andc_i64(tmp, mask, xb);
> +        tcg_gen_movcond_i64(TCG_COND_EQ, all_false, tmp,
> +                            mask, all_false, zero);

I would unroll this and use fewer conditions.

     t0 = mask & xb[0]
     t1 = mask & xb[1]

     o2 = t0 | t1
     a2 = t0 & t1

     o2 = (o2 == 0) << 1
     a2 = (a2 == mask) << 3
     crf = o2 | a2


r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 38/47] target/ppc: Refactor VSX_SCALAR_CMP_DP
  2022-02-22 14:36 ` [PATCH v4 38/47] target/ppc: Refactor VSX_SCALAR_CMP_DP matheus.ferst
@ 2022-02-23  0:20   ` Richard Henderson
  2022-02-24 19:16     ` Víctor Colombo
  0 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2022-02-23  0:20 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: Víctor Colombo, groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Víctor Colombo <victor.colombo@eldorado.org.br>
> 
> Refactor VSX_SCALAR_CMP_DP, changing its name to VSX_SCALAR_CMP and
> prepare the helper to be used for quadword comparisons.
> 
> Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/fpu_helper.c | 31 ++++++++++++++-----------------
>   1 file changed, 14 insertions(+), 17 deletions(-)
> 
> diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
> index 9b034d1fe4..5ebbcfe3b7 100644
> --- a/target/ppc/fpu_helper.c
> +++ b/target/ppc/fpu_helper.c
> @@ -2265,28 +2265,30 @@ VSX_MADDQ(XSNMSUBQP, NMSUB_FLGS, 0)
>   VSX_MADDQ(XSNMSUBQPO, NMSUB_FLGS, 0)
>   
>   /*
> - * VSX_SCALAR_CMP_DP - VSX scalar floating point compare double precision
> + * VSX_SCALAR_CMP - VSX scalar floating point compare
>    *   op    - instruction mnemonic
> + *   tp    - type
>    *   cmp   - comparison operation
>    *   exp   - expected result of comparison
> + *   fld   - vsr_t field
>    *   svxvc - set VXVC bit
>    */
> -#define VSX_SCALAR_CMP_DP(op, cmp, exp, svxvc)                                \
> +#define VSX_SCALAR_CMP(op, tp, cmp, fld, exp, svxvc)                          \
>   void helper_##op(CPUPPCState *env, ppc_vsr_t *xt,                             \
>                    ppc_vsr_t *xa, ppc_vsr_t *xb)                                \
>   {                                                                             \
> -    ppc_vsr_t t = *xt;                                                        \
> +    ppc_vsr_t t = { };                                                        \
>       bool vxsnan_flag = false, vxvc_flag = false, vex_flag = false;            \
>                                                                                 \
> -    if (float64_is_signaling_nan(xa->VsrD(0), &env->fp_status) ||             \
> -        float64_is_signaling_nan(xb->VsrD(0), &env->fp_status)) {             \
> +    if (tp##_is_signaling_nan(xa->fld, &env->fp_status) ||                    \
> +        tp##_is_signaling_nan(xb->fld, &env->fp_status)) {                    \
>           vxsnan_flag = true;                                                   \
>           if (fpscr_ve == 0 && svxvc) {                                         \
>               vxvc_flag = true;                                                 \
>           }                                                                     \
>       } else if (svxvc) {                                                       \
> -        vxvc_flag = float64_is_quiet_nan(xa->VsrD(0), &env->fp_status) ||     \
> -            float64_is_quiet_nan(xb->VsrD(0), &env->fp_status);               \
> +        vxvc_flag = tp##_is_quiet_nan(xa->fld, &env->fp_status) ||            \
> +            tp##_is_quiet_nan(xb->fld, &env->fp_status);                      \
>       }                     

Note that this can be simplified further, using the full FloatRelation result and 
float_flag_invalid_snan.

Note that do_scalar_cmp gets half-way there, only checking for NaNs once we have 
float_relation_unordered as a comparision result.  But it could go further and check 
float_flag_invalid_snan and drop all of the other checks vs snan and qnan.


r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 39/47] target/ppc: Implement xscmp{eq,ge,gt}qp
  2022-02-22 14:36 ` [PATCH v4 39/47] target/ppc: Implement xscmp{eq,ge,gt}qp matheus.ferst
@ 2022-02-23  0:21   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-23  0:21 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: Víctor Colombo, groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Víctor Colombo<victor.colombo@eldorado.org.br>
> 
> Signed-off-by: Víctor Colombo<victor.colombo@eldorado.org.br>
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/fpu_helper.c             |  4 ++++
>   target/ppc/helper.h                 |  3 +++
>   target/ppc/insn32.decode            |  3 +++
>   target/ppc/translate/vsx-impl.c.inc | 31 +++++++++++++++++++++++++++++
>   4 files changed, 41 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 40/47] target/ppc: Move xscmp{eq, ge, gt}dp to decodetree
  2022-02-22 14:36 ` [PATCH v4 40/47] target/ppc: Move xscmp{eq,ge,gt}dp to decodetree matheus.ferst
@ 2022-02-23  0:22   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-23  0:22 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: Víctor Colombo, groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Víctor Colombo<victor.colombo@eldorado.org.br>
> 
> Signed-off-by: Víctor Colombo<victor.colombo@eldorado.org.br>
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/fpu_helper.c             |  7 +++----
>   target/ppc/helper.h                 |  6 +++---
>   target/ppc/insn32.decode            |  3 +++
>   target/ppc/translate/vsx-impl.c.inc | 28 +++++++++++++++++++++++++---
>   target/ppc/translate/vsx-ops.c.inc  |  3 ---
>   5 files changed, 34 insertions(+), 13 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 41/47] target/ppc: Move xs{max, min}[cj]dp to use do_helper_XX3
  2022-02-22 14:36 ` [PATCH v4 41/47] target/ppc: Move xs{max, min}[cj]dp to use do_helper_XX3 matheus.ferst
@ 2022-02-23  0:23   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-23  0:23 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: Víctor Colombo, groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Víctor Colombo<victor.colombo@eldorado.org.br>
> 
> Also, fixes these instructions not being capitalized.
> 
> Signed-off-by: Víctor Colombo<victor.colombo@eldorado.org.br>
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/fpu_helper.c             |  8 ++++----
>   target/ppc/helper.h                 |  8 ++++----
>   target/ppc/translate/vsx-impl.c.inc | 30 ++++-------------------------
>   3 files changed, 12 insertions(+), 34 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 42/47] target/ppc: Refactor VSX_MAX_MINC helper
  2022-02-22 14:36 ` [PATCH v4 42/47] target/ppc: Refactor VSX_MAX_MINC helper matheus.ferst
@ 2022-02-23  0:40   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-23  0:40 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: Víctor Colombo, groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> -#define VSX_MAX_MINC(name, max)                                               \
> +#define VSX_MAX_MINC(name, op, tp, fld)                                       \
>   void helper_##name(CPUPPCState *env,                                          \
>                      ppc_vsr_t *xt, ppc_vsr_t *xa, ppc_vsr_t *xb)               \
>   {                                                                             \
>       ppc_vsr_t t = { };                                                        \
>       bool vxsnan_flag = false, vex_flag = false;                               \
>                                                                                 \
> -    if (unlikely(float64_is_any_nan(xa->VsrD(0)) ||                           \
> -                 float64_is_any_nan(xb->VsrD(0)))) {                          \
> -        if (float64_is_signaling_nan(xa->VsrD(0), &env->fp_status) ||         \
> -            float64_is_signaling_nan(xb->VsrD(0), &env->fp_status)) {         \
> +    if (unlikely(tp##_is_any_nan(xa->fld) ||                                  \
> +                 tp##_is_any_nan(xb->fld))) {                                 \
> +        if (tp##_is_signaling_nan(xa->fld, &env->fp_status) ||                \
> +            tp##_is_signaling_nan(xb->fld, &env->fp_status)) {                \
>               vxsnan_flag = true;                                               \
>           }                                                                     \
> -        t.VsrD(0) = xb->VsrD(0);                                              \
> -    } else if ((max &&                                                        \
> -               !float64_lt(xa->VsrD(0), xb->VsrD(0), &env->fp_status)) ||     \
> -               (!max &&                                                       \
> -               float64_lt(xa->VsrD(0), xb->VsrD(0), &env->fp_status))) {      \
> -        t.VsrD(0) = xa->VsrD(0);                                              \
> +        t.fld = xb->fld;                                                      \
>       } else {                                                                  \
> -        t.VsrD(0) = xb->VsrD(0);                                              \
> +        t.fld = tp##_##op(xa->fld, xb->fld, &env->fp_status);                 \
>       }                                                                         \
>                                                                                 \
>       vex_flag = fpscr_ve & vxsnan_flag;                                        \

I think this would be simpler to utilize the result of the compare vs nans:

     bool first;

     if (max) {
         first = tp##_le_quiet(xb->fld, xa->fld, status);
     } else {
         first = tp##_lt_quiet(xa->fld, xb->fld, status);
     }
     if (first) {
         t.fld = xa->fld;
     } else {
         t.fld = xb->fld;
         if (flags & float_flag_invalid_snan) {
             float_invalid_op_vxsnan(env, retaddr);
         }
     }
     xt = *t;


r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 43/47] target/ppc: Implement xs{max,min}cqp
  2022-02-22 14:36 ` [PATCH v4 43/47] target/ppc: Implement xs{max,min}cqp matheus.ferst
@ 2022-02-23  0:41   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-23  0:41 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: Víctor Colombo, groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Víctor Colombo<victor.colombo@eldorado.org.br>
> 
> Signed-off-by: Víctor Colombo<victor.colombo@eldorado.org.br>
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/fpu_helper.c             | 2 ++
>   target/ppc/helper.h                 | 2 ++
>   target/ppc/insn32.decode            | 3 +++
>   target/ppc/translate/vsx-impl.c.inc | 2 ++
>   4 files changed, 9 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 44/47] target/ppc: Implement xvcvbf16spn and xvcvspbf16 instructions
  2022-02-22 14:36 ` [PATCH v4 44/47] target/ppc: Implement xvcvbf16spn and xvcvspbf16 instructions matheus.ferst
@ 2022-02-23  3:08   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-23  3:08 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: Víctor Colombo, groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Víctor Colombo <victor.colombo@eldorado.org.br>
> 
> Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/fpu_helper.c             | 21 +++++++++++++++++++
>   target/ppc/helper.h                 |  1 +
>   target/ppc/insn32.decode            | 11 +++++++---
>   target/ppc/translate/vsx-impl.c.inc | 31 ++++++++++++++++++++++++++++-
>   4 files changed, 60 insertions(+), 4 deletions(-)
> 
> diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
> index 7773333bd7..d77900fff1 100644
> --- a/target/ppc/fpu_helper.c
> +++ b/target/ppc/fpu_helper.c
> @@ -2790,6 +2790,27 @@ VSX_CVT_FP_TO_FP_HP(xscvhpdp, 1, float16, float64, VsrH(3), VsrD(0), 1)
>   VSX_CVT_FP_TO_FP_HP(xvcvsphp, 4, float32, float16, VsrW(i), VsrH(2 * i  + 1), 0)
>   VSX_CVT_FP_TO_FP_HP(xvcvhpsp, 4, float16, float32, VsrH(2 * i + 1), VsrW(i), 0)
>   
> +void helper_XVCVSPBF16(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)
> +{
> +    ppc_vsr_t t = { };
> +    int i;
> +
> +    helper_reset_fpstatus(env);
> +    for (i = 0; i < 4; i++) {
> +        if (unlikely(float32_is_signaling_nan(xb->VsrW(i), &env->fp_status))) {
> +            float_invalid_op_vxsnan(env, GETPC());
> +            t.VsrH(2 * i + 1) = float32_to_bfloat16(
> +                float32_snan_to_qnan(xb->VsrW(i)), &env->fp_status);
> +        } else {
> +            t.VsrH(2 * i + 1) =
> +                float32_to_bfloat16(xb->VsrW(i), &env->fp_status);
> +        }
> +    }

Do not check for snan first; use float_flag_invalid_snan.
And you can move that check outside the loop, before the
writeback of t to *xt.


r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 45/47] target/ppc: implement plxsd/pstxsd
  2022-02-22 14:36 ` [PATCH v4 45/47] target/ppc: implement plxsd/pstxsd matheus.ferst
@ 2022-02-23  3:14   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-23  3:14 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: groug, danielhb413, Leandro Lupori, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Leandro Lupori<leandro.lupori@eldorado.org.br>
> 
> Implement instructions plxsd/pstxsd and port lxsd/stxsd to decode
> tree.
> 
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/insn32.decode            |  2 ++
>   target/ppc/insn64.decode            | 10 ++++++
>   target/ppc/translate.c              | 14 ++------
>   target/ppc/translate/vsx-impl.c.inc | 55 +++++++++++++++++++++++++++--
>   4 files changed, 67 insertions(+), 14 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 46/47] target/ppc: implement plxssp/pstxssp
  2022-02-22 14:36 ` [PATCH v4 46/47] target/ppc: implement plxssp/pstxssp matheus.ferst
@ 2022-02-23  3:16   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-23  3:16 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: groug, danielhb413, Leandro Lupori, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Leandro Lupori<leandro.lupori@eldorado.org.br>
> 
> Implement instructions plxssp/pstxssp and port lxssp/stxssp to
> decode tree.
> 
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/insn32.decode            |  2 +
>   target/ppc/insn64.decode            |  6 ++
>   target/ppc/translate.c              | 29 +++------
>   target/ppc/translate/vsx-impl.c.inc | 93 +++++++++++++++--------------
>   4 files changed, 62 insertions(+), 68 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 47/47] target/ppc: implement lxvr[bhwd]/stxvr[bhwd]x
  2022-02-22 14:36 ` [PATCH v4 47/47] target/ppc: implement lxvr[bhwd]/stxvr[bhwd]x matheus.ferst
@ 2022-02-23  3:23   ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-23  3:23 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: Lucas Coutinho, groug, danielhb413, clg, david

On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
> From: Lucas Coutinho<lucas.coutinho@eldorado.org.br>
> 
> Implement the following PowerISA v3.1 instuctions:
> lxvrbx: Load VSX Vector Rightmost Byte Indexed X-form
> lxvrhx: Load VSX Vector Rightmost Halfword Indexed X-form
> lxvrwx: Load VSX Vector Rightmost Word Indexed X-form
> lxvrdx: Load VSX Vector Rightmost Doubleword Indexed X-form
> 
> stxvrbx: Store VSX Vector Rightmost Byte Indexed X-form
> stxvrhx: Store VSX Vector Rightmost Halfword Indexed X-form
> stxvrwx: Store VSX Vector Rightmost Word Indexed X-form
> stxvrdx: Store VSX Vector Rightmost Doubleword Indexed X-form
> 
> Signed-off-by: Lucas Coutinho<lucas.coutinho@eldorado.org.br>
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/insn32.decode            |  8 +++++++
>   target/ppc/translate/vsx-impl.c.inc | 35 +++++++++++++++++++++++++++++
>   2 files changed, 43 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 24/47] target/ppc: move vrl[bhwd]nm/vrl[bhwd]mi to decodetree
  2022-02-22 22:30   ` Richard Henderson
@ 2022-02-23 21:43     ` Matheus K. Ferst
  2022-02-23 22:19       ` Richard Henderson
  0 siblings, 1 reply; 97+ messages in thread
From: Matheus K. Ferst @ 2022-02-23 21:43 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 22/02/2022 19:30, Richard Henderson wrote:
> On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
>> +static void gen_vrlnm_vec(unsigned vece, TCGv_vec vrt, TCGv_vec vra,
>> +                          TCGv_vec vrb)
>> +{
>> +    TCGv_vec mask, n = tcg_temp_new_vec_matching(vrt);
>> +
>> +    /* Create the mask */
>> +    mask = do_vrl_mask_vec(vece, vrb);
>> +
>> +    /* Extract n */
>> +    tcg_gen_dupi_vec(vece, n, (8 << vece) - 1);
>> +    tcg_gen_and_vec(vece, n, vrb, n);
>> +
>> +    /* Rotate and mask */
>> +    tcg_gen_rotlv_vec(vece, vrt, vra, n);
> 
> Note that rotlv does the masking itself:
> 
> /*
>   * Expand D = A << (B % element bits)
>   *
>   * Unlike scalar shifts, where it is easy for the target front end
>   * to include the modulo as part of the expansion.  If the target
>   * naturally includes the modulo as part of the operation, great!
>   * If the target has some other behaviour from out-of-range shifts,
>   * then it could not use this function anyway, and would need to
>   * do it's own expansion with custom functions.
>   */
> 

Using tcg_gen_rotlv_vec(vece, vrt, vra, vrb) works on PPC but fails on 
x86. It looks like a problem on the i386 backend. It's using 
VPS[RL]LV[DQ], but instead of this modulo behavior, these instructions 
write zero to the element[1]. I'm not sure how to fix that. Do we need 
an INDEX_op_shlv_vec case in i386 tcg_expand_vec_op?

>> +static bool do_vrlnm(DisasContext *ctx, arg_VX *a, int vece)
>> +{
>> +    static const TCGOpcode vecop_list[] = {
>> +        INDEX_op_cmp_vec, INDEX_op_rotlv_vec, INDEX_op_sari_vec,
>> +        INDEX_op_shli_vec, INDEX_op_shri_vec, INDEX_op_shrv_vec, 0
>> +    };
> 
> Where is sari used?
> 

I'll remove in v5.

[1] Section 5.3 of 
https://www.intel.com/content/dam/develop/external/us/en/documents/36945

Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 20/47] target/ppc: implement vslq
  2022-02-22 22:14   ` Richard Henderson
@ 2022-02-23 21:53     ` Matheus K. Ferst
  2022-02-23 22:12       ` Richard Henderson
  0 siblings, 1 reply; 97+ messages in thread
From: Matheus K. Ferst @ 2022-02-23 21:53 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 22/02/2022 19:14, Richard Henderson wrote:
> On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
>> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>
>> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
>> ---
>> v4:
>>   -  New in v4.
>> ---
>>   target/ppc/insn32.decode            |  1 +
>>   target/ppc/translate/vmx-impl.c.inc | 40 +++++++++++++++++++++++++++++
>>   2 files changed, 41 insertions(+)
>>
>> diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
>> index 88baebe35e..3799065508 100644
>> --- a/target/ppc/insn32.decode
>> +++ b/target/ppc/insn32.decode
>> @@ -473,6 +473,7 @@ VSLB            000100 ..... ..... ..... 
>> 00100000100    @VX
>>   VSLH            000100 ..... ..... ..... 00101000100    @VX
>>   VSLW            000100 ..... ..... ..... 00110000100    @VX
>>   VSLD            000100 ..... ..... ..... 10111000100    @VX
>> +VSLQ            000100 ..... ..... ..... 00100000101    @VX
>>
>>   VSRB            000100 ..... ..... ..... 01000000100    @VX
>>   VSRH            000100 ..... ..... ..... 01001000100    @VX
>> diff --git a/target/ppc/translate/vmx-impl.c.inc 
>> b/target/ppc/translate/vmx-impl.c.inc
>> index ec4f0e7654..ca98a545ef 100644
>> --- a/target/ppc/translate/vmx-impl.c.inc
>> +++ b/target/ppc/translate/vmx-impl.c.inc
>> @@ -834,6 +834,46 @@ TRANS_FLAGS(ALTIVEC, VSRAH, do_vector_gvec3_VX, 
>> MO_16, tcg_gen_gvec_sarv);
>>   TRANS_FLAGS(ALTIVEC, VSRAW, do_vector_gvec3_VX, MO_32, 
>> tcg_gen_gvec_sarv);
>>   TRANS_FLAGS2(ALTIVEC_207, VSRAD, do_vector_gvec3_VX, MO_64, 
>> tcg_gen_gvec_sarv);
>>
>> +static bool trans_VSLQ(DisasContext *ctx, arg_VX *a)
>> +{
>> +    TCGv_i64 hi, lo, tmp, n, sf = tcg_constant_i64(64);
>> +
>> +    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
>> +    REQUIRE_VECTOR(ctx);
>> +
>> +    n = tcg_temp_new_i64();
>> +    hi = tcg_temp_new_i64();
>> +    lo = tcg_temp_new_i64();
>> +    tmp = tcg_const_i64(0);
>> +
>> +    get_avr64(lo, a->vra, false);
>> +    get_avr64(hi, a->vra, true);
>> +
>> +    get_avr64(n, a->vrb, true);
>> +    tcg_gen_andi_i64(n, n, 0x7F);
>> +
>> +    tcg_gen_movcond_i64(TCG_COND_GE, hi, n, sf, lo, hi);
>> +    tcg_gen_movcond_i64(TCG_COND_GE, lo, n, sf, tmp, lo);
> 
> Since you have to mask twice anyway, better use (n & 64) != 0.
> 

Hmm, I'm not sure if I understood. To check != 0 we'll need a temp to 
hold n&64. We could use tmp here, but we'll need another one in patch 
22. Is that right?

Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 20/47] target/ppc: implement vslq
  2022-02-23 21:53     ` Matheus K. Ferst
@ 2022-02-23 22:12       ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-23 22:12 UTC (permalink / raw)
  To: Matheus K. Ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/23/22 11:53, Matheus K. Ferst wrote:
> On 22/02/2022 19:14, Richard Henderson wrote:
>> On 2/22/22 04:36, matheus.ferst@eldorado.org.br wrote:
>>> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>>
>>> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>> ---
>>> v4:
>>>   -  New in v4.
>>> ---
>>>   target/ppc/insn32.decode            |  1 +
>>>   target/ppc/translate/vmx-impl.c.inc | 40 +++++++++++++++++++++++++++++
>>>   2 files changed, 41 insertions(+)
>>>
>>> diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
>>> index 88baebe35e..3799065508 100644
>>> --- a/target/ppc/insn32.decode
>>> +++ b/target/ppc/insn32.decode
>>> @@ -473,6 +473,7 @@ VSLB            000100 ..... ..... ..... 00100000100    @VX
>>>   VSLH            000100 ..... ..... ..... 00101000100    @VX
>>>   VSLW            000100 ..... ..... ..... 00110000100    @VX
>>>   VSLD            000100 ..... ..... ..... 10111000100    @VX
>>> +VSLQ            000100 ..... ..... ..... 00100000101    @VX
>>>
>>>   VSRB            000100 ..... ..... ..... 01000000100    @VX
>>>   VSRH            000100 ..... ..... ..... 01001000100    @VX
>>> diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc
>>> index ec4f0e7654..ca98a545ef 100644
>>> --- a/target/ppc/translate/vmx-impl.c.inc
>>> +++ b/target/ppc/translate/vmx-impl.c.inc
>>> @@ -834,6 +834,46 @@ TRANS_FLAGS(ALTIVEC, VSRAH, do_vector_gvec3_VX, MO_16, 
>>> tcg_gen_gvec_sarv);
>>>   TRANS_FLAGS(ALTIVEC, VSRAW, do_vector_gvec3_VX, MO_32, tcg_gen_gvec_sarv);
>>>   TRANS_FLAGS2(ALTIVEC_207, VSRAD, do_vector_gvec3_VX, MO_64, tcg_gen_gvec_sarv);
>>>
>>> +static bool trans_VSLQ(DisasContext *ctx, arg_VX *a)
>>> +{
>>> +    TCGv_i64 hi, lo, tmp, n, sf = tcg_constant_i64(64);
>>> +
>>> +    REQUIRE_INSNS_FLAGS2(ctx, ISA310);
>>> +    REQUIRE_VECTOR(ctx);
>>> +
>>> +    n = tcg_temp_new_i64();
>>> +    hi = tcg_temp_new_i64();
>>> +    lo = tcg_temp_new_i64();
>>> +    tmp = tcg_const_i64(0);
>>> +
>>> +    get_avr64(lo, a->vra, false);
>>> +    get_avr64(hi, a->vra, true);
>>> +
>>> +    get_avr64(n, a->vrb, true);
>>> +    tcg_gen_andi_i64(n, n, 0x7F);
>>> +
>>> +    tcg_gen_movcond_i64(TCG_COND_GE, hi, n, sf, lo, hi);
>>> +    tcg_gen_movcond_i64(TCG_COND_GE, lo, n, sf, tmp, lo);
>>
>> Since you have to mask twice anyway, better use (n & 64) != 0.
>>
> 
> Hmm, I'm not sure if I understood. To check != 0 we'll need a temp to hold n&64. We could 
> use tmp here, but we'll need another one in patch 22. Is that right?

Yes.

r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 24/47] target/ppc: move vrl[bhwd]nm/vrl[bhwd]mi to decodetree
  2022-02-23 21:43     ` Matheus K. Ferst
@ 2022-02-23 22:19       ` Richard Henderson
  2022-02-24 20:23         ` Matheus K. Ferst
  0 siblings, 1 reply; 97+ messages in thread
From: Richard Henderson @ 2022-02-23 22:19 UTC (permalink / raw)
  To: Matheus K. Ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/23/22 11:43, Matheus K. Ferst wrote:
>> Note that rotlv does the masking itself:
>>
>> /*
>>   * Expand D = A << (B % element bits)
>>   *
>>   * Unlike scalar shifts, where it is easy for the target front end
>>   * to include the modulo as part of the expansion.  If the target
>>   * naturally includes the modulo as part of the operation, great!
>>   * If the target has some other behaviour from out-of-range shifts,
>>   * then it could not use this function anyway, and would need to
>>   * do it's own expansion with custom functions.
>>   */
>>
> 
> Using tcg_gen_rotlv_vec(vece, vrt, vra, vrb) works on PPC but fails on x86. It looks like 
> a problem on the i386 backend. It's using VPS[RL]LV[DQ], but instead of this modulo 
> behavior, these instructions write zero to the element[1]. I'm not sure how to fix that. 

You don't want to use tcg_gen_rotlv_vec directly, but tcg_gen_rotlv_vec.

The generic modulo is being applied here:

static void tcg_gen_rotlv_mod_vec(unsigned vece, TCGv_vec d,
                                   TCGv_vec a, TCGv_vec b)
{
     TCGv_vec t = tcg_temp_new_vec_matching(d);
     TCGv_vec m = tcg_constant_vec_matching(d, vece, (8 << vece) - 1);

     tcg_gen_and_vec(vece, t, b, m);
     tcg_gen_rotlv_vec(vece, d, a, t);
     tcg_temp_free_vec(t);
}


r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 38/47] target/ppc: Refactor VSX_SCALAR_CMP_DP
  2022-02-23  0:20   ` Richard Henderson
@ 2022-02-24 19:16     ` Víctor Colombo
  2022-02-24 21:24       ` Richard Henderson
  0 siblings, 1 reply; 97+ messages in thread
From: Víctor Colombo @ 2022-02-24 19:16 UTC (permalink / raw)
  To: Richard Henderson, matheus.ferst, qemu-devel, qemu-ppc
  Cc: groug, danielhb413, clg, david

On 22/02/2022 21:20, Richard Henderson wrote:> On 2/22/22 04:36, 
matheus.ferst@eldorado.org.br wrote:
>> From: Víctor Colombo <victor.colombo@eldorado.org.br>
>>
>> Refactor VSX_SCALAR_CMP_DP, changing its name to VSX_SCALAR_CMP and
>> prepare the helper to be used for quadword comparisons.
>>
>> Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
>> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
>> ---
>>   target/ppc/fpu_helper.c | 31 ++++++++++++++-----------------
>>   1 file changed, 14 insertions(+), 17 deletions(-)
>>
>> diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
>> index 9b034d1fe4..5ebbcfe3b7 100644
>> --- a/target/ppc/fpu_helper.c
>> +++ b/target/ppc/fpu_helper.c
>> @@ -2265,28 +2265,30 @@ VSX_MADDQ(XSNMSUBQP, NMSUB_FLGS, 0)
>>   VSX_MADDQ(XSNMSUBQPO, NMSUB_FLGS, 0)
>>
>>   /*
>> - * VSX_SCALAR_CMP_DP - VSX scalar floating point compare double 
>> precision
>> + * VSX_SCALAR_CMP - VSX scalar floating point compare
>>    *   op    - instruction mnemonic
>> + *   tp    - type
>>    *   cmp   - comparison operation
>>    *   exp   - expected result of comparison
>> + *   fld   - vsr_t field
>>    *   svxvc - set VXVC bit
>>    */
>> -#define VSX_SCALAR_CMP_DP(op, cmp, exp, 
>> svxvc)                                \
>> +#define VSX_SCALAR_CMP(op, tp, cmp, fld, exp, 
>> svxvc)                          \
>>   void helper_##op(CPUPPCState *env, ppc_vsr_t 
>> *xt,                             \
>>                    ppc_vsr_t *xa, ppc_vsr_t 
>> *xb)                                \
>>   
>> {                                                                             
>> \
>> -    ppc_vsr_t t = 
>> *xt;                                                        \
>> +    ppc_vsr_t t = { 
>> };                                                        \
>>       bool vxsnan_flag = false, vxvc_flag = false, vex_flag = 
>> false;            \
>>                                                                                 
>> \
>> -    if (float64_is_signaling_nan(xa->VsrD(0), &env->fp_status) 
>> ||             \
>> -        float64_is_signaling_nan(xb->VsrD(0), &env->fp_status)) 
>> {             \
>> +    if (tp##_is_signaling_nan(xa->fld, &env->fp_status) 
>> ||                    \
>> +        tp##_is_signaling_nan(xb->fld, &env->fp_status)) 
>> {                    \
>>           vxsnan_flag = 
>> true;                                                   \
>>           if (fpscr_ve == 0 && svxvc) 
>> {                                         \
>>               vxvc_flag = 
>> true;                                                 \
>>           
>> }                                                                     \
>>       } else if (svxvc) 
>> {                                                       \
>> -        vxvc_flag = float64_is_quiet_nan(xa->VsrD(0), 
>> &env->fp_status) ||     \
>> -            float64_is_quiet_nan(xb->VsrD(0), 
>> &env->fp_status);               \
>> +        vxvc_flag = tp##_is_quiet_nan(xa->fld, &env->fp_status) 
>> ||            \
>> +            tp##_is_quiet_nan(xb->fld, 
>> &env->fp_status);                      \
>>       }
> 
> Note that this can be simplified further, using the full FloatRelation 
> result and
> float_flag_invalid_snan.
> 
> Note that do_scalar_cmp gets half-way there, only checking for NaNs once 
> we have
> float_relation_unordered as a comparision result.  But it could go 
> further and check
> float_flag_invalid_snan and drop all of the other checks vs snan and qnan.
> 
> 
> r~

Hello Richard! Thanks for your review

Could you please elaborate more on how do you think using
float*_compare and its FloatRelation result would work here?
I noticed do_scalar_cmp modifies CR and sets FPCC flag, which
is not what VSX_SCALAR_CMP do. Using that function would require a
rework.

An option I though would be to bring into VSX_SCALAR_CMP the
important necessary parts, something like this:

#define VSX_SCALAR_CMP(op, tp, cmp, fld, svxvc, expr) 
       ...
     r = tp##_compare(xa->fld, xb->fld, &env->fp_status); 
        \
     if (expr) { 
        \
         memset(&t.fld, 0xFF, sizeof(t.fld)); 
        \
     } else if (r == float_relation_unordered) { 
        \
         if (env->fp_status.float_exception_flags & 
float_flag_invalid_snan) { \
             float_invalid_op_vxsnan(env, GETPC()); 
        \
             if (fpscr_ve == 0 && svxvc) { 
        \
                 float_invalid_op_vxvc(env, 0, GETPC()); 
        \
             } 
        \
         } else if (svxvc) { 
        \
             if (tp##_is_quiet_nan(xa->fld, &env->fp_status) || 
        \
                 tp##_is_quiet_nan(xb->fld, &env->fp_status)) { 
        \
                     float_invalid_op_vxvc(env, 0, GETPC()); 
        \
                 } 
        \
         } 
        \
     } 
        \
...
VSX_SCALAR_CMP(XSCMPEQDP, float64, eq, VsrD(0), 0, r == 
float_relation_equal)
VSX_SCALAR_CMP(XSCMPGEDP, float64, le, VsrD(0), 1, \
     r == float_relation_equal || r == float_relation_greater)
VSX_SCALAR_CMP(XSCMPGTDP, float64, lt, VsrD(0), 1, r == 
float_relation_greater)

But this still looks convoluted. Another option I came with would be:

     ppc_vsr_t t = { }; 
        \
 
        \
     helper_reset_fpstatus(env); 
        \
 
        \
     if (tp##_##cmp##_quiet(xb->fld, xa->fld, &env->fp_status)) { 
        \
         memset(&t.fld, 0xFF, sizeof(t.fld)); 
        \
     } 
        \
 
        \
     if (env->fp_status.float_exception_flags & float_flag_invalid_snan) 
{     \
         float_invalid_op_vxsnan(env, GETPC()); 
        \
         if (fpscr_ve == 0 && svxvc) { 
        \
             float_invalid_op_vxvc(env, 0, GETPC()); 
        \
         } 
        \
     } else if (svxvc) { 
        \
         if (tp##_is_quiet_nan(xa->fld, &env->fp_status) || 
        \
             tp##_is_quiet_nan(xb->fld, &env->fp_status)) { 
        \
                 float_invalid_op_vxvc(env, 0, GETPC()); 
        \
             } 
        \
     } 
        \

Is this close to what you were thinking?

Thank you very much!

-- Víctor


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 24/47] target/ppc: move vrl[bhwd]nm/vrl[bhwd]mi to decodetree
  2022-02-23 22:19       ` Richard Henderson
@ 2022-02-24 20:23         ` Matheus K. Ferst
  2022-02-24 21:26           ` Richard Henderson
  0 siblings, 1 reply; 97+ messages in thread
From: Matheus K. Ferst @ 2022-02-24 20:23 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 23/02/2022 19:19, Richard Henderson wrote:
> On 2/23/22 11:43, Matheus K. Ferst wrote:
>>> Note that rotlv does the masking itself:
>>>
>>> /*
>>>   * Expand D = A << (B % element bits)
>>>   *
>>>   * Unlike scalar shifts, where it is easy for the target front end
>>>   * to include the modulo as part of the expansion.  If the target
>>>   * naturally includes the modulo as part of the operation, great!
>>>   * If the target has some other behaviour from out-of-range shifts,
>>>   * then it could not use this function anyway, and would need to
>>>   * do it's own expansion with custom functions.
>>>   */
>>>
>>
>> Using tcg_gen_rotlv_vec(vece, vrt, vra, vrb) works on PPC but fails on 
>> x86. It looks like
>> a problem on the i386 backend. It's using VPS[RL]LV[DQ], but instead 
>> of this modulo
>> behavior, these instructions write zero to the element[1]. I'm not 
>> sure how to fix that.
> 
> You don't want to use tcg_gen_rotlv_vec directly, but tcg_gen_rotlv_vec.
> 

I guess there is a typo here. Did you mean tcg_gen_gvec_rotlv? Or 
tcg_gen_rotlv_mod_vec?

> The generic modulo is being applied here:
> 
> static void tcg_gen_rotlv_mod_vec(unsigned vece, TCGv_vec d,
>                                    TCGv_vec a, TCGv_vec b)
> {
>      TCGv_vec t = tcg_temp_new_vec_matching(d);
>      TCGv_vec m = tcg_constant_vec_matching(d, vece, (8 << vece) - 1);
> 
>      tcg_gen_and_vec(vece, t, b, m);
>      tcg_gen_rotlv_vec(vece, d, a, t);
>      tcg_temp_free_vec(t);
> }

I can see that this method is called when we use tcg_gen_gvec_rotlv to 
implement vrl[bhwd], and they are working as expected. For vrl[wd]nm and 
vrl[wd]mi, however, we can't call tcg_gen_rotlv_mod_vec directly in the 
.fniv implementation because it is not exposed in tcg-op.h. Is there any 
other way to use this method? Should we add it to the header file?

Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 38/47] target/ppc: Refactor VSX_SCALAR_CMP_DP
  2022-02-24 19:16     ` Víctor Colombo
@ 2022-02-24 21:24       ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-24 21:24 UTC (permalink / raw)
  To: Víctor Colombo, matheus.ferst, qemu-devel, qemu-ppc
  Cc: groug, danielhb413, clg, david

On 2/24/22 09:16, Víctor Colombo wrote:
> Could you please elaborate more on how do you think using
> float*_compare and its FloatRelation result would work here?
> I noticed do_scalar_cmp modifies CR and sets FPCC flag, which
> is not what VSX_SCALAR_CMP do. Using that function would require a
> rework.
> 
> An option I though would be to bring into VSX_SCALAR_CMP the
> important necessary parts, something like this:
> 
> #define VSX_SCALAR_CMP(op, tp, cmp, fld, svxvc, expr)       ...
>      r = tp##_compare(xa->fld, xb->fld, &env->fp_status);        \
>      if (expr) {        \
>          memset(&t.fld, 0xFF, sizeof(t.fld));        \
>      } else if (r == float_relation_unordered) {        \
>          if (env->fp_status.float_exception_flags & float_flag_invalid_snan) { \
>              float_invalid_op_vxsnan(env, GETPC());        \
>              if (fpscr_ve == 0 && svxvc) {        \
>                  float_invalid_op_vxvc(env, 0, GETPC());        \
>              }        \
>          } else if (svxvc) {        \
>              if (tp##_is_quiet_nan(xa->fld, &env->fp_status) ||        \
>                  tp##_is_quiet_nan(xb->fld, &env->fp_status)) {        \
>                      float_invalid_op_vxvc(env, 0, GETPC());        \
>                  }        \
>          }        \
>      }        \
> ...
> VSX_SCALAR_CMP(XSCMPEQDP, float64, eq, VsrD(0), 0, r == float_relation_equal)
> VSX_SCALAR_CMP(XSCMPGEDP, float64, le, VsrD(0), 1, \
>      r == float_relation_equal || r == float_relation_greater)
> VSX_SCALAR_CMP(XSCMPGTDP, float64, lt, VsrD(0), 1, r == float_relation_greater)

I was thinking along the lines of:

     bool r;
     int flags;

     helper_reset_fpstatus(env);
     if (svxvc) {
         r = tp##cmp(...);
     } else {
         r = tp##cmp##_quiet(...);
     }

     flags = get_float_exception_flags(&env->fp_status);
     if (unlikely(flags & float_flag_invalid)) {
         bool vxvc = svxvc;
         if (flags & float_flag_invalid_snan)) {
             float_invalid_op_vxsnan(...);
             vxvc &= fpscr_ve == 0;
         }
         if (vxvc) {
             float_invalid_op_vxvc(...);
         }
     }

     memset(xt, 0, sizeof(*xt));
     memset(&xt->fld, -r, sizeof(xt->fld));
     do_float_check_status(...);


r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v4 24/47] target/ppc: move vrl[bhwd]nm/vrl[bhwd]mi to decodetree
  2022-02-24 20:23         ` Matheus K. Ferst
@ 2022-02-24 21:26           ` Richard Henderson
  0 siblings, 0 replies; 97+ messages in thread
From: Richard Henderson @ 2022-02-24 21:26 UTC (permalink / raw)
  To: Matheus K. Ferst, qemu-devel, qemu-ppc; +Cc: groug, danielhb413, clg, david

On 2/24/22 10:23, Matheus K. Ferst wrote:
>> You don't want to use tcg_gen_rotlv_vec directly, but tcg_gen_rotlv_vec.
>>
> 
> I guess there is a typo here. Did you mean tcg_gen_gvec_rotlv? Or tcg_gen_rotlv_mod_vec?

Dangit.  Paste-paste error.  The first: tcg_gen_gvec_rotlv.


r~


^ permalink raw reply	[flat|nested] 97+ messages in thread

end of thread, other threads:[~2022-02-24 21:27 UTC | newest]

Thread overview: 97+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-22 14:35 [PATCH v4 00/47] target/ppc: PowerISA Vector/VSX instruction batch matheus.ferst
2022-02-22 14:35 ` [PATCH v4 01/47] target/ppc: Introduce TRANS*FLAGS macros matheus.ferst
2022-02-22 14:36 ` [PATCH v4 02/47] target/ppc: moved vector even and odd multiplication to decodetree matheus.ferst
2022-02-22 18:19   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 03/47] target/ppc: Moved vector multiply high and low " matheus.ferst
2022-02-22 18:19   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 04/47] target/ppc: vmulh* instructions without helpers matheus.ferst
2022-02-22 18:23   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 05/47] target/ppc: Implement vmsumcud instruction matheus.ferst
2022-02-22 18:28   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 06/47] target/ppc: Implement vmsumudm instruction matheus.ferst
2022-02-22 14:36 ` [PATCH v4 07/47] target/ppc: Move vexts[bhw]2[wd] to decodetree matheus.ferst
2022-02-22 18:34   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 08/47] target/ppc: Implement vextsd2q matheus.ferst
2022-02-22 14:36 ` [PATCH v4 09/47] target/ppc: Move Vector Compare Equal/Not Equal/Greater Than to decodetree matheus.ferst
2022-02-22 18:37   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 10/47] target/ppc: Move Vector Compare Not Equal or Zero " matheus.ferst
2022-02-22 19:04   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 11/47] target/ppc: Implement Vector Compare Equal Quadword matheus.ferst
2022-02-22 19:05   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 12/47] target/ppc: Implement Vector Compare Greater Than Quadword matheus.ferst
2022-02-22 19:07   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 13/47] target/ppc: Implement Vector Compare Quadword matheus.ferst
2022-02-22 14:36 ` [PATCH v4 14/47] target/ppc: implement vstri[bh][lr] matheus.ferst
2022-02-22 19:13   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 15/47] target/ppc: implement vclrlb matheus.ferst
2022-02-22 19:15   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 16/47] target/ppc: implement vclrrb matheus.ferst
2022-02-22 19:17   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 17/47] target/ppc: implement vcntmb[bhwd] matheus.ferst
2022-02-22 14:36 ` [PATCH v4 18/47] target/ppc: implement vgnb matheus.ferst
2022-02-22 21:58   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 19/47] target/ppc: move vs[lr][a][bhwd] to decodetree matheus.ferst
2022-02-22 22:01   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 20/47] target/ppc: implement vslq matheus.ferst
2022-02-22 22:14   ` Richard Henderson
2022-02-23 21:53     ` Matheus K. Ferst
2022-02-23 22:12       ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 21/47] target/ppc: implement vsrq matheus.ferst
2022-02-22 22:15   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 22/47] target/ppc: implement vsraq matheus.ferst
2022-02-22 22:19   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 23/47] target/ppc: move vrl[bhwd] to decodetree matheus.ferst
2022-02-22 22:20   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 24/47] target/ppc: move vrl[bhwd]nm/vrl[bhwd]mi " matheus.ferst
2022-02-22 22:30   ` Richard Henderson
2022-02-23 21:43     ` Matheus K. Ferst
2022-02-23 22:19       ` Richard Henderson
2022-02-24 20:23         ` Matheus K. Ferst
2022-02-24 21:26           ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 25/47] target/ppc: implement vrlq matheus.ferst
2022-02-22 22:33   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 26/47] target/ppc: Move vsel and vperm/vpermr to decodetree matheus.ferst
2022-02-22 22:37   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 27/47] target/ppc: Move xxsel " matheus.ferst
2022-02-22 22:38   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 28/47] target/ppc: move xxperm/xxpermr " matheus.ferst
2022-02-22 22:40   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 29/47] target/ppc: Move xxpermdi " matheus.ferst
2022-02-22 22:42   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 30/47] target/ppc: Implement xxpermx instruction matheus.ferst
2022-02-22 22:46   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 31/47] tcg/tcg-op-gvec.c: Introduce tcg_gen_gvec_4i matheus.ferst
2022-02-22 23:04   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 32/47] target/ppc: Implement xxeval matheus.ferst
2022-02-22 23:43   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 33/47] target/ppc: Implement xxgenpcv[bhwd]m instruction matheus.ferst
2022-02-22 23:48   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 34/47] target/ppc: move xs[n]madd[am][ds]p/xs[n]msub[am][ds]p to decodetree matheus.ferst
2022-02-22 23:52   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 35/47] target/ppc: implement xs[n]maddqp[o]/xs[n]msubqp[o] matheus.ferst
2022-02-22 23:56   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 36/47] target/ppc: Implement xvtlsbb instruction matheus.ferst
2022-02-23  0:07   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 37/47] target/ppc: Remove xscmpnedp instruction matheus.ferst
2022-02-22 14:36 ` [PATCH v4 38/47] target/ppc: Refactor VSX_SCALAR_CMP_DP matheus.ferst
2022-02-23  0:20   ` Richard Henderson
2022-02-24 19:16     ` Víctor Colombo
2022-02-24 21:24       ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 39/47] target/ppc: Implement xscmp{eq,ge,gt}qp matheus.ferst
2022-02-23  0:21   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 40/47] target/ppc: Move xscmp{eq,ge,gt}dp to decodetree matheus.ferst
2022-02-23  0:22   ` [PATCH v4 40/47] target/ppc: Move xscmp{eq, ge, gt}dp " Richard Henderson
2022-02-22 14:36 ` [PATCH v4 41/47] target/ppc: Move xs{max, min}[cj]dp to use do_helper_XX3 matheus.ferst
2022-02-23  0:23   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 42/47] target/ppc: Refactor VSX_MAX_MINC helper matheus.ferst
2022-02-23  0:40   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 43/47] target/ppc: Implement xs{max,min}cqp matheus.ferst
2022-02-23  0:41   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 44/47] target/ppc: Implement xvcvbf16spn and xvcvspbf16 instructions matheus.ferst
2022-02-23  3:08   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 45/47] target/ppc: implement plxsd/pstxsd matheus.ferst
2022-02-23  3:14   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 46/47] target/ppc: implement plxssp/pstxssp matheus.ferst
2022-02-23  3:16   ` Richard Henderson
2022-02-22 14:36 ` [PATCH v4 47/47] target/ppc: implement lxvr[bhwd]/stxvr[bhwd]x matheus.ferst
2022-02-23  3:23   ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.