All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree
@ 2020-05-12 16:38 Peter Maydell
  2020-05-12 16:38 ` [PATCH v2 01/17] target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH " Peter Maydell
                   ` (17 more replies)
  0 siblings, 18 replies; 30+ messages in thread
From: Peter Maydell @ 2020-05-12 16:38 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

This patchset is v2 of the Neon decodetree conversion. The first
half of v1 is in master already, so we're left just with patches
converting the rest of the 3-reg-same Neon dp insn group.

Based-on: <20200508152200.6547-1-richard.henderson@linaro.org>
("[PATCH v3 00/16] target/arm: partial vector cleanup")
Strictly speaking, based on that with a fixup for the VRSRA bug,
but I think patchew should be placated by that based-on tag.

Git tree available at:
 https://git.linaro.org/people/peter.maydell/qemu-arm.git neon-decodetree
with the whole patchstack including RTH's series.

Changes from v1:
 * the first 19 or so patches have been upstreamed
 * patch 1 (VQRDMLAH/VQRDMLSH) uses do_3same() now
 * patch 3 (64-bit elt 3-reg-same): shifts now use a format
   which swaps Vn and Vm in decode, so we don't need to special
   case them in the C code. We also use the gvec interface
   rather than hand-rolling a for-each-pass loop.
 * patch 4: make DO_3SAME_32() handle just one trans fn rather
   than doing both _U and _S in one macro invocation.
   Use gvec rather than hand-rolling the for-each-pass loop.
   patch 5: (vaba/vabd): new, since rth's patchset rewrote how these
   were handled and they can now just be handled via DO_3SAME_NO_SZ_3()
 * patch 7: saturating shift handling rewritten to use the gvec APIs.
 * patch 10 (vqdmulh/vqrdmulh): rewritten to use gvec
 * patch 11 (vadd/vsub): rewritten to use gvec; vabd now in an earlier
   patch with vaba
 * patch 13 (vmul/vmla/vmls): vmul uses gvec; do_3same_fp() and
   DO_3S_FP() now added in this patch as it is the first user
 * new patch 15 making recps_f32 and rsqrts_f32 easier to use with
   common gvec APIs and macros by moving the 'env' argument to the front
 * patch 16: updated VRECPS/VRSQRTS code to use gvec

NB: I have not attempted to merge VQSHL_S and VQSHL_S64 into
one pattern in patch 7 (as suggested in review of v1) -- adding
yet another DO_3SAME_FOO for the case of "64 bit and 32 bit
can be done with same trans/gen fn" didn't seem worthwhile.

thanks
-- PMM

Peter Maydell (17):
  target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH to decodetree
  target/arm: Convert Neon 3-reg-same SHA to decodetree
  target/arm: Convert Neon 64-bit element 3-reg-same insns
  target/arm: Convert Neon VHADD 3-reg-same insns
  target/arm: Convert Neon VABA/VABD 3-reg-same to decodetree
  target/arm: Convert Neon VRHADD, VHSUB 3-reg-same insns to decodetree
  target/arm: Convert Neon VQSHL, VRSHL, VQRSHL 3-reg-same insns to
    decodetree
  target/arm: Convert Neon VPMAX/VPMIN 3-reg-same insns to decodetree
  target/arm: Convert Neon VPADD 3-reg-same insns to decodetree
  target/arm: Convert Neon VQDMULH/VQRDMULH 3-reg-same to decodetree
  target/arm: Convert Neon VADD, VSUB, VABD 3-reg-same insns to
    decodetree
  target/arm: Convert Neon VPMIN/VPMAX/VPADD float 3-reg-same insns to
    decodetree
  target/arm: Convert Neon fp VMUL, VMLA, VMLS 3-reg-same insns to
    decodetree
  target/arm: Convert Neon 3-reg-same compare insns to decodetree
  target/arm: Move 'env' argument of recps_f32 and rsqrts_f32 helpers to
    usual place
  target/arm: Convert Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS to
    decodetree
  target/arm: Convert NEON VFMA, VFMS 3-reg-same insns to decodetree

 target/arm/helper.h             |   7 +-
 target/arm/neon-dp.decode       | 113 +++++-
 target/arm/neon_helper.c        |   7 -
 target/arm/translate-neon.inc.c | 639 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 495 +------------------------
 target/arm/vec_helper.c         |   7 +
 target/arm/vfp_helper.c         |   4 +-
 7 files changed, 767 insertions(+), 505 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 01/17] target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH to decodetree
  2020-05-12 16:38 [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree Peter Maydell
@ 2020-05-12 16:38 ` Peter Maydell
  2020-05-13 16:39   ` Richard Henderson
  2020-05-12 16:38 ` [PATCH v2 02/17] target/arm: Convert Neon 3-reg-same SHA " Peter Maydell
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2020-05-12 16:38 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon VQRDMLAH and VQRDMLSH insns in the 3-reg-same group
to decodetree.  These don't use do_3same() because they want to
operate on VFP double registers, whose offsets are different from the
neon_reg_offset() calculations do_3same does.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
v1->v2: use do_3same() rather than custom do_vqrdlmah() function
---
 target/arm/neon-dp.decode       |  3 +++
 target/arm/translate-neon.inc.c | 15 +++++++++++++++
 target/arm/translate.c          | 14 ++------------
 3 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 6b0b6566d60..04aff438410 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -89,3 +89,6 @@ VMLS_3s          1111 001 1 0 . .. .... .... 1001 . . . 0 .... @3same
 
 VMUL_3s          1111 001 0 0 . .. .... .... 1001 . . . 1 .... @3same
 VMUL_p_3s        1111 001 1 0 . .. .... .... 1001 . . . 1 .... @3same
+
+VQRDMLAH_3s      1111 001 1 0 . .. .... .... 1011 ... 1 .... @3same
+VQRDMLSH_3s      1111 001 1 0 . .. .... .... 1100 ... 1 .... @3same
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 099491b16f3..661b5fc4cf2 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -671,3 +671,18 @@ static bool trans_VMUL_p_3s(DisasContext *s, arg_3same *a)
     }
     return do_3same(s, a, gen_VMUL_p_3s);
 }
+
+#define DO_VQRDMLAH(INSN, FUNC)                                         \
+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
+    {                                                                   \
+        if (!dc_isar_feature(aa32_rdm, s)) {                            \
+            return false;                                               \
+        }                                                               \
+        if (a->size != 1 && a->size != 2) {                             \
+            return false;                                               \
+        }                                                               \
+        return do_3same(s, a, FUNC);                                    \
+    }
+
+DO_VQRDMLAH(VQRDMLAH, gen_gvec_sqrdmlah_qc)
+DO_VQRDMLAH(VQRDMLSH, gen_gvec_sqrdmlsh_qc)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index e3d37ef2e99..1f06cb5a87f 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5450,12 +5450,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             if (!u) {
                 break;  /* VPADD */
             }
-            /* VQRDMLAH */
-            if (dc_isar_feature(aa32_rdm, s) && (size == 1 || size == 2)) {
-                gen_gvec_sqrdmlah_qc(size, rd_ofs, rn_ofs, rm_ofs,
-                                     vec_size, vec_size);
-                return 0;
-            }
+            /* VQRDMLAH : handled by decodetree */
             return 1;
 
         case NEON_3R_VFM_VQRDMLSH:
@@ -5466,12 +5461,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 }
                 break;
             }
-            /* VQRDMLSH */
-            if (dc_isar_feature(aa32_rdm, s) && (size == 1 || size == 2)) {
-                gen_gvec_sqrdmlsh_qc(size, rd_ofs, rn_ofs, rm_ofs,
-                                     vec_size, vec_size);
-                return 0;
-            }
+            /* VQRDMLSH : handled by decodetree */
             return 1;
 
         case NEON_3R_VABD:
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 02/17] target/arm: Convert Neon 3-reg-same SHA to decodetree
  2020-05-12 16:38 [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree Peter Maydell
  2020-05-12 16:38 ` [PATCH v2 01/17] target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH " Peter Maydell
@ 2020-05-12 16:38 ` Peter Maydell
  2020-05-12 16:38 ` [PATCH v2 03/17] target/arm: Convert Neon 64-bit element 3-reg-same insns Peter Maydell
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2020-05-12 16:38 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon SHA instructions in the 3-reg-same group
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  10 +++
 target/arm/translate-neon.inc.c | 139 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  46 +----------
 3 files changed, 151 insertions(+), 44 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 04aff438410..68b0a44793f 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -91,4 +91,14 @@ VMUL_3s          1111 001 0 0 . .. .... .... 1001 . . . 1 .... @3same
 VMUL_p_3s        1111 001 1 0 . .. .... .... 1001 . . . 1 .... @3same
 
 VQRDMLAH_3s      1111 001 1 0 . .. .... .... 1011 ... 1 .... @3same
+
+SHA1_3s          1111 001 0 0 . optype:2 .... .... 1100 . 1 . 0 .... \
+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
+SHA256H_3s       1111 001 1 0 . 00 .... .... 1100 . 1 . 0 .... \
+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
+SHA256H2_3s      1111 001 1 0 . 01 .... .... 1100 . 1 . 0 .... \
+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
+SHA256SU1_3s     1111 001 1 0 . 10 .... .... 1100 . 1 . 0 .... \
+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
 VQRDMLSH_3s      1111 001 1 0 . .. .... .... 1100 ... 1 .... @3same
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 661b5fc4cf2..03b3337e460 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -686,3 +686,142 @@ static bool trans_VMUL_p_3s(DisasContext *s, arg_3same *a)
 
 DO_VQRDMLAH(VQRDMLAH, gen_gvec_sqrdmlah_qc)
 DO_VQRDMLAH(VQRDMLSH, gen_gvec_sqrdmlsh_qc)
+
+static bool trans_SHA1_3s(DisasContext *s, arg_SHA1_3s *a)
+{
+    TCGv_ptr ptr1, ptr2, ptr3;
+    TCGv_i32 tmp;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON) ||
+        !dc_isar_feature(aa32_sha1, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    ptr1 = vfp_reg_ptr(true, a->vd);
+    ptr2 = vfp_reg_ptr(true, a->vn);
+    ptr3 = vfp_reg_ptr(true, a->vm);
+    tmp = tcg_const_i32(a->optype);
+    gen_helper_crypto_sha1_3reg(ptr1, ptr2, ptr3, tmp);
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_ptr(ptr1);
+    tcg_temp_free_ptr(ptr2);
+    tcg_temp_free_ptr(ptr3);
+
+    return true;
+}
+
+static bool trans_SHA256H_3s(DisasContext *s, arg_SHA256H_3s *a)
+{
+    TCGv_ptr ptr1, ptr2, ptr3;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON) ||
+        !dc_isar_feature(aa32_sha2, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    ptr1 = vfp_reg_ptr(true, a->vd);
+    ptr2 = vfp_reg_ptr(true, a->vn);
+    ptr3 = vfp_reg_ptr(true, a->vm);
+    gen_helper_crypto_sha256h(ptr1, ptr2, ptr3);
+    tcg_temp_free_ptr(ptr1);
+    tcg_temp_free_ptr(ptr2);
+    tcg_temp_free_ptr(ptr3);
+
+    return true;
+}
+
+static bool trans_SHA256H2_3s(DisasContext *s, arg_SHA256H2_3s *a)
+{
+    TCGv_ptr ptr1, ptr2, ptr3;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON) ||
+        !dc_isar_feature(aa32_sha2, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    ptr1 = vfp_reg_ptr(true, a->vd);
+    ptr2 = vfp_reg_ptr(true, a->vn);
+    ptr3 = vfp_reg_ptr(true, a->vm);
+    gen_helper_crypto_sha256h2(ptr1, ptr2, ptr3);
+    tcg_temp_free_ptr(ptr1);
+    tcg_temp_free_ptr(ptr2);
+    tcg_temp_free_ptr(ptr3);
+
+    return true;
+}
+
+static bool trans_SHA256SU1_3s(DisasContext *s, arg_SHA256SU1_3s *a)
+{
+    TCGv_ptr ptr1, ptr2, ptr3;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON) ||
+        !dc_isar_feature(aa32_sha2, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    ptr1 = vfp_reg_ptr(true, a->vd);
+    ptr2 = vfp_reg_ptr(true, a->vn);
+    ptr3 = vfp_reg_ptr(true, a->vm);
+    gen_helper_crypto_sha256su1(ptr1, ptr2, ptr3);
+    tcg_temp_free_ptr(ptr1);
+    tcg_temp_free_ptr(ptr2);
+    tcg_temp_free_ptr(ptr3);
+
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 1f06cb5a87f..ee2b8d6f6e3 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5359,7 +5359,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     int vec_size;
     uint32_t imm;
     TCGv_i32 tmp, tmp2, tmp3, tmp4, tmp5;
-    TCGv_ptr ptr1, ptr2, ptr3;
+    TCGv_ptr ptr1, ptr2;
     TCGv_i64 tmp64;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -5403,49 +5403,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             return 1;
         }
         switch (op) {
-        case NEON_3R_SHA:
-            /* The SHA-1/SHA-256 3-register instructions require special
-             * treatment here, as their size field is overloaded as an
-             * op type selector, and they all consume their input in a
-             * single pass.
-             */
-            if (!q) {
-                return 1;
-            }
-            if (!u) { /* SHA-1 */
-                if (!dc_isar_feature(aa32_sha1, s)) {
-                    return 1;
-                }
-                ptr1 = vfp_reg_ptr(true, rd);
-                ptr2 = vfp_reg_ptr(true, rn);
-                ptr3 = vfp_reg_ptr(true, rm);
-                tmp4 = tcg_const_i32(size);
-                gen_helper_crypto_sha1_3reg(ptr1, ptr2, ptr3, tmp4);
-                tcg_temp_free_i32(tmp4);
-            } else { /* SHA-256 */
-                if (!dc_isar_feature(aa32_sha2, s) || size == 3) {
-                    return 1;
-                }
-                ptr1 = vfp_reg_ptr(true, rd);
-                ptr2 = vfp_reg_ptr(true, rn);
-                ptr3 = vfp_reg_ptr(true, rm);
-                switch (size) {
-                case 0:
-                    gen_helper_crypto_sha256h(ptr1, ptr2, ptr3);
-                    break;
-                case 1:
-                    gen_helper_crypto_sha256h2(ptr1, ptr2, ptr3);
-                    break;
-                case 2:
-                    gen_helper_crypto_sha256su1(ptr1, ptr2, ptr3);
-                    break;
-                }
-            }
-            tcg_temp_free_ptr(ptr1);
-            tcg_temp_free_ptr(ptr2);
-            tcg_temp_free_ptr(ptr3);
-            return 0;
-
         case NEON_3R_VPADD_VQRDMLAH:
             if (!u) {
                 break;  /* VPADD */
@@ -5496,6 +5453,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VMUL:
         case NEON_3R_VML:
         case NEON_3R_VSHL:
+        case NEON_3R_SHA:
             /* Already handled by decodetree */
             return 1;
         }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 03/17] target/arm: Convert Neon 64-bit element 3-reg-same insns
  2020-05-12 16:38 [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree Peter Maydell
  2020-05-12 16:38 ` [PATCH v2 01/17] target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH " Peter Maydell
  2020-05-12 16:38 ` [PATCH v2 02/17] target/arm: Convert Neon 3-reg-same SHA " Peter Maydell
@ 2020-05-12 16:38 ` Peter Maydell
  2020-05-13 20:27   ` Richard Henderson
  2020-05-12 16:38 ` [PATCH v2 04/17] target/arm: Convert Neon VHADD " Peter Maydell
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2020-05-12 16:38 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the 64-bit element insns in the 3-reg-same group
to decodetree. This covers VQSHL, VRSHL and VQRSHL where
size==0b11.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       | 21 ++++++++++++++++++
 target/arm/translate-neon.inc.c | 24 +++++++++++++++++++++
 target/arm/translate.c          | 38 ++-------------------------------
 3 files changed, 47 insertions(+), 36 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 68b0a44793f..fe649038547 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -73,6 +73,27 @@ VCGE_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 1 .... @3same
 VSHL_S_3s        1111 001 0 0 . .. .... .... 0100 . . . 0 .... @3same_rev
 VSHL_U_3s        1111 001 1 0 . .. .... .... 0100 . . . 0 .... @3same_rev
 
+# Insns operating on 64-bit elements (size!=0b11 handled elsewhere)
+# The _rev suffix indicates that Vn and Vm are reversed. This is
+# the case for shifts. In the Arm ARM these insns are documented
+# with the Vm and Vn fields in their usual places, but in the
+# assembly the operands are listed "backwards", ie in the order
+# Dd, Dm, Dn where other insns use Dd, Dn, Dm. For QEMU we choose
+# to consider Vm and Vm as being in different fields in the insn,
+# which allows us to avoid special-casing shifts in the trans_
+# function code (where we would otherwise need to manually swap
+# the operands over to call Neon helper functions that are shared
+# with AArch64 which does not have this odd reversed-operand situation).
+@3same_64_rev    .... ... . . . 11 .... .... .... . q:1 . . .... \
+                 &3same vm=%vn_dp vn=%vm_dp vd=%vd_dp size=3
+
+VQSHL_S64_3s     1111 001 0 0 . .. .... .... 0100 . . . 1 .... @3same_64_rev
+VQSHL_U64_3s     1111 001 1 0 . .. .... .... 0100 . . . 1 .... @3same_64_rev
+VRSHL_S64_3s     1111 001 0 0 . .. .... .... 0101 . . . 0 .... @3same_64_rev
+VRSHL_U64_3s     1111 001 1 0 . .. .... .... 0101 . . . 0 .... @3same_64_rev
+VQRSHL_S64_3s    1111 001 0 0 . .. .... .... 0101 . . . 1 .... @3same_64_rev
+VQRSHL_U64_3s    1111 001 1 0 . .. .... .... 0101 . . . 1 .... @3same_64_rev
+
 VMAX_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
 VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
 VMIN_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 03b3337e460..05c6dcdc9b9 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -825,3 +825,27 @@ static bool trans_SHA256SU1_3s(DisasContext *s, arg_SHA256SU1_3s *a)
 
     return true;
 }
+
+#define DO_3SAME_64(INSN, FUNC)                                         \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        static const GVecGen3 op = { .fni8 = FUNC };                    \
+        tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &op);      \
+    }                                                                   \
+    DO_3SAME(INSN, gen_##INSN##_3s)
+
+#define DO_3SAME_64_ENV(INSN, FUNC)                                     \
+    static void gen_##INSN##_elt(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m)    \
+    {                                                                   \
+        FUNC(d, cpu_env, n, m);                                         \
+    }                                                                   \
+    DO_3SAME_64(INSN, gen_##INSN##_elt)
+
+DO_3SAME_64(VRSHL_S64, gen_helper_neon_rshl_s64)
+DO_3SAME_64(VRSHL_U64, gen_helper_neon_rshl_u64)
+DO_3SAME_64_ENV(VQSHL_S64, gen_helper_neon_qshl_s64)
+DO_3SAME_64_ENV(VQSHL_U64, gen_helper_neon_qshl_u64)
+DO_3SAME_64_ENV(VQRSHL_S64, gen_helper_neon_qrshl_s64)
+DO_3SAME_64_ENV(VQRSHL_U64, gen_helper_neon_qrshl_u64)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index ee2b8d6f6e3..1ce3e182867 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5459,42 +5459,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         }
 
         if (size == 3) {
-            /* 64-bit element instructions. */
-            for (pass = 0; pass < (q ? 2 : 1); pass++) {
-                neon_load_reg64(cpu_V0, rn + pass);
-                neon_load_reg64(cpu_V1, rm + pass);
-                switch (op) {
-                case NEON_3R_VQSHL:
-                    if (u) {
-                        gen_helper_neon_qshl_u64(cpu_V0, cpu_env,
-                                                 cpu_V1, cpu_V0);
-                    } else {
-                        gen_helper_neon_qshl_s64(cpu_V0, cpu_env,
-                                                 cpu_V1, cpu_V0);
-                    }
-                    break;
-                case NEON_3R_VRSHL:
-                    if (u) {
-                        gen_helper_neon_rshl_u64(cpu_V0, cpu_V1, cpu_V0);
-                    } else {
-                        gen_helper_neon_rshl_s64(cpu_V0, cpu_V1, cpu_V0);
-                    }
-                    break;
-                case NEON_3R_VQRSHL:
-                    if (u) {
-                        gen_helper_neon_qrshl_u64(cpu_V0, cpu_env,
-                                                  cpu_V1, cpu_V0);
-                    } else {
-                        gen_helper_neon_qrshl_s64(cpu_V0, cpu_env,
-                                                  cpu_V1, cpu_V0);
-                    }
-                    break;
-                default:
-                    abort();
-                }
-                neon_store_reg64(cpu_V0, rd + pass);
-            }
-            return 0;
+            /* 64-bit element instructions: handled by decodetree */
+            return 1;
         }
         pairwise = 0;
         switch (op) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 04/17] target/arm: Convert Neon VHADD 3-reg-same insns
  2020-05-12 16:38 [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree Peter Maydell
                   ` (2 preceding siblings ...)
  2020-05-12 16:38 ` [PATCH v2 03/17] target/arm: Convert Neon 64-bit element 3-reg-same insns Peter Maydell
@ 2020-05-12 16:38 ` Peter Maydell
  2020-05-13 16:47   ` Richard Henderson
  2020-05-12 16:38 ` [PATCH v2 05/17] target/arm: Convert Neon VABA/VABD 3-reg-same to decodetree Peter Maydell
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2020-05-12 16:38 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon VHADD insns in the 3-reg-same group to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  2 ++
 target/arm/translate-neon.inc.c | 24 ++++++++++++++++++++++++
 target/arm/translate.c          |  4 +---
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index fe649038547..3432aacfe90 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -42,6 +42,8 @@
 @3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
+VHADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 0 .... @3same
+VHADD_U_3s       1111 001 1 0 . .. .... .... 0000 . . . 0 .... @3same
 VQADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 1 .... @3same
 VQADD_U_3s       1111 001 1 0 . .. .... .... 0000 . . . 1 .... @3same
 
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 05c6dcdc9b9..0418a84a7de 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -849,3 +849,27 @@ DO_3SAME_64_ENV(VQSHL_S64, gen_helper_neon_qshl_s64)
 DO_3SAME_64_ENV(VQSHL_U64, gen_helper_neon_qshl_u64)
 DO_3SAME_64_ENV(VQRSHL_S64, gen_helper_neon_qrshl_s64)
 DO_3SAME_64_ENV(VQRSHL_U64, gen_helper_neon_qrshl_u64)
+
+#define DO_3SAME_32(INSN, FUNC)                                         \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        static const GVecGen3 ops[4] = {                                \
+            { .fni4 = gen_helper_neon_##FUNC##8 },                      \
+            { .fni4 = gen_helper_neon_##FUNC##16 },                     \
+            { .fni4 = gen_helper_neon_##FUNC##32 },                     \
+            { 0 },                                                      \
+        };                                                              \
+        tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &ops[vece]); \
+    }                                                                   \
+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
+    {                                                                   \
+        if (a->size > 2) {                                              \
+            return false;                                               \
+        }                                                               \
+        return do_3same(s, a, gen_##INSN##_3s);                         \
+    }
+
+DO_3SAME_32(VHADD_S, hadd_s)
+DO_3SAME_32(VHADD_U, hadd_u)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 1ce3e182867..8d856ccfe96 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5454,6 +5454,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VML:
         case NEON_3R_VSHL:
         case NEON_3R_SHA:
+        case NEON_3R_VHADD:
             /* Already handled by decodetree */
             return 1;
         }
@@ -5534,9 +5535,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             tmp2 = neon_load_reg(rm, pass);
         }
         switch (op) {
-        case NEON_3R_VHADD:
-            GEN_NEON_INTEGER_OP(hadd);
-            break;
         case NEON_3R_VRHADD:
             GEN_NEON_INTEGER_OP(rhadd);
             break;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 05/17] target/arm: Convert Neon VABA/VABD 3-reg-same to decodetree
  2020-05-12 16:38 [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree Peter Maydell
                   ` (3 preceding siblings ...)
  2020-05-12 16:38 ` [PATCH v2 04/17] target/arm: Convert Neon VHADD " Peter Maydell
@ 2020-05-12 16:38 ` Peter Maydell
  2020-05-13 16:48   ` Richard Henderson
  2020-05-12 16:38 ` [PATCH v2 06/17] target/arm: Convert Neon VRHADD, VHSUB 3-reg-same insns " Peter Maydell
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2020-05-12 16:38 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon VABA and VABD insns in the 3-reg-same group to
decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  6 ++++++
 target/arm/translate-neon.inc.c |  4 ++++
 target/arm/translate.c          | 22 ++--------------------
 3 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 3432aacfe90..25c08a4170d 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -101,6 +101,12 @@ VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
 VMIN_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
 VMIN_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 1 .... @3same
 
+VABD_S_3s        1111 001 0 0 . .. .... .... 0111 . . . 0 .... @3same
+VABD_U_3s        1111 001 1 0 . .. .... .... 0111 . . . 0 .... @3same
+
+VABA_S_3s        1111 001 0 0 . .. .... .... 0111 . . . 1 .... @3same
+VABA_U_3s        1111 001 1 0 . .. .... .... 0111 . . . 1 .... @3same
+
 VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
 VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
 
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 0418a84a7de..20f0f2c8d83 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -641,6 +641,10 @@ DO_3SAME_NO_SZ_3(VMUL, tcg_gen_gvec_mul)
 DO_3SAME_NO_SZ_3(VMLA, gen_gvec_mla)
 DO_3SAME_NO_SZ_3(VMLS, gen_gvec_mls)
 DO_3SAME_NO_SZ_3(VTST, gen_gvec_cmtst)
+DO_3SAME_NO_SZ_3(VABD_S, gen_gvec_sabd)
+DO_3SAME_NO_SZ_3(VABA_S, gen_gvec_saba)
+DO_3SAME_NO_SZ_3(VABD_U, gen_gvec_uabd)
+DO_3SAME_NO_SZ_3(VABA_U, gen_gvec_uaba)
 
 #define DO_3SAME_CMP(INSN, COND)                                        \
     static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 8d856ccfe96..2c842df4451 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5421,26 +5421,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             /* VQRDMLSH : handled by decodetree */
             return 1;
 
-        case NEON_3R_VABD:
-            if (u) {
-                gen_gvec_uabd(size, rd_ofs, rn_ofs, rm_ofs,
-                              vec_size, vec_size);
-            } else {
-                gen_gvec_sabd(size, rd_ofs, rn_ofs, rm_ofs,
-                              vec_size, vec_size);
-            }
-            return 0;
-
-        case NEON_3R_VABA:
-            if (u) {
-                gen_gvec_uaba(size, rd_ofs, rn_ofs, rm_ofs,
-                              vec_size, vec_size);
-            } else {
-                gen_gvec_saba(size, rd_ofs, rn_ofs, rm_ofs,
-                              vec_size, vec_size);
-            }
-            return 0;
-
         case NEON_3R_VADD_VSUB:
         case NEON_3R_LOGIC:
         case NEON_3R_VMAX:
@@ -5455,6 +5435,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VSHL:
         case NEON_3R_SHA:
         case NEON_3R_VHADD:
+        case NEON_3R_VABD:
+        case NEON_3R_VABA:
             /* Already handled by decodetree */
             return 1;
         }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 06/17] target/arm: Convert Neon VRHADD, VHSUB 3-reg-same insns to decodetree
  2020-05-12 16:38 [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree Peter Maydell
                   ` (4 preceding siblings ...)
  2020-05-12 16:38 ` [PATCH v2 05/17] target/arm: Convert Neon VABA/VABD 3-reg-same to decodetree Peter Maydell
@ 2020-05-12 16:38 ` Peter Maydell
  2020-05-13 16:50   ` Richard Henderson
  2020-05-12 16:38 ` [PATCH v2 07/17] target/arm: Convert Neon VQSHL, VRSHL, VQRSHL " Peter Maydell
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2020-05-12 16:38 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon VRHADD and VHSUB 3-reg-same insns to decodetree.
(These are all the other insns in 3-reg-same which were using
GEN_NEON_INTEGER_OP() and which are not pairwise or
reversed-operands.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       | 6 ++++++
 target/arm/translate-neon.inc.c | 4 ++++
 target/arm/translate.c          | 8 ++------
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 25c08a4170d..353cb1f4992 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -47,6 +47,9 @@ VHADD_U_3s       1111 001 1 0 . .. .... .... 0000 . . . 0 .... @3same
 VQADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 1 .... @3same
 VQADD_U_3s       1111 001 1 0 . .. .... .... 0000 . . . 1 .... @3same
 
+VRHADD_S_3s      1111 001 0 0 . .. .... .... 0001 . . . 0 .... @3same
+VRHADD_U_3s      1111 001 1 0 . .. .... .... 0001 . . . 0 .... @3same
+
 @3same_logic     .... ... . . . .. .... .... .... . q:1 .. .... \
                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp size=0
 
@@ -59,6 +62,9 @@ VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
 VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
 VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
 
+VHSUB_S_3s       1111 001 0 0 . .. .... .... 0010 . . . 0 .... @3same
+VHSUB_U_3s       1111 001 1 0 . .. .... .... 0010 . . . 0 .... @3same
+
 VQSUB_S_3s       1111 001 0 0 . .. .... .... 0010 . . . 1 .... @3same
 VQSUB_U_3s       1111 001 1 0 . .. .... .... 0010 . . . 1 .... @3same
 
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 20f0f2c8d83..e9da47171c6 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -877,3 +877,7 @@ DO_3SAME_64_ENV(VQRSHL_U64, gen_helper_neon_qrshl_u64)
 
 DO_3SAME_32(VHADD_S, hadd_s)
 DO_3SAME_32(VHADD_U, hadd_u)
+DO_3SAME_32(VHSUB_S, hsub_s)
+DO_3SAME_32(VHSUB_U, hsub_u)
+DO_3SAME_32(VRHADD_S, rhadd_s)
+DO_3SAME_32(VRHADD_U, rhadd_u)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 2c842df4451..ebb899d846e 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5435,6 +5435,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VSHL:
         case NEON_3R_SHA:
         case NEON_3R_VHADD:
+        case NEON_3R_VRHADD:
+        case NEON_3R_VHSUB:
         case NEON_3R_VABD:
         case NEON_3R_VABA:
             /* Already handled by decodetree */
@@ -5517,12 +5519,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             tmp2 = neon_load_reg(rm, pass);
         }
         switch (op) {
-        case NEON_3R_VRHADD:
-            GEN_NEON_INTEGER_OP(rhadd);
-            break;
-        case NEON_3R_VHSUB:
-            GEN_NEON_INTEGER_OP(hsub);
-            break;
         case NEON_3R_VQSHL:
             GEN_NEON_INTEGER_OP_ENV(qshl);
             break;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 07/17] target/arm: Convert Neon VQSHL, VRSHL, VQRSHL 3-reg-same insns to decodetree
  2020-05-12 16:38 [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree Peter Maydell
                   ` (5 preceding siblings ...)
  2020-05-12 16:38 ` [PATCH v2 06/17] target/arm: Convert Neon VRHADD, VHSUB 3-reg-same insns " Peter Maydell
@ 2020-05-12 16:38 ` Peter Maydell
  2020-05-13 16:55   ` Richard Henderson
  2020-05-12 16:38 ` [PATCH v2 08/17] target/arm: Convert Neon VPMAX/VPMIN " Peter Maydell
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2020-05-12 16:38 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the VQSHL, VRSHL and VQRSHL insns in the 3-reg-same
group to decodetree. We have already implemented the size==0b11
case of these insns; this commit handles the remaining sizes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       | 47 +++++++++++++++++++++++----------
 target/arm/translate-neon.inc.c | 43 ++++++++++++++++++++++++++++++
 target/arm/translate.c          | 22 +++------------
 3 files changed, 79 insertions(+), 33 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 353cb1f4992..e0da5a39fc2 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -74,14 +74,6 @@ VCGE_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
 VCGE_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 1 .... @3same
 
 # The shift operations are of the form Vd = Vm << Vn.
-# By reversing the names of the fields here, we can use standard expanders.
-@3same_rev       .... ... . . . size:2 .... .... .... . q:1 . . .... \
-                 &3same vn=%vm_dp vm=%vn_dp vd=%vd_dp
-
-VSHL_S_3s        1111 001 0 0 . .. .... .... 0100 . . . 0 .... @3same_rev
-VSHL_U_3s        1111 001 1 0 . .. .... .... 0100 . . . 0 .... @3same_rev
-
-# Insns operating on 64-bit elements (size!=0b11 handled elsewhere)
 # The _rev suffix indicates that Vn and Vm are reversed. This is
 # the case for shifts. In the Arm ARM these insns are documented
 # with the Vm and Vn fields in their usual places, but in the
@@ -92,15 +84,42 @@ VSHL_U_3s        1111 001 1 0 . .. .... .... 0100 . . . 0 .... @3same_rev
 # function code (where we would otherwise need to manually swap
 # the operands over to call Neon helper functions that are shared
 # with AArch64 which does not have this odd reversed-operand situation).
+@3same_rev       .... ... . . . size:2 .... .... .... . q:1 . . .... \
+                 &3same vn=%vm_dp vm=%vn_dp vd=%vd_dp
+
+VSHL_S_3s        1111 001 0 0 . .. .... .... 0100 . . . 0 .... @3same_rev
+VSHL_U_3s        1111 001 1 0 . .. .... .... 0100 . . . 0 .... @3same_rev
+
+# Insns operating on 64-bit elements (size!=0b11 handled elsewhere)
+# The _rev suffix indicates that Vn and Vm are reversed (as explained
+# by the comment for the @3same_rev format).
 @3same_64_rev    .... ... . . . 11 .... .... .... . q:1 . . .... \
                  &3same vm=%vn_dp vn=%vm_dp vd=%vd_dp size=3
 
-VQSHL_S64_3s     1111 001 0 0 . .. .... .... 0100 . . . 1 .... @3same_64_rev
-VQSHL_U64_3s     1111 001 1 0 . .. .... .... 0100 . . . 1 .... @3same_64_rev
-VRSHL_S64_3s     1111 001 0 0 . .. .... .... 0101 . . . 0 .... @3same_64_rev
-VRSHL_U64_3s     1111 001 1 0 . .. .... .... 0101 . . . 0 .... @3same_64_rev
-VQRSHL_S64_3s    1111 001 0 0 . .. .... .... 0101 . . . 1 .... @3same_64_rev
-VQRSHL_U64_3s    1111 001 1 0 . .. .... .... 0101 . . . 1 .... @3same_64_rev
+{
+  VQSHL_S64_3s   1111 001 0 0 . .. .... .... 0100 . . . 1 .... @3same_64_rev
+  VQSHL_S_3s     1111 001 0 0 . .. .... .... 0100 . . . 1 .... @3same_rev
+}
+{
+  VQSHL_U64_3s   1111 001 1 0 . .. .... .... 0100 . . . 1 .... @3same_64_rev
+  VQSHL_U_3s     1111 001 1 0 . .. .... .... 0100 . . . 1 .... @3same_rev
+}
+{
+  VRSHL_S64_3s   1111 001 0 0 . .. .... .... 0101 . . . 0 .... @3same_64_rev
+  VRSHL_S_3s     1111 001 0 0 . .. .... .... 0101 . . . 0 .... @3same_rev
+}
+{
+  VRSHL_U64_3s   1111 001 1 0 . .. .... .... 0101 . . . 0 .... @3same_64_rev
+  VRSHL_U_3s     1111 001 1 0 . .. .... .... 0101 . . . 0 .... @3same_rev
+}
+{
+  VQRSHL_S64_3s  1111 001 0 0 . .. .... .... 0101 . . . 1 .... @3same_64_rev
+  VQRSHL_S_3s    1111 001 0 0 . .. .... .... 0101 . . . 1 .... @3same_rev
+}
+{
+  VQRSHL_U64_3s  1111 001 1 0 . .. .... .... 0101 . . . 1 .... @3same_64_rev
+  VQRSHL_U_3s    1111 001 1 0 . .. .... .... 0101 . . . 1 .... @3same_rev
+}
 
 VMAX_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
 VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index e9da47171c6..7097c18f334 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -875,9 +875,52 @@ DO_3SAME_64_ENV(VQRSHL_U64, gen_helper_neon_qrshl_u64)
         return do_3same(s, a, gen_##INSN##_3s);                         \
     }
 
+/*
+ * Some helper functions need to be passed the cpu_env. In order
+ * to use those with the gvec APIs like tcg_gen_gvec_3() we need
+ * to create wrapper functions whose prototype is a NeonGenTwoOpFn()
+ * and which call a NeonGenTwoOpEnvFn().
+ */
+#define WRAP_ENV_FN(WRAPNAME, FUNC)                                     \
+    static void WRAPNAME(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m)            \
+    {                                                                   \
+        FUNC(d, cpu_env, n, m);                                         \
+    }
+
+#define DO_3SAME_32_ENV(INSN, FUNC)                                     \
+    WRAP_ENV_FN(gen_##INSN##_tramp8, gen_helper_neon_##FUNC##8);        \
+    WRAP_ENV_FN(gen_##INSN##_tramp16, gen_helper_neon_##FUNC##16);      \
+    WRAP_ENV_FN(gen_##INSN##_tramp32, gen_helper_neon_##FUNC##32);      \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        static const GVecGen3 ops[4] = {                                \
+            { .fni4 = gen_##INSN##_tramp8 },                            \
+            { .fni4 = gen_##INSN##_tramp16 },                           \
+            { .fni4 = gen_##INSN##_tramp32 },                           \
+            { 0 },                                                      \
+        };                                                              \
+        tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &ops[vece]); \
+    }                                                                   \
+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
+    {                                                                   \
+        if (a->size > 2) {                                              \
+            return false;                                               \
+        }                                                               \
+        return do_3same(s, a, gen_##INSN##_3s);                         \
+    }
+
 DO_3SAME_32(VHADD_S, hadd_s)
 DO_3SAME_32(VHADD_U, hadd_u)
 DO_3SAME_32(VHSUB_S, hsub_s)
 DO_3SAME_32(VHSUB_U, hsub_u)
 DO_3SAME_32(VRHADD_S, rhadd_s)
 DO_3SAME_32(VRHADD_U, rhadd_u)
+DO_3SAME_32(VRSHL_S, rshl_s)
+DO_3SAME_32(VRSHL_U, rshl_u)
+
+DO_3SAME_32_ENV(VQSHL_S, qshl_s)
+DO_3SAME_32_ENV(VQSHL_U, qshl_u)
+DO_3SAME_32_ENV(VQRSHL_S, qrshl_s)
+DO_3SAME_32_ENV(VQRSHL_U, qrshl_u)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index ebb899d846e..3aabb18720b 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5439,6 +5439,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VHSUB:
         case NEON_3R_VABD:
         case NEON_3R_VABA:
+        case NEON_3R_VQSHL:
+        case NEON_3R_VRSHL:
+        case NEON_3R_VQRSHL:
             /* Already handled by decodetree */
             return 1;
         }
@@ -5449,17 +5452,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         }
         pairwise = 0;
         switch (op) {
-        case NEON_3R_VQSHL:
-        case NEON_3R_VRSHL:
-        case NEON_3R_VQRSHL:
-            {
-                int rtmp;
-                /* Shift instruction operands are reversed.  */
-                rtmp = rn;
-                rn = rm;
-                rm = rtmp;
-            }
-            break;
         case NEON_3R_VPADD_VQRDMLAH:
         case NEON_3R_VPMAX:
         case NEON_3R_VPMIN:
@@ -5519,14 +5511,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             tmp2 = neon_load_reg(rm, pass);
         }
         switch (op) {
-        case NEON_3R_VQSHL:
-            GEN_NEON_INTEGER_OP_ENV(qshl);
-            break;
-        case NEON_3R_VRSHL:
-            GEN_NEON_INTEGER_OP(rshl);
-            break;
-        case NEON_3R_VQRSHL:
-            GEN_NEON_INTEGER_OP_ENV(qrshl);
             break;
         case NEON_3R_VPMAX:
             GEN_NEON_INTEGER_OP(pmax);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 08/17] target/arm: Convert Neon VPMAX/VPMIN 3-reg-same insns to decodetree
  2020-05-12 16:38 [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree Peter Maydell
                   ` (6 preceding siblings ...)
  2020-05-12 16:38 ` [PATCH v2 07/17] target/arm: Convert Neon VQSHL, VRSHL, VQRSHL " Peter Maydell
@ 2020-05-12 16:38 ` Peter Maydell
  2020-05-12 16:38 ` [PATCH v2 09/17] target/arm: Convert Neon VPADD " Peter Maydell
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2020-05-12 16:38 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon integer VPMAX and VPMIN 3-reg-same insns to
decodetree. These are 'pairwise' operations.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  9 +++++
 target/arm/translate-neon.inc.c | 71 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 17 +-------
 3 files changed, 82 insertions(+), 15 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index e0da5a39fc2..03f39a0b44b 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -42,6 +42,9 @@
 @3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
+@3same_q0        .... ... . . . size:2 .... .... .... . 0 . . .... \
+                 &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp q=0
+
 VHADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 0 .... @3same
 VHADD_U_3s       1111 001 1 0 . .. .... .... 0000 . . . 0 .... @3same
 VQADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 1 .... @3same
@@ -144,6 +147,12 @@ VMLS_3s          1111 001 1 0 . .. .... .... 1001 . . . 0 .... @3same
 VMUL_3s          1111 001 0 0 . .. .... .... 1001 . . . 1 .... @3same
 VMUL_p_3s        1111 001 1 0 . .. .... .... 1001 . . . 1 .... @3same
 
+VPMAX_S_3s       1111 001 0 0 . .. .... .... 1010 . . . 0 .... @3same_q0
+VPMAX_U_3s       1111 001 1 0 . .. .... .... 1010 . . . 0 .... @3same_q0
+
+VPMIN_S_3s       1111 001 0 0 . .. .... .... 1010 . . . 1 .... @3same_q0
+VPMIN_U_3s       1111 001 1 0 . .. .... .... 1010 . . . 1 .... @3same_q0
+
 VQRDMLAH_3s      1111 001 1 0 . .. .... .... 1011 ... 1 .... @3same
 
 SHA1_3s          1111 001 0 0 . optype:2 .... .... 1100 . 1 . 0 .... \
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 7097c18f334..7db6b856598 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -924,3 +924,74 @@ DO_3SAME_32_ENV(VQSHL_S, qshl_s)
 DO_3SAME_32_ENV(VQSHL_U, qshl_u)
 DO_3SAME_32_ENV(VQRSHL_S, qrshl_s)
 DO_3SAME_32_ENV(VQRSHL_U, qrshl_u)
+
+static bool do_3same_pair(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
+{
+    /* Operations handled pairwise 32 bits at a time */
+    TCGv_i32 tmp, tmp2, tmp3;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (a->size == 3) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    assert(a->q == 0); /* enforced by decode patterns */
+
+    /*
+     * Note that we have to be careful not to clobber the source operands
+     * in the "vm == vd" case by storing the result of the first pass too
+     * early. Since Q is 0 there are always just two passes, so instead
+     * of a complicated loop over each pass we just unroll.
+     */
+    tmp = neon_load_reg(a->vn, 0);
+    tmp2 = neon_load_reg(a->vn, 1);
+    fn(tmp, tmp, tmp2);
+    tcg_temp_free_i32(tmp2);
+
+    tmp3 = neon_load_reg(a->vm, 0);
+    tmp2 = neon_load_reg(a->vm, 1);
+    fn(tmp3, tmp3, tmp2);
+    tcg_temp_free_i32(tmp2);
+
+    neon_store_reg(a->vd, 0, tmp);
+    neon_store_reg(a->vd, 1, tmp3);
+    return true;
+}
+
+#define DO_3SAME_PAIR(INSN, func)                                       \
+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
+    {                                                                   \
+        static NeonGenTwoOpFn * const fns[] = {                         \
+            gen_helper_neon_##func##8,                                  \
+            gen_helper_neon_##func##16,                                 \
+            gen_helper_neon_##func##32,                                 \
+        };                                                              \
+        if (a->size > 2) {                                              \
+            return false;                                               \
+        }                                                               \
+        return do_3same_pair(s, a, fns[a->size]);                       \
+    }
+
+/* 32-bit pairwise ops end up the same as the elementwise versions.  */
+#define gen_helper_neon_pmax_s32  tcg_gen_smax_i32
+#define gen_helper_neon_pmax_u32  tcg_gen_umax_i32
+#define gen_helper_neon_pmin_s32  tcg_gen_smin_i32
+#define gen_helper_neon_pmin_u32  tcg_gen_umin_i32
+
+DO_3SAME_PAIR(VPMAX_S, pmax_s)
+DO_3SAME_PAIR(VPMIN_S, pmin_s)
+DO_3SAME_PAIR(VPMAX_U, pmax_u)
+DO_3SAME_PAIR(VPMIN_U, pmin_u)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 3aabb18720b..82be4d40282 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3011,12 +3011,6 @@ static inline void gen_neon_rsb(int size, TCGv_i32 t0, TCGv_i32 t1)
     }
 }
 
-/* 32-bit pairwise ops end up the same as the elementwise versions.  */
-#define gen_helper_neon_pmax_s32  tcg_gen_smax_i32
-#define gen_helper_neon_pmax_u32  tcg_gen_umax_i32
-#define gen_helper_neon_pmin_s32  tcg_gen_smin_i32
-#define gen_helper_neon_pmin_u32  tcg_gen_umin_i32
-
 #define GEN_NEON_INTEGER_OP_ENV(name) do { \
     switch ((size << 1) | u) { \
     case 0: \
@@ -5442,6 +5436,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VQSHL:
         case NEON_3R_VRSHL:
         case NEON_3R_VQRSHL:
+        case NEON_3R_VPMAX:
+        case NEON_3R_VPMIN:
             /* Already handled by decodetree */
             return 1;
         }
@@ -5453,8 +5449,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         pairwise = 0;
         switch (op) {
         case NEON_3R_VPADD_VQRDMLAH:
-        case NEON_3R_VPMAX:
-        case NEON_3R_VPMIN:
             pairwise = 1;
             break;
         case NEON_3R_FLOAT_ARITH:
@@ -5511,13 +5505,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             tmp2 = neon_load_reg(rm, pass);
         }
         switch (op) {
-            break;
-        case NEON_3R_VPMAX:
-            GEN_NEON_INTEGER_OP(pmax);
-            break;
-        case NEON_3R_VPMIN:
-            GEN_NEON_INTEGER_OP(pmin);
-            break;
         case NEON_3R_VQDMULH_VQRDMULH: /* Multiply high.  */
             if (!u) { /* VQDMULH */
                 switch (size) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 09/17] target/arm: Convert Neon VPADD 3-reg-same insns to decodetree
  2020-05-12 16:38 [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree Peter Maydell
                   ` (7 preceding siblings ...)
  2020-05-12 16:38 ` [PATCH v2 08/17] target/arm: Convert Neon VPMAX/VPMIN " Peter Maydell
@ 2020-05-12 16:38 ` Peter Maydell
  2020-05-12 16:38 ` [PATCH v2 10/17] target/arm: Convert Neon VQDMULH/VQRDMULH 3-reg-same " Peter Maydell
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2020-05-12 16:38 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon integer VPADD 3-reg-same insns to decodetree.  These
are 'pairwise' operations.  (Note that VQRDMLAH, which shares the
same primary opcode but has U=1, has already been converted.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  2 ++
 target/arm/translate-neon.inc.c |  2 ++
 target/arm/translate.c          | 19 +------------------
 3 files changed, 5 insertions(+), 18 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 03f39a0b44b..9bbb2dd77e0 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -153,6 +153,8 @@ VPMAX_U_3s       1111 001 1 0 . .. .... .... 1010 . . . 0 .... @3same_q0
 VPMIN_S_3s       1111 001 0 0 . .. .... .... 1010 . . . 1 .... @3same_q0
 VPMIN_U_3s       1111 001 1 0 . .. .... .... 1010 . . . 1 .... @3same_q0
 
+VPADD_3s         1111 001 0 0 . .. .... .... 1011 . . . 1 .... @3same_q0
+
 VQRDMLAH_3s      1111 001 1 0 . .. .... .... 1011 ... 1 .... @3same
 
 SHA1_3s          1111 001 0 0 . optype:2 .... .... 1100 . 1 . 0 .... \
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 7db6b856598..e0137364075 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -990,8 +990,10 @@ static bool do_3same_pair(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
 #define gen_helper_neon_pmax_u32  tcg_gen_umax_i32
 #define gen_helper_neon_pmin_s32  tcg_gen_smin_i32
 #define gen_helper_neon_pmin_u32  tcg_gen_umin_i32
+#define gen_helper_neon_padd_u32  tcg_gen_add_i32
 
 DO_3SAME_PAIR(VPMAX_S, pmax_s)
 DO_3SAME_PAIR(VPMIN_S, pmin_s)
 DO_3SAME_PAIR(VPMAX_U, pmax_u)
 DO_3SAME_PAIR(VPMIN_U, pmin_u)
+DO_3SAME_PAIR(VPADD, padd_u)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 82be4d40282..ce30417014d 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5397,13 +5397,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             return 1;
         }
         switch (op) {
-        case NEON_3R_VPADD_VQRDMLAH:
-            if (!u) {
-                break;  /* VPADD */
-            }
-            /* VQRDMLAH : handled by decodetree */
-            return 1;
-
         case NEON_3R_VFM_VQRDMLSH:
             if (!u) {
                 /* VFM, VFMS */
@@ -5438,6 +5431,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VQRSHL:
         case NEON_3R_VPMAX:
         case NEON_3R_VPMIN:
+        case NEON_3R_VPADD_VQRDMLAH:
             /* Already handled by decodetree */
             return 1;
         }
@@ -5448,9 +5442,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         }
         pairwise = 0;
         switch (op) {
-        case NEON_3R_VPADD_VQRDMLAH:
-            pairwise = 1;
-            break;
         case NEON_3R_FLOAT_ARITH:
             pairwise = (u && size < 2); /* if VPADD (float) */
             break;
@@ -5528,14 +5519,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 }
             }
             break;
-        case NEON_3R_VPADD_VQRDMLAH:
-            switch (size) {
-            case 0: gen_helper_neon_padd_u8(tmp, tmp, tmp2); break;
-            case 1: gen_helper_neon_padd_u16(tmp, tmp, tmp2); break;
-            case 2: tcg_gen_add_i32(tmp, tmp, tmp2); break;
-            default: abort();
-            }
-            break;
         case NEON_3R_FLOAT_ARITH: /* Floating point arithmetic. */
         {
             TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 10/17] target/arm: Convert Neon VQDMULH/VQRDMULH 3-reg-same to decodetree
  2020-05-12 16:38 [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree Peter Maydell
                   ` (8 preceding siblings ...)
  2020-05-12 16:38 ` [PATCH v2 09/17] target/arm: Convert Neon VPADD " Peter Maydell
@ 2020-05-12 16:38 ` Peter Maydell
  2020-05-13 16:58   ` Richard Henderson
  2020-05-12 16:38 ` [PATCH v2 11/17] target/arm: Convert Neon VADD, VSUB, VABD 3-reg-same insns " Peter Maydell
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2020-05-12 16:38 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon VQDMULH and VQRDMULH 3-reg-same insns to
decodetree. These are the last integer operations in the
3-reg-same group.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  3 +++
 target/arm/translate-neon.inc.c | 24 ++++++++++++++++++++++++
 target/arm/translate.c          | 24 +-----------------------
 3 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 9bbb2dd77e0..28e8333335d 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -153,6 +153,9 @@ VPMAX_U_3s       1111 001 1 0 . .. .... .... 1010 . . . 0 .... @3same_q0
 VPMIN_S_3s       1111 001 0 0 . .. .... .... 1010 . . . 1 .... @3same_q0
 VPMIN_U_3s       1111 001 1 0 . .. .... .... 1010 . . . 1 .... @3same_q0
 
+VQDMULH_3s       1111 001 0 0 . .. .... .... 1011 . . . 0 .... @3same
+VQRDMULH_3s      1111 001 1 0 . .. .... .... 1011 . . . 0 .... @3same
+
 VPADD_3s         1111 001 0 0 . .. .... .... 1011 . . . 1 .... @3same_q0
 
 VQRDMLAH_3s      1111 001 1 0 . .. .... .... 1011 ... 1 .... @3same
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index e0137364075..f52302f42b1 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -997,3 +997,27 @@ DO_3SAME_PAIR(VPMIN_S, pmin_s)
 DO_3SAME_PAIR(VPMAX_U, pmax_u)
 DO_3SAME_PAIR(VPMIN_U, pmin_u)
 DO_3SAME_PAIR(VPADD, padd_u)
+
+#define DO_3SAME_VQDMULH(INSN, FUNC)                                    \
+    WRAP_ENV_FN(gen_##INSN##_tramp16, gen_helper_neon_##FUNC##_s16);    \
+    WRAP_ENV_FN(gen_##INSN##_tramp32, gen_helper_neon_##FUNC##_s32);    \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        static const GVecGen3 ops[2] = {                                \
+            { .fni4 = gen_##INSN##_tramp16 },                           \
+            { .fni4 = gen_##INSN##_tramp32 },                           \
+        };                                                              \
+        tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &ops[vece - 1]); \
+    }                                                                   \
+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
+    {                                                                   \
+        if (a->size != 1 && a->size != 2) {                             \
+            return false;                                               \
+        }                                                               \
+        return do_3same(s, a, gen_##INSN##_3s);                         \
+    }
+
+DO_3SAME_VQDMULH(VQDMULH, qdmulh)
+DO_3SAME_VQDMULH(VQRDMULH, qrdmulh)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index ce30417014d..561cb67286d 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5432,6 +5432,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VPMAX:
         case NEON_3R_VPMIN:
         case NEON_3R_VPADD_VQRDMLAH:
+        case NEON_3R_VQDMULH_VQRDMULH:
             /* Already handled by decodetree */
             return 1;
         }
@@ -5496,29 +5497,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             tmp2 = neon_load_reg(rm, pass);
         }
         switch (op) {
-        case NEON_3R_VQDMULH_VQRDMULH: /* Multiply high.  */
-            if (!u) { /* VQDMULH */
-                switch (size) {
-                case 1:
-                    gen_helper_neon_qdmulh_s16(tmp, cpu_env, tmp, tmp2);
-                    break;
-                case 2:
-                    gen_helper_neon_qdmulh_s32(tmp, cpu_env, tmp, tmp2);
-                    break;
-                default: abort();
-                }
-            } else { /* VQRDMULH */
-                switch (size) {
-                case 1:
-                    gen_helper_neon_qrdmulh_s16(tmp, cpu_env, tmp, tmp2);
-                    break;
-                case 2:
-                    gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
-                    break;
-                default: abort();
-                }
-            }
-            break;
         case NEON_3R_FLOAT_ARITH: /* Floating point arithmetic. */
         {
             TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 11/17] target/arm: Convert Neon VADD, VSUB, VABD 3-reg-same insns to decodetree
  2020-05-12 16:38 [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree Peter Maydell
                   ` (9 preceding siblings ...)
  2020-05-12 16:38 ` [PATCH v2 10/17] target/arm: Convert Neon VQDMULH/VQRDMULH 3-reg-same " Peter Maydell
@ 2020-05-12 16:38 ` Peter Maydell
  2020-05-13 20:06   ` Richard Henderson
  2020-05-12 16:38 ` [PATCH v2 12/17] target/arm: Convert Neon VPMIN/VPMAX/VPADD float " Peter Maydell
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2020-05-12 16:38 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon VADD, VSUB, VABD 3-reg-same insns to decodetree.
We already have gvec helpers for addition and subtraction, but must
add one for fabd.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h             |  3 ++-
 target/arm/neon-dp.decode       |  8 ++++++++
 target/arm/neon_helper.c        |  7 -------
 target/arm/translate-neon.inc.c | 28 ++++++++++++++++++++++++++++
 target/arm/translate.c          | 10 +++-------
 target/arm/vec_helper.c         |  7 +++++++
 6 files changed, 48 insertions(+), 15 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 1857f4ee46a..6e9629c87b0 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -396,7 +396,6 @@ DEF_HELPER_FLAGS_2(neon_qneg_s16, TCG_CALL_NO_RWG, i32, env, i32)
 DEF_HELPER_FLAGS_2(neon_qneg_s32, TCG_CALL_NO_RWG, i32, env, i32)
 DEF_HELPER_FLAGS_2(neon_qneg_s64, TCG_CALL_NO_RWG, i64, env, i64)
 
-DEF_HELPER_3(neon_abd_f32, i32, i32, i32, ptr)
 DEF_HELPER_3(neon_ceq_f32, i32, i32, i32, ptr)
 DEF_HELPER_3(neon_cge_f32, i32, i32, i32, ptr)
 DEF_HELPER_3(neon_cgt_f32, i32, i32, i32, ptr)
@@ -595,6 +594,8 @@ DEF_HELPER_FLAGS_5(gvec_fmul_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_fmul_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_fmul_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(gvec_fabd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 28e8333335d..06fb8b96aad 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -45,6 +45,10 @@
 @3same_q0        .... ... . . . size:2 .... .... .... . 0 . . .... \
                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp q=0
 
+# For FP insns the high bit of 'size' is used as part of opcode decode
+@3same_fp        .... ... . . . . size:1 .... .... .... . q:1 . . .... \
+                 &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
 VHADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 0 .... @3same
 VHADD_U_3s       1111 001 1 0 . .. .... .... 0000 . . . 0 .... @3same
 VQADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 1 .... @3same
@@ -170,3 +174,7 @@ SHA256SU1_3s     1111 001 1 0 . 10 .... .... 1100 . 1 . 0 .... \
                  vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
 VQRDMLSH_3s      1111 001 1 0 . .. .... .... 1100 ... 1 .... @3same
+
+VADD_fp_3s       1111 001 0 0 . 0 . .... .... 1101 ... 0 .... @3same_fp
+VSUB_fp_3s       1111 001 0 0 . 1 . .... .... 1101 ... 0 .... @3same_fp
+VABD_fp_3s       1111 001 1 0 . 1 . .... .... 1101 ... 0 .... @3same_fp
diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
index 2ef75e04c83..b637265691a 100644
--- a/target/arm/neon_helper.c
+++ b/target/arm/neon_helper.c
@@ -1825,13 +1825,6 @@ uint64_t HELPER(neon_qneg_s64)(CPUARMState *env, uint64_t x)
 }
 
 /* NEON Float helpers.  */
-uint32_t HELPER(neon_abd_f32)(uint32_t a, uint32_t b, void *fpstp)
-{
-    float_status *fpst = fpstp;
-    float32 f0 = make_float32(a);
-    float32 f1 = make_float32(b);
-    return float32_val(float32_abs(float32_sub(f0, f1, fpst)));
-}
 
 /* Floating point comparisons produce an integer result.
  * Note that EQ doesn't signal InvalidOp for QNaNs but GE and GT do.
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index f52302f42b1..540720f5e0d 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1021,3 +1021,31 @@ DO_3SAME_PAIR(VPADD, padd_u)
 
 DO_3SAME_VQDMULH(VQDMULH, qdmulh)
 DO_3SAME_VQDMULH(VQRDMULH, qrdmulh)
+
+/*
+ * For all the functions using this macro, size == 1 means fp16,
+ * which is an architecture extension we don't implement yet.
+ */
+#define DO_3S_FP_GVEC(INSN,FUNC)                                        \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        TCGv_ptr fpst = get_fpstatus_ptr(1);                            \
+        tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, fpst,                \
+                           oprsz, maxsz, 0, FUNC);                      \
+        tcg_temp_free_ptr(fpst);                                        \
+    }                                                                   \
+    static bool trans_##INSN##_fp_3s(DisasContext *s, arg_3same *a)     \
+    {                                                                   \
+        if (a->size != 0) {                                             \
+            /* TODO fp16 support */                                     \
+            return false;                                               \
+        }                                                               \
+        return do_3same(s, a, gen_##INSN##_3s);                         \
+    }
+
+
+DO_3S_FP_GVEC(VADD, gen_helper_gvec_fadd_s)
+DO_3S_FP_GVEC(VSUB, gen_helper_gvec_fsub_s)
+DO_3S_FP_GVEC(VABD, gen_helper_gvec_fabd_s)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 561cb67286d..8a94856cd28 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5445,6 +5445,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         switch (op) {
         case NEON_3R_FLOAT_ARITH:
             pairwise = (u && size < 2); /* if VPADD (float) */
+            if (!pairwise) {
+                return 1; /* handled by decodetree */
+            }
             break;
         case NEON_3R_FLOAT_MINMAX:
             pairwise = u; /* if VPMIN/VPMAX (float) */
@@ -5501,16 +5504,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         {
             TCGv_ptr fpstatus = get_fpstatus_ptr(1);
             switch ((u << 2) | size) {
-            case 0: /* VADD */
             case 4: /* VPADD */
                 gen_helper_vfp_adds(tmp, tmp, tmp2, fpstatus);
                 break;
-            case 2: /* VSUB */
-                gen_helper_vfp_subs(tmp, tmp, tmp2, fpstatus);
-                break;
-            case 6: /* VABD */
-                gen_helper_neon_abd_f32(tmp, tmp, tmp2, fpstatus);
-                break;
             default:
                 abort();
             }
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index fa33df859e5..50a499299fd 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -691,6 +691,11 @@ static float64 float64_ftsmul(float64 op1, uint64_t op2, float_status *stat)
     return result;
 }
 
+static float32 float32_abd(float32 op1, float32 op2, float_status *stat)
+{
+    return float32_abs(float32_sub(op1, op2, stat));
+}
+
 #define DO_3OP(NAME, FUNC, TYPE) \
 void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
 {                                                                          \
@@ -718,6 +723,8 @@ DO_3OP(gvec_ftsmul_h, float16_ftsmul, float16)
 DO_3OP(gvec_ftsmul_s, float32_ftsmul, float32)
 DO_3OP(gvec_ftsmul_d, float64_ftsmul, float64)
 
+DO_3OP(gvec_fabd_s, float32_abd, float32)
+
 #ifdef TARGET_AARCH64
 
 DO_3OP(gvec_recps_h, helper_recpsf_f16, float16)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 12/17] target/arm: Convert Neon VPMIN/VPMAX/VPADD float 3-reg-same insns to decodetree
  2020-05-12 16:38 [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree Peter Maydell
                   ` (10 preceding siblings ...)
  2020-05-12 16:38 ` [PATCH v2 11/17] target/arm: Convert Neon VADD, VSUB, VABD 3-reg-same insns " Peter Maydell
@ 2020-05-12 16:38 ` Peter Maydell
  2020-05-12 16:39 ` [PATCH v2 13/17] target/arm: Convert Neon fp VMUL, VMLA, VMLS " Peter Maydell
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2020-05-12 16:38 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon float VPMIN, VPMAX and VPADD 3-reg-same insns to
decodetree. These are the only remaining 'pairwise' operations,
so we can delete the pairwise-specific bits of the old decoder's
for-each-element loop now.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  5 +++
 target/arm/translate-neon.inc.c | 63 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 63 +++++----------------------------
 3 files changed, 76 insertions(+), 55 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 06fb8b96aad..1f289750dcd 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -48,6 +48,8 @@
 # For FP insns the high bit of 'size' is used as part of opcode decode
 @3same_fp        .... ... . . . . size:1 .... .... .... . q:1 . . .... \
                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
+@3same_fp_q0     .... ... . . . . size:1 .... .... .... . 0 . . .... \
+                 &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp q=0
 
 VHADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 0 .... @3same
 VHADD_U_3s       1111 001 1 0 . .. .... .... 0000 . . . 0 .... @3same
@@ -177,4 +179,7 @@ VQRDMLSH_3s      1111 001 1 0 . .. .... .... 1100 ... 1 .... @3same
 
 VADD_fp_3s       1111 001 0 0 . 0 . .... .... 1101 ... 0 .... @3same_fp
 VSUB_fp_3s       1111 001 0 0 . 1 . .... .... 1101 ... 0 .... @3same_fp
+VPADD_fp_3s      1111 001 1 0 . 0 . .... .... 1101 ... 0 .... @3same_fp_q0
 VABD_fp_3s       1111 001 1 0 . 1 . .... .... 1101 ... 0 .... @3same_fp
+VPMAX_fp_3s      1111 001 1 0 . 0 . .... .... 1111 ... 0 .... @3same_fp_q0
+VPMIN_fp_3s      1111 001 1 0 . 1 . .... .... 1111 ... 0 .... @3same_fp_q0
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 540720f5e0d..7bdf1e3fee8 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1049,3 +1049,66 @@ DO_3SAME_VQDMULH(VQRDMULH, qrdmulh)
 DO_3S_FP_GVEC(VADD, gen_helper_gvec_fadd_s)
 DO_3S_FP_GVEC(VSUB, gen_helper_gvec_fsub_s)
 DO_3S_FP_GVEC(VABD, gen_helper_gvec_fabd_s)
+
+static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
+{
+    /* FP operations handled pairwise 32 bits at a time */
+    TCGv_i32 tmp, tmp2, tmp3;
+    TCGv_ptr fpstatus;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    assert(a->q == 0); /* enforced by decode patterns */
+
+    /*
+     * Note that we have to be careful not to clobber the source operands
+     * in the "vm == vd" case by storing the result of the first pass too
+     * early. Since Q is 0 there are always just two passes, so instead
+     * of a complicated loop over each pass we just unroll.
+     */
+    fpstatus = get_fpstatus_ptr(1);
+    tmp = neon_load_reg(a->vn, 0);
+    tmp2 = neon_load_reg(a->vn, 1);
+    fn(tmp, tmp, tmp2, fpstatus);
+    tcg_temp_free_i32(tmp2);
+
+    tmp3 = neon_load_reg(a->vm, 0);
+    tmp2 = neon_load_reg(a->vm, 1);
+    fn(tmp3, tmp3, tmp2, fpstatus);
+    tcg_temp_free_i32(tmp2);
+    tcg_temp_free_ptr(fpstatus);
+
+    neon_store_reg(a->vd, 0, tmp);
+    neon_store_reg(a->vd, 1, tmp3);
+    return true;
+}
+
+/*
+ * For all the functions using this macro, size == 1 means fp16,
+ * which is an architecture extension we don't implement yet.
+ */
+#define DO_3S_FP_PAIR(INSN,FUNC)                                    \
+    static bool trans_##INSN##_fp_3s(DisasContext *s, arg_3same *a) \
+    {                                                               \
+        if (a->size != 0) {                                         \
+            /* TODO fp16 support */                                 \
+            return false;                                           \
+        }                                                           \
+        return do_3same_fp_pair(s, a, FUNC);                        \
+    }
+
+DO_3S_FP_PAIR(VPADD, gen_helper_vfp_adds)
+DO_3S_FP_PAIR(VPMAX, gen_helper_vfp_maxs)
+DO_3S_FP_PAIR(VPMIN, gen_helper_vfp_mins)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 8a94856cd28..ca6ed09ec34 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5348,7 +5348,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     int shift;
     int pass;
     int count;
-    int pairwise;
     int u;
     int vec_size;
     uint32_t imm;
@@ -5433,6 +5432,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VPMIN:
         case NEON_3R_VPADD_VQRDMLAH:
         case NEON_3R_VQDMULH_VQRDMULH:
+        case NEON_3R_FLOAT_ARITH:
             /* Already handled by decodetree */
             return 1;
         }
@@ -5441,16 +5441,11 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             /* 64-bit element instructions: handled by decodetree */
             return 1;
         }
-        pairwise = 0;
         switch (op) {
-        case NEON_3R_FLOAT_ARITH:
-            pairwise = (u && size < 2); /* if VPADD (float) */
-            if (!pairwise) {
-                return 1; /* handled by decodetree */
-            }
-            break;
         case NEON_3R_FLOAT_MINMAX:
-            pairwise = u; /* if VPMIN/VPMAX (float) */
+            if (u) {
+                return 1; /* VPMIN/VPMAX handled by decodetree */
+            }
             break;
         case NEON_3R_FLOAT_CMP:
             if (!u && size) {
@@ -5478,41 +5473,12 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             break;
         }
 
-        if (pairwise && q) {
-            /* All the pairwise insns UNDEF if Q is set */
-            return 1;
-        }
-
         for (pass = 0; pass < (q ? 4 : 2); pass++) {
 
-        if (pairwise) {
-            /* Pairwise.  */
-            if (pass < 1) {
-                tmp = neon_load_reg(rn, 0);
-                tmp2 = neon_load_reg(rn, 1);
-            } else {
-                tmp = neon_load_reg(rm, 0);
-                tmp2 = neon_load_reg(rm, 1);
-            }
-        } else {
-            /* Elementwise.  */
-            tmp = neon_load_reg(rn, pass);
-            tmp2 = neon_load_reg(rm, pass);
-        }
+        /* Elementwise.  */
+        tmp = neon_load_reg(rn, pass);
+        tmp2 = neon_load_reg(rm, pass);
         switch (op) {
-        case NEON_3R_FLOAT_ARITH: /* Floating point arithmetic. */
-        {
-            TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-            switch ((u << 2) | size) {
-            case 4: /* VPADD */
-                gen_helper_vfp_adds(tmp, tmp, tmp2, fpstatus);
-                break;
-            default:
-                abort();
-            }
-            tcg_temp_free_ptr(fpstatus);
-            break;
-        }
         case NEON_3R_FLOAT_MULTIPLY:
         {
             TCGv_ptr fpstatus = get_fpstatus_ptr(1);
@@ -5603,22 +5569,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         }
         tcg_temp_free_i32(tmp2);
 
-        /* Save the result.  For elementwise operations we can put it
-           straight into the destination register.  For pairwise operations
-           we have to be careful to avoid clobbering the source operands.  */
-        if (pairwise && rd == rm) {
-            neon_store_scratch(pass, tmp);
-        } else {
-            neon_store_reg(rd, pass, tmp);
-        }
+        neon_store_reg(rd, pass, tmp);
 
         } /* for pass */
-        if (pairwise && rd == rm) {
-            for (pass = 0; pass < (q ? 4 : 2); pass++) {
-                tmp = neon_load_scratch(pass);
-                neon_store_reg(rd, pass, tmp);
-            }
-        }
         /* End of 3 register same size operations.  */
     } else if (insn & (1 << 4)) {
         if ((insn & 0x00380080) != 0) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 13/17] target/arm: Convert Neon fp VMUL, VMLA, VMLS 3-reg-same insns to decodetree
  2020-05-12 16:38 [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree Peter Maydell
                   ` (11 preceding siblings ...)
  2020-05-12 16:38 ` [PATCH v2 12/17] target/arm: Convert Neon VPMIN/VPMAX/VPADD float " Peter Maydell
@ 2020-05-12 16:39 ` Peter Maydell
  2020-05-13 20:28   ` Richard Henderson
  2020-05-12 16:39 ` [PATCH v2 14/17] target/arm: Convert Neon 3-reg-same compare " Peter Maydell
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2020-05-12 16:39 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon integer VMUL, VMLA, and VMLS 3-reg-same inssn to
decodetree.

We don't have a gvec helper for multiply-accumulate, so VMLA and VMLS
need a loop function do_3same_fp().  This takes a reads_vd parameter
to do_3same_fp() which tells it to load the old value into vd before
calling the callback function, in the same way that the do_vfp_3op_sp()
and do_vfp_3op_dp() functions in translate-vfp.inc.c work. (The
only uses in this patch pass reads_vd == true, but later commits
will use reads_vd == false.)

This conversion fixes in passing an underdecoding for VMUL
(originally reported by Fredrik Strupe <fredrik@strupe.net>): bit 1
of the 'size' field must be 0.  The old decoder didn't enforce this,
but the decodetree pattern does.

The gen_VMLA_fp_reg() function performs the addition operation
with the operands in the opposite order to the old decoder:
since Neon sets 'default NaN mode' float32_add operations are
commutative so there is no behaviour difference, but putting
them this way around matches the Arm ARM pseudocode and the
required operation order for the subtraction in gen_VMLS_fp_reg().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  3 ++
 target/arm/translate-neon.inc.c | 81 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 17 +------
 3 files changed, 85 insertions(+), 16 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 1f289750dcd..0f74b31dcd1 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -181,5 +181,8 @@ VADD_fp_3s       1111 001 0 0 . 0 . .... .... 1101 ... 0 .... @3same_fp
 VSUB_fp_3s       1111 001 0 0 . 1 . .... .... 1101 ... 0 .... @3same_fp
 VPADD_fp_3s      1111 001 1 0 . 0 . .... .... 1101 ... 0 .... @3same_fp_q0
 VABD_fp_3s       1111 001 1 0 . 1 . .... .... 1101 ... 0 .... @3same_fp
+VMLA_fp_3s       1111 001 0 0 . 0 . .... .... 1101 ... 1 .... @3same_fp
+VMLS_fp_3s       1111 001 0 0 . 1 . .... .... 1101 ... 1 .... @3same_fp
+VMUL_fp_3s       1111 001 1 0 . 0 . .... .... 1101 ... 1 .... @3same_fp
 VPMAX_fp_3s      1111 001 1 0 . 0 . .... .... 1111 ... 0 .... @3same_fp_q0
 VPMIN_fp_3s      1111 001 1 0 . 1 . .... .... 1111 ... 0 .... @3same_fp_q0
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 7bdf1e3fee8..18896598bb4 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1022,6 +1022,55 @@ DO_3SAME_PAIR(VPADD, padd_u)
 DO_3SAME_VQDMULH(VQDMULH, qdmulh)
 DO_3SAME_VQDMULH(VQRDMULH, qrdmulh)
 
+static bool do_3same_fp(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn,
+                        bool reads_vd)
+{
+    /*
+     * FP operations handled elementwise 32 bits at a time.
+     * If reads_vd is true then the old value of Vd will be
+     * loaded before calling the callback function. This is
+     * used for multiply-accumulate type operations.
+     */
+    TCGv_i32 tmp, tmp2;
+    int pass;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    TCGv_ptr fpstatus = get_fpstatus_ptr(1);
+    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
+        tmp = neon_load_reg(a->vn, pass);
+        tmp2 = neon_load_reg(a->vm, pass);
+        if (reads_vd) {
+            TCGv_i32 tmp_rd = neon_load_reg(a->vd, pass);
+            fn(tmp_rd, tmp, tmp2, fpstatus);
+            neon_store_reg(a->vd, pass, tmp_rd);
+            tcg_temp_free_i32(tmp);
+        } else {
+            fn(tmp, tmp, tmp2, fpstatus);
+            neon_store_reg(a->vd, pass, tmp);
+        }
+        tcg_temp_free_i32(tmp2);
+    }
+    tcg_temp_free_ptr(fpstatus);
+    return true;
+}
+
 /*
  * For all the functions using this macro, size == 1 means fp16,
  * which is an architecture extension we don't implement yet.
@@ -1049,6 +1098,38 @@ DO_3SAME_VQDMULH(VQRDMULH, qrdmulh)
 DO_3S_FP_GVEC(VADD, gen_helper_gvec_fadd_s)
 DO_3S_FP_GVEC(VSUB, gen_helper_gvec_fsub_s)
 DO_3S_FP_GVEC(VABD, gen_helper_gvec_fabd_s)
+DO_3S_FP_GVEC(VMUL, gen_helper_gvec_fmul_s)
+
+/*
+ * For all the functions using this macro, size == 1 means fp16,
+ * which is an architecture extension we don't implement yet.
+ */
+#define DO_3S_FP(INSN,FUNC,READS_VD)                                \
+    static bool trans_##INSN##_fp_3s(DisasContext *s, arg_3same *a) \
+    {                                                               \
+        if (a->size != 0) {                                         \
+            /* TODO fp16 support */                                 \
+            return false;                                           \
+        }                                                           \
+        return do_3same_fp(s, a, FUNC, READS_VD);                   \
+    }
+
+static void gen_VMLA_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
+                            TCGv_ptr fpstatus)
+{
+    gen_helper_vfp_muls(vn, vn, vm, fpstatus);
+    gen_helper_vfp_adds(vd, vd, vn, fpstatus);
+}
+
+static void gen_VMLS_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
+                            TCGv_ptr fpstatus)
+{
+    gen_helper_vfp_muls(vn, vn, vm, fpstatus);
+    gen_helper_vfp_subs(vd, vd, vn, fpstatus);
+}
+
+DO_3S_FP(VMLA, gen_VMLA_fp_3s, true)
+DO_3S_FP(VMLS, gen_VMLS_fp_3s, true)
 
 static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
 {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index ca6ed09ec34..06b6925d31e 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5433,6 +5433,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VPADD_VQRDMLAH:
         case NEON_3R_VQDMULH_VQRDMULH:
         case NEON_3R_FLOAT_ARITH:
+        case NEON_3R_FLOAT_MULTIPLY:
             /* Already handled by decodetree */
             return 1;
         }
@@ -5479,22 +5480,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         tmp = neon_load_reg(rn, pass);
         tmp2 = neon_load_reg(rm, pass);
         switch (op) {
-        case NEON_3R_FLOAT_MULTIPLY:
-        {
-            TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-            gen_helper_vfp_muls(tmp, tmp, tmp2, fpstatus);
-            if (!u) {
-                tcg_temp_free_i32(tmp2);
-                tmp2 = neon_load_reg(rd, pass);
-                if (size == 0) {
-                    gen_helper_vfp_adds(tmp, tmp, tmp2, fpstatus);
-                } else {
-                    gen_helper_vfp_subs(tmp, tmp2, tmp, fpstatus);
-                }
-            }
-            tcg_temp_free_ptr(fpstatus);
-            break;
-        }
         case NEON_3R_FLOAT_CMP:
         {
             TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 14/17] target/arm: Convert Neon 3-reg-same compare insns to decodetree
  2020-05-12 16:38 [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree Peter Maydell
                   ` (12 preceding siblings ...)
  2020-05-12 16:39 ` [PATCH v2 13/17] target/arm: Convert Neon fp VMUL, VMLA, VMLS " Peter Maydell
@ 2020-05-12 16:39 ` Peter Maydell
  2020-05-12 16:39 ` [PATCH v2 15/17] target/arm: Move 'env' argument of recps_f32 and rsqrts_f32 helpers to usual place Peter Maydell
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2020-05-12 16:39 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon integer 3-reg-same compare insns VCGE, VCGT,
VCEQ, VACGE and VACGT to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  5 +++++
 target/arm/translate-neon.inc.c |  6 +++++
 target/arm/translate.c          | 39 ++-------------------------------
 3 files changed, 13 insertions(+), 37 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 0f74b31dcd1..bc5a3e0295c 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -184,5 +184,10 @@ VABD_fp_3s       1111 001 1 0 . 1 . .... .... 1101 ... 0 .... @3same_fp
 VMLA_fp_3s       1111 001 0 0 . 0 . .... .... 1101 ... 1 .... @3same_fp
 VMLS_fp_3s       1111 001 0 0 . 1 . .... .... 1101 ... 1 .... @3same_fp
 VMUL_fp_3s       1111 001 1 0 . 0 . .... .... 1101 ... 1 .... @3same_fp
+VCEQ_fp_3s       1111 001 0 0 . 0 . .... .... 1110 ... 0 .... @3same_fp
+VCGE_fp_3s       1111 001 1 0 . 0 . .... .... 1110 ... 0 .... @3same_fp
+VACGE_fp_3s      1111 001 1 0 . 0 . .... .... 1110 ... 1 .... @3same_fp
+VCGT_fp_3s       1111 001 1 0 . 1 . .... .... 1110 ... 0 .... @3same_fp
+VACGT_fp_3s      1111 001 1 0 . 1 . .... .... 1110 ... 1 .... @3same_fp
 VPMAX_fp_3s      1111 001 1 0 . 0 . .... .... 1111 ... 0 .... @3same_fp_q0
 VPMIN_fp_3s      1111 001 1 0 . 1 . .... .... 1111 ... 0 .... @3same_fp_q0
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 18896598bb4..eeea71e3bec 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1114,6 +1114,12 @@ DO_3S_FP_GVEC(VMUL, gen_helper_gvec_fmul_s)
         return do_3same_fp(s, a, FUNC, READS_VD);                   \
     }
 
+DO_3S_FP(VCEQ, gen_helper_neon_ceq_f32, false)
+DO_3S_FP(VCGE, gen_helper_neon_cge_f32, false)
+DO_3S_FP(VCGT, gen_helper_neon_cgt_f32, false)
+DO_3S_FP(VACGE, gen_helper_neon_acge_f32, false)
+DO_3S_FP(VACGT, gen_helper_neon_acgt_f32, false)
+
 static void gen_VMLA_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
                             TCGv_ptr fpstatus)
 {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 06b6925d31e..b9fcbbcbcb5 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5434,6 +5434,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VQDMULH_VQRDMULH:
         case NEON_3R_FLOAT_ARITH:
         case NEON_3R_FLOAT_MULTIPLY:
+        case NEON_3R_FLOAT_CMP:
+        case NEON_3R_FLOAT_ACMP:
             /* Already handled by decodetree */
             return 1;
         }
@@ -5448,17 +5450,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 return 1; /* VPMIN/VPMAX handled by decodetree */
             }
             break;
-        case NEON_3R_FLOAT_CMP:
-            if (!u && size) {
-                /* no encoding for U=0 C=1x */
-                return 1;
-            }
-            break;
-        case NEON_3R_FLOAT_ACMP:
-            if (!u) {
-                return 1;
-            }
-            break;
         case NEON_3R_FLOAT_MISC:
             /* VMAXNM/VMINNM in ARMv8 */
             if (u && !arm_dc_feature(s, ARM_FEATURE_V8)) {
@@ -5480,32 +5471,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         tmp = neon_load_reg(rn, pass);
         tmp2 = neon_load_reg(rm, pass);
         switch (op) {
-        case NEON_3R_FLOAT_CMP:
-        {
-            TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-            if (!u) {
-                gen_helper_neon_ceq_f32(tmp, tmp, tmp2, fpstatus);
-            } else {
-                if (size == 0) {
-                    gen_helper_neon_cge_f32(tmp, tmp, tmp2, fpstatus);
-                } else {
-                    gen_helper_neon_cgt_f32(tmp, tmp, tmp2, fpstatus);
-                }
-            }
-            tcg_temp_free_ptr(fpstatus);
-            break;
-        }
-        case NEON_3R_FLOAT_ACMP:
-        {
-            TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-            if (size == 0) {
-                gen_helper_neon_acge_f32(tmp, tmp, tmp2, fpstatus);
-            } else {
-                gen_helper_neon_acgt_f32(tmp, tmp, tmp2, fpstatus);
-            }
-            tcg_temp_free_ptr(fpstatus);
-            break;
-        }
         case NEON_3R_FLOAT_MINMAX:
         {
             TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 15/17] target/arm: Move 'env' argument of recps_f32 and rsqrts_f32 helpers to usual place
  2020-05-12 16:38 [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree Peter Maydell
                   ` (13 preceding siblings ...)
  2020-05-12 16:39 ` [PATCH v2 14/17] target/arm: Convert Neon 3-reg-same compare " Peter Maydell
@ 2020-05-12 16:39 ` Peter Maydell
  2020-05-13 20:16   ` Richard Henderson
  2020-05-12 16:39 ` [PATCH v2 16/17] target/arm: Convert Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS to decodetree Peter Maydell
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2020-05-12 16:39 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

The usual location for the env argument in the argument list of a TCG helper
is immediately after the return-value argument. recps_f32 and rsqrts_f32
differ in that they put it at the end.

Move the env argument to its usual place; this will allow us to
more easily use these helper functions with the gvec APIs.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h     | 4 ++--
 target/arm/translate.c  | 4 ++--
 target/arm/vfp_helper.c | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 6e9629c87b0..49336dc432c 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -203,8 +203,8 @@ DEF_HELPER_FLAGS_3(vfp_fcvt_f64_to_f16, TCG_CALL_NO_RWG, f16, f64, ptr, i32)
 DEF_HELPER_4(vfp_muladdd, f64, f64, f64, f64, ptr)
 DEF_HELPER_4(vfp_muladds, f32, f32, f32, f32, ptr)
 
-DEF_HELPER_3(recps_f32, f32, f32, f32, env)
-DEF_HELPER_3(rsqrts_f32, f32, f32, f32, env)
+DEF_HELPER_3(recps_f32, f32, env, f32, f32)
+DEF_HELPER_3(rsqrts_f32, f32, env, f32, f32)
 DEF_HELPER_FLAGS_2(recpe_f16, TCG_CALL_NO_RWG, f16, f16, ptr)
 DEF_HELPER_FLAGS_2(recpe_f32, TCG_CALL_NO_RWG, f32, f32, ptr)
 DEF_HELPER_FLAGS_2(recpe_f64, TCG_CALL_NO_RWG, f64, f64, ptr)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index b9fcbbcbcb5..23e3705172b 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5494,9 +5494,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 tcg_temp_free_ptr(fpstatus);
             } else {
                 if (size == 0) {
-                    gen_helper_recps_f32(tmp, tmp, tmp2, cpu_env);
+                    gen_helper_recps_f32(tmp, cpu_env, tmp, tmp2);
                 } else {
-                    gen_helper_rsqrts_f32(tmp, tmp, tmp2, cpu_env);
+                    gen_helper_rsqrts_f32(tmp, cpu_env, tmp, tmp2);
               }
             }
             break;
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index a7926611665..f5ecaab2d4a 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -581,7 +581,7 @@ uint32_t HELPER(vfp_fcvt_f64_to_f16)(float64 a, void *fpstp, uint32_t ahp_mode)
 #define float32_three make_float32(0x40400000)
 #define float32_one_point_five make_float32(0x3fc00000)
 
-float32 HELPER(recps_f32)(float32 a, float32 b, CPUARMState *env)
+float32 HELPER(recps_f32)(CPUARMState *env, float32 a, float32 b)
 {
     float_status *s = &env->vfp.standard_fp_status;
     if ((float32_is_infinity(a) && float32_is_zero_or_denormal(b)) ||
@@ -594,7 +594,7 @@ float32 HELPER(recps_f32)(float32 a, float32 b, CPUARMState *env)
     return float32_sub(float32_two, float32_mul(a, b, s), s);
 }
 
-float32 HELPER(rsqrts_f32)(float32 a, float32 b, CPUARMState *env)
+float32 HELPER(rsqrts_f32)(CPUARMState *env, float32 a, float32 b)
 {
     float_status *s = &env->vfp.standard_fp_status;
     float32 product;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 16/17] target/arm: Convert Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS to decodetree
  2020-05-12 16:38 [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree Peter Maydell
                   ` (14 preceding siblings ...)
  2020-05-12 16:39 ` [PATCH v2 15/17] target/arm: Move 'env' argument of recps_f32 and rsqrts_f32 helpers to usual place Peter Maydell
@ 2020-05-12 16:39 ` Peter Maydell
  2020-05-13 20:20   ` Richard Henderson
  2020-05-12 16:39 ` [PATCH v2 17/17] target/arm: Convert NEON VFMA, VFMS 3-reg-same insns " Peter Maydell
  2020-05-12 23:44 ` [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same " no-reply
  17 siblings, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2020-05-12 16:39 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS 3-reg-same
insns to decodetree. (These are all the remaining non-accumulation
instructions in this group.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  6 +++
 target/arm/translate-neon.inc.c | 70 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 42 +-------------------
 3 files changed, 78 insertions(+), 40 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index bc5a3e0295c..c276faf8cce 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -189,5 +189,11 @@ VCGE_fp_3s       1111 001 1 0 . 0 . .... .... 1110 ... 0 .... @3same_fp
 VACGE_fp_3s      1111 001 1 0 . 0 . .... .... 1110 ... 1 .... @3same_fp
 VCGT_fp_3s       1111 001 1 0 . 1 . .... .... 1110 ... 0 .... @3same_fp
 VACGT_fp_3s      1111 001 1 0 . 1 . .... .... 1110 ... 1 .... @3same_fp
+VMAX_fp_3s       1111 001 0 0 . 0 . .... .... 1111 ... 0 .... @3same_fp
+VMIN_fp_3s       1111 001 0 0 . 1 . .... .... 1111 ... 0 .... @3same_fp
 VPMAX_fp_3s      1111 001 1 0 . 0 . .... .... 1111 ... 0 .... @3same_fp_q0
 VPMIN_fp_3s      1111 001 1 0 . 1 . .... .... 1111 ... 0 .... @3same_fp_q0
+VRECPS_fp_3s     1111 001 0 0 . 0 . .... .... 1111 ... 1 .... @3same_fp
+VRSQRTS_fp_3s    1111 001 0 0 . 1 . .... .... 1111 ... 1 .... @3same_fp
+VMAXNM_fp_3s     1111 001 1 0 . 0 . .... .... 1111 ... 1 .... @3same_fp
+VMINNM_fp_3s     1111 001 1 0 . 1 . .... .... 1111 ... 1 .... @3same_fp
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index eeea71e3bec..263d2b47132 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1119,6 +1119,8 @@ DO_3S_FP(VCGE, gen_helper_neon_cge_f32, false)
 DO_3S_FP(VCGT, gen_helper_neon_cgt_f32, false)
 DO_3S_FP(VACGE, gen_helper_neon_acge_f32, false)
 DO_3S_FP(VACGT, gen_helper_neon_acgt_f32, false)
+DO_3S_FP(VMAX, gen_helper_vfp_maxs, false)
+DO_3S_FP(VMIN, gen_helper_vfp_mins, false)
 
 static void gen_VMLA_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
                             TCGv_ptr fpstatus)
@@ -1137,6 +1139,74 @@ static void gen_VMLS_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
 DO_3S_FP(VMLA, gen_VMLA_fp_3s, true)
 DO_3S_FP(VMLS, gen_VMLS_fp_3s, true)
 
+static bool trans_VMAXNM_fp_3s(DisasContext *s, arg_3same *a)
+{
+    if (!arm_dc_feature(s, ARM_FEATURE_V8)) {
+            return false;
+    }
+
+    if (a->size != 0) {
+        /* TODO fp16 support */
+        return false;
+    }
+
+    return do_3same_fp(s, a, gen_helper_vfp_maxnums, false);
+}
+
+static bool trans_VMINNM_fp_3s(DisasContext *s, arg_3same *a)
+{
+    if (!arm_dc_feature(s, ARM_FEATURE_V8)) {
+            return false;
+    }
+
+    if (a->size != 0) {
+        /* TODO fp16 support */
+        return false;
+    }
+
+    return do_3same_fp(s, a, gen_helper_vfp_minnums, false);
+}
+
+WRAP_ENV_FN(gen_VRECPS_tramp, gen_helper_recps_f32)
+
+static void gen_VRECPS_fp_3s(unsigned vece, uint32_t rd_ofs,
+                             uint32_t rn_ofs, uint32_t rm_ofs,
+                             uint32_t oprsz, uint32_t maxsz)
+{
+    static const GVecGen3 ops = { .fni4 = gen_VRECPS_tramp };
+    tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &ops);
+}
+
+static bool trans_VRECPS_fp_3s(DisasContext *s, arg_3same *a)
+{
+    if (a->size != 0) {
+        /* TODO fp16 support */
+        return false;
+    }
+
+    return do_3same(s, a, gen_VRECPS_fp_3s);
+}
+
+WRAP_ENV_FN(gen_VRSQRTS_tramp, gen_helper_rsqrts_f32)
+
+static void gen_VRSQRTS_fp_3s(unsigned vece, uint32_t rd_ofs,
+                              uint32_t rn_ofs, uint32_t rm_ofs,
+                              uint32_t oprsz, uint32_t maxsz)
+{
+    static const GVecGen3 ops = { .fni4 = gen_VRSQRTS_tramp };
+    tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &ops);
+}
+
+static bool trans_VRSQRTS_fp_3s(DisasContext *s, arg_3same *a)
+{
+    if (a->size != 0) {
+        /* TODO fp16 support */
+        return false;
+    }
+
+    return do_3same(s, a, gen_VRSQRTS_fp_3s);
+}
+
 static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
 {
     /* FP operations handled pairwise 32 bits at a time */
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 23e3705172b..c1d4fab8e80 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5436,6 +5436,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_FLOAT_MULTIPLY:
         case NEON_3R_FLOAT_CMP:
         case NEON_3R_FLOAT_ACMP:
+        case NEON_3R_FLOAT_MINMAX:
+        case NEON_3R_FLOAT_MISC:
             /* Already handled by decodetree */
             return 1;
         }
@@ -5445,17 +5447,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             return 1;
         }
         switch (op) {
-        case NEON_3R_FLOAT_MINMAX:
-            if (u) {
-                return 1; /* VPMIN/VPMAX handled by decodetree */
-            }
-            break;
-        case NEON_3R_FLOAT_MISC:
-            /* VMAXNM/VMINNM in ARMv8 */
-            if (u && !arm_dc_feature(s, ARM_FEATURE_V8)) {
-                return 1;
-            }
-            break;
         case NEON_3R_VFM_VQRDMLSH:
             if (!dc_isar_feature(aa32_simdfmac, s)) {
                 return 1;
@@ -5471,35 +5462,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         tmp = neon_load_reg(rn, pass);
         tmp2 = neon_load_reg(rm, pass);
         switch (op) {
-        case NEON_3R_FLOAT_MINMAX:
-        {
-            TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-            if (size == 0) {
-                gen_helper_vfp_maxs(tmp, tmp, tmp2, fpstatus);
-            } else {
-                gen_helper_vfp_mins(tmp, tmp, tmp2, fpstatus);
-            }
-            tcg_temp_free_ptr(fpstatus);
-            break;
-        }
-        case NEON_3R_FLOAT_MISC:
-            if (u) {
-                /* VMAXNM/VMINNM */
-                TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-                if (size == 0) {
-                    gen_helper_vfp_maxnums(tmp, tmp, tmp2, fpstatus);
-                } else {
-                    gen_helper_vfp_minnums(tmp, tmp, tmp2, fpstatus);
-                }
-                tcg_temp_free_ptr(fpstatus);
-            } else {
-                if (size == 0) {
-                    gen_helper_recps_f32(tmp, cpu_env, tmp, tmp2);
-                } else {
-                    gen_helper_rsqrts_f32(tmp, cpu_env, tmp, tmp2);
-              }
-            }
-            break;
         case NEON_3R_VFM_VQRDMLSH:
         {
             /* VFMA, VFMS: fused multiply-add */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 17/17] target/arm: Convert NEON VFMA, VFMS 3-reg-same insns to decodetree
  2020-05-12 16:38 [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree Peter Maydell
                   ` (15 preceding siblings ...)
  2020-05-12 16:39 ` [PATCH v2 16/17] target/arm: Convert Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS to decodetree Peter Maydell
@ 2020-05-12 16:39 ` Peter Maydell
  2020-05-12 23:44 ` [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same " no-reply
  17 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2020-05-12 16:39 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon floating point VFMA and VFMS insn to decodetree.
These are the last insns in the 3-reg-same group so we can
remove all the support/loop code from the old decoder.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |   3 +
 target/arm/translate-neon.inc.c |  41 ++++++++
 target/arm/translate.c          | 176 +-------------------------------
 3 files changed, 46 insertions(+), 174 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index c276faf8cce..f53bf8a4494 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -175,6 +175,9 @@ SHA256H2_3s      1111 001 1 0 . 01 .... .... 1100 . 1 . 0 .... \
 SHA256SU1_3s     1111 001 1 0 . 10 .... .... 1100 . 1 . 0 .... \
                  vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
+VFMA_fp_3s       1111 001 0 0 . 0 . .... .... 1100 ... 1 .... @3same_fp
+VFMS_fp_3s       1111 001 0 0 . 1 . .... .... 1100 ... 1 .... @3same_fp
+
 VQRDMLSH_3s      1111 001 1 0 . .. .... .... 1100 ... 1 .... @3same
 
 VADD_fp_3s       1111 001 0 0 . 0 . .... .... 1101 ... 0 .... @3same_fp
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 263d2b47132..7212876e8f0 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1207,6 +1207,47 @@ static bool trans_VRSQRTS_fp_3s(DisasContext *s, arg_3same *a)
     return do_3same(s, a, gen_VRSQRTS_fp_3s);
 }
 
+static void gen_VFMA_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
+                            TCGv_ptr fpstatus)
+{
+    gen_helper_vfp_muladds(vd, vn, vm, vd, fpstatus);
+}
+
+static bool trans_VFMA_fp_3s(DisasContext *s, arg_3same *a)
+{
+    if (!dc_isar_feature(aa32_simdfmac, s)) {
+        return false;
+    }
+
+    if (a->size != 0) {
+        /* TODO fp16 support */
+        return false;
+    }
+
+    return do_3same_fp(s, a, gen_VFMA_fp_3s, true);
+}
+
+static void gen_VFMS_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
+                            TCGv_ptr fpstatus)
+{
+    gen_helper_vfp_negs(vn, vn);
+    gen_helper_vfp_muladds(vd, vn, vm, vd, fpstatus);
+}
+
+static bool trans_VFMS_fp_3s(DisasContext *s, arg_3same *a)
+{
+    if (!dc_isar_feature(aa32_simdfmac, s)) {
+        return false;
+    }
+
+    if (a->size != 0) {
+        /* TODO fp16 support */
+        return false;
+    }
+
+    return do_3same_fp(s, a, gen_VFMS_fp_3s, true);
+}
+
 static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
 {
     /* FP operations handled pairwise 32 bits at a time */
diff --git a/target/arm/translate.c b/target/arm/translate.c
index c1d4fab8e80..4c9bb8b5ac0 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3391,78 +3391,6 @@ static void gen_neon_narrow_op(int op, int u, int size,
     }
 }
 
-/* Symbolic constants for op fields for Neon 3-register same-length.
- * The values correspond to bits [11:8,4]; see the ARM ARM DDI0406B
- * table A7-9.
- */
-#define NEON_3R_VHADD 0
-#define NEON_3R_VQADD 1
-#define NEON_3R_VRHADD 2
-#define NEON_3R_LOGIC 3 /* VAND,VBIC,VORR,VMOV,VORN,VEOR,VBIF,VBIT,VBSL */
-#define NEON_3R_VHSUB 4
-#define NEON_3R_VQSUB 5
-#define NEON_3R_VCGT 6
-#define NEON_3R_VCGE 7
-#define NEON_3R_VSHL 8
-#define NEON_3R_VQSHL 9
-#define NEON_3R_VRSHL 10
-#define NEON_3R_VQRSHL 11
-#define NEON_3R_VMAX 12
-#define NEON_3R_VMIN 13
-#define NEON_3R_VABD 14
-#define NEON_3R_VABA 15
-#define NEON_3R_VADD_VSUB 16
-#define NEON_3R_VTST_VCEQ 17
-#define NEON_3R_VML 18 /* VMLA, VMLS */
-#define NEON_3R_VMUL 19
-#define NEON_3R_VPMAX 20
-#define NEON_3R_VPMIN 21
-#define NEON_3R_VQDMULH_VQRDMULH 22
-#define NEON_3R_VPADD_VQRDMLAH 23
-#define NEON_3R_SHA 24 /* SHA1C,SHA1P,SHA1M,SHA1SU0,SHA256H{2},SHA256SU1 */
-#define NEON_3R_VFM_VQRDMLSH 25 /* VFMA, VFMS, VQRDMLSH */
-#define NEON_3R_FLOAT_ARITH 26 /* float VADD, VSUB, VPADD, VABD */
-#define NEON_3R_FLOAT_MULTIPLY 27 /* float VMLA, VMLS, VMUL */
-#define NEON_3R_FLOAT_CMP 28 /* float VCEQ, VCGE, VCGT */
-#define NEON_3R_FLOAT_ACMP 29 /* float VACGE, VACGT, VACLE, VACLT */
-#define NEON_3R_FLOAT_MINMAX 30 /* float VMIN, VMAX */
-#define NEON_3R_FLOAT_MISC 31 /* float VRECPS, VRSQRTS, VMAXNM/MINNM */
-
-static const uint8_t neon_3r_sizes[] = {
-    [NEON_3R_VHADD] = 0x7,
-    [NEON_3R_VQADD] = 0xf,
-    [NEON_3R_VRHADD] = 0x7,
-    [NEON_3R_LOGIC] = 0xf, /* size field encodes op type */
-    [NEON_3R_VHSUB] = 0x7,
-    [NEON_3R_VQSUB] = 0xf,
-    [NEON_3R_VCGT] = 0x7,
-    [NEON_3R_VCGE] = 0x7,
-    [NEON_3R_VSHL] = 0xf,
-    [NEON_3R_VQSHL] = 0xf,
-    [NEON_3R_VRSHL] = 0xf,
-    [NEON_3R_VQRSHL] = 0xf,
-    [NEON_3R_VMAX] = 0x7,
-    [NEON_3R_VMIN] = 0x7,
-    [NEON_3R_VABD] = 0x7,
-    [NEON_3R_VABA] = 0x7,
-    [NEON_3R_VADD_VSUB] = 0xf,
-    [NEON_3R_VTST_VCEQ] = 0x7,
-    [NEON_3R_VML] = 0x7,
-    [NEON_3R_VMUL] = 0x7,
-    [NEON_3R_VPMAX] = 0x7,
-    [NEON_3R_VPMIN] = 0x7,
-    [NEON_3R_VQDMULH_VQRDMULH] = 0x6,
-    [NEON_3R_VPADD_VQRDMLAH] = 0x7,
-    [NEON_3R_SHA] = 0xf, /* size field encodes op type */
-    [NEON_3R_VFM_VQRDMLSH] = 0x7, /* For VFM, size bit 1 encodes op */
-    [NEON_3R_FLOAT_ARITH] = 0x5, /* size bit 1 encodes op */
-    [NEON_3R_FLOAT_MULTIPLY] = 0x5, /* size bit 1 encodes op */
-    [NEON_3R_FLOAT_CMP] = 0x5, /* size bit 1 encodes op */
-    [NEON_3R_FLOAT_ACMP] = 0x5, /* size bit 1 encodes op */
-    [NEON_3R_FLOAT_MINMAX] = 0x5, /* size bit 1 encodes op */
-    [NEON_3R_FLOAT_MISC] = 0x5, /* size bit 1 encodes op */
-};
-
 /* Symbolic constants for op fields for Neon 2-register miscellaneous.
  * The values correspond to bits [17:16,10:7]; see the ARM ARM DDI0406B
  * table A7-13.
@@ -5383,108 +5311,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     rm_ofs = neon_reg_offset(rm, 0);
 
     if ((insn & (1 << 23)) == 0) {
-        /* Three register same length.  */
-        op = ((insn >> 7) & 0x1e) | ((insn >> 4) & 1);
-        /* Catch invalid op and bad size combinations: UNDEF */
-        if ((neon_3r_sizes[op] & (1 << size)) == 0) {
-            return 1;
-        }
-        /* All insns of this form UNDEF for either this condition or the
-         * superset of cases "Q==1"; we catch the latter later.
-         */
-        if (q && ((rd | rn | rm) & 1)) {
-            return 1;
-        }
-        switch (op) {
-        case NEON_3R_VFM_VQRDMLSH:
-            if (!u) {
-                /* VFM, VFMS */
-                if (size == 1) {
-                    return 1;
-                }
-                break;
-            }
-            /* VQRDMLSH : handled by decodetree */
-            return 1;
-
-        case NEON_3R_VADD_VSUB:
-        case NEON_3R_LOGIC:
-        case NEON_3R_VMAX:
-        case NEON_3R_VMIN:
-        case NEON_3R_VTST_VCEQ:
-        case NEON_3R_VCGT:
-        case NEON_3R_VCGE:
-        case NEON_3R_VQADD:
-        case NEON_3R_VQSUB:
-        case NEON_3R_VMUL:
-        case NEON_3R_VML:
-        case NEON_3R_VSHL:
-        case NEON_3R_SHA:
-        case NEON_3R_VHADD:
-        case NEON_3R_VRHADD:
-        case NEON_3R_VHSUB:
-        case NEON_3R_VABD:
-        case NEON_3R_VABA:
-        case NEON_3R_VQSHL:
-        case NEON_3R_VRSHL:
-        case NEON_3R_VQRSHL:
-        case NEON_3R_VPMAX:
-        case NEON_3R_VPMIN:
-        case NEON_3R_VPADD_VQRDMLAH:
-        case NEON_3R_VQDMULH_VQRDMULH:
-        case NEON_3R_FLOAT_ARITH:
-        case NEON_3R_FLOAT_MULTIPLY:
-        case NEON_3R_FLOAT_CMP:
-        case NEON_3R_FLOAT_ACMP:
-        case NEON_3R_FLOAT_MINMAX:
-        case NEON_3R_FLOAT_MISC:
-            /* Already handled by decodetree */
-            return 1;
-        }
-
-        if (size == 3) {
-            /* 64-bit element instructions: handled by decodetree */
-            return 1;
-        }
-        switch (op) {
-        case NEON_3R_VFM_VQRDMLSH:
-            if (!dc_isar_feature(aa32_simdfmac, s)) {
-                return 1;
-            }
-            break;
-        default:
-            break;
-        }
-
-        for (pass = 0; pass < (q ? 4 : 2); pass++) {
-
-        /* Elementwise.  */
-        tmp = neon_load_reg(rn, pass);
-        tmp2 = neon_load_reg(rm, pass);
-        switch (op) {
-        case NEON_3R_VFM_VQRDMLSH:
-        {
-            /* VFMA, VFMS: fused multiply-add */
-            TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-            TCGv_i32 tmp3 = neon_load_reg(rd, pass);
-            if (size) {
-                /* VFMS */
-                gen_helper_vfp_negs(tmp, tmp);
-            }
-            gen_helper_vfp_muladds(tmp, tmp, tmp2, tmp3, fpstatus);
-            tcg_temp_free_i32(tmp3);
-            tcg_temp_free_ptr(fpstatus);
-            break;
-        }
-        default:
-            abort();
-        }
-        tcg_temp_free_i32(tmp2);
-
-        neon_store_reg(rd, pass, tmp);
-
-        } /* for pass */
-        /* End of 3 register same size operations.  */
+        /* Three register same length: handled by decodetree */
+        return 1;
     } else if (insn & (1 << 4)) {
         if ((insn & 0x00380080) != 0) {
             /* Two registers and shift.  */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree
  2020-05-12 16:38 [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree Peter Maydell
                   ` (16 preceding siblings ...)
  2020-05-12 16:39 ` [PATCH v2 17/17] target/arm: Convert NEON VFMA, VFMS 3-reg-same insns " Peter Maydell
@ 2020-05-12 23:44 ` no-reply
  17 siblings, 0 replies; 30+ messages in thread
From: no-reply @ 2020-05-12 23:44 UTC (permalink / raw)
  To: peter.maydell; +Cc: qemu-arm, richard.henderson, qemu-devel

Patchew URL: https://patchew.org/QEMU/20200512163904.10918-1-peter.maydell@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20200512163904.10918-1-peter.maydell@linaro.org
Subject: [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
baa5469 target/arm: Convert NEON VFMA, VFMS 3-reg-same insns to decodetree
5b54865 target/arm: Convert Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS to decodetree
d72bf92 target/arm: Move 'env' argument of recps_f32 and rsqrts_f32 helpers to usual place
87ce7b2 target/arm: Convert Neon 3-reg-same compare insns to decodetree
3613a84 target/arm: Convert Neon fp VMUL, VMLA, VMLS 3-reg-same insns to decodetree
1c588e7 target/arm: Convert Neon VPMIN/VPMAX/VPADD float 3-reg-same insns to decodetree
0cc39b7 target/arm: Convert Neon VADD, VSUB, VABD 3-reg-same insns to decodetree
b2fa781 target/arm: Convert Neon VQDMULH/VQRDMULH 3-reg-same to decodetree
3015d59 target/arm: Convert Neon VPADD 3-reg-same insns to decodetree
a4747ac target/arm: Convert Neon VPMAX/VPMIN 3-reg-same insns to decodetree
6886825 target/arm: Convert Neon VQSHL, VRSHL, VQRSHL 3-reg-same insns to decodetree
a52dbe7 target/arm: Convert Neon VRHADD, VHSUB 3-reg-same insns to decodetree
f2f6dd6 target/arm: Convert Neon VABA/VABD 3-reg-same to decodetree
d4559f0 target/arm: Convert Neon VHADD 3-reg-same insns
876d7c9 target/arm: Convert Neon 64-bit element 3-reg-same insns
acac746 target/arm: Convert Neon 3-reg-same SHA to decodetree
7c693fa target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH to decodetree

=== OUTPUT BEGIN ===
1/17 Checking commit 7c693fa75b77 (target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH to decodetree)
ERROR: spaces required around that '*' (ctx:WxV)
#37: FILE: target/arm/translate-neon.inc.c:676:
+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
                                                              ^

total: 1 errors, 0 warnings, 50 lines checked

Patch 1/17 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

2/17 Checking commit acac7465d7c4 (target/arm: Convert Neon 3-reg-same SHA to decodetree)
ERROR: spaces required around that '*' (ctx:WxV)
#43: FILE: target/arm/translate-neon.inc.c:690:
+static bool trans_SHA1_3s(DisasContext *s, arg_SHA1_3s *a)
                                                        ^

total: 1 errors, 0 warnings, 220 lines checked

Patch 2/17 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

3/17 Checking commit 876d7c9fec85 (target/arm: Convert Neon 64-bit element 3-reg-same insns)
4/17 Checking commit d4559f0c2349 (target/arm: Convert Neon VHADD 3-reg-same insns)
ERROR: spaces required around that '*' (ctx:WxV)
#48: FILE: target/arm/translate-neon.inc.c:866:
+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
                                                              ^

total: 1 errors, 0 warnings, 51 lines checked

Patch 4/17 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

5/17 Checking commit f2f6dd6b5d9c (target/arm: Convert Neon VABA/VABD 3-reg-same to decodetree)
6/17 Checking commit a52dbe7662e7 (target/arm: Convert Neon VRHADD, VHSUB 3-reg-same insns to decodetree)
7/17 Checking commit 6886825aa2b5 (target/arm: Convert Neon VQSHL, VRSHL, VQRSHL 3-reg-same insns to decodetree)
ERROR: Macros with multiple statements should be enclosed in a do - while loop
#103: FILE: target/arm/translate-neon.inc.c:890:
+#define DO_3SAME_32_ENV(INSN, FUNC)                                     \
+    WRAP_ENV_FN(gen_##INSN##_tramp8, gen_helper_neon_##FUNC##8);        \
+    WRAP_ENV_FN(gen_##INSN##_tramp16, gen_helper_neon_##FUNC##16);      \
+    WRAP_ENV_FN(gen_##INSN##_tramp32, gen_helper_neon_##FUNC##32);      \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        static const GVecGen3 ops[4] = {                                \
+            { .fni4 = gen_##INSN##_tramp8 },                            \
+            { .fni4 = gen_##INSN##_tramp16 },                           \
+            { .fni4 = gen_##INSN##_tramp32 },                           \
+            { 0 },                                                      \
+        };                                                              \
+        tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &ops[vece]); \
+    }                                                                   \
+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
+    {                                                                   \
+        if (a->size > 2) {                                              \
+            return false;                                               \
+        }                                                               \
+        return do_3same(s, a, gen_##INSN##_3s);                         \
+    }

ERROR: spaces required around that '*' (ctx:WxV)
#119: FILE: target/arm/translate-neon.inc.c:906:
+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
                                                              ^

total: 2 errors, 0 warnings, 154 lines checked

Patch 7/17 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

8/17 Checking commit a4747ac6caf6 (target/arm: Convert Neon VPMAX/VPMIN 3-reg-same insns to decodetree)
ERROR: spaces required around that '*' (ctx:WxV)
#51: FILE: target/arm/translate-neon.inc.c:928:
+static bool do_3same_pair(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
                                                      ^

ERROR: spaces required around that '*' (ctx:WxV)
#98: FILE: target/arm/translate-neon.inc.c:975:
+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
                                                              ^

total: 2 errors, 0 warnings, 136 lines checked

Patch 8/17 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

9/17 Checking commit 3015d59a7178 (target/arm: Convert Neon VPADD 3-reg-same insns to decodetree)
10/17 Checking commit b2fa781c9e60 (target/arm: Convert Neon VQDMULH/VQRDMULH 3-reg-same to decodetree)
ERROR: Macros with multiple statements should be enclosed in a do - while loop
#38: FILE: target/arm/translate-neon.inc.c:1001:
+#define DO_3SAME_VQDMULH(INSN, FUNC)                                    \
+    WRAP_ENV_FN(gen_##INSN##_tramp16, gen_helper_neon_##FUNC##_s16);    \
+    WRAP_ENV_FN(gen_##INSN##_tramp32, gen_helper_neon_##FUNC##_s32);    \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        static const GVecGen3 ops[2] = {                                \
+            { .fni4 = gen_##INSN##_tramp16 },                           \
+            { .fni4 = gen_##INSN##_tramp32 },                           \
+        };                                                              \
+        tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &ops[vece - 1]); \
+    }                                                                   \
+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
+    {                                                                   \
+        if (a->size != 1 && a->size != 2) {                             \
+            return false;                                               \
+        }                                                               \
+        return do_3same(s, a, gen_##INSN##_3s);                         \
+    }

ERROR: spaces required around that '*' (ctx:WxV)
#51: FILE: target/arm/translate-neon.inc.c:1014:
+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
                                                              ^

total: 2 errors, 0 warnings, 72 lines checked

Patch 10/17 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

11/17 Checking commit 0cc39b799ad7 (target/arm: Convert Neon VADD, VSUB, VABD 3-reg-same insns to decodetree)
ERROR: space required after that ',' (ctx:VxV)
#90: FILE: target/arm/translate-neon.inc.c:1029:
+#define DO_3S_FP_GVEC(INSN,FUNC)                                        \
                           ^

ERROR: spaces required around that '*' (ctx:WxV)
#100: FILE: target/arm/translate-neon.inc.c:1039:
+    static bool trans_##INSN##_fp_3s(DisasContext *s, arg_3same *a)     \
                                                                 ^

WARNING: Block comments use a leading /* on a separate line
#103: FILE: target/arm/translate-neon.inc.c:1042:
+            /* TODO fp16 support */                                     \

total: 2 errors, 1 warnings, 120 lines checked

Patch 11/17 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

12/17 Checking commit 1c588e7110e6 (target/arm: Convert Neon VPMIN/VPMAX/VPADD float 3-reg-same insns to decodetree)
ERROR: spaces required around that '*' (ctx:WxV)
#47: FILE: target/arm/translate-neon.inc.c:1053:
+static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
                                                         ^

ERROR: space required after that ',' (ctx:VxV)
#96: FILE: target/arm/translate-neon.inc.c:1102:
+#define DO_3S_FP_PAIR(INSN,FUNC)                                    \
                           ^

ERROR: spaces required around that '*' (ctx:WxV)
#97: FILE: target/arm/translate-neon.inc.c:1103:
+    static bool trans_##INSN##_fp_3s(DisasContext *s, arg_3same *a) \
                                                                 ^

WARNING: Block comments use a leading /* on a separate line
#100: FILE: target/arm/translate-neon.inc.c:1106:
+            /* TODO fp16 support */                                 \

ERROR: suspect code indent for conditional statements (8, 8)
#158: FILE: target/arm/translate.c:5476:
         for (pass = 0; pass < (q ? 4 : 2); pass++) {
[...]
+        /* Elementwise.  */

total: 4 errors, 1 warnings, 181 lines checked

Patch 12/17 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

13/17 Checking commit 3613a840e48e (target/arm: Convert Neon fp VMUL, VMLA, VMLS 3-reg-same insns to decodetree)
ERROR: spaces required around that '*' (ctx:WxV)
#55: FILE: target/arm/translate-neon.inc.c:1025:
+static bool do_3same_fp(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn,
                                                    ^

ERROR: space required after that ',' (ctx:VxV)
#117: FILE: target/arm/translate-neon.inc.c:1107:
+#define DO_3S_FP(INSN,FUNC,READS_VD)                                \
                      ^

ERROR: space required after that ',' (ctx:VxV)
#117: FILE: target/arm/translate-neon.inc.c:1107:
+#define DO_3S_FP(INSN,FUNC,READS_VD)                                \
                           ^

ERROR: spaces required around that '*' (ctx:WxV)
#118: FILE: target/arm/translate-neon.inc.c:1108:
+    static bool trans_##INSN##_fp_3s(DisasContext *s, arg_3same *a) \
                                                                 ^

WARNING: Block comments use a leading /* on a separate line
#121: FILE: target/arm/translate-neon.inc.c:1111:
+            /* TODO fp16 support */                                 \

total: 4 errors, 1 warnings, 130 lines checked

Patch 13/17 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

14/17 Checking commit 87ce7b2990a3 (target/arm: Convert Neon 3-reg-same compare insns to decodetree)
15/17 Checking commit d72bf92f9fba (target/arm: Move 'env' argument of recps_f32 and rsqrts_f32 helpers to usual place)
16/17 Checking commit 5b54865d4a45 (target/arm: Convert Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS to decodetree)
ERROR: spaces required around that '*' (ctx:WxV)
#48: FILE: target/arm/translate-neon.inc.c:1142:
+static bool trans_VMAXNM_fp_3s(DisasContext *s, arg_3same *a)
                                                           ^

total: 1 errors, 0 warnings, 153 lines checked

Patch 16/17 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

17/17 Checking commit baa5469d3e6d (target/arm: Convert NEON VFMA, VFMS 3-reg-same insns to decodetree)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200512163904.10918-1-peter.maydell@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 01/17] target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH to decodetree
  2020-05-12 16:38 ` [PATCH v2 01/17] target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH " Peter Maydell
@ 2020-05-13 16:39   ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-05-13 16:39 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/12/20 9:38 AM, Peter Maydell wrote:
> Convert the Neon VQRDMLAH and VQRDMLSH insns in the 3-reg-same group
> to decodetree.  These don't use do_3same() because they want to
> operate on VFP double registers, whose offsets are different from the
> neon_reg_offset() calculations do_3same does.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> v1->v2: use do_3same() rather than custom do_vqrdlmah() function
> ---
>  target/arm/neon-dp.decode       |  3 +++
>  target/arm/translate-neon.inc.c | 15 +++++++++++++++
>  target/arm/translate.c          | 14 ++------------
>  3 files changed, 20 insertions(+), 12 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 04/17] target/arm: Convert Neon VHADD 3-reg-same insns
  2020-05-12 16:38 ` [PATCH v2 04/17] target/arm: Convert Neon VHADD " Peter Maydell
@ 2020-05-13 16:47   ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-05-13 16:47 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/12/20 9:38 AM, Peter Maydell wrote:
> Convert the Neon VHADD insns in the 3-reg-same group to decodetree.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       |  2 ++
>  target/arm/translate-neon.inc.c | 24 ++++++++++++++++++++++++
>  target/arm/translate.c          |  4 +---
>  3 files changed, 27 insertions(+), 3 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 05/17] target/arm: Convert Neon VABA/VABD 3-reg-same to decodetree
  2020-05-12 16:38 ` [PATCH v2 05/17] target/arm: Convert Neon VABA/VABD 3-reg-same to decodetree Peter Maydell
@ 2020-05-13 16:48   ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-05-13 16:48 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/12/20 9:38 AM, Peter Maydell wrote:
> Convert the Neon VABA and VABD insns in the 3-reg-same group to
> decodetree.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       |  6 ++++++
>  target/arm/translate-neon.inc.c |  4 ++++
>  target/arm/translate.c          | 22 ++--------------------
>  3 files changed, 12 insertions(+), 20 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 06/17] target/arm: Convert Neon VRHADD, VHSUB 3-reg-same insns to decodetree
  2020-05-12 16:38 ` [PATCH v2 06/17] target/arm: Convert Neon VRHADD, VHSUB 3-reg-same insns " Peter Maydell
@ 2020-05-13 16:50   ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-05-13 16:50 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/12/20 9:38 AM, Peter Maydell wrote:
> Convert the Neon VRHADD and VHSUB 3-reg-same insns to decodetree.
> (These are all the other insns in 3-reg-same which were using
> GEN_NEON_INTEGER_OP() and which are not pairwise or
> reversed-operands.)
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       | 6 ++++++
>  target/arm/translate-neon.inc.c | 4 ++++
>  target/arm/translate.c          | 8 ++------
>  3 files changed, 12 insertions(+), 6 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 07/17] target/arm: Convert Neon VQSHL, VRSHL, VQRSHL 3-reg-same insns to decodetree
  2020-05-12 16:38 ` [PATCH v2 07/17] target/arm: Convert Neon VQSHL, VRSHL, VQRSHL " Peter Maydell
@ 2020-05-13 16:55   ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-05-13 16:55 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/12/20 9:38 AM, Peter Maydell wrote:
> Convert the VQSHL, VRSHL and VQRSHL insns in the 3-reg-same
> group to decodetree. We have already implemented the size==0b11
> case of these insns; this commit handles the remaining sizes.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       | 47 +++++++++++++++++++++++----------
>  target/arm/translate-neon.inc.c | 43 ++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 22 +++------------
>  3 files changed, 79 insertions(+), 33 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

I think I'll come back and clean up this split 64/32-bit decode, as well as
converting to pointer-to-qc, but this works for now.


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 10/17] target/arm: Convert Neon VQDMULH/VQRDMULH 3-reg-same to decodetree
  2020-05-12 16:38 ` [PATCH v2 10/17] target/arm: Convert Neon VQDMULH/VQRDMULH 3-reg-same " Peter Maydell
@ 2020-05-13 16:58   ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-05-13 16:58 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/12/20 9:38 AM, Peter Maydell wrote:
> Convert the Neon VQDMULH and VQRDMULH 3-reg-same insns to
> decodetree. These are the last integer operations in the
> 3-reg-same group.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       |  3 +++
>  target/arm/translate-neon.inc.c | 24 ++++++++++++++++++++++++
>  target/arm/translate.c          | 24 +-----------------------
>  3 files changed, 28 insertions(+), 23 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 11/17] target/arm: Convert Neon VADD, VSUB, VABD 3-reg-same insns to decodetree
  2020-05-12 16:38 ` [PATCH v2 11/17] target/arm: Convert Neon VADD, VSUB, VABD 3-reg-same insns " Peter Maydell
@ 2020-05-13 20:06   ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-05-13 20:06 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/12/20 9:38 AM, Peter Maydell wrote:
> Convert the Neon VADD, VSUB, VABD 3-reg-same insns to decodetree.
> We already have gvec helpers for addition and subtraction, but must
> add one for fabd.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/helper.h             |  3 ++-
>  target/arm/neon-dp.decode       |  8 ++++++++
>  target/arm/neon_helper.c        |  7 -------
>  target/arm/translate-neon.inc.c | 28 ++++++++++++++++++++++++++++
>  target/arm/translate.c          | 10 +++-------
>  target/arm/vec_helper.c         |  7 +++++++
>  6 files changed, 48 insertions(+), 15 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 15/17] target/arm: Move 'env' argument of recps_f32 and rsqrts_f32 helpers to usual place
  2020-05-12 16:39 ` [PATCH v2 15/17] target/arm: Move 'env' argument of recps_f32 and rsqrts_f32 helpers to usual place Peter Maydell
@ 2020-05-13 20:16   ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-05-13 20:16 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/12/20 9:39 AM, Peter Maydell wrote:
> The usual location for the env argument in the argument list of a TCG helper
> is immediately after the return-value argument. recps_f32 and rsqrts_f32
> differ in that they put it at the end.
> 
> Move the env argument to its usual place; this will allow us to
> more easily use these helper functions with the gvec APIs.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/helper.h     | 4 ++--
>  target/arm/translate.c  | 4 ++--
>  target/arm/vfp_helper.c | 4 ++--
>  3 files changed, 6 insertions(+), 6 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 16/17] target/arm: Convert Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS to decodetree
  2020-05-12 16:39 ` [PATCH v2 16/17] target/arm: Convert Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS to decodetree Peter Maydell
@ 2020-05-13 20:20   ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-05-13 20:20 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/12/20 9:39 AM, Peter Maydell wrote:
> Convert the Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS 3-reg-same
> insns to decodetree. (These are all the remaining non-accumulation
> instructions in this group.)
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       |  6 +++
>  target/arm/translate-neon.inc.c | 70 +++++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 42 +-------------------
>  3 files changed, 78 insertions(+), 40 deletions(-)


Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

> +static bool trans_VMAXNM_fp_3s(DisasContext *s, arg_3same *a)
> +{
> +    if (!arm_dc_feature(s, ARM_FEATURE_V8)) {
> +            return false;
> +    }

Indentation is off.

> +static bool trans_VMINNM_fp_3s(DisasContext *s, arg_3same *a)
> +{
> +    if (!arm_dc_feature(s, ARM_FEATURE_V8)) {
> +            return false;
> +    }

Likewise.


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 03/17] target/arm: Convert Neon 64-bit element 3-reg-same insns
  2020-05-12 16:38 ` [PATCH v2 03/17] target/arm: Convert Neon 64-bit element 3-reg-same insns Peter Maydell
@ 2020-05-13 20:27   ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-05-13 20:27 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/12/20 9:38 AM, Peter Maydell wrote:
> Convert the 64-bit element insns in the 3-reg-same group
> to decodetree. This covers VQSHL, VRSHL and VQRSHL where
> size==0b11.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       | 21 ++++++++++++++++++
>  target/arm/translate-neon.inc.c | 24 +++++++++++++++++++++
>  target/arm/translate.c          | 38 ++-------------------------------
>  3 files changed, 47 insertions(+), 36 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 13/17] target/arm: Convert Neon fp VMUL, VMLA, VMLS 3-reg-same insns to decodetree
  2020-05-12 16:39 ` [PATCH v2 13/17] target/arm: Convert Neon fp VMUL, VMLA, VMLS " Peter Maydell
@ 2020-05-13 20:28   ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2020-05-13 20:28 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/12/20 9:39 AM, Peter Maydell wrote:
> Convert the Neon integer VMUL, VMLA, and VMLS 3-reg-same inssn to
> decodetree.
> 
> We don't have a gvec helper for multiply-accumulate, so VMLA and VMLS
> need a loop function do_3same_fp().  This takes a reads_vd parameter
> to do_3same_fp() which tells it to load the old value into vd before
> calling the callback function, in the same way that the do_vfp_3op_sp()
> and do_vfp_3op_dp() functions in translate-vfp.inc.c work. (The
> only uses in this patch pass reads_vd == true, but later commits
> will use reads_vd == false.)
> 
> This conversion fixes in passing an underdecoding for VMUL
> (originally reported by Fredrik Strupe <fredrik@strupe.net>): bit 1
> of the 'size' field must be 0.  The old decoder didn't enforce this,
> but the decodetree pattern does.
> 
> The gen_VMLA_fp_reg() function performs the addition operation
> with the operands in the opposite order to the old decoder:
> since Neon sets 'default NaN mode' float32_add operations are
> commutative so there is no behaviour difference, but putting
> them this way around matches the Arm ARM pseudocode and the
> required operation order for the subtraction in gen_VMLS_fp_reg().
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       |  3 ++
>  target/arm/translate-neon.inc.c | 81 +++++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 17 +------
>  3 files changed, 85 insertions(+), 16 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2020-05-13 20:30 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-12 16:38 [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same to decodetree Peter Maydell
2020-05-12 16:38 ` [PATCH v2 01/17] target/arm: Convert Neon 3-reg-same VQRDMLAH/VQRDMLSH " Peter Maydell
2020-05-13 16:39   ` Richard Henderson
2020-05-12 16:38 ` [PATCH v2 02/17] target/arm: Convert Neon 3-reg-same SHA " Peter Maydell
2020-05-12 16:38 ` [PATCH v2 03/17] target/arm: Convert Neon 64-bit element 3-reg-same insns Peter Maydell
2020-05-13 20:27   ` Richard Henderson
2020-05-12 16:38 ` [PATCH v2 04/17] target/arm: Convert Neon VHADD " Peter Maydell
2020-05-13 16:47   ` Richard Henderson
2020-05-12 16:38 ` [PATCH v2 05/17] target/arm: Convert Neon VABA/VABD 3-reg-same to decodetree Peter Maydell
2020-05-13 16:48   ` Richard Henderson
2020-05-12 16:38 ` [PATCH v2 06/17] target/arm: Convert Neon VRHADD, VHSUB 3-reg-same insns " Peter Maydell
2020-05-13 16:50   ` Richard Henderson
2020-05-12 16:38 ` [PATCH v2 07/17] target/arm: Convert Neon VQSHL, VRSHL, VQRSHL " Peter Maydell
2020-05-13 16:55   ` Richard Henderson
2020-05-12 16:38 ` [PATCH v2 08/17] target/arm: Convert Neon VPMAX/VPMIN " Peter Maydell
2020-05-12 16:38 ` [PATCH v2 09/17] target/arm: Convert Neon VPADD " Peter Maydell
2020-05-12 16:38 ` [PATCH v2 10/17] target/arm: Convert Neon VQDMULH/VQRDMULH 3-reg-same " Peter Maydell
2020-05-13 16:58   ` Richard Henderson
2020-05-12 16:38 ` [PATCH v2 11/17] target/arm: Convert Neon VADD, VSUB, VABD 3-reg-same insns " Peter Maydell
2020-05-13 20:06   ` Richard Henderson
2020-05-12 16:38 ` [PATCH v2 12/17] target/arm: Convert Neon VPMIN/VPMAX/VPADD float " Peter Maydell
2020-05-12 16:39 ` [PATCH v2 13/17] target/arm: Convert Neon fp VMUL, VMLA, VMLS " Peter Maydell
2020-05-13 20:28   ` Richard Henderson
2020-05-12 16:39 ` [PATCH v2 14/17] target/arm: Convert Neon 3-reg-same compare " Peter Maydell
2020-05-12 16:39 ` [PATCH v2 15/17] target/arm: Move 'env' argument of recps_f32 and rsqrts_f32 helpers to usual place Peter Maydell
2020-05-13 20:16   ` Richard Henderson
2020-05-12 16:39 ` [PATCH v2 16/17] target/arm: Convert Neon fp VMAX/VMIN/VMAXNM/VMINNM/VRECPS/VRSQRTS to decodetree Peter Maydell
2020-05-13 20:20   ` Richard Henderson
2020-05-12 16:39 ` [PATCH v2 17/17] target/arm: Convert NEON VFMA, VFMS 3-reg-same insns " Peter Maydell
2020-05-12 23:44 ` [PATCH v2 00/17] target/arm: Convert rest of Neon 3-reg-same " no-reply

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.