QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 00/10] target/arm: Convert 2-reg-shift and 1-reg-imm Neon insns to decodetree
@ 2020-05-15 14:20 Peter Maydell
  2020-05-15 14:20 ` [PATCH 01/10] target/arm: Remove unused GEN_NEON_INTEGER_OP macro Peter Maydell
                   ` (10 more replies)
  0 siblings, 11 replies; 24+ messages in thread
From: Peter Maydell @ 2020-05-15 14:20 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

This patchset converts the Neon insns in the 2-register-and-shift-amount
and 1-register-and-modified-immediate groups to decodetree.

Patch 1 is a trivial dead-macro-deletion that got missed in
the 3-reg-same conversion.

thanks
-- PMM

Peter Maydell (10):
  target/arm: Remove unused GEN_NEON_INTEGER_OP macro
  target/arm: Convert Neon VSHL and VSLI 2-reg-shift insn to decodetree
  target/arm: Convert Neon VSHR 2-reg-shift insns to decodetree
  target/arm: Convert Neon VSRA, VSRI, VRSHR, VRSRA 2-reg-shift insns to
    decodetree
  target/arm: Convert VQSHLU, VQSHL 2-reg-shift insns to decodetree
  target/arm: Convert Neon narrowing shifts with op==8 to decodetree
  target/arm: Convert Neon narrowing shifts with op==9 to decodetree
  target/arm: Convert Neon VSHLL, VMOVL to decodetree
  target/arm: Convert VCVT fixed-point ops to decodetree
  target/arm: Convert Neon one-register-and-immediate insns to
    decodetree

 target/arm/neon-dp.decode       | 280 ++++++++++++++
 target/arm/translate-neon.inc.c | 662 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 511 +-----------------------
 3 files changed, 944 insertions(+), 509 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 01/10] target/arm: Remove unused GEN_NEON_INTEGER_OP macro
  2020-05-15 14:20 [PATCH 00/10] target/arm: Convert 2-reg-shift and 1-reg-imm Neon insns to decodetree Peter Maydell
@ 2020-05-15 14:20 ` Peter Maydell
  2020-05-15 22:07   ` Richard Henderson
  2020-05-15 14:20 ` [PATCH 02/10] target/arm: Convert Neon VSHL and VSLI 2-reg-shift insn to decodetree Peter Maydell
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Peter Maydell @ 2020-05-15 14:20 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

The GEN_NEON_INTEGER_OP macro is no longer used; remove it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
Between Richard's cleanup and mine we deleted all the uses of this,
but since neither series on its own was sufficient to delete all
of them we failed to remove the macro definition when it finally
became unused.
---
 target/arm/translate.c | 23 -----------------------
 1 file changed, 23 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 4c9bb8b5ac0..c8296116d4b 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3034,29 +3034,6 @@ static inline void gen_neon_rsb(int size, TCGv_i32 t0, TCGv_i32 t1)
     default: return 1; \
     }} while (0)
 
-#define GEN_NEON_INTEGER_OP(name) do { \
-    switch ((size << 1) | u) { \
-    case 0: \
-        gen_helper_neon_##name##_s8(tmp, tmp, tmp2); \
-        break; \
-    case 1: \
-        gen_helper_neon_##name##_u8(tmp, tmp, tmp2); \
-        break; \
-    case 2: \
-        gen_helper_neon_##name##_s16(tmp, tmp, tmp2); \
-        break; \
-    case 3: \
-        gen_helper_neon_##name##_u16(tmp, tmp, tmp2); \
-        break; \
-    case 4: \
-        gen_helper_neon_##name##_s32(tmp, tmp, tmp2); \
-        break; \
-    case 5: \
-        gen_helper_neon_##name##_u32(tmp, tmp, tmp2); \
-        break; \
-    default: return 1; \
-    }} while (0)
-
 static TCGv_i32 neon_load_scratch(int scratch)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
-- 
2.20.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 02/10] target/arm: Convert Neon VSHL and VSLI 2-reg-shift insn to decodetree
  2020-05-15 14:20 [PATCH 00/10] target/arm: Convert 2-reg-shift and 1-reg-imm Neon insns to decodetree Peter Maydell
  2020-05-15 14:20 ` [PATCH 01/10] target/arm: Remove unused GEN_NEON_INTEGER_OP macro Peter Maydell
@ 2020-05-15 14:20 ` Peter Maydell
  2020-05-15 22:16   ` Richard Henderson
  2020-05-15 14:20 ` [PATCH 03/10] target/arm: Convert Neon VSHR 2-reg-shift insns " Peter Maydell
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Peter Maydell @ 2020-05-15 14:20 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the VSHL and VSLI insns from the Neon 2-registers-and-a-shift
group to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       | 27 +++++++++++++++++++++++
 target/arm/translate-neon.inc.c | 38 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 18 +++++++---------
 3 files changed, 73 insertions(+), 10 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 8beb1db768b..df7b4798a5a 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -199,3 +199,30 @@ VRECPS_fp_3s     1111 001 0 0 . 0 . .... .... 1111 ... 1 .... @3same_fp
 VRSQRTS_fp_3s    1111 001 0 0 . 1 . .... .... 1111 ... 1 .... @3same_fp
 VMAXNM_fp_3s     1111 001 1 0 . 0 . .... .... 1111 ... 1 .... @3same_fp
 VMINNM_fp_3s     1111 001 1 0 . 1 . .... .... 1111 ... 1 .... @3same_fp
+
+######################################################################
+# 2-reg-and-shift grouping:
+# 1111 001 U 1 D immH:3 immL:3 Vd:4 opc:4 L Q M 1 Vm:4
+######################################################################
+&2reg_shift vm vd q shift size
+
+@2reg_shift      .... ... . . . ...... .... .... . q:1 . . .... \
+                 &2reg_shift vm=%vm_dp vd=%vd_dp
+
+VSHL_2sh         1111 001 0 1 . shift:6     .... 0101 1 . . 1 .... \
+                 @2reg_shift size=3
+VSHL_2sh         1111 001 0 1 . 1 shift:5   .... 0101 0 . . 1 .... \
+                 @2reg_shift size=2
+VSHL_2sh         1111 001 0 1 . 01 shift:4  .... 0101 0 . . 1 .... \
+                 @2reg_shift size=1
+VSHL_2sh         1111 001 0 1 . 001 shift:3 .... 0101 0 . . 1 .... \
+                 @2reg_shift size=0
+
+VSLI_2sh         1111 001 1 1 . shift:6     .... 0101 1 . . 1 .... \
+                 @2reg_shift size=3
+VSLI_2sh         1111 001 1 1 . 1 shift:5   .... 0101 0 . . 1 .... \
+                 @2reg_shift size=2
+VSLI_2sh         1111 001 1 1 . 01 shift:4  .... 0101 0 . . 1 .... \
+                 @2reg_shift size=1
+VSLI_2sh         1111 001 1 1 . 001 shift:3 .... 0101 0 . . 1 .... \
+                 @2reg_shift size=0
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 3fe65a0b080..305213fe6d9 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1310,3 +1310,41 @@ static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
 DO_3S_FP_PAIR(VPADD, gen_helper_vfp_adds)
 DO_3S_FP_PAIR(VPMAX, gen_helper_vfp_maxs)
 DO_3S_FP_PAIR(VPMIN, gen_helper_vfp_mins)
+
+static bool do_vector_2sh(DisasContext *s, arg_2reg_shift *a, GVecGen2iFn *fn)
+{
+    /* Handle a 2-reg-shift insn which can be vectorized. */
+    int vec_size = a->q ? 16 : 8;
+    int rd_ofs = neon_reg_offset(a->vd, 0);
+    int rm_ofs = neon_reg_offset(a->vm, 0);
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fn(a->size, rd_ofs, rm_ofs, a->shift, vec_size, vec_size);
+    return true;
+}
+
+#define DO_2SH(INSN, FUNC)                                              \
+    static bool trans_##INSN##_2sh(DisasContext *s, arg_2reg_shift *a)  \
+    {                                                                   \
+        return do_vector_2sh(s, a, FUNC);                               \
+    }                                                                   \
+
+DO_2SH(VSHL, tcg_gen_gvec_shli)
+DO_2SH(VSLI, gen_gvec_sli)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index c8296116d4b..d0a4a08f6d9 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5294,6 +5294,14 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         if ((insn & 0x00380080) != 0) {
             /* Two registers and shift.  */
             op = (insn >> 8) & 0xf;
+
+            switch (op) {
+            case 5: /* VSHL, VSLI */
+                return 1; /* handled by decodetree */
+            default:
+                break;
+            }
+
             if (insn & (1 << 7)) {
                 /* 64-bit shift. */
                 if (op > 7) {
@@ -5387,16 +5395,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     gen_gvec_sri(size, rd_ofs, rm_ofs, shift,
                                  vec_size, vec_size);
                     return 0;
-
-                case 5: /* VSHL, VSLI */
-                    if (u) { /* VSLI */
-                        gen_gvec_sli(size, rd_ofs, rm_ofs, shift,
-                                     vec_size, vec_size);
-                    } else { /* VSHL */
-                        tcg_gen_gvec_shli(size, rd_ofs, rm_ofs, shift,
-                                          vec_size, vec_size);
-                    }
-                    return 0;
                 }
 
                 if (size == 3) {
-- 
2.20.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 03/10] target/arm: Convert Neon VSHR 2-reg-shift insns to decodetree
  2020-05-15 14:20 [PATCH 00/10] target/arm: Convert 2-reg-shift and 1-reg-imm Neon insns to decodetree Peter Maydell
  2020-05-15 14:20 ` [PATCH 01/10] target/arm: Remove unused GEN_NEON_INTEGER_OP macro Peter Maydell
  2020-05-15 14:20 ` [PATCH 02/10] target/arm: Convert Neon VSHL and VSLI 2-reg-shift insn to decodetree Peter Maydell
@ 2020-05-15 14:20 ` Peter Maydell
  2020-05-15 22:33   ` Richard Henderson
  2020-05-15 22:48   ` Richard Henderson
  2020-05-15 14:20 ` [PATCH 04/10] target/arm: Convert Neon VSRA, VSRI, VRSHR, VRSRA " Peter Maydell
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 24+ messages in thread
From: Peter Maydell @ 2020-05-15 14:20 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the VSHR 2-reg-shift insns to decodetree.

Note that unlike the legacy decoder, we present the right shift
amount to the trans_ function as a positive integer.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       | 24 +++++++++++++++++++
 target/arm/translate-neon.inc.c | 41 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 21 +----------------
 3 files changed, 66 insertions(+), 20 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index df7b4798a5a..648812395f1 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -209,6 +209,30 @@ VMINNM_fp_3s     1111 001 1 0 . 1 . .... .... 1111 ... 1 .... @3same_fp
 @2reg_shift      .... ... . . . ...... .... .... . q:1 . . .... \
                  &2reg_shift vm=%vm_dp vd=%vd_dp
 
+# Right shifts are encoded as N - shift, where N is the element size in bits.
+%neon_rshift_i6  16:6 !function=rsub_64
+%neon_rshift_i5  16:5 !function=rsub_32
+%neon_rshift_i4  16:4 !function=rsub_16
+%neon_rshift_i3  16:3 !function=rsub_8
+
+VSHR_S_2sh       1111 001 0 1 .  ......     .... 0000 1 . . 1 .... \
+                 @2reg_shift size=3 shift=%neon_rshift_i6
+VSHR_S_2sh       1111 001 0 1 . 1 .....     .... 0000 0 . . 1 .... \
+                 @2reg_shift size=2 shift=%neon_rshift_i5
+VSHR_S_2sh       1111 001 0 1 . 01 ....     .... 0000 0 . . 1 .... \
+                 @2reg_shift size=1 shift=%neon_rshift_i4
+VSHR_S_2sh       1111 001 0 1 . 001 ...     .... 0000 0 . . 1 .... \
+                 @2reg_shift size=0 shift=%neon_rshift_i3
+
+VSHR_U_2sh       1111 001 1 1 .  ......     .... 0000 1 . . 1 .... \
+                 @2reg_shift size=3 shift=%neon_rshift_i6
+VSHR_U_2sh       1111 001 1 1 . 1 .....     .... 0000 0 . . 1 .... \
+                 @2reg_shift size=2 shift=%neon_rshift_i5
+VSHR_U_2sh       1111 001 1 1 . 01 ....     .... 0000 0 . . 1 .... \
+                 @2reg_shift size=1 shift=%neon_rshift_i4
+VSHR_U_2sh       1111 001 1 1 . 001 ...     .... 0000 0 . . 1 .... \
+                 @2reg_shift size=0 shift=%neon_rshift_i3
+
 VSHL_2sh         1111 001 0 1 . shift:6     .... 0101 1 . . 1 .... \
                  @2reg_shift size=3
 VSHL_2sh         1111 001 0 1 . 1 shift:5   .... 0101 0 . . 1 .... \
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 305213fe6d9..0475696835f 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -31,6 +31,24 @@ static inline int plus1(DisasContext *s, int x)
     return x + 1;
 }
 
+static inline int rsub_64(DisasContext *s, int x)
+{
+    return 64 - x;
+}
+
+static inline int rsub_32(DisasContext *s, int x)
+{
+    return 32 - x;
+}
+static inline int rsub_16(DisasContext *s, int x)
+{
+    return 16 - x;
+}
+static inline int rsub_8(DisasContext *s, int x)
+{
+    return 8 - x;
+}
+
 /* Include the generated Neon decoder */
 #include "decode-neon-dp.inc.c"
 #include "decode-neon-ls.inc.c"
@@ -1348,3 +1366,26 @@ static bool do_vector_2sh(DisasContext *s, arg_2reg_shift *a, GVecGen2iFn *fn)
 
 DO_2SH(VSHL, tcg_gen_gvec_shli)
 DO_2SH(VSLI, gen_gvec_sli)
+
+static bool trans_VSHR_S_2sh(DisasContext *s, arg_2reg_shift *a)
+{
+    /* Signed shift out of range results in all-sign-bits */
+    a->shift = MIN(a->shift, (8 << a->size) - 1);
+    return do_vector_2sh(s, a, tcg_gen_gvec_sari);
+}
+
+static void gen_zero_rd_2sh(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
+                            int64_t shift, uint32_t oprsz, uint32_t maxsz)
+{
+    tcg_gen_gvec_dup_imm(vece, rd_ofs, oprsz, maxsz, 0);
+}
+
+static bool trans_VSHR_U_2sh(DisasContext *s, arg_2reg_shift *a)
+{
+    /* Shift out of range is architecturally valid and results in zero. */
+    if (a->shift >= (8 << a->size)) {
+        return do_vector_2sh(s, a, gen_zero_rd_2sh);
+    } else {
+        return do_vector_2sh(s, a, tcg_gen_gvec_shri);
+    }
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index d0a4a08f6d9..f2ccab1b21c 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5296,6 +5296,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             op = (insn >> 8) & 0xf;
 
             switch (op) {
+            case 0: /* VSHR */
             case 5: /* VSHL, VSLI */
                 return 1; /* handled by decodetree */
             default:
@@ -5330,26 +5331,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 }
 
                 switch (op) {
-                case 0:  /* VSHR */
-                    /* Right shift comes here negative.  */
-                    shift = -shift;
-                    /* Shifts larger than the element size are architecturally
-                     * valid.  Unsigned results in all zeros; signed results
-                     * in all sign bits.
-                     */
-                    if (!u) {
-                        tcg_gen_gvec_sari(size, rd_ofs, rm_ofs,
-                                          MIN(shift, (8 << size) - 1),
-                                          vec_size, vec_size);
-                    } else if (shift >= 8 << size) {
-                        tcg_gen_gvec_dup_imm(MO_8, rd_ofs, vec_size,
-                                             vec_size, 0);
-                    } else {
-                        tcg_gen_gvec_shri(size, rd_ofs, rm_ofs, shift,
-                                          vec_size, vec_size);
-                    }
-                    return 0;
-
                 case 1:  /* VSRA */
                     /* Right shift comes here negative.  */
                     shift = -shift;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 04/10] target/arm: Convert Neon VSRA, VSRI, VRSHR, VRSRA 2-reg-shift insns to decodetree
  2020-05-15 14:20 [PATCH 00/10] target/arm: Convert 2-reg-shift and 1-reg-imm Neon insns to decodetree Peter Maydell
                   ` (2 preceding siblings ...)
  2020-05-15 14:20 ` [PATCH 03/10] target/arm: Convert Neon VSHR 2-reg-shift insns " Peter Maydell
@ 2020-05-15 14:20 ` Peter Maydell
  2020-05-15 22:50   ` Richard Henderson
  2020-05-15 14:20 ` [PATCH 05/10] target/arm: Convert VQSHLU, VQSHL " Peter Maydell
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Peter Maydell @ 2020-05-15 14:20 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the VSRA, VSRI, VRSHR, VRSRA 2-reg-shift insns to decodetree.
(These are the last instructions in the group that are vectorized;
the rest all require looping over each element.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       | 63 +++++++++++++++++++++++++++++++++
 target/arm/translate-neon.inc.c |  7 ++++
 target/arm/translate.c          | 52 +++------------------------
 3 files changed, 74 insertions(+), 48 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 648812395f1..3ed10d1524e 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -233,6 +233,69 @@ VSHR_U_2sh       1111 001 1 1 . 01 ....     .... 0000 0 . . 1 .... \
 VSHR_U_2sh       1111 001 1 1 . 001 ...     .... 0000 0 . . 1 .... \
                  @2reg_shift size=0 shift=%neon_rshift_i3
 
+VSRA_S_2sh       1111 001 0 1 .  ......     .... 0001 1 . . 1 .... \
+                 @2reg_shift size=3 shift=%neon_rshift_i6
+VSRA_S_2sh       1111 001 0 1 . 1 .....     .... 0001 0 . . 1 .... \
+                 @2reg_shift size=2 shift=%neon_rshift_i5
+VSRA_S_2sh       1111 001 0 1 . 01 ....     .... 0001 0 . . 1 .... \
+                 @2reg_shift size=1 shift=%neon_rshift_i4
+VSRA_S_2sh       1111 001 0 1 . 001 ...     .... 0001 0 . . 1 .... \
+                 @2reg_shift size=0 shift=%neon_rshift_i3
+
+VSRA_U_2sh       1111 001 1 1 .  ......     .... 0001 1 . . 1 .... \
+                 @2reg_shift size=3 shift=%neon_rshift_i6
+VSRA_U_2sh       1111 001 1 1 . 1 .....     .... 0001 0 . . 1 .... \
+                 @2reg_shift size=2 shift=%neon_rshift_i5
+VSRA_U_2sh       1111 001 1 1 . 01 ....     .... 0001 0 . . 1 .... \
+                 @2reg_shift size=1 shift=%neon_rshift_i4
+VSRA_U_2sh       1111 001 1 1 . 001 ...     .... 0001 0 . . 1 .... \
+                 @2reg_shift size=0 shift=%neon_rshift_i3
+
+VRSHR_S_2sh      1111 001 0 1 .  ......     .... 0010 1 . . 1 .... \
+                 @2reg_shift size=3 shift=%neon_rshift_i6
+VRSHR_S_2sh      1111 001 0 1 . 1 .....     .... 0010 0 . . 1 .... \
+                 @2reg_shift size=2 shift=%neon_rshift_i5
+VRSHR_S_2sh      1111 001 0 1 . 01 ....     .... 0010 0 . . 1 .... \
+                 @2reg_shift size=1 shift=%neon_rshift_i4
+VRSHR_S_2sh      1111 001 0 1 . 001 ...     .... 0010 0 . . 1 .... \
+                 @2reg_shift size=0 shift=%neon_rshift_i3
+
+VRSHR_U_2sh      1111 001 1 1 .  ......     .... 0010 1 . . 1 .... \
+                 @2reg_shift size=3 shift=%neon_rshift_i6
+VRSHR_U_2sh      1111 001 1 1 . 1 .....     .... 0010 0 . . 1 .... \
+                 @2reg_shift size=2 shift=%neon_rshift_i5
+VRSHR_U_2sh      1111 001 1 1 . 01 ....     .... 0010 0 . . 1 .... \
+                 @2reg_shift size=1 shift=%neon_rshift_i4
+VRSHR_U_2sh      1111 001 1 1 . 001 ...     .... 0010 0 . . 1 .... \
+                 @2reg_shift size=0 shift=%neon_rshift_i3
+
+VRSRA_S_2sh      1111 001 0 1 .  ......     .... 0011 1 . . 1 .... \
+                 @2reg_shift size=3 shift=%neon_rshift_i6
+VRSRA_S_2sh      1111 001 0 1 . 1 .....     .... 0011 0 . . 1 .... \
+                 @2reg_shift size=2 shift=%neon_rshift_i5
+VRSRA_S_2sh      1111 001 0 1 . 01 ....     .... 0011 0 . . 1 .... \
+                 @2reg_shift size=1 shift=%neon_rshift_i4
+VRSRA_S_2sh      1111 001 0 1 . 001 ...     .... 0011 0 . . 1 .... \
+                 @2reg_shift size=0 shift=%neon_rshift_i3
+
+VRSRA_U_2sh      1111 001 1 1 .  ......     .... 0011 1 . . 1 .... \
+                 @2reg_shift size=3 shift=%neon_rshift_i6
+VRSRA_U_2sh      1111 001 1 1 . 1 .....     .... 0011 0 . . 1 .... \
+                 @2reg_shift size=2 shift=%neon_rshift_i5
+VRSRA_U_2sh      1111 001 1 1 . 01 ....     .... 0011 0 . . 1 .... \
+                 @2reg_shift size=1 shift=%neon_rshift_i4
+VRSRA_U_2sh      1111 001 1 1 . 001 ...     .... 0011 0 . . 1 .... \
+                 @2reg_shift size=0 shift=%neon_rshift_i3
+
+VSRI_2sh         1111 001 1 1 .  ......     .... 0100 1 . . 1 .... \
+                 @2reg_shift size=3 shift=%neon_rshift_i6
+VSRI_2sh         1111 001 1 1 . 1 .....     .... 0100 0 . . 1 .... \
+                 @2reg_shift size=2 shift=%neon_rshift_i5
+VSRI_2sh         1111 001 1 1 . 01 ....     .... 0100 0 . . 1 .... \
+                 @2reg_shift size=1 shift=%neon_rshift_i4
+VSRI_2sh         1111 001 1 1 . 001 ...     .... 0100 0 . . 1 .... \
+                 @2reg_shift size=0 shift=%neon_rshift_i3
+
 VSHL_2sh         1111 001 0 1 . shift:6     .... 0101 1 . . 1 .... \
                  @2reg_shift size=3
 VSHL_2sh         1111 001 0 1 . 1 shift:5   .... 0101 0 . . 1 .... \
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 0475696835f..f4d42683aea 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1366,6 +1366,13 @@ static bool do_vector_2sh(DisasContext *s, arg_2reg_shift *a, GVecGen2iFn *fn)
 
 DO_2SH(VSHL, tcg_gen_gvec_shli)
 DO_2SH(VSLI, gen_gvec_sli)
+DO_2SH(VSRI, gen_gvec_sri)
+DO_2SH(VSRA_S, gen_gvec_ssra)
+DO_2SH(VSRA_U, gen_gvec_usra)
+DO_2SH(VRSHR_S, gen_gvec_srshr)
+DO_2SH(VRSHR_U, gen_gvec_urshr)
+DO_2SH(VRSRA_S, gen_gvec_srsra)
+DO_2SH(VRSRA_U, gen_gvec_ursra)
 
 static bool trans_VSHR_S_2sh(DisasContext *s, arg_2reg_shift *a)
 {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index f2ccab1b21c..4a55986aad9 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5297,6 +5297,10 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
 
             switch (op) {
             case 0: /* VSHR */
+            case 1: /* VSRA */
+            case 2: /* VRSHR */
+            case 3: /* VRSRA */
+            case 4: /* VSRI */
             case 5: /* VSHL, VSLI */
                 return 1; /* handled by decodetree */
             default:
@@ -5330,54 +5334,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     shift = shift - (1 << (size + 3));
                 }
 
-                switch (op) {
-                case 1:  /* VSRA */
-                    /* Right shift comes here negative.  */
-                    shift = -shift;
-                    if (u) {
-                        gen_gvec_usra(size, rd_ofs, rm_ofs, shift,
-                                      vec_size, vec_size);
-                    } else {
-                        gen_gvec_ssra(size, rd_ofs, rm_ofs, shift,
-                                      vec_size, vec_size);
-                    }
-                    return 0;
-
-                case 2: /* VRSHR */
-                    /* Right shift comes here negative.  */
-                    shift = -shift;
-                    if (u) {
-                        gen_gvec_urshr(size, rd_ofs, rm_ofs, shift,
-                                       vec_size, vec_size);
-                    } else {
-                        gen_gvec_srshr(size, rd_ofs, rm_ofs, shift,
-                                       vec_size, vec_size);
-                    }
-                    return 0;
-
-                case 3: /* VRSRA */
-                    /* Right shift comes here negative.  */
-                    shift = -shift;
-                    if (u) {
-                        gen_gvec_ursra(size, rd_ofs, rm_ofs, shift,
-                                       vec_size, vec_size);
-                    } else {
-                        gen_gvec_srsra(size, rd_ofs, rm_ofs, shift,
-                                       vec_size, vec_size);
-                    }
-                    return 0;
-
-                case 4: /* VSRI */
-                    if (!u) {
-                        return 1;
-                    }
-                    /* Right shift comes here negative.  */
-                    shift = -shift;
-                    gen_gvec_sri(size, rd_ofs, rm_ofs, shift,
-                                 vec_size, vec_size);
-                    return 0;
-                }
-
                 if (size == 3) {
                     count = q + 1;
                 } else {
-- 
2.20.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 05/10] target/arm: Convert VQSHLU, VQSHL 2-reg-shift insns to decodetree
  2020-05-15 14:20 [PATCH 00/10] target/arm: Convert 2-reg-shift and 1-reg-imm Neon insns to decodetree Peter Maydell
                   ` (3 preceding siblings ...)
  2020-05-15 14:20 ` [PATCH 04/10] target/arm: Convert Neon VSRA, VSRI, VRSHR, VRSRA " Peter Maydell
@ 2020-05-15 14:20 ` Peter Maydell
  2020-05-15 22:55   ` Richard Henderson
  2020-05-15 14:20 ` [PATCH 06/10] target/arm: Convert Neon narrowing shifts with op==8 " Peter Maydell
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Peter Maydell @ 2020-05-15 14:20 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the VQSHLU and QVSHL 2-reg-shift insns to decodetree.
These are the last of the simple shift-by-immediate insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  27 ++++++++
 target/arm/translate-neon.inc.c | 108 +++++++++++++++++++++++++++++++
 target/arm/translate.c          | 110 +-------------------------------
 3 files changed, 138 insertions(+), 107 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 3ed10d1524e..6456b53a690 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -313,3 +313,30 @@ VSLI_2sh         1111 001 1 1 . 01 shift:4  .... 0101 0 . . 1 .... \
                  @2reg_shift size=1
 VSLI_2sh         1111 001 1 1 . 001 shift:3 .... 0101 0 . . 1 .... \
                  @2reg_shift size=0
+
+VQSHLU_64_2sh    1111 001 1 1 . shift:6     .... 0110 1 . . 1 .... \
+                 @2reg_shift size=3
+VQSHLU_2sh       1111 001 1 1 . 1 shift:5   .... 0110 0 . . 1 .... \
+                 @2reg_shift size=2
+VQSHLU_2sh       1111 001 1 1 . 01 shift:4  .... 0110 0 . . 1 .... \
+                 @2reg_shift size=1
+VQSHLU_2sh       1111 001 1 1 . 001 shift:3 .... 0110 0 . . 1 .... \
+                 @2reg_shift size=0
+
+VQSHL_S_64_2sh   1111 001 0 1 . shift:6     .... 0111 1 . . 1 .... \
+                 @2reg_shift size=3
+VQSHL_S_2sh      1111 001 0 1 . 1 shift:5   .... 0111 0 . . 1 .... \
+                 @2reg_shift size=2
+VQSHL_S_2sh      1111 001 0 1 . 01 shift:4  .... 0111 0 . . 1 .... \
+                 @2reg_shift size=1
+VQSHL_S_2sh      1111 001 0 1 . 001 shift:3 .... 0111 0 . . 1 .... \
+                 @2reg_shift size=0
+
+VQSHL_U_64_2sh   1111 001 1 1 . shift:6     .... 0111 1 . . 1 .... \
+                 @2reg_shift size=3
+VQSHL_U_2sh      1111 001 1 1 . 1 shift:5   .... 0111 0 . . 1 .... \
+                 @2reg_shift size=2
+VQSHL_U_2sh      1111 001 1 1 . 01 shift:4  .... 0111 0 . . 1 .... \
+                 @2reg_shift size=1
+VQSHL_U_2sh      1111 001 1 1 . 001 shift:3 .... 0111 0 . . 1 .... \
+                 @2reg_shift size=0
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index f4d42683aea..396db55565f 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1396,3 +1396,111 @@ static bool trans_VSHR_U_2sh(DisasContext *s, arg_2reg_shift *a)
         return do_vector_2sh(s, a, tcg_gen_gvec_shri);
     }
 }
+
+static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
+                             NeonGenTwo64OpEnvFn *fn)
+{
+    /*
+     * 2-reg-and-shift operations, size == 3 case, where the
+     * function needs to be passed cpu_env.
+     */
+    TCGv_i64 constimm;
+    int pass;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    /*
+     * To avoid excessive duplication of ops we implement shift
+     * by immediate using the variable shift operations.
+     */
+    constimm = tcg_const_i64(dup_const(a->size, a->shift));
+
+    for (pass = 0; pass < a->q + 1; pass++) {
+        TCGv_i64 tmp = tcg_temp_new_i64();
+
+        neon_load_reg64(tmp, a->vm + pass);
+        fn(tmp, cpu_env, tmp, constimm);
+        neon_store_reg64(tmp, a->vd + pass);
+    }
+    tcg_temp_free_i64(constimm);
+    return true;
+}
+
+static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
+                             NeonGenTwoOpEnvFn *fn)
+{
+    /*
+     * 2-reg-and-shift operations, size < 3 case, where the
+     * helper needs to be passed cpu_env.
+     */
+    TCGv_i32 constimm;
+    int pass;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    /*
+     * To avoid excessive duplication of ops we implement shift
+     * by immediate using the variable shift operations.
+     */
+    constimm = tcg_const_i32(dup_const(a->size, a->shift));
+
+    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
+        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
+        fn(tmp, cpu_env, tmp, constimm);
+        neon_store_reg(a->vd, pass, tmp);
+    }
+    tcg_temp_free_i32(constimm);
+    return true;
+}
+
+#define DO_2SHIFT_ENV(INSN, FUNC)                                       \
+    static bool trans_##INSN##_64_2sh(DisasContext *s, arg_2reg_shift *a) \
+    {                                                                   \
+        return do_2shift_env_64(s, a, gen_helper_neon_##FUNC##64);      \
+    }                                                                   \
+    static bool trans_##INSN##_2sh(DisasContext *s, arg_2reg_shift *a)  \
+    {                                                                   \
+        static NeonGenTwoOpEnvFn * const fns[] = {                      \
+            gen_helper_neon_##FUNC##8,                                  \
+            gen_helper_neon_##FUNC##16,                                 \
+            gen_helper_neon_##FUNC##32,                                 \
+        };                                                              \
+        assert(a->size < ARRAY_SIZE(fns));                              \
+        return do_2shift_env_32(s, a, fns[a->size]);                    \
+    }
+
+DO_2SHIFT_ENV(VQSHLU, qshlu_s)
+DO_2SHIFT_ENV(VQSHL_U, qshl_u)
+DO_2SHIFT_ENV(VQSHL_S, qshl_s)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 4a55986aad9..d711d39eb9d 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3011,29 +3011,6 @@ static inline void gen_neon_rsb(int size, TCGv_i32 t0, TCGv_i32 t1)
     }
 }
 
-#define GEN_NEON_INTEGER_OP_ENV(name) do { \
-    switch ((size << 1) | u) { \
-    case 0: \
-        gen_helper_neon_##name##_s8(tmp, cpu_env, tmp, tmp2); \
-        break; \
-    case 1: \
-        gen_helper_neon_##name##_u8(tmp, cpu_env, tmp, tmp2); \
-        break; \
-    case 2: \
-        gen_helper_neon_##name##_s16(tmp, cpu_env, tmp, tmp2); \
-        break; \
-    case 3: \
-        gen_helper_neon_##name##_u16(tmp, cpu_env, tmp, tmp2); \
-        break; \
-    case 4: \
-        gen_helper_neon_##name##_s32(tmp, cpu_env, tmp, tmp2); \
-        break; \
-    case 5: \
-        gen_helper_neon_##name##_u32(tmp, cpu_env, tmp, tmp2); \
-        break; \
-    default: return 1; \
-    }} while (0)
-
 static TCGv_i32 neon_load_scratch(int scratch)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
@@ -5252,7 +5229,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     int size;
     int shift;
     int pass;
-    int count;
     int u;
     int vec_size;
     uint32_t imm;
@@ -5302,6 +5278,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             case 3: /* VRSRA */
             case 4: /* VSRI */
             case 5: /* VSHL, VSLI */
+            case 6: /* VQSHLU */
+            case 7: /* VQSHL */
                 return 1; /* handled by decodetree */
             default:
                 break;
@@ -5319,89 +5297,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     size--;
             }
             shift = (insn >> 16) & ((1 << (3 + size)) - 1);
-            if (op < 8) {
-                /* Shift by immediate:
-                   VSHR, VSRA, VRSHR, VRSRA, VSRI, VSHL, VQSHL, VQSHLU.  */
-                if (q && ((rd | rm) & 1)) {
-                    return 1;
-                }
-                if (!u && (op == 4 || op == 6)) {
-                    return 1;
-                }
-                /* Right shifts are encoded as N - shift, where N is the
-                   element size in bits.  */
-                if (op <= 4) {
-                    shift = shift - (1 << (size + 3));
-                }
-
-                if (size == 3) {
-                    count = q + 1;
-                } else {
-                    count = q ? 4: 2;
-                }
-
-                /* To avoid excessive duplication of ops we implement shift
-                 * by immediate using the variable shift operations.
-                  */
-                imm = dup_const(size, shift);
-
-                for (pass = 0; pass < count; pass++) {
-                    if (size == 3) {
-                        neon_load_reg64(cpu_V0, rm + pass);
-                        tcg_gen_movi_i64(cpu_V1, imm);
-                        switch (op) {
-                        case 6: /* VQSHLU */
-                            gen_helper_neon_qshlu_s64(cpu_V0, cpu_env,
-                                                      cpu_V0, cpu_V1);
-                            break;
-                        case 7: /* VQSHL */
-                            if (u) {
-                                gen_helper_neon_qshl_u64(cpu_V0, cpu_env,
-                                                         cpu_V0, cpu_V1);
-                            } else {
-                                gen_helper_neon_qshl_s64(cpu_V0, cpu_env,
-                                                         cpu_V0, cpu_V1);
-                            }
-                            break;
-                        default:
-                            g_assert_not_reached();
-                        }
-                        neon_store_reg64(cpu_V0, rd + pass);
-                    } else { /* size < 3 */
-                        /* Operands in T0 and T1.  */
-                        tmp = neon_load_reg(rm, pass);
-                        tmp2 = tcg_temp_new_i32();
-                        tcg_gen_movi_i32(tmp2, imm);
-                        switch (op) {
-                        case 6: /* VQSHLU */
-                            switch (size) {
-                            case 0:
-                                gen_helper_neon_qshlu_s8(tmp, cpu_env,
-                                                         tmp, tmp2);
-                                break;
-                            case 1:
-                                gen_helper_neon_qshlu_s16(tmp, cpu_env,
-                                                          tmp, tmp2);
-                                break;
-                            case 2:
-                                gen_helper_neon_qshlu_s32(tmp, cpu_env,
-                                                          tmp, tmp2);
-                                break;
-                            default:
-                                abort();
-                            }
-                            break;
-                        case 7: /* VQSHL */
-                            GEN_NEON_INTEGER_OP_ENV(qshl);
-                            break;
-                        default:
-                            g_assert_not_reached();
-                        }
-                        tcg_temp_free_i32(tmp2);
-                        neon_store_reg(rd, pass, tmp);
-                    }
-                } /* for pass */
-            } else if (op < 10) {
+            if (op < 10) {
                 /* Shift by immediate and narrow:
                    VSHRN, VRSHRN, VQSHRN, VQRSHRN.  */
                 int input_unsigned = (op == 8) ? !u : u;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 06/10] target/arm: Convert Neon narrowing shifts with op==8 to decodetree
  2020-05-15 14:20 [PATCH 00/10] target/arm: Convert 2-reg-shift and 1-reg-imm Neon insns to decodetree Peter Maydell
                   ` (4 preceding siblings ...)
  2020-05-15 14:20 ` [PATCH 05/10] target/arm: Convert VQSHLU, VQSHL " Peter Maydell
@ 2020-05-15 14:20 ` Peter Maydell
  2020-05-16  2:01   ` Richard Henderson
  2020-05-15 14:20 ` [PATCH 07/10] target/arm: Convert Neon narrowing shifts with op==9 " Peter Maydell
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Peter Maydell @ 2020-05-15 14:20 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon narrowing shifts where op==8 to decodetree:
 * VSHRN
 * VRSHRN
 * VQSHRUN
 * VQRSHRUN

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  32 ++++++
 target/arm/translate-neon.inc.c | 168 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |   1 +
 3 files changed, 201 insertions(+)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 6456b53a690..f8d19c5819c 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -208,6 +208,10 @@ VMINNM_fp_3s     1111 001 1 0 . 1 . .... .... 1111 ... 1 .... @3same_fp
 
 @2reg_shift      .... ... . . . ...... .... .... . q:1 . . .... \
                  &2reg_shift vm=%vm_dp vd=%vd_dp
+@2reg_shift_q0   .... ... . . . ...... .... .... . 0 . . .... \
+                 &2reg_shift vm=%vm_dp vd=%vd_dp q=0
+@2reg_shift_q1   .... ... . . . ...... .... .... . 1 . . .... \
+                 &2reg_shift vm=%vm_dp vd=%vd_dp q=1
 
 # Right shifts are encoded as N - shift, where N is the element size in bits.
 %neon_rshift_i6  16:6 !function=rsub_64
@@ -340,3 +344,31 @@ VQSHL_U_2sh      1111 001 1 1 . 01 shift:4  .... 0111 0 . . 1 .... \
                  @2reg_shift size=1
 VQSHL_U_2sh      1111 001 1 1 . 001 shift:3 .... 0111 0 . . 1 .... \
                  @2reg_shift size=0
+
+VSHRN_64_2sh     1111 001 0 1 . 1 .....     .... 1000 0 . . 1 .... \
+                 @2reg_shift_q0 size=3 shift=%neon_rshift_i5
+VSHRN_32_2sh     1111 001 0 1 . 01 ....     .... 1000 0 . . 1 .... \
+                 @2reg_shift_q0 size=2 shift=%neon_rshift_i4
+VSHRN_16_2sh     1111 001 0 1 . 001 ...     .... 1000 0 . . 1 .... \
+                 @2reg_shift_q0 size=1 shift=%neon_rshift_i3
+
+VRSHRN_64_2sh    1111 001 0 1 . 1 .....     .... 1000 0 . . 1 .... \
+                 @2reg_shift_q1 size=3 shift=%neon_rshift_i5
+VRSHRN_32_2sh    1111 001 0 1 . 01 ....     .... 1000 0 . . 1 .... \
+                 @2reg_shift_q1 size=2 shift=%neon_rshift_i4
+VRSHRN_16_2sh    1111 001 0 1 . 001 ...     .... 1000 0 . . 1 .... \
+                 @2reg_shift_q1 size=1 shift=%neon_rshift_i3
+
+VQSHRUN_64_2sh   1111 001 1 1 . 1 .....     .... 1000 0 . . 1 .... \
+                 @2reg_shift_q0 size=3 shift=%neon_rshift_i5
+VQSHRUN_32_2sh   1111 001 1 1 . 01 ....     .... 1000 0 . . 1 .... \
+                 @2reg_shift_q0 size=2 shift=%neon_rshift_i4
+VQSHRUN_16_2sh   1111 001 1 1 . 001 ...     .... 1000 0 . . 1 .... \
+                 @2reg_shift_q0 size=1 shift=%neon_rshift_i3
+
+VQRSHRUN_64_2sh  1111 001 1 1 . 1 .....     .... 1000 0 . . 1 .... \
+                 @2reg_shift_q1 size=3 shift=%neon_rshift_i5
+VQRSHRUN_32_2sh  1111 001 1 1 . 01 ....     .... 1000 0 . . 1 .... \
+                 @2reg_shift_q1 size=2 shift=%neon_rshift_i4
+VQRSHRUN_16_2sh  1111 001 1 1 . 001 ...     .... 1000 0 . . 1 .... \
+                 @2reg_shift_q1 size=1 shift=%neon_rshift_i3
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 396db55565f..18ea7255e38 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1504,3 +1504,171 @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
 DO_2SHIFT_ENV(VQSHLU, qshlu_s)
 DO_2SHIFT_ENV(VQSHL_U, qshl_u)
 DO_2SHIFT_ENV(VQSHL_S, qshl_s)
+
+static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
+                                NeonGenTwo64OpFn *shiftfn,
+                                NeonGenNarrowEnvFn *narrowfn)
+{
+    /* 2-reg-and-shift narrowing-shift operations, size == 3 case */
+    TCGv_i64 constimm, rm1, rm2;
+    TCGv_i32 rd;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (a->vm & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    /*
+     * This is always a right shift, and the shiftfn is always a
+     * left-shift helper, which thus needs the negated shift count.
+     */
+    constimm = tcg_const_i64(-a->shift);
+    rm1 = tcg_temp_new_i64();
+    rm2 = tcg_temp_new_i64();
+
+    /* Load both inputs first to avoid potential overwrite if rm == rd */
+    neon_load_reg64(rm1, a->vm);
+    neon_load_reg64(rm2, a->vm + 1);
+
+    shiftfn(rm1, rm1, constimm);
+    rd = tcg_temp_new_i32();
+    narrowfn(rd, cpu_env, rm1);
+    neon_store_reg(a->vd, 0, rd);
+
+    shiftfn(rm2, rm2, constimm);
+    rd = tcg_temp_new_i32();
+    narrowfn(rd, cpu_env, rm2);
+    neon_store_reg(a->vd, 1, rd);
+
+    tcg_temp_free_i64(rm1);
+    tcg_temp_free_i64(rm2);
+    tcg_temp_free_i64(constimm);
+
+    return true;
+}
+
+static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
+                                NeonGenTwoOpFn *shiftfn,
+                                NeonGenNarrowEnvFn *narrowfn)
+{
+    /* 2-reg-and-shift narrowing-shift operations, size < 3 case */
+    TCGv_i32 constimm, rm1, rm2, rm3, rm4;
+    TCGv_i64 rtmp;
+    uint32_t imm;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (a->vm & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    /*
+     * This is always a right shift, and the shiftfn is always a
+     * left-shift helper, which thus needs the negated shift count
+     * duplicated into each lane of the immediate value.
+     */
+    if (a->size == 1) {
+        imm = (uint16_t)(-a->shift);
+        imm |= imm << 16;
+    } else {
+        /* size == 2 */
+        imm = -a->shift;
+    }
+    constimm = tcg_const_i32(imm);
+
+    /* Load all inputs first to avoid potential overwrite */
+    rm1 = neon_load_reg(a->vm, 0);
+    rm2 = neon_load_reg(a->vm, 1);
+    rm3 = neon_load_reg(a->vm + 1, 0);
+    rm4 = neon_load_reg(a->vm + 1, 1);
+    rtmp = tcg_temp_new_i64();
+
+    // todo expand out the shift-narrow and the narrow-op
+    shiftfn(rm1, rm1, constimm);
+    shiftfn(rm2, rm2, constimm);
+
+    tcg_gen_concat_i32_i64(rtmp, rm1, rm2);
+    tcg_temp_free_i32(rm2);
+
+    narrowfn(rm1, cpu_env, rtmp);
+    neon_store_reg(a->vd, 0, rm1);
+
+    shiftfn(rm3, rm3, constimm);
+    shiftfn(rm4, rm4, constimm);
+    tcg_temp_free_i32(constimm);
+
+    tcg_gen_concat_i32_i64(rtmp, rm3, rm4);
+    tcg_temp_free_i32(rm4);
+
+    narrowfn(rm3, cpu_env, rtmp);
+    tcg_temp_free_i64(rtmp);
+    neon_store_reg(a->vd, 1, rm3);
+    return true;
+}
+
+#define DO_2SN_64(INSN, FUNC, NARROWFUNC)                               \
+    static bool trans_##INSN##_2sh(DisasContext *s, arg_2reg_shift *a)  \
+    {                                                                   \
+        return do_2shift_narrow_64(s, a, FUNC, NARROWFUNC);             \
+    }
+#define DO_2SN_32(INSN, FUNC, NARROWFUNC)                               \
+    static bool trans_##INSN##_2sh(DisasContext *s, arg_2reg_shift *a)  \
+    {                                                                   \
+        return do_2shift_narrow_32(s, a, FUNC, NARROWFUNC);             \
+    }
+
+static void gen_neon_narrow_u32(TCGv_i32 dest, TCGv_ptr env, TCGv_i64 src)
+{
+    tcg_gen_extrl_i64_i32(dest, src);
+}
+
+static void gen_neon_narrow_u16(TCGv_i32 dest, TCGv_ptr env, TCGv_i64 src)
+{
+    gen_helper_neon_narrow_u16(dest, src);
+}
+
+static void gen_neon_narrow_u8(TCGv_i32 dest, TCGv_ptr env, TCGv_i64 src)
+{
+    gen_helper_neon_narrow_u8(dest, src);
+}
+
+DO_2SN_64(VSHRN_64, gen_ushl_i64, gen_neon_narrow_u32)
+DO_2SN_32(VSHRN_32, gen_ushl_i32, gen_neon_narrow_u16)
+DO_2SN_32(VSHRN_16, gen_helper_neon_shl_u16, gen_neon_narrow_u8)
+
+DO_2SN_64(VRSHRN_64, gen_helper_neon_rshl_u64, gen_neon_narrow_u32)
+DO_2SN_32(VRSHRN_32, gen_helper_neon_rshl_u32, gen_neon_narrow_u16)
+DO_2SN_32(VRSHRN_16, gen_helper_neon_rshl_u16, gen_neon_narrow_u8)
+
+DO_2SN_64(VQSHRUN_64, gen_sshl_i64, gen_helper_neon_unarrow_sat32)
+DO_2SN_32(VQSHRUN_32, gen_sshl_i32, gen_helper_neon_unarrow_sat16)
+DO_2SN_32(VQSHRUN_16, gen_helper_neon_shl_s16, gen_helper_neon_unarrow_sat8)
+
+DO_2SN_64(VQRSHRUN_64, gen_helper_neon_rshl_s64, gen_helper_neon_unarrow_sat32)
+DO_2SN_32(VQRSHRUN_32, gen_helper_neon_rshl_s32, gen_helper_neon_unarrow_sat16)
+DO_2SN_32(VQRSHRUN_16, gen_helper_neon_rshl_s16, gen_helper_neon_unarrow_sat8)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index d711d39eb9d..f884db535b4 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5280,6 +5280,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             case 5: /* VSHL, VSLI */
             case 6: /* VQSHLU */
             case 7: /* VQSHL */
+            case 8: /* VSHRN, VRSHRN, VQSHRUN, VQRSHRUN */
                 return 1; /* handled by decodetree */
             default:
                 break;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 07/10] target/arm: Convert Neon narrowing shifts with op==9 to decodetree
  2020-05-15 14:20 [PATCH 00/10] target/arm: Convert 2-reg-shift and 1-reg-imm Neon insns to decodetree Peter Maydell
                   ` (5 preceding siblings ...)
  2020-05-15 14:20 ` [PATCH 06/10] target/arm: Convert Neon narrowing shifts with op==8 " Peter Maydell
@ 2020-05-15 14:20 ` Peter Maydell
  2020-05-16  2:05   ` Richard Henderson
  2020-05-15 14:20 ` [PATCH 08/10] target/arm: Convert Neon VSHLL, VMOVL " Peter Maydell
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Peter Maydell @ 2020-05-15 14:20 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the remaining Neon narrowing shifts to decodetree:
  * VQSHRN
  * VQRSHRN

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  32 ++++++++++
 target/arm/translate-neon.inc.c |  15 +++++
 target/arm/translate.c          | 110 +-------------------------------
 3 files changed, 49 insertions(+), 108 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index f8d19c5819c..bf4ef8c555f 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -372,3 +372,35 @@ VQRSHRUN_32_2sh  1111 001 1 1 . 01 ....     .... 1000 0 . . 1 .... \
                  @2reg_shift_q1 size=2 shift=%neon_rshift_i4
 VQRSHRUN_16_2sh  1111 001 1 1 . 001 ...     .... 1000 0 . . 1 .... \
                  @2reg_shift_q1 size=1 shift=%neon_rshift_i3
+
+# VQSHRN with signed input
+VQSHRN_S64_2sh   1111 001 0 1 . 1 .....     .... 1001 0 . . 1 .... \
+                 @2reg_shift_q0 size=3 shift=%neon_rshift_i5
+VQSHRN_S32_2sh   1111 001 0 1 . 01 ....     .... 1001 0 . . 1 .... \
+                 @2reg_shift_q0 size=2 shift=%neon_rshift_i4
+VQSHRN_S16_2sh   1111 001 0 1 . 001 ...     .... 1001 0 . . 1 .... \
+                 @2reg_shift_q0 size=1 shift=%neon_rshift_i3
+
+# VQRSHRN with signed input
+VQRSHRN_S64_2sh  1111 001 0 1 . 1 .....     .... 1001 0 . . 1 .... \
+                 @2reg_shift_q1 size=3 shift=%neon_rshift_i5
+VQRSHRN_S32_2sh  1111 001 0 1 . 01 ....     .... 1001 0 . . 1 .... \
+                 @2reg_shift_q1 size=2 shift=%neon_rshift_i4
+VQRSHRN_S16_2sh  1111 001 0 1 . 001 ...     .... 1001 0 . . 1 .... \
+                 @2reg_shift_q1 size=1 shift=%neon_rshift_i3
+
+# VQSHRN with unsigned input
+VQSHRN_U64_2sh   1111 001 1 1 . 1 .....     .... 1001 0 . . 1 .... \
+                 @2reg_shift_q0 size=3 shift=%neon_rshift_i5
+VQSHRN_U32_2sh   1111 001 1 1 . 01 ....     .... 1001 0 . . 1 .... \
+                 @2reg_shift_q0 size=2 shift=%neon_rshift_i4
+VQSHRN_U16_2sh   1111 001 1 1 . 001 ...     .... 1001 0 . . 1 .... \
+                 @2reg_shift_q0 size=1 shift=%neon_rshift_i3
+
+# VQRSHRN with unsigned input
+VQRSHRN_U64_2sh  1111 001 1 1 . 1 .....     .... 1001 0 . . 1 .... \
+                 @2reg_shift_q1 size=3 shift=%neon_rshift_i5
+VQRSHRN_U32_2sh  1111 001 1 1 . 01 ....     .... 1001 0 . . 1 .... \
+                 @2reg_shift_q1 size=2 shift=%neon_rshift_i4
+VQRSHRN_U16_2sh  1111 001 1 1 . 001 ...     .... 1001 0 . . 1 .... \
+                 @2reg_shift_q1 size=1 shift=%neon_rshift_i3
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 18ea7255e38..9a75a69a4f5 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1672,3 +1672,18 @@ DO_2SN_32(VQSHRUN_16, gen_helper_neon_shl_s16, gen_helper_neon_unarrow_sat8)
 DO_2SN_64(VQRSHRUN_64, gen_helper_neon_rshl_s64, gen_helper_neon_unarrow_sat32)
 DO_2SN_32(VQRSHRUN_32, gen_helper_neon_rshl_s32, gen_helper_neon_unarrow_sat16)
 DO_2SN_32(VQRSHRUN_16, gen_helper_neon_rshl_s16, gen_helper_neon_unarrow_sat8)
+DO_2SN_64(VQSHRN_S64, gen_sshl_i64, gen_helper_neon_narrow_sat_s32)
+DO_2SN_32(VQSHRN_S32, gen_sshl_i32, gen_helper_neon_narrow_sat_s16)
+DO_2SN_32(VQSHRN_S16, gen_helper_neon_shl_s16, gen_helper_neon_narrow_sat_s8)
+
+DO_2SN_64(VQRSHRN_S64, gen_helper_neon_rshl_s64, gen_helper_neon_narrow_sat_s32)
+DO_2SN_32(VQRSHRN_S32, gen_helper_neon_rshl_s32, gen_helper_neon_narrow_sat_s16)
+DO_2SN_32(VQRSHRN_S16, gen_helper_neon_rshl_s16, gen_helper_neon_narrow_sat_s8)
+
+DO_2SN_64(VQSHRN_U64, gen_ushl_i64, gen_helper_neon_narrow_sat_u32)
+DO_2SN_32(VQSHRN_U32, gen_ushl_i32, gen_helper_neon_narrow_sat_u16)
+DO_2SN_32(VQSHRN_U16, gen_helper_neon_shl_u16, gen_helper_neon_narrow_sat_u8)
+
+DO_2SN_64(VQRSHRN_U64, gen_helper_neon_rshl_u64, gen_helper_neon_narrow_sat_u32)
+DO_2SN_32(VQRSHRN_U32, gen_helper_neon_rshl_u32, gen_helper_neon_narrow_sat_u16)
+DO_2SN_32(VQRSHRN_U16, gen_helper_neon_rshl_u16, gen_helper_neon_narrow_sat_u8)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index f884db535b4..f728231b198 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3201,40 +3201,6 @@ static inline void gen_neon_unarrow_sats(int size, TCGv_i32 dest, TCGv_i64 src)
     }
 }
 
-static inline void gen_neon_shift_narrow(int size, TCGv_i32 var, TCGv_i32 shift,
-                                         int q, int u)
-{
-    if (q) {
-        if (u) {
-            switch (size) {
-            case 1: gen_helper_neon_rshl_u16(var, var, shift); break;
-            case 2: gen_helper_neon_rshl_u32(var, var, shift); break;
-            default: abort();
-            }
-        } else {
-            switch (size) {
-            case 1: gen_helper_neon_rshl_s16(var, var, shift); break;
-            case 2: gen_helper_neon_rshl_s32(var, var, shift); break;
-            default: abort();
-            }
-        }
-    } else {
-        if (u) {
-            switch (size) {
-            case 1: gen_helper_neon_shl_u16(var, var, shift); break;
-            case 2: gen_ushl_i32(var, var, shift); break;
-            default: abort();
-            }
-        } else {
-            switch (size) {
-            case 1: gen_helper_neon_shl_s16(var, var, shift); break;
-            case 2: gen_sshl_i32(var, var, shift); break;
-            default: abort();
-            }
-        }
-    }
-}
-
 static inline void gen_neon_widen(TCGv_i64 dest, TCGv_i32 src, int size, int u)
 {
     if (u) {
@@ -5281,6 +5247,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             case 6: /* VQSHLU */
             case 7: /* VQSHL */
             case 8: /* VSHRN, VRSHRN, VQSHRUN, VQRSHRUN */
+            case 9: /* VQSHRN, VQRSHRN */
                 return 1; /* handled by decodetree */
             default:
                 break;
@@ -5298,80 +5265,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     size--;
             }
             shift = (insn >> 16) & ((1 << (3 + size)) - 1);
-            if (op < 10) {
-                /* Shift by immediate and narrow:
-                   VSHRN, VRSHRN, VQSHRN, VQRSHRN.  */
-                int input_unsigned = (op == 8) ? !u : u;
-                if (rm & 1) {
-                    return 1;
-                }
-                shift = shift - (1 << (size + 3));
-                size++;
-                if (size == 3) {
-                    tmp64 = tcg_const_i64(shift);
-                    neon_load_reg64(cpu_V0, rm);
-                    neon_load_reg64(cpu_V1, rm + 1);
-                    for (pass = 0; pass < 2; pass++) {
-                        TCGv_i64 in;
-                        if (pass == 0) {
-                            in = cpu_V0;
-                        } else {
-                            in = cpu_V1;
-                        }
-                        if (q) {
-                            if (input_unsigned) {
-                                gen_helper_neon_rshl_u64(cpu_V0, in, tmp64);
-                            } else {
-                                gen_helper_neon_rshl_s64(cpu_V0, in, tmp64);
-                            }
-                        } else {
-                            if (input_unsigned) {
-                                gen_ushl_i64(cpu_V0, in, tmp64);
-                            } else {
-                                gen_sshl_i64(cpu_V0, in, tmp64);
-                            }
-                        }
-                        tmp = tcg_temp_new_i32();
-                        gen_neon_narrow_op(op == 8, u, size - 1, tmp, cpu_V0);
-                        neon_store_reg(rd, pass, tmp);
-                    } /* for pass */
-                    tcg_temp_free_i64(tmp64);
-                } else {
-                    if (size == 1) {
-                        imm = (uint16_t)shift;
-                        imm |= imm << 16;
-                    } else {
-                        /* size == 2 */
-                        imm = (uint32_t)shift;
-                    }
-                    tmp2 = tcg_const_i32(imm);
-                    tmp4 = neon_load_reg(rm + 1, 0);
-                    tmp5 = neon_load_reg(rm + 1, 1);
-                    for (pass = 0; pass < 2; pass++) {
-                        if (pass == 0) {
-                            tmp = neon_load_reg(rm, 0);
-                        } else {
-                            tmp = tmp4;
-                        }
-                        gen_neon_shift_narrow(size, tmp, tmp2, q,
-                                              input_unsigned);
-                        if (pass == 0) {
-                            tmp3 = neon_load_reg(rm, 1);
-                        } else {
-                            tmp3 = tmp5;
-                        }
-                        gen_neon_shift_narrow(size, tmp3, tmp2, q,
-                                              input_unsigned);
-                        tcg_gen_concat_i32_i64(cpu_V0, tmp, tmp3);
-                        tcg_temp_free_i32(tmp);
-                        tcg_temp_free_i32(tmp3);
-                        tmp = tcg_temp_new_i32();
-                        gen_neon_narrow_op(op == 8, u, size - 1, tmp, cpu_V0);
-                        neon_store_reg(rd, pass, tmp);
-                    } /* for pass */
-                    tcg_temp_free_i32(tmp2);
-                }
-            } else if (op == 10) {
+            if (op == 10) {
                 /* VSHLL, VMOVL */
                 if (q || (rd & 1)) {
                     return 1;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 08/10] target/arm: Convert Neon VSHLL, VMOVL to decodetree
  2020-05-15 14:20 [PATCH 00/10] target/arm: Convert 2-reg-shift and 1-reg-imm Neon insns to decodetree Peter Maydell
                   ` (6 preceding siblings ...)
  2020-05-15 14:20 ` [PATCH 07/10] target/arm: Convert Neon narrowing shifts with op==9 " Peter Maydell
@ 2020-05-15 14:20 ` Peter Maydell
  2020-05-16  2:18   ` Richard Henderson
  2020-05-15 14:20 ` [PATCH 09/10] target/arm: Convert VCVT fixed-point ops " Peter Maydell
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Peter Maydell @ 2020-05-15 14:20 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the VSHLL and VMOVL insns from the 2-reg-shift group
to decodetree. Since the loop always has two passes, we unroll
it to avoid the awkward reassignment of one TCGv to another.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       | 14 ++++++
 target/arm/translate-neon.inc.c | 81 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 46 +------------------
 3 files changed, 97 insertions(+), 44 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index bf4ef8c555f..4438c1c8728 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -404,3 +404,17 @@ VQRSHRN_U32_2sh  1111 001 1 1 . 01 ....     .... 1001 0 . . 1 .... \
                  @2reg_shift_q1 size=2 shift=%neon_rshift_i4
 VQRSHRN_U16_2sh  1111 001 1 1 . 001 ...     .... 1001 0 . . 1 .... \
                  @2reg_shift_q1 size=1 shift=%neon_rshift_i3
+
+VSHLL_S_2sh      1111 001 0 1 . 1 shift:5   .... 1010 0 . . 1 .... \
+                 @2reg_shift_q0 size=2
+VSHLL_S_2sh      1111 001 0 1 . 01 shift:4  .... 1010 0 . . 1 .... \
+                 @2reg_shift_q0 size=1
+VSHLL_S_2sh      1111 001 0 1 . 001 shift:3 .... 1010 0 . . 1 .... \
+                 @2reg_shift_q0 size=0
+
+VSHLL_U_2sh      1111 001 1 1 . 1 shift:5   .... 1010 0 . . 1 .... \
+                 @2reg_shift_q0 size=2
+VSHLL_U_2sh      1111 001 1 1 . 01 shift:4  .... 1010 0 . . 1 .... \
+                 @2reg_shift_q0 size=1
+VSHLL_U_2sh      1111 001 1 1 . 001 shift:3 .... 1010 0 . . 1 .... \
+                 @2reg_shift_q0 size=0
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 9a75a69a4f5..5678bfd0d4d 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1687,3 +1687,84 @@ DO_2SN_32(VQSHRN_U16, gen_helper_neon_shl_u16, gen_helper_neon_narrow_sat_u8)
 DO_2SN_64(VQRSHRN_U64, gen_helper_neon_rshl_u64, gen_helper_neon_narrow_sat_u32)
 DO_2SN_32(VQRSHRN_U32, gen_helper_neon_rshl_u32, gen_helper_neon_narrow_sat_u16)
 DO_2SN_32(VQRSHRN_U16, gen_helper_neon_rshl_u16, gen_helper_neon_narrow_sat_u8)
+
+static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
+                         NeonGenWidenFn *widenfn, bool u)
+{
+    TCGv_i64 tmp;
+    TCGv_i32 rm0, rm1;
+    uint64_t widen_mask = 0;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (a->vd & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    /*
+     * This is a widen-and-shift operation. The shift is always less
+     * than the width of the source type, so after widening the input
+     * vector we can simply shift the whole 64-bit widened register,
+     * and then clear the potential overflow bits resulting from left
+     * bits of the narrow input appearing as right bits of the left
+     * neighbour narrow input. Calculate a mask of bits to clear.
+     */
+    if ((a->shift != 0) && (a->size < 2 || u)) {
+        int esize = 8 << a->size;
+        widen_mask = MAKE_64BIT_MASK(0, esize);
+        widen_mask >>= esize - a->shift;
+        widen_mask = dup_const(a->size + 1, widen_mask);
+    }
+
+    rm0 = neon_load_reg(a->vm, 0);
+    rm1 = neon_load_reg(a->vm, 1);
+    tmp = tcg_temp_new_i64();
+
+    widenfn(tmp, rm0);
+    if (a->shift != 0) {
+        tcg_gen_shli_i64(tmp, tmp, a->shift);
+        tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
+    }
+    neon_store_reg64(tmp, a->vd);
+
+    widenfn(tmp, rm1);
+    if (a->shift != 0) {
+        tcg_gen_shli_i64(tmp, tmp, a->shift);
+        tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
+    }
+    neon_store_reg64(tmp, a->vd + 1);
+    tcg_temp_free_i64(tmp);
+    return true;
+}
+
+static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
+{
+    NeonGenWidenFn *widenfn[] = {
+        gen_helper_neon_widen_s8,
+        gen_helper_neon_widen_s16,
+        tcg_gen_ext_i32_i64,
+    };
+    return do_vshll_2sh(s, a, widenfn[a->size], false);
+}
+
+static bool trans_VSHLL_U_2sh(DisasContext *s, arg_2reg_shift *a)
+{
+    NeonGenWidenFn *widenfn[] = {
+        gen_helper_neon_widen_u8,
+        gen_helper_neon_widen_u16,
+        tcg_gen_extu_i32_i64,
+    };
+    return do_vshll_2sh(s, a, widenfn[a->size], true);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index f728231b198..ef39c89f10a 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5248,6 +5248,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             case 7: /* VQSHL */
             case 8: /* VSHRN, VRSHRN, VQSHRUN, VQRSHRUN */
             case 9: /* VQSHRN, VQRSHRN */
+            case 10: /* VSHLL, including VMOVL */
                 return 1; /* handled by decodetree */
             default:
                 break;
@@ -5265,50 +5266,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     size--;
             }
             shift = (insn >> 16) & ((1 << (3 + size)) - 1);
-            if (op == 10) {
-                /* VSHLL, VMOVL */
-                if (q || (rd & 1)) {
-                    return 1;
-                }
-                tmp = neon_load_reg(rm, 0);
-                tmp2 = neon_load_reg(rm, 1);
-                for (pass = 0; pass < 2; pass++) {
-                    if (pass == 1)
-                        tmp = tmp2;
-
-                    gen_neon_widen(cpu_V0, tmp, size, u);
-
-                    if (shift != 0) {
-                        /* The shift is less than the width of the source
-                           type, so we can just shift the whole register.  */
-                        tcg_gen_shli_i64(cpu_V0, cpu_V0, shift);
-                        /* Widen the result of shift: we need to clear
-                         * the potential overflow bits resulting from
-                         * left bits of the narrow input appearing as
-                         * right bits of left the neighbour narrow
-                         * input.  */
-                        if (size < 2 || !u) {
-                            uint64_t imm64;
-                            if (size == 0) {
-                                imm = (0xffu >> (8 - shift));
-                                imm |= imm << 16;
-                            } else if (size == 1) {
-                                imm = 0xffff >> (16 - shift);
-                            } else {
-                                /* size == 2 */
-                                imm = 0xffffffff >> (32 - shift);
-                            }
-                            if (size < 2) {
-                                imm64 = imm | (((uint64_t)imm) << 32);
-                            } else {
-                                imm64 = imm;
-                            }
-                            tcg_gen_andi_i64(cpu_V0, cpu_V0, ~imm64);
-                        }
-                    }
-                    neon_store_reg64(cpu_V0, rd + pass);
-                }
-            } else if (op >= 14) {
+            if (op >= 14) {
                 /* VCVT fixed-point.  */
                 TCGv_ptr fpst;
                 TCGv_i32 shiftv;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 09/10] target/arm: Convert VCVT fixed-point ops to decodetree
  2020-05-15 14:20 [PATCH 00/10] target/arm: Convert 2-reg-shift and 1-reg-imm Neon insns to decodetree Peter Maydell
                   ` (7 preceding siblings ...)
  2020-05-15 14:20 ` [PATCH 08/10] target/arm: Convert Neon VSHLL, VMOVL " Peter Maydell
@ 2020-05-15 14:20 ` Peter Maydell
  2020-05-16  2:27   ` Richard Henderson
  2020-05-15 14:20 ` [PATCH 10/10] target/arm: Convert Neon one-register-and-immediate insns " Peter Maydell
  2020-05-15 21:32 ` [PATCH 00/10] target/arm: Convert 2-reg-shift and 1-reg-imm Neon " no-reply
  10 siblings, 1 reply; 24+ messages in thread
From: Peter Maydell @ 2020-05-15 14:20 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the VCVT fixed-point conversion operations in the
Neon 2-regs-and-shift group to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       | 12 ++++++
 target/arm/translate-neon.inc.c | 53 +++++++++++++++++++++++
 target/arm/translate.c          | 75 +--------------------------------
 3 files changed, 67 insertions(+), 73 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 4438c1c8728..bce4043746e 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -418,3 +418,15 @@ VSHLL_U_2sh      1111 001 1 1 . 01 shift:4  .... 1010 0 . . 1 .... \
                  @2reg_shift_q0 size=1
 VSHLL_U_2sh      1111 001 1 1 . 001 shift:3 .... 1010 0 . . 1 .... \
                  @2reg_shift_q0 size=0
+
+# VCVT fixed<->float conversions
+# TODO: FP16 fixed<->float conversions are opc==0b1100 and 0b1101
+# We use size=0 for fp32 and size=1 for fp16 to match the 3-same encodings.
+VCVT_SF_2sh      1111 001 0 1 . 1 shift:5   .... 1110 0 . . 1 .... \
+                 @2reg_shift size=0
+VCVT_UF_2sh      1111 001 1 1 . 1 shift:5   .... 1110 0 . . 1 .... \
+                 @2reg_shift size=0
+VCVT_FS_2sh      1111 001 0 1 . 1 shift:5   .... 1111 0 . . 1 .... \
+                 @2reg_shift size=0
+VCVT_FU_2sh      1111 001 1 1 . 1 shift:5   .... 1111 0 . . 1 .... \
+                 @2reg_shift size=0
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 5678bfd0d4d..f27fe769f85 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1768,3 +1768,56 @@ static bool trans_VSHLL_U_2sh(DisasContext *s, arg_2reg_shift *a)
     };
     return do_vshll_2sh(s, a, widenfn[a->size], true);
 }
+
+static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a,
+                      NeonGenTwoSingleOPFn *fn)
+{
+    /* FP operations in 2-reg-and-shift group */
+    TCGv_i32 tmp, shiftv;
+    TCGv_ptr fpstatus;
+    int pass;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpstatus = get_fpstatus_ptr(1);
+    /*
+     * The decode doesn't include the must-be-1 top bit of imm6 in a->shift,
+     * hence this 32-shift where the ARM ARM has 64-imm6.
+     */
+    shiftv = tcg_const_i32(32 - a->shift);
+    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
+        tmp = neon_load_reg(a->vm, pass);
+        fn(tmp, tmp, shiftv, fpstatus);
+        neon_store_reg(a->vd, pass, tmp);
+    }
+    tcg_temp_free_ptr(fpstatus);
+    tcg_temp_free_i32(shiftv);
+    return true;
+}
+
+#define DO_FP_2SH(INSN, FUNC)                                           \
+    static bool trans_##INSN##_2sh(DisasContext *s, arg_2reg_shift *a)  \
+    {                                                                   \
+        return do_fp_2sh(s, a, FUNC);                                   \
+    }
+
+DO_FP_2SH(VCVT_SF, gen_helper_vfp_sltos)
+DO_FP_2SH(VCVT_UF, gen_helper_vfp_ultos)
+DO_FP_2SH(VCVT_FS, gen_helper_vfp_tosls_round_to_zero)
+DO_FP_2SH(VCVT_FU, gen_helper_vfp_touls_round_to_zero)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index ef39c89f10a..9cc44e6258e 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5193,7 +5193,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     int q;
     int rd, rn, rm, rd_ofs, rn_ofs, rm_ofs;
     int size;
-    int shift;
     int pass;
     int u;
     int vec_size;
@@ -5234,78 +5233,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         return 1;
     } else if (insn & (1 << 4)) {
         if ((insn & 0x00380080) != 0) {
-            /* Two registers and shift.  */
-            op = (insn >> 8) & 0xf;
-
-            switch (op) {
-            case 0: /* VSHR */
-            case 1: /* VSRA */
-            case 2: /* VRSHR */
-            case 3: /* VRSRA */
-            case 4: /* VSRI */
-            case 5: /* VSHL, VSLI */
-            case 6: /* VQSHLU */
-            case 7: /* VQSHL */
-            case 8: /* VSHRN, VRSHRN, VQSHRUN, VQRSHRUN */
-            case 9: /* VQSHRN, VQRSHRN */
-            case 10: /* VSHLL, including VMOVL */
-                return 1; /* handled by decodetree */
-            default:
-                break;
-            }
-
-            if (insn & (1 << 7)) {
-                /* 64-bit shift. */
-                if (op > 7) {
-                    return 1;
-                }
-                size = 3;
-            } else {
-                size = 2;
-                while ((insn & (1 << (size + 19))) == 0)
-                    size--;
-            }
-            shift = (insn >> 16) & ((1 << (3 + size)) - 1);
-            if (op >= 14) {
-                /* VCVT fixed-point.  */
-                TCGv_ptr fpst;
-                TCGv_i32 shiftv;
-                VFPGenFixPointFn *fn;
-
-                if (!(insn & (1 << 21)) || (q && ((rd | rm) & 1))) {
-                    return 1;
-                }
-
-                if (!(op & 1)) {
-                    if (u) {
-                        fn = gen_helper_vfp_ultos;
-                    } else {
-                        fn = gen_helper_vfp_sltos;
-                    }
-                } else {
-                    if (u) {
-                        fn = gen_helper_vfp_touls_round_to_zero;
-                    } else {
-                        fn = gen_helper_vfp_tosls_round_to_zero;
-                    }
-                }
-
-                /* We have already masked out the must-be-1 top bit of imm6,
-                 * hence this 32-shift where the ARM ARM has 64-imm6.
-                 */
-                shift = 32 - shift;
-                fpst = get_fpstatus_ptr(1);
-                shiftv = tcg_const_i32(shift);
-                for (pass = 0; pass < (q ? 4 : 2); pass++) {
-                    TCGv_i32 tmpf = neon_load_reg(rm, pass);
-                    fn(tmpf, tmpf, shiftv, fpst);
-                    neon_store_reg(rd, pass, tmpf);
-                }
-                tcg_temp_free_ptr(fpst);
-                tcg_temp_free_i32(shiftv);
-            } else {
-                return 1;
-            }
+            /* Two registers and shift: handled by decodetree */
+            return 1;
         } else { /* (insn & 0x00380080) == 0 */
             int invert, reg_ofs, vec_size;
 
-- 
2.20.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 10/10] target/arm: Convert Neon one-register-and-immediate insns to decodetree
  2020-05-15 14:20 [PATCH 00/10] target/arm: Convert 2-reg-shift and 1-reg-imm Neon insns to decodetree Peter Maydell
                   ` (8 preceding siblings ...)
  2020-05-15 14:20 ` [PATCH 09/10] target/arm: Convert VCVT fixed-point ops " Peter Maydell
@ 2020-05-15 14:20 ` Peter Maydell
  2020-05-16  2:50   ` Richard Henderson
  2020-05-15 21:32 ` [PATCH 00/10] target/arm: Convert 2-reg-shift and 1-reg-imm Neon " no-reply
  10 siblings, 1 reply; 24+ messages in thread
From: Peter Maydell @ 2020-05-15 14:20 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the insns in the one-register-and-immediate group to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  49 +++++++++++
 target/arm/translate-neon.inc.c | 151 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 101 +--------------------
 3 files changed, 202 insertions(+), 99 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index bce4043746e..39d2217a9c8 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -430,3 +430,52 @@ VCVT_FS_2sh      1111 001 0 1 . 1 shift:5   .... 1111 0 . . 1 .... \
                  @2reg_shift size=0
 VCVT_FU_2sh      1111 001 1 1 . 1 shift:5   .... 1111 0 . . 1 .... \
                  @2reg_shift size=0
+
+######################################################################
+# 1-reg-and-modified-immediate grouping:
+# 1111 001 i 1 D 000 imm:3 Vd:4 cmode:4 0 Q op 1 Vm:4
+######################################################################
+
+&1reg_imm        vd q imm cmode op
+
+%asimd_imm_value 24:1 16:3 0:4
+
+@1reg_imm        .... ... . . . ... ... .... .... . q:1 . . .... \
+                 &1reg_imm imm=%asimd_imm_value vd=%vd_dp
+
+{
+  # Logic operations, ie not VMOV or VMVN: (cmode & 1) && cmode < 12
+  VORR_1r        1111 001 . 1 . 000 ... .... 0001 0 . 0 1 .... \
+                 @1reg_imm cmode=1 op=0
+  VORR_1r        1111 001 . 1 . 000 ... .... 0011 0 . 0 1 .... \
+                 @1reg_imm cmode=3 op=0
+  VORR_1r        1111 001 . 1 . 000 ... .... 0101 0 . 0 1 .... \
+                 @1reg_imm cmode=5 op=0
+  VORR_1r        1111 001 . 1 . 000 ... .... 0111 0 . 0 1 .... \
+                 @1reg_imm cmode=7 op=0
+  VORR_1r        1111 001 . 1 . 000 ... .... 1001 0 . 0 1 .... \
+                 @1reg_imm cmode=9 op=0
+  VORR_1r        1111 001 . 1 . 000 ... .... 1011 0 . 0 1 .... \
+                 @1reg_imm cmode=11 op=0
+
+  VBIC_1r        1111 001 . 1 . 000 ... .... 0001 0 . 1 1 .... \
+                 @1reg_imm cmode=1 op=1
+  VBIC_1r        1111 001 . 1 . 000 ... .... 0011 0 . 1 1 .... \
+                 @1reg_imm cmode=3 op=1
+  VBIC_1r        1111 001 . 1 . 000 ... .... 0101 0 . 1 1 .... \
+                 @1reg_imm cmode=5 op=1
+  VBIC_1r        1111 001 . 1 . 000 ... .... 0111 0 . 1 1 .... \
+                 @1reg_imm cmode=7 op=1
+  VBIC_1r        1111 001 . 1 . 000 ... .... 1001 0 . 1 1 .... \
+                 @1reg_imm cmode=9 op=1
+  VBIC_1r        1111 001 . 1 . 000 ... .... 1011 0 . 1 1 .... \
+                 @1reg_imm cmode=11 op=1
+
+  # A VMVN special case: cmode == 14 op == 1
+  VMVN_14_1r     1111 001 . 1 . 000 ... .... 1110 0 . 1 1 .... \
+                 @1reg_imm cmode=14 op=1
+
+  # VMOV, VMVN: all other cmode/op combinations
+  VMOV_1r        1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... \
+                 @1reg_imm
+}
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index f27fe769f85..f4eeb84541f 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1821,3 +1821,154 @@ DO_FP_2SH(VCVT_SF, gen_helper_vfp_sltos)
 DO_FP_2SH(VCVT_UF, gen_helper_vfp_ultos)
 DO_FP_2SH(VCVT_FS, gen_helper_vfp_tosls_round_to_zero)
 DO_FP_2SH(VCVT_FU, gen_helper_vfp_touls_round_to_zero)
+
+static uint32_t asimd_imm_const(uint32_t imm, int cmode, int op)
+{
+    /*
+     * Expand the encoded constant.
+     * Note that cmode = 2,3,4,5,6,7,10,11,12,13 imm=0 is UNPREDICTABLE.
+     * We choose to not special-case this and will behave as if a
+     * valid constant encoding of 0 had been given.
+     * cmode = 15 op = 1 must UNDEF; we assume decode has handled that.
+     */
+    switch (cmode) {
+    case 0: case 1:
+        /* no-op */
+        break;
+    case 2: case 3:
+        imm <<= 8;
+        break;
+    case 4: case 5:
+        imm <<= 16;
+        break;
+    case 6: case 7:
+        imm <<= 24;
+        break;
+    case 8: case 9:
+        imm |= imm << 16;
+        break;
+    case 10: case 11:
+        imm = (imm << 8) | (imm << 24);
+        break;
+    case 12:
+        imm = (imm << 8) | 0xff;
+        break;
+    case 13:
+        imm = (imm << 16) | 0xffff;
+        break;
+    case 14:
+        imm |= (imm << 8) | (imm << 16) | (imm << 24);
+        if (op) {
+            imm = ~imm;
+        }
+        break;
+    case 15:
+        imm = ((imm & 0x80) << 24) | ((imm & 0x3f) << 19)
+            | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
+        break;
+    }
+    if (op) {
+        imm = ~imm;
+    }
+    return imm;
+}
+
+static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a,
+                        GVecGen2iFn *fn)
+{
+    uint32_t imm;
+    int reg_ofs, vec_size;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
+        return false;
+    }
+
+    if (a->vd & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    reg_ofs = neon_reg_offset(a->vd, 0);
+    vec_size = a->q ? 16 : 8;
+    imm = asimd_imm_const(a->imm, a->cmode, a->op);
+
+    fn(MO_32, reg_ofs, reg_ofs, imm, vec_size, vec_size);
+    return true;
+}
+
+static bool trans_VORR_1r(DisasContext *s, arg_1reg_imm *a)
+{
+    return do_1reg_imm(s, a, tcg_gen_gvec_ori);
+}
+
+static bool trans_VBIC_1r(DisasContext *s, arg_1reg_imm *a)
+{
+    /* The immediate value will be inverted, so BIC becomes AND. */
+    return do_1reg_imm(s, a, tcg_gen_gvec_andi);
+}
+
+static bool trans_VMVN_14_1r(DisasContext *s, arg_1reg_imm *a)
+{
+    /* The cmode==14 op==1 special case isn't vectorized */
+    uint32_t imm;
+    TCGv_i64 t64;
+    int pass;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
+        return false;
+    }
+
+    if (a->vd & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    imm = asimd_imm_const(a->imm, a->cmode, a->op);
+
+    t64 = tcg_temp_new_i64();
+    for (pass = 0; pass <= a->q; ++pass) {
+        uint64_t val = 0;
+        int n;
+
+        for (n = 0; n < 8; n++) {
+            if (imm & (1 << (n + pass * 8))) {
+                val |= 0xffull << (n * 8);
+            }
+        }
+        tcg_gen_movi_i64(t64, val);
+        neon_store_reg64(t64, a->vd + pass);
+    }
+    tcg_temp_free_i64(t64);
+    return true;
+}
+
+static void gen_VMOV_1r(unsigned vece, uint32_t dofs, uint32_t aofs,
+                        int64_t c, uint32_t oprsz, uint32_t maxsz)
+{
+    tcg_gen_gvec_dup_imm(MO_32, dofs, oprsz, maxsz, c);
+}
+
+static bool trans_VMOV_1r(DisasContext *s, arg_1reg_imm *a)
+{
+    /* There is one unallocated cmode/op combination in this space */
+    if (a->cmode == 15 && a->op == 1) {
+        return false;
+    }
+    return do_1reg_imm(s, a, gen_VMOV_1r);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 9cc44e6258e..20d07e99053 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5232,105 +5232,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         /* Three register same length: handled by decodetree */
         return 1;
     } else if (insn & (1 << 4)) {
-        if ((insn & 0x00380080) != 0) {
-            /* Two registers and shift: handled by decodetree */
-            return 1;
-        } else { /* (insn & 0x00380080) == 0 */
-            int invert, reg_ofs, vec_size;
-
-            if (q && (rd & 1)) {
-                return 1;
-            }
-
-            op = (insn >> 8) & 0xf;
-            /* One register and immediate.  */
-            imm = (u << 7) | ((insn >> 12) & 0x70) | (insn & 0xf);
-            invert = (insn & (1 << 5)) != 0;
-            /* Note that op = 2,3,4,5,6,7,10,11,12,13 imm=0 is UNPREDICTABLE.
-             * We choose to not special-case this and will behave as if a
-             * valid constant encoding of 0 had been given.
-             */
-            switch (op) {
-            case 0: case 1:
-                /* no-op */
-                break;
-            case 2: case 3:
-                imm <<= 8;
-                break;
-            case 4: case 5:
-                imm <<= 16;
-                break;
-            case 6: case 7:
-                imm <<= 24;
-                break;
-            case 8: case 9:
-                imm |= imm << 16;
-                break;
-            case 10: case 11:
-                imm = (imm << 8) | (imm << 24);
-                break;
-            case 12:
-                imm = (imm << 8) | 0xff;
-                break;
-            case 13:
-                imm = (imm << 16) | 0xffff;
-                break;
-            case 14:
-                imm |= (imm << 8) | (imm << 16) | (imm << 24);
-                if (invert) {
-                    imm = ~imm;
-                }
-                break;
-            case 15:
-                if (invert) {
-                    return 1;
-                }
-                imm = ((imm & 0x80) << 24) | ((imm & 0x3f) << 19)
-                      | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
-                break;
-            }
-            if (invert) {
-                imm = ~imm;
-            }
-
-            reg_ofs = neon_reg_offset(rd, 0);
-            vec_size = q ? 16 : 8;
-
-            if (op & 1 && op < 12) {
-                if (invert) {
-                    /* The immediate value has already been inverted,
-                     * so BIC becomes AND.
-                     */
-                    tcg_gen_gvec_andi(MO_32, reg_ofs, reg_ofs, imm,
-                                      vec_size, vec_size);
-                } else {
-                    tcg_gen_gvec_ori(MO_32, reg_ofs, reg_ofs, imm,
-                                     vec_size, vec_size);
-                }
-            } else {
-                /* VMOV, VMVN.  */
-                if (op == 14 && invert) {
-                    TCGv_i64 t64 = tcg_temp_new_i64();
-
-                    for (pass = 0; pass <= q; ++pass) {
-                        uint64_t val = 0;
-                        int n;
-
-                        for (n = 0; n < 8; n++) {
-                            if (imm & (1 << (n + pass * 8))) {
-                                val |= 0xffull << (n * 8);
-                            }
-                        }
-                        tcg_gen_movi_i64(t64, val);
-                        neon_store_reg64(t64, rd + pass);
-                    }
-                    tcg_temp_free_i64(t64);
-                } else {
-                    tcg_gen_gvec_dup_imm(MO_32, reg_ofs, vec_size,
-                                         vec_size, imm);
-                }
-            }
-        }
+        /* Two registers and shift or reg and imm: handled by decodetree */
+        return 1;
     } else { /* (insn & 0x00800010 == 0x00800000) */
         if (size != 3) {
             op = (insn >> 8) & 0xf;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/10] target/arm: Convert 2-reg-shift and 1-reg-imm Neon insns to decodetree
  2020-05-15 14:20 [PATCH 00/10] target/arm: Convert 2-reg-shift and 1-reg-imm Neon insns to decodetree Peter Maydell
                   ` (9 preceding siblings ...)
  2020-05-15 14:20 ` [PATCH 10/10] target/arm: Convert Neon one-register-and-immediate insns " Peter Maydell
@ 2020-05-15 21:32 ` no-reply
  10 siblings, 0 replies; 24+ messages in thread
From: no-reply @ 2020-05-15 21:32 UTC (permalink / raw)
  To: peter.maydell; +Cc: qemu-arm, richard.henderson, qemu-devel

Patchew URL: https://patchew.org/QEMU/20200515142056.21346-1-peter.maydell@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20200515142056.21346-1-peter.maydell@linaro.org
Subject: [PATCH 00/10] target/arm: Convert 2-reg-shift and 1-reg-imm Neon insns to decodetree
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
74c8ea0 target/arm: Convert Neon one-register-and-immediate insns to decodetree
7dc1693 target/arm: Convert VCVT fixed-point ops to decodetree
dc02d37 target/arm: Convert Neon VSHLL, VMOVL to decodetree
7674a5e target/arm: Convert Neon narrowing shifts with op==9 to decodetree
6eed1f0 target/arm: Convert Neon narrowing shifts with op==8 to decodetree
b63364a target/arm: Convert VQSHLU, VQSHL 2-reg-shift insns to decodetree
1e6fb4a target/arm: Convert Neon VSRA, VSRI, VRSHR, VRSRA 2-reg-shift insns to decodetree
735501e target/arm: Convert Neon VSHR 2-reg-shift insns to decodetree
95697c1 target/arm: Convert Neon VSHL and VSLI 2-reg-shift insn to decodetree
ecf2d95 target/arm: Remove unused GEN_NEON_INTEGER_OP macro

=== OUTPUT BEGIN ===
1/10 Checking commit ecf2d95da0c2 (target/arm: Remove unused GEN_NEON_INTEGER_OP macro)
2/10 Checking commit 95697c1a03a3 (target/arm: Convert Neon VSHL and VSLI 2-reg-shift insn to decodetree)
ERROR: spaces required around that '*' (ctx:WxV)
#57: FILE: target/arm/translate-neon.inc.c:1314:
+static bool do_vector_2sh(DisasContext *s, arg_2reg_shift *a, GVecGen2iFn *fn)
                                                           ^

ERROR: spaces required around that '*' (ctx:WxV)
#87: FILE: target/arm/translate-neon.inc.c:1344:
+    static bool trans_##INSN##_2sh(DisasContext *s, arg_2reg_shift *a)  \
                                                                    ^

total: 2 errors, 0 warnings, 101 lines checked

Patch 2/10 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

3/10 Checking commit 735501ec3402 (target/arm: Convert Neon VSHR 2-reg-shift insns to decodetree)
ERROR: spaces required around that '*' (ctx:WxV)
#84: FILE: target/arm/translate-neon.inc.c:1370:
+static bool trans_VSHR_S_2sh(DisasContext *s, arg_2reg_shift *a)
                                                              ^

total: 1 errors, 0 warnings, 113 lines checked

Patch 3/10 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

4/10 Checking commit 1e6fb4a901be (target/arm: Convert Neon VSRA, VSRI, VRSHR, VRSRA 2-reg-shift insns to decodetree)
5/10 Checking commit b63364afac7a (target/arm: Convert VQSHLU, VQSHL 2-reg-shift insns to decodetree)
6/10 Checking commit 6eed1f014a7c (target/arm: Convert Neon narrowing shifts with op==8 to decodetree)
ERROR: do not use C99 // comments
#175: FILE: target/arm/translate-neon.inc.c:1611:
+    // todo expand out the shift-narrow and the narrow-op

total: 1 errors, 0 warnings, 219 lines checked

Patch 6/10 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

7/10 Checking commit 7674a5e418f9 (target/arm: Convert Neon narrowing shifts with op==9 to decodetree)
8/10 Checking commit dc02d373d232 (target/arm: Convert Neon VSHLL, VMOVL to decodetree)
9/10 Checking commit 7dc1693ad7a8 (target/arm: Convert VCVT fixed-point ops to decodetree)
10/10 Checking commit 74c8ea05ee0f (target/arm: Convert Neon one-register-and-immediate insns to decodetree)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200515142056.21346-1-peter.maydell@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 01/10] target/arm: Remove unused GEN_NEON_INTEGER_OP macro
  2020-05-15 14:20 ` [PATCH 01/10] target/arm: Remove unused GEN_NEON_INTEGER_OP macro Peter Maydell
@ 2020-05-15 22:07   ` Richard Henderson
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2020-05-15 22:07 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/15/20 7:20 AM, Peter Maydell wrote:
> The GEN_NEON_INTEGER_OP macro is no longer used; remove it.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> Between Richard's cleanup and mine we deleted all the uses of this,
> but since neither series on its own was sufficient to delete all
> of them we failed to remove the macro definition when it finally
> became unused.
> ---
>  target/arm/translate.c | 23 -----------------------
>  1 file changed, 23 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 02/10] target/arm: Convert Neon VSHL and VSLI 2-reg-shift insn to decodetree
  2020-05-15 14:20 ` [PATCH 02/10] target/arm: Convert Neon VSHL and VSLI 2-reg-shift insn to decodetree Peter Maydell
@ 2020-05-15 22:16   ` Richard Henderson
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2020-05-15 22:16 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/15/20 7:20 AM, Peter Maydell wrote:
> Convert the VSHL and VSLI insns from the Neon 2-registers-and-a-shift
> group to decodetree.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       | 27 +++++++++++++++++++++++
>  target/arm/translate-neon.inc.c | 38 +++++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 18 +++++++---------
>  3 files changed, 73 insertions(+), 10 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 03/10] target/arm: Convert Neon VSHR 2-reg-shift insns to decodetree
  2020-05-15 14:20 ` [PATCH 03/10] target/arm: Convert Neon VSHR 2-reg-shift insns " Peter Maydell
@ 2020-05-15 22:33   ` Richard Henderson
  2020-05-15 22:48   ` Richard Henderson
  1 sibling, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2020-05-15 22:33 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/15/20 7:20 AM, Peter Maydell wrote:
> Convert the VSHR 2-reg-shift insns to decodetree.
> 
> Note that unlike the legacy decoder, we present the right shift
> amount to the trans_ function as a positive integer.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       | 24 +++++++++++++++++++
>  target/arm/translate-neon.inc.c | 41 +++++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 21 +----------------
>  3 files changed, 66 insertions(+), 20 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 03/10] target/arm: Convert Neon VSHR 2-reg-shift insns to decodetree
  2020-05-15 14:20 ` [PATCH 03/10] target/arm: Convert Neon VSHR 2-reg-shift insns " Peter Maydell
  2020-05-15 22:33   ` Richard Henderson
@ 2020-05-15 22:48   ` Richard Henderson
  1 sibling, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2020-05-15 22:48 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/15/20 7:20 AM, Peter Maydell wrote:
> +VSHR_S_2sh       1111 001 0 1 .  ......     .... 0000 1 . . 1 .... \
> +                 @2reg_shift size=3 shift=%neon_rshift_i6
> +VSHR_S_2sh       1111 001 0 1 . 1 .....     .... 0000 0 . . 1 .... \
> +                 @2reg_shift size=2 shift=%neon_rshift_i5
> +VSHR_S_2sh       1111 001 0 1 . 01 ....     .... 0000 0 . . 1 .... \
> +                 @2reg_shift size=1 shift=%neon_rshift_i4
> +VSHR_S_2sh       1111 001 0 1 . 001 ...     .... 0000 0 . . 1 .... \
> +                 @2reg_shift size=0 shift=%neon_rshift_i3

It would be worth creating new @formats for each of these, since there are 9
uses of each, between this patch and the next.

E.g.

@2reg_shr_b    .... .... ..00 1... .... .... . q:1 0. .... \
               &2reg_shift vm=%vm_dp vd=%vd_dp size=0 \
               shift=%neon_rshift_i3
@2reg_shr_h    .... .... ..01 .... .... .... . q:1 0. .... \
               &2reg_shift vm=%vm_dp vd=%vd_dp size=1 \
               shift=%neon_rshift_i4
@2reg_shr_s    .... .... ..1. .... .... .... . q:1 0. .... \
               &2reg_shift vm=%vm_dp vd=%vd_dp size=2 \
               shift=%neon_rshift_i5
@2reg_shr_d    .... .... .... .... .... .... . q:1 1. .... \
               &2reg_shift vm=%vm_dp vd=%vd_dp size=2 \
               shift=%neon_rshift_i6


r~


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 04/10] target/arm: Convert Neon VSRA, VSRI, VRSHR, VRSRA 2-reg-shift insns to decodetree
  2020-05-15 14:20 ` [PATCH 04/10] target/arm: Convert Neon VSRA, VSRI, VRSHR, VRSRA " Peter Maydell
@ 2020-05-15 22:50   ` Richard Henderson
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2020-05-15 22:50 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/15/20 7:20 AM, Peter Maydell wrote:
> Convert the VSRA, VSRI, VRSHR, VRSRA 2-reg-shift insns to decodetree.
> (These are the last instructions in the group that are vectorized;
> the rest all require looping over each element.)
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       | 63 +++++++++++++++++++++++++++++++++
>  target/arm/translate-neon.inc.c |  7 ++++
>  target/arm/translate.c          | 52 +++------------------------
>  3 files changed, 74 insertions(+), 48 deletions(-)

Modulo the extra formats I mentioned vs the previous patch,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 05/10] target/arm: Convert VQSHLU, VQSHL 2-reg-shift insns to decodetree
  2020-05-15 14:20 ` [PATCH 05/10] target/arm: Convert VQSHLU, VQSHL " Peter Maydell
@ 2020-05-15 22:55   ` Richard Henderson
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2020-05-15 22:55 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/15/20 7:20 AM, Peter Maydell wrote:
> Convert the VQSHLU and QVSHL 2-reg-shift insns to decodetree.
> These are the last of the simple shift-by-immediate insns.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       |  27 ++++++++
>  target/arm/translate-neon.inc.c | 108 +++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 110 +-------------------------------
>  3 files changed, 138 insertions(+), 107 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 06/10] target/arm: Convert Neon narrowing shifts with op==8 to decodetree
  2020-05-15 14:20 ` [PATCH 06/10] target/arm: Convert Neon narrowing shifts with op==8 " Peter Maydell
@ 2020-05-16  2:01   ` Richard Henderson
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2020-05-16  2:01 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/15/20 7:20 AM, Peter Maydell wrote:
> Convert the Neon narrowing shifts where op==8 to decodetree:
>  * VSHRN
>  * VRSHRN
>  * VQSHRUN
>  * VQRSHRUN
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       |  32 ++++++
>  target/arm/translate-neon.inc.c | 168 ++++++++++++++++++++++++++++++++
>  target/arm/translate.c          |   1 +
>  3 files changed, 201 insertions(+)
> 
> diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
> index 6456b53a690..f8d19c5819c 100644
> --- a/target/arm/neon-dp.decode
> +++ b/target/arm/neon-dp.decode
> @@ -208,6 +208,10 @@ VMINNM_fp_3s     1111 001 1 0 . 1 . .... .... 1111 ... 1 .... @3same_fp
>  
>  @2reg_shift      .... ... . . . ...... .... .... . q:1 . . .... \
>                   &2reg_shift vm=%vm_dp vd=%vd_dp
> +@2reg_shift_q0   .... ... . . . ...... .... .... . 0 . . .... \
> +                 &2reg_shift vm=%vm_dp vd=%vd_dp q=0
> +@2reg_shift_q1   .... ... . . . ...... .... .... . 1 . . .... \
> +                 &2reg_shift vm=%vm_dp vd=%vd_dp q=1

I'm not sure this part makes sense.  Correct, you cannot leave the q field
unset and continue to use &2reg_shift, but the insn field q is decode.  We wind
up with VSHRN having q=0 and VRSHRN having q=1, which is a distinction without
meaning.

While we could perhaps reasonably set q to a consistent constant, the only
driving reason to do so would be to share code with do_vector_2sh or
do_2shift_env_*.

But since we can't do that, due to the expansion algorithm, I think it would be
better to create a new &2reg_shift_nq that does not contain the q field.

The rest of the code looks good.


r~


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 07/10] target/arm: Convert Neon narrowing shifts with op==9 to decodetree
  2020-05-15 14:20 ` [PATCH 07/10] target/arm: Convert Neon narrowing shifts with op==9 " Peter Maydell
@ 2020-05-16  2:05   ` Richard Henderson
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2020-05-16  2:05 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/15/20 7:20 AM, Peter Maydell wrote:
> Convert the remaining Neon narrowing shifts to decodetree:
>   * VQSHRN
>   * VQRSHRN
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       |  32 ++++++++++
>  target/arm/translate-neon.inc.c |  15 +++++
>  target/arm/translate.c          | 110 +-------------------------------
>  3 files changed, 49 insertions(+), 108 deletions(-)

Modulo &2reg_shift_nq,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 08/10] target/arm: Convert Neon VSHLL, VMOVL to decodetree
  2020-05-15 14:20 ` [PATCH 08/10] target/arm: Convert Neon VSHLL, VMOVL " Peter Maydell
@ 2020-05-16  2:18   ` Richard Henderson
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2020-05-16  2:18 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/15/20 7:20 AM, Peter Maydell wrote:
> Convert the VSHLL and VMOVL insns from the 2-reg-shift group
> to decodetree. Since the loop always has two passes, we unroll
> it to avoid the awkward reassignment of one TCGv to another.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       | 14 ++++++
>  target/arm/translate-neon.inc.c | 81 +++++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 46 +------------------
>  3 files changed, 97 insertions(+), 44 deletions(-)

Modulo &2reg_shift_nq,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 09/10] target/arm: Convert VCVT fixed-point ops to decodetree
  2020-05-15 14:20 ` [PATCH 09/10] target/arm: Convert VCVT fixed-point ops " Peter Maydell
@ 2020-05-16  2:27   ` Richard Henderson
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2020-05-16  2:27 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/15/20 7:20 AM, Peter Maydell wrote:
> +# VCVT fixed<->float conversions
> +# TODO: FP16 fixed<->float conversions are opc==0b1100 and 0b1101
> +# We use size=0 for fp32 and size=1 for fp16 to match the 3-same encodings.
> +VCVT_SF_2sh      1111 001 0 1 . 1 shift:5   .... 1110 0 . . 1 .... \
> +                 @2reg_shift size=0

Maybe use %neon_rshift_i5 so you can drop

> +    /*
> +     * The decode doesn't include the must-be-1 top bit of imm6 in a->shift,
> +     * hence this 32-shift where the ARM ARM has 64-imm6.
> +     */
> +    shiftv = tcg_const_i32(32 - a->shift);

this.  Otherwise,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 10/10] target/arm: Convert Neon one-register-and-immediate insns to decodetree
  2020-05-15 14:20 ` [PATCH 10/10] target/arm: Convert Neon one-register-and-immediate insns " Peter Maydell
@ 2020-05-16  2:50   ` Richard Henderson
  2020-05-22 14:31     ` Peter Maydell
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2020-05-16  2:50 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 5/15/20 7:20 AM, Peter Maydell wrote:
> diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
> index f27fe769f85..f4eeb84541f 100644
> --- a/target/arm/translate-neon.inc.c
> +++ b/target/arm/translate-neon.inc.c
> @@ -1821,3 +1821,154 @@ DO_FP_2SH(VCVT_SF, gen_helper_vfp_sltos)
>  DO_FP_2SH(VCVT_UF, gen_helper_vfp_ultos)
>  DO_FP_2SH(VCVT_FS, gen_helper_vfp_tosls_round_to_zero)
>  DO_FP_2SH(VCVT_FU, gen_helper_vfp_touls_round_to_zero)
> +
> +static uint32_t asimd_imm_const(uint32_t imm, int cmode, int op)

It would be better to match AdvSIMDExpandImm and return uint64_t.

> +    case 14:
> +        imm |= (imm << 8) | (imm << 16) | (imm << 24);
> +        if (op) {
> +            imm = ~imm;
> +        }

You could then handle case 14 op == 1 properly here,

> +static bool trans_VMVN_14_1r(DisasContext *s, arg_1reg_imm *a)

and you wouldn't have to special case this at all.

> +{
> +  # Logic operations, ie not VMOV or VMVN: (cmode & 1) && cmode < 12
> +  VORR_1r        1111 001 . 1 . 000 ... .... 0001 0 . 0 1 .... \
> +                 @1reg_imm cmode=1 op=0
> +  VORR_1r        1111 001 . 1 . 000 ... .... 0011 0 . 0 1 .... \
> +                 @1reg_imm cmode=3 op=0
> +  VORR_1r        1111 001 . 1 . 000 ... .... 0101 0 . 0 1 .... \
> +                 @1reg_imm cmode=5 op=0
> +  VORR_1r        1111 001 . 1 . 000 ... .... 0111 0 . 0 1 .... \
> +                 @1reg_imm cmode=7 op=0
> +  VORR_1r        1111 001 . 1 . 000 ... .... 1001 0 . 0 1 .... \
> +                 @1reg_imm cmode=9 op=0
> +  VORR_1r        1111 001 . 1 . 000 ... .... 1011 0 . 0 1 .... \
> +                 @1reg_imm cmode=11 op=0
> +
> +  VBIC_1r        1111 001 . 1 . 000 ... .... 0001 0 . 1 1 .... \
> +                 @1reg_imm cmode=1 op=1
> +  VBIC_1r        1111 001 . 1 . 000 ... .... 0011 0 . 1 1 .... \
> +                 @1reg_imm cmode=3 op=1
> +  VBIC_1r        1111 001 . 1 . 000 ... .... 0101 0 . 1 1 .... \
> +                 @1reg_imm cmode=5 op=1
> +  VBIC_1r        1111 001 . 1 . 000 ... .... 0111 0 . 1 1 .... \
> +                 @1reg_imm cmode=7 op=1
> +  VBIC_1r        1111 001 . 1 . 000 ... .... 1001 0 . 1 1 .... \
> +                 @1reg_imm cmode=9 op=1
> +  VBIC_1r        1111 001 . 1 . 000 ... .... 1011 0 . 1 1 .... \
> +                 @1reg_imm cmode=11 op=1
> +
> +  # A VMVN special case: cmode == 14 op == 1
> +  VMVN_14_1r     1111 001 . 1 . 000 ... .... 1110 0 . 1 1 .... \
> +                 @1reg_imm cmode=14 op=1
> +
> +  # VMOV, VMVN: all other cmode/op combinations
> +  VMOV_1r        1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... \
> +                 @1reg_imm
> +}

I wonder if it's worth repeating VORR/VBIC so many times.
You can just as well do the (cmode & 1) && cmode < 12 check in the trans_ function.


r~


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 10/10] target/arm: Convert Neon one-register-and-immediate insns to decodetree
  2020-05-16  2:50   ` Richard Henderson
@ 2020-05-22 14:31     ` Peter Maydell
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Maydell @ 2020-05-22 14:31 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Sat, 16 May 2020 at 03:51, Richard Henderson
<richard.henderson@linaro.org> wrote:
> I wonder if it's worth repeating VORR/VBIC so many times.
> You can just as well do the (cmode & 1) && cmode < 12 check in the trans_ function.

OK; at that point we might as well just have a single
Vimm_1r pattern and distinguish VORR/VBIC/VMOV in the
trans function, rather than having three trans functions
which are all doing decode on cmode/op anyway.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, back to index

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-15 14:20 [PATCH 00/10] target/arm: Convert 2-reg-shift and 1-reg-imm Neon insns to decodetree Peter Maydell
2020-05-15 14:20 ` [PATCH 01/10] target/arm: Remove unused GEN_NEON_INTEGER_OP macro Peter Maydell
2020-05-15 22:07   ` Richard Henderson
2020-05-15 14:20 ` [PATCH 02/10] target/arm: Convert Neon VSHL and VSLI 2-reg-shift insn to decodetree Peter Maydell
2020-05-15 22:16   ` Richard Henderson
2020-05-15 14:20 ` [PATCH 03/10] target/arm: Convert Neon VSHR 2-reg-shift insns " Peter Maydell
2020-05-15 22:33   ` Richard Henderson
2020-05-15 22:48   ` Richard Henderson
2020-05-15 14:20 ` [PATCH 04/10] target/arm: Convert Neon VSRA, VSRI, VRSHR, VRSRA " Peter Maydell
2020-05-15 22:50   ` Richard Henderson
2020-05-15 14:20 ` [PATCH 05/10] target/arm: Convert VQSHLU, VQSHL " Peter Maydell
2020-05-15 22:55   ` Richard Henderson
2020-05-15 14:20 ` [PATCH 06/10] target/arm: Convert Neon narrowing shifts with op==8 " Peter Maydell
2020-05-16  2:01   ` Richard Henderson
2020-05-15 14:20 ` [PATCH 07/10] target/arm: Convert Neon narrowing shifts with op==9 " Peter Maydell
2020-05-16  2:05   ` Richard Henderson
2020-05-15 14:20 ` [PATCH 08/10] target/arm: Convert Neon VSHLL, VMOVL " Peter Maydell
2020-05-16  2:18   ` Richard Henderson
2020-05-15 14:20 ` [PATCH 09/10] target/arm: Convert VCVT fixed-point ops " Peter Maydell
2020-05-16  2:27   ` Richard Henderson
2020-05-15 14:20 ` [PATCH 10/10] target/arm: Convert Neon one-register-and-immediate insns " Peter Maydell
2020-05-16  2:50   ` Richard Henderson
2020-05-22 14:31     ` Peter Maydell
2020-05-15 21:32 ` [PATCH 00/10] target/arm: Convert 2-reg-shift and 1-reg-imm Neon " no-reply

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git