[PATCH 00/10] target/arm: Convert 2-reg-scalar to decodetree

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/10] target/arm: Convert 2-reg-scalar to decodetree
@ 2020-06-11 14:45 Peter Maydell
  2020-06-11 14:45 ` [PATCH 01/10] target/arm: Add 'static' and 'const' annotations to VSHLL function arrays Peter Maydell
                   ` (11 more replies)
  0 siblings, 12 replies; 25+ messages in thread
From: Peter Maydell @ 2020-06-11 14:45 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

This patchset converts the whole 2-reg-scalar group to decodetree,
together with the VEXT, VTBL, VTBX, VDUP insns which don't fall
into any particular group. The only remaining unconverted Neon
insns are now the "2 registers misc" group.

Based-on: 20200609160209.29960-1-peter.maydell@linaro.org
("target/arm: Convert Neon 3-reg-diff to decodetree")

The first two patches fix minor bugs in earlier parts of the conversion
that made it into master.

thanks
-- PMM

Peter Maydell (10):
  target/arm: Add 'static' and 'const' annotations to VSHLL function
    arrays
  target/arm: Add missing TCG temp free in do_2shift_env_64()
  target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree
  target/arm: Convert Neon 2-reg-scalar float multiplies to decodetree
  target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH to decodetree
  target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH to decodetree
  target/arm: Convert Neon 2-reg-scalar long multiplies to decodetree
  target/arm: Convert Neon VEXT to decodetree
  target/arm: Convert Neon VTBL, VTBX to decodetree
  target/arm: Convert Neon VDUP (scalar) to decodetree

 target/arm/neon-dp.decode       |  60 ++-
 target/arm/translate-neon.inc.c | 627 +++++++++++++++++++++++++++++++-
 target/arm/translate.c          | 468 +-----------------------
 3 files changed, 694 insertions(+), 461 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/10] target/arm: Add 'static' and 'const' annotations to VSHLL function arrays
  2020-06-11 14:45 [PATCH 00/10] target/arm: Convert 2-reg-scalar to decodetree Peter Maydell
@ 2020-06-11 14:45 ` Peter Maydell
  2020-06-11 15:41   ` Richard Henderson
  2020-06-11 14:45 ` [PATCH 02/10] target/arm: Add missing TCG temp free in do_2shift_env_64() Peter Maydell
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Peter Maydell @ 2020-06-11 14:45 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Mark the arrays of function pointers in trans_VSHLL_S_2sh() and
trans_VSHLL_U_2sh() as both 'static' and 'const'.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index c2cc10913f8..7c4888a80c9 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1644,7 +1644,7 @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
 
 static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
 {
-    NeonGenWidenFn *widenfn[] = {
+    static NeonGenWidenFn * const widenfn[] = {
         gen_helper_neon_widen_s8,
         gen_helper_neon_widen_s16,
         tcg_gen_ext_i32_i64,
@@ -1654,7 +1654,7 @@ static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
 
 static bool trans_VSHLL_U_2sh(DisasContext *s, arg_2reg_shift *a)
 {
-    NeonGenWidenFn *widenfn[] = {
+    static NeonGenWidenFn * const widenfn[] = {
         gen_helper_neon_widen_u8,
         gen_helper_neon_widen_u16,
         tcg_gen_extu_i32_i64,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/10] target/arm: Add missing TCG temp free in do_2shift_env_64()
  2020-06-11 14:45 [PATCH 00/10] target/arm: Convert 2-reg-scalar to decodetree Peter Maydell
  2020-06-11 14:45 ` [PATCH 01/10] target/arm: Add 'static' and 'const' annotations to VSHLL function arrays Peter Maydell
@ 2020-06-11 14:45 ` Peter Maydell
  2020-06-11 15:43   ` Richard Henderson
  2020-06-11 14:45 ` [PATCH 03/10] target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree Peter Maydell
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Peter Maydell @ 2020-06-11 14:45 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

In commit 37bfce81b10450071 we accidentally introduced a leak of a TCG
temporary in do_2shift_env_64(); free it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
My test setup wasn't looking for temporary-leak warnings (they are
not as easy to get at as they used to be because they only appear
if you enable qemu_log tracing for some other purpose). This is the
only one that snuck through, though.
---
 target/arm/translate-neon.inc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 7c4888a80c9..f2c241a87e9 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1329,6 +1329,7 @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
         neon_load_reg64(tmp, a->vm + pass);
         fn(tmp, cpu_env, tmp, constimm);
         neon_store_reg64(tmp, a->vd + pass);
+        tcg_temp_free_i64(tmp);
     }
     tcg_temp_free_i64(constimm);
     return true;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/10] target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree
  2020-06-11 14:45 [PATCH 00/10] target/arm: Convert 2-reg-scalar to decodetree Peter Maydell
  2020-06-11 14:45 ` [PATCH 01/10] target/arm: Add 'static' and 'const' annotations to VSHLL function arrays Peter Maydell
  2020-06-11 14:45 ` [PATCH 02/10] target/arm: Add missing TCG temp free in do_2shift_env_64() Peter Maydell
@ 2020-06-11 14:45 ` Peter Maydell
  2020-06-11 16:09   ` Richard Henderson
  2020-06-11 14:45 ` [PATCH 04/10] target/arm: Convert Neon 2-reg-scalar float " Peter Maydell
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Peter Maydell @ 2020-06-11 14:45 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the VMLA, VMLS and VMUL insns in the Neon "2 registers and a
scalar" group to decodetree.  These are 32x32->32 operations where
one of the inputs is the scalar, followed by a possible accumulate
operation of the 32-bit result.

The refactoring removes some of the oddities of the old decoder:
 * operands to the operation and accumulation were often
   reversed (taking advantage of the fact that most of these ops
   are commutative); the new code follows the pseudocode order
 * the Q bit in the insn was in a local variable 'u'; in the
   new code it is decoded into a->q

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  15 ++++
 target/arm/translate-neon.inc.c | 133 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  77 ++----------------
 3 files changed, 154 insertions(+), 71 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index ed49726abf5..983747b785f 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -467,5 +467,20 @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
     VQDMULL_3d   1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
 
     VMULL_P_3d   1111 001 0 1 . .. .... .... 1110 . 0 . 0 .... @3diff
+
+    ##################################################################
+    # 2-regs-plus-scalar grouping:
+    # 1111 001 Q 1 D sz!=11 Vn:4 Vd:4 opc:4 N 1 M 0 Vm:4
+    ##################################################################
+    &2scalar vm vn vd size q
+
+    @2scalar     .... ... q:1 . . size:2 .... .... .... . . . . .... \
+                 &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+    VMLA_2sc     1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
+
+    VMLS_2sc     1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
+
+    VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
   ]
 }
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index f2c241a87e9..478a0dd5c1d 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -2348,3 +2348,136 @@ static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
                        16, 16, 0, fn_gvec);
     return true;
 }
+
+static void gen_neon_dup_low16(TCGv_i32 var)
+{
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    tcg_gen_ext16u_i32(var, var);
+    tcg_gen_shli_i32(tmp, var, 16);
+    tcg_gen_or_i32(var, var, tmp);
+    tcg_temp_free_i32(tmp);
+}
+
+static void gen_neon_dup_high16(TCGv_i32 var)
+{
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    tcg_gen_andi_i32(var, var, 0xffff0000);
+    tcg_gen_shri_i32(tmp, var, 16);
+    tcg_gen_or_i32(var, var, tmp);
+    tcg_temp_free_i32(tmp);
+}
+
+static inline TCGv_i32 neon_get_scalar(int size, int reg)
+{
+    TCGv_i32 tmp;
+    if (size == 1) {
+        tmp = neon_load_reg(reg & 7, reg >> 4);
+        if (reg & 8) {
+            gen_neon_dup_high16(tmp);
+        } else {
+            gen_neon_dup_low16(tmp);
+        }
+    } else {
+        tmp = neon_load_reg(reg & 15, reg >> 4);
+    }
+    return tmp;
+}
+
+static bool do_2scalar(DisasContext *s, arg_2scalar *a,
+                       NeonGenTwoOpFn *opfn, NeonGenTwoOpFn *accfn)
+{
+    /*
+     * Two registers and a scalar: perform an operation between
+     * the input elements and the scalar, and then possibly
+     * perform an accumulation operation of that result into the
+     * destination.
+     */
+    TCGv_i32 scalar;
+    int pass;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!opfn) {
+        /* Bad size (including size == 3, which is a different insn group) */
+        return false;
+    }
+
+    if (a->q && ((a->vd | a->vn) & 1)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    scalar = neon_get_scalar(a->size, a->vm);
+
+    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
+        TCGv_i32 tmp = neon_load_reg(a->vn, pass);
+        opfn(tmp, tmp, scalar);
+        if (accfn) {
+            TCGv_i32 rd = neon_load_reg(a->vd, pass);
+            accfn(tmp, rd, tmp);
+            tcg_temp_free_i32(rd);
+        }
+        neon_store_reg(a->vd, pass, tmp);
+    }
+    tcg_temp_free_i32(scalar);
+    return true;
+}
+
+static bool trans_VMUL_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpFn * const opfn[] = {
+        NULL,
+        gen_helper_neon_mul_u16,
+        tcg_gen_mul_i32,
+        NULL,
+    };
+
+    return do_2scalar(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VMLA_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpFn * const opfn[] = {
+        NULL,
+        gen_helper_neon_mul_u16,
+        tcg_gen_mul_i32,
+        NULL,
+    };
+    static NeonGenTwoOpFn * const accfn[] = {
+        NULL,
+        gen_helper_neon_add_u16,
+        tcg_gen_add_i32,
+        NULL,
+    };
+
+    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
+}
+
+static bool trans_VMLS_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpFn * const opfn[] = {
+        NULL,
+        gen_helper_neon_mul_u16,
+        tcg_gen_mul_i32,
+        NULL,
+    };
+    static NeonGenTwoOpFn * const accfn[] = {
+        NULL,
+        gen_helper_neon_sub_u16,
+        tcg_gen_sub_i32,
+        NULL,
+    };
+
+    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index f459fad8646..e9cc237ef80 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -2624,24 +2624,6 @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
 #define VFP_DREG_N(reg, insn) VFP_DREG(reg, insn, 16,  7)
 #define VFP_DREG_M(reg, insn) VFP_DREG(reg, insn,  0,  5)
 
-static void gen_neon_dup_low16(TCGv_i32 var)
-{
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_ext16u_i32(var, var);
-    tcg_gen_shli_i32(tmp, var, 16);
-    tcg_gen_or_i32(var, var, tmp);
-    tcg_temp_free_i32(tmp);
-}
-
-static void gen_neon_dup_high16(TCGv_i32 var)
-{
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_andi_i32(var, var, 0xffff0000);
-    tcg_gen_shri_i32(tmp, var, 16);
-    tcg_gen_or_i32(var, var, tmp);
-    tcg_temp_free_i32(tmp);
-}
-
 static inline bool use_goto_tb(DisasContext *s, target_ulong dest)
 {
 #ifndef CONFIG_USER_ONLY
@@ -2991,26 +2973,6 @@ static void gen_exception_return(DisasContext *s, TCGv_i32 pc)
 
 #define CPU_V001 cpu_V0, cpu_V0, cpu_V1
 
-static inline void gen_neon_add(int size, TCGv_i32 t0, TCGv_i32 t1)
-{
-    switch (size) {
-    case 0: gen_helper_neon_add_u8(t0, t0, t1); break;
-    case 1: gen_helper_neon_add_u16(t0, t0, t1); break;
-    case 2: tcg_gen_add_i32(t0, t0, t1); break;
-    default: abort();
-    }
-}
-
-static inline void gen_neon_rsb(int size, TCGv_i32 t0, TCGv_i32 t1)
-{
-    switch (size) {
-    case 0: gen_helper_neon_sub_u8(t0, t1, t0); break;
-    case 1: gen_helper_neon_sub_u16(t0, t1, t0); break;
-    case 2: tcg_gen_sub_i32(t0, t1, t0); break;
-    default: return;
-    }
-}
-
 static TCGv_i32 neon_load_scratch(int scratch)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
@@ -3024,22 +2986,6 @@ static void neon_store_scratch(int scratch, TCGv_i32 var)
     tcg_temp_free_i32(var);
 }
 
-static inline TCGv_i32 neon_get_scalar(int size, int reg)
-{
-    TCGv_i32 tmp;
-    if (size == 1) {
-        tmp = neon_load_reg(reg & 7, reg >> 4);
-        if (reg & 8) {
-            gen_neon_dup_high16(tmp);
-        } else {
-            gen_neon_dup_low16(tmp);
-        }
-    } else {
-        tmp = neon_load_reg(reg & 15, reg >> 4);
-    }
-    return tmp;
-}
-
 static int gen_neon_unzip(int rd, int rm, int size, int q)
 {
     TCGv_ptr pd, pm;
@@ -5238,6 +5184,11 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     return 1;
                 }
                 switch (op) {
+                case 0: /* Integer VMLA scalar */
+                case 4: /* Integer VMLS scalar */
+                case 8: /* Integer VMUL scalar */
+                    return 1; /* handled by decodetree */
+
                 case 1: /* Float VMLA scalar */
                 case 5: /* Floating point VMLS scalar */
                 case 9: /* Floating point VMUL scalar */
@@ -5245,9 +5196,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         return 1;
                     }
                     /* fall through */
-                case 0: /* Integer VMLA scalar */
-                case 4: /* Integer VMLS scalar */
-                case 8: /* Integer VMUL scalar */
                 case 12: /* VQDMULH scalar */
                 case 13: /* VQRDMULH scalar */
                     if (u && ((rd | rn) & 1)) {
@@ -5270,26 +5218,16 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             } else {
                                 gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
                             }
-                        } else if (op & 1) {
+                        } else {
                             TCGv_ptr fpstatus = get_fpstatus_ptr(1);
                             gen_helper_vfp_muls(tmp, tmp, tmp2, fpstatus);
                             tcg_temp_free_ptr(fpstatus);
-                        } else {
-                            switch (size) {
-                            case 0: gen_helper_neon_mul_u8(tmp, tmp, tmp2); break;
-                            case 1: gen_helper_neon_mul_u16(tmp, tmp, tmp2); break;
-                            case 2: tcg_gen_mul_i32(tmp, tmp, tmp2); break;
-                            default: abort();
-                            }
                         }
                         tcg_temp_free_i32(tmp2);
                         if (op < 8) {
                             /* Accumulate.  */
                             tmp2 = neon_load_reg(rd, pass);
                             switch (op) {
-                            case 0:
-                                gen_neon_add(size, tmp, tmp2);
-                                break;
                             case 1:
                             {
                                 TCGv_ptr fpstatus = get_fpstatus_ptr(1);
@@ -5297,9 +5235,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                 tcg_temp_free_ptr(fpstatus);
                                 break;
                             }
-                            case 4:
-                                gen_neon_rsb(size, tmp, tmp2);
-                                break;
                             case 5:
                             {
                                 TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/10] target/arm: Convert Neon 2-reg-scalar float multiplies to decodetree
  2020-06-11 14:45 [PATCH 00/10] target/arm: Convert 2-reg-scalar to decodetree Peter Maydell
                   ` (2 preceding siblings ...)
  2020-06-11 14:45 ` [PATCH 03/10] target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree Peter Maydell
@ 2020-06-11 14:45 ` Peter Maydell
  2020-06-11 16:14   ` Richard Henderson
  2020-06-11 14:45 ` [PATCH 05/10] target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH " Peter Maydell
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Peter Maydell @ 2020-06-11 14:45 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the float versions of VMLA, VMLS and VMUL in the Neon
2-reg-scalar group to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
As noted in the comment on the WRAP_FP_FN macro, we could have
had a do_2scalar_fp() function, but for 3 insns it seemed
simpler to just do the wrapping to get hold of the fpstatus ptr.
(These are the only fp insns in the group.)
---
 target/arm/neon-dp.decode       |  3 ++
 target/arm/translate-neon.inc.c | 65 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 37 ++-----------------
 3 files changed, 71 insertions(+), 34 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 983747b785f..cc2ee9641d6 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -478,9 +478,12 @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
                  &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
     VMLA_2sc     1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
+    VMLA_F_2sc   1111 001 . 1 . .. .... .... 0001 . 1 . 0 .... @2scalar
 
     VMLS_2sc     1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
+    VMLS_F_2sc   1111 001 . 1 . .. .... .... 0101 . 1 . 0 .... @2scalar
 
     VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
+    VMUL_F_2sc   1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
   ]
 }
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 478a0dd5c1d..a5c7d60bdac 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -2481,3 +2481,68 @@ static bool trans_VMLS_2sc(DisasContext *s, arg_2scalar *a)
 
     return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
 }
+
+/*
+ * Rather than have a float-specific version of do_2scalar just for
+ * three insns, we wrap a NeonGenTwoSingleOpFn to turn it into
+ * a NeonGenTwoOpFn.
+ */
+#define WRAP_FP_FN(WRAPNAME, FUNC)                              \
+    static void WRAPNAME(TCGv_i32 rd, TCGv_i32 rn, TCGv_i32 rm) \
+    {                                                           \
+        TCGv_ptr fpstatus = get_fpstatus_ptr(1);                \
+        FUNC(rd, rn, rm, fpstatus);                             \
+        tcg_temp_free_ptr(fpstatus);                            \
+    }
+
+WRAP_FP_FN(gen_VMUL_F_mul, gen_helper_vfp_muls)
+WRAP_FP_FN(gen_VMUL_F_add, gen_helper_vfp_adds)
+WRAP_FP_FN(gen_VMUL_F_sub, gen_helper_vfp_subs)
+
+static bool trans_VMUL_F_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpFn * const opfn[] = {
+        NULL,
+        NULL, /* TODO: fp16 support */
+        gen_VMUL_F_mul,
+        NULL,
+    };
+
+    return do_2scalar(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VMLA_F_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpFn * const opfn[] = {
+        NULL,
+        NULL, /* TODO: fp16 support */
+        gen_VMUL_F_mul,
+        NULL,
+    };
+    static NeonGenTwoOpFn * const accfn[] = {
+        NULL,
+        NULL, /* TODO: fp16 support */
+        gen_VMUL_F_add,
+        NULL,
+    };
+
+    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
+}
+
+static bool trans_VMLS_F_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpFn * const opfn[] = {
+        NULL,
+        NULL, /* TODO: fp16 support */
+        gen_VMUL_F_mul,
+        NULL,
+    };
+    static NeonGenTwoOpFn * const accfn[] = {
+        NULL,
+        NULL, /* TODO: fp16 support */
+        gen_VMUL_F_sub,
+        NULL,
+    };
+
+    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index e9cc237ef80..e4a6a38c782 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5187,15 +5187,11 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 case 0: /* Integer VMLA scalar */
                 case 4: /* Integer VMLS scalar */
                 case 8: /* Integer VMUL scalar */
-                    return 1; /* handled by decodetree */
-
                 case 1: /* Float VMLA scalar */
                 case 5: /* Floating point VMLS scalar */
                 case 9: /* Floating point VMUL scalar */
-                    if (size == 1) {
-                        return 1;
-                    }
-                    /* fall through */
+                    return 1; /* handled by decodetree */
+
                 case 12: /* VQDMULH scalar */
                 case 13: /* VQRDMULH scalar */
                     if (u && ((rd | rn) & 1)) {
@@ -5212,41 +5208,14 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             } else {
                                 gen_helper_neon_qdmulh_s32(tmp, cpu_env, tmp, tmp2);
                             }
-                        } else if (op == 13) {
+                        } else {
                             if (size == 1) {
                                 gen_helper_neon_qrdmulh_s16(tmp, cpu_env, tmp, tmp2);
                             } else {
                                 gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
                             }
-                        } else {
-                            TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-                            gen_helper_vfp_muls(tmp, tmp, tmp2, fpstatus);
-                            tcg_temp_free_ptr(fpstatus);
                         }
                         tcg_temp_free_i32(tmp2);
-                        if (op < 8) {
-                            /* Accumulate.  */
-                            tmp2 = neon_load_reg(rd, pass);
-                            switch (op) {
-                            case 1:
-                            {
-                                TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-                                gen_helper_vfp_adds(tmp, tmp, tmp2, fpstatus);
-                                tcg_temp_free_ptr(fpstatus);
-                                break;
-                            }
-                            case 5:
-                            {
-                                TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-                                gen_helper_vfp_subs(tmp, tmp2, tmp, fpstatus);
-                                tcg_temp_free_ptr(fpstatus);
-                                break;
-                            }
-                            default:
-                                abort();
-                            }
-                            tcg_temp_free_i32(tmp2);
-                        }
                         neon_store_reg(rd, pass, tmp);
                     }
                     break;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/10] target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH to decodetree
  2020-06-11 14:45 [PATCH 00/10] target/arm: Convert 2-reg-scalar to decodetree Peter Maydell
                   ` (3 preceding siblings ...)
  2020-06-11 14:45 ` [PATCH 04/10] target/arm: Convert Neon 2-reg-scalar float " Peter Maydell
@ 2020-06-11 14:45 ` Peter Maydell
  2020-06-11 16:15   ` Richard Henderson
  2020-06-11 14:45 ` [PATCH 06/10] target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH " Peter Maydell
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Peter Maydell @ 2020-06-11 14:45 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the VQDMULH and VQRDMULH insns in the 2-reg-scalar group
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  3 +++
 target/arm/translate-neon.inc.c | 29 +++++++++++++++++++++++
 target/arm/translate.c          | 42 ++-------------------------------
 3 files changed, 34 insertions(+), 40 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index cc2ee9641d6..105cf2b2db3 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -485,5 +485,8 @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
 
     VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
     VMUL_F_2sc   1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
+
+    VQDMULH_2sc  1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
+    VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
   ]
 }
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index a5c7d60bdac..6301016d5cf 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -2546,3 +2546,32 @@ static bool trans_VMLS_F_2sc(DisasContext *s, arg_2scalar *a)
 
     return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
 }
+
+WRAP_ENV_FN(gen_VQDMULH_16, gen_helper_neon_qdmulh_s16)
+WRAP_ENV_FN(gen_VQDMULH_32, gen_helper_neon_qdmulh_s32)
+WRAP_ENV_FN(gen_VQRDMULH_16, gen_helper_neon_qrdmulh_s16)
+WRAP_ENV_FN(gen_VQRDMULH_32, gen_helper_neon_qrdmulh_s32)
+
+static bool trans_VQDMULH_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpFn * const opfn[] = {
+        NULL,
+        gen_VQDMULH_16,
+        gen_VQDMULH_32,
+        NULL,
+    };
+
+    return do_2scalar(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VQRDMULH_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpFn * const opfn[] = {
+        NULL,
+        gen_VQRDMULH_16,
+        gen_VQRDMULH_32,
+        NULL,
+    };
+
+    return do_2scalar(s, a, opfn[a->size], NULL);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index e4a6a38c782..5d0e00f92e0 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -2973,19 +2973,6 @@ static void gen_exception_return(DisasContext *s, TCGv_i32 pc)
 
 #define CPU_V001 cpu_V0, cpu_V0, cpu_V1
 
-static TCGv_i32 neon_load_scratch(int scratch)
-{
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_ld_i32(tmp, cpu_env, offsetof(CPUARMState, vfp.scratch[scratch]));
-    return tmp;
-}
-
-static void neon_store_scratch(int scratch, TCGv_i32 var)
-{
-    tcg_gen_st_i32(var, cpu_env, offsetof(CPUARMState, vfp.scratch[scratch]));
-    tcg_temp_free_i32(var);
-}
-
 static int gen_neon_unzip(int rd, int rm, int size, int q)
 {
     TCGv_ptr pd, pm;
@@ -5190,35 +5177,10 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 case 1: /* Float VMLA scalar */
                 case 5: /* Floating point VMLS scalar */
                 case 9: /* Floating point VMUL scalar */
-                    return 1; /* handled by decodetree */
-
                 case 12: /* VQDMULH scalar */
                 case 13: /* VQRDMULH scalar */
-                    if (u && ((rd | rn) & 1)) {
-                        return 1;
-                    }
-                    tmp = neon_get_scalar(size, rm);
-                    neon_store_scratch(0, tmp);
-                    for (pass = 0; pass < (u ? 4 : 2); pass++) {
-                        tmp = neon_load_scratch(0);
-                        tmp2 = neon_load_reg(rn, pass);
-                        if (op == 12) {
-                            if (size == 1) {
-                                gen_helper_neon_qdmulh_s16(tmp, cpu_env, tmp, tmp2);
-                            } else {
-                                gen_helper_neon_qdmulh_s32(tmp, cpu_env, tmp, tmp2);
-                            }
-                        } else {
-                            if (size == 1) {
-                                gen_helper_neon_qrdmulh_s16(tmp, cpu_env, tmp, tmp2);
-                            } else {
-                                gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
-                            }
-                        }
-                        tcg_temp_free_i32(tmp2);
-                        neon_store_reg(rd, pass, tmp);
-                    }
-                    break;
+                    return 1; /* handled by decodetree */
+
                 case 3: /* VQDMLAL scalar */
                 case 7: /* VQDMLSL scalar */
                 case 11: /* VQDMULL scalar */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/10] target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH to decodetree
  2020-06-11 14:45 [PATCH 00/10] target/arm: Convert 2-reg-scalar to decodetree Peter Maydell
                   ` (4 preceding siblings ...)
  2020-06-11 14:45 ` [PATCH 05/10] target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH " Peter Maydell
@ 2020-06-11 14:45 ` Peter Maydell
  2020-06-11 16:19   ` Richard Henderson
  2020-06-11 14:45 ` [PATCH 07/10] target/arm: Convert Neon 2-reg-scalar long multiplies " Peter Maydell
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Peter Maydell @ 2020-06-11 14:45 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the VQRDMLAH and VQRDMLSH insns in the 2-reg-scalar
group to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  3 ++
 target/arm/translate-neon.inc.c | 74 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 38 +----------------
 3 files changed, 79 insertions(+), 36 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 105cf2b2db3..7b069abe628 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -488,5 +488,8 @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
 
     VQDMULH_2sc  1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
     VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
+
+    VQRDMLAH_2sc 1111 001 . 1 . .. .... .... 1110 . 1 . 0 .... @2scalar
+    VQRDMLSH_2sc 1111 001 . 1 . .. .... .... 1111 . 1 . 0 .... @2scalar
   ]
 }
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 6301016d5cf..19344ac331d 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -2575,3 +2575,77 @@ static bool trans_VQRDMULH_2sc(DisasContext *s, arg_2scalar *a)
 
     return do_2scalar(s, a, opfn[a->size], NULL);
 }
+
+static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
+                            NeonGenThreeOpEnvFn *opfn)
+{
+    /*
+     * VQRDMLAH/VQRDMLSH: this is like do_2scalar, but the opfn
+     * performs a kind of fused op-then-accumulate using a helper
+     * function that takes all of rd, rn and the scalar at once.
+     */
+    TCGv_i32 scalar;
+    int pass;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    if (!dc_isar_feature(aa32_rdm, s)) {
+        return 1;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!opfn) {
+        /* Bad size (including size == 3, which is a different insn group) */
+        return false;
+    }
+
+    if (a->q && ((a->vd | a->vn) & 1)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    scalar = neon_get_scalar(a->size, a->vm);
+
+    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
+        TCGv_i32 rn = neon_load_reg(a->vn, pass);
+        TCGv_i32 rd = neon_load_reg(a->vd, pass);
+        opfn(rd, cpu_env, rn, scalar, rd);
+        tcg_temp_free_i32(rn);
+        neon_store_reg(a->vd, pass, rd);
+    }
+    tcg_temp_free_i32(scalar);
+
+    return true;
+}
+
+static bool trans_VQRDMLAH_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenThreeOpEnvFn *opfn[] = {
+        NULL,
+        gen_helper_neon_qrdmlah_s16,
+        gen_helper_neon_qrdmlah_s32,
+        NULL,
+    };
+    return do_vqrdmlah_2sc(s, a, opfn[a->size]);
+}
+
+static bool trans_VQRDMLSH_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenThreeOpEnvFn *opfn[] = {
+        NULL,
+        gen_helper_neon_qrdmlsh_s16,
+        gen_helper_neon_qrdmlsh_s32,
+        NULL,
+    };
+    return do_vqrdmlah_2sc(s, a, opfn[a->size]);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 5d0e00f92e0..f0db029f66d 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5179,6 +5179,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 case 9: /* Floating point VMUL scalar */
                 case 12: /* VQDMULH scalar */
                 case 13: /* VQRDMULH scalar */
+                case 14: /* VQRDMLAH scalar */
+                case 15: /* VQRDMLSH scalar */
                     return 1; /* handled by decodetree */
 
                 case 3: /* VQDMLAL scalar */
@@ -5238,42 +5240,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         neon_store_reg64(cpu_V0, rd + pass);
                     }
                     break;
-                case 14: /* VQRDMLAH scalar */
-                case 15: /* VQRDMLSH scalar */
-                    {
-                        NeonGenThreeOpEnvFn *fn;
-
-                        if (!dc_isar_feature(aa32_rdm, s)) {
-                            return 1;
-                        }
-                        if (u && ((rd | rn) & 1)) {
-                            return 1;
-                        }
-                        if (op == 14) {
-                            if (size == 1) {
-                                fn = gen_helper_neon_qrdmlah_s16;
-                            } else {
-                                fn = gen_helper_neon_qrdmlah_s32;
-                            }
-                        } else {
-                            if (size == 1) {
-                                fn = gen_helper_neon_qrdmlsh_s16;
-                            } else {
-                                fn = gen_helper_neon_qrdmlsh_s32;
-                            }
-                        }
-
-                        tmp2 = neon_get_scalar(size, rm);
-                        for (pass = 0; pass < (u ? 4 : 2); pass++) {
-                            tmp = neon_load_reg(rn, pass);
-                            tmp3 = neon_load_reg(rd, pass);
-                            fn(tmp, cpu_env, tmp, tmp2, tmp3);
-                            tcg_temp_free_i32(tmp3);
-                            neon_store_reg(rd, pass, tmp);
-                        }
-                        tcg_temp_free_i32(tmp2);
-                    }
-                    break;
                 default:
                     g_assert_not_reached();
                 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/10] target/arm: Convert Neon 2-reg-scalar long multiplies to decodetree
  2020-06-11 14:45 [PATCH 00/10] target/arm: Convert 2-reg-scalar to decodetree Peter Maydell
                   ` (5 preceding siblings ...)
  2020-06-11 14:45 ` [PATCH 06/10] target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH " Peter Maydell
@ 2020-06-11 14:45 ` Peter Maydell
  2020-06-11 16:25   ` Richard Henderson
  2020-06-11 14:45 ` [PATCH 08/10] target/arm: Convert Neon VEXT " Peter Maydell
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Peter Maydell @ 2020-06-11 14:45 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon 2-reg-scalar long multiplies to decodetree.
These are the last instructions in the group.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  18 ++++
 target/arm/translate-neon.inc.c | 163 ++++++++++++++++++++++++++++
 target/arm/translate.c          | 182 ++------------------------------
 3 files changed, 187 insertions(+), 176 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 7b069abe628..9ff182f56d7 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -476,16 +476,34 @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
 
     @2scalar     .... ... q:1 . . size:2 .... .... .... . . . . .... \
                  &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
+    # For the 'long' ops the Q bit is part of insn decode
+    @2scalar_q0  .... ... . . . size:2 .... .... .... . . . . .... \
+                 &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp q=0
 
     VMLA_2sc     1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
     VMLA_F_2sc   1111 001 . 1 . .. .... .... 0001 . 1 . 0 .... @2scalar
 
+    VMLAL_S_2sc  1111 001 0 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
+    VMLAL_U_2sc  1111 001 1 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
+
+    VQDMLAL_2sc  1111 001 0 1 . .. .... .... 0011 . 1 . 0 .... @2scalar_q0
+
     VMLS_2sc     1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
     VMLS_F_2sc   1111 001 . 1 . .. .... .... 0101 . 1 . 0 .... @2scalar
 
+    VMLSL_S_2sc  1111 001 0 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
+    VMLSL_U_2sc  1111 001 1 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
+
+    VQDMLSL_2sc  1111 001 0 1 . .. .... .... 0111 . 1 . 0 .... @2scalar_q0
+
     VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
     VMUL_F_2sc   1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
 
+    VMULL_S_2sc  1111 001 0 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
+    VMULL_U_2sc  1111 001 1 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
+
+    VQDMULL_2sc  1111 001 0 1 . .. .... .... 1011 . 1 . 0 .... @2scalar_q0
+
     VQDMULH_2sc  1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
     VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
 
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 19344ac331d..f054b567c8f 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -2649,3 +2649,166 @@ static bool trans_VQRDMLSH_2sc(DisasContext *s, arg_2scalar *a)
     };
     return do_vqrdmlah_2sc(s, a, opfn[a->size]);
 }
+
+static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
+                            NeonGenTwoOpWidenFn *opfn,
+                            NeonGenTwo64OpFn *accfn)
+{
+    /*
+     * Two registers and a scalar, long operations: perform an
+     * operation on the input elements and the scalar which produces
+     * a double-width result, and then possibly perform an accumulation
+     * operation of that result into the destination.
+     */
+    TCGv_i32 scalar, rn;
+    TCGv_i64 rn0_64, rn1_64;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!opfn) {
+        /* Bad size (including size == 3, which is a different insn group) */
+        return false;
+    }
+
+    if (a->vd & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    scalar = neon_get_scalar(a->size, a->vm);
+
+    /* Load all inputs before writing any outputs, in case of overlap */
+    rn = neon_load_reg(a->vn, 0);
+    rn0_64 = tcg_temp_new_i64();
+    opfn(rn0_64, rn, scalar);
+    tcg_temp_free_i32(rn);
+
+    rn = neon_load_reg(a->vn, 1);
+    rn1_64 = tcg_temp_new_i64();
+    opfn(rn1_64, rn, scalar);
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(scalar);
+
+    if (accfn) {
+        TCGv_i64 t64 = tcg_temp_new_i64();
+        neon_load_reg64(t64, a->vd);
+        accfn(t64, t64, rn0_64);
+        neon_store_reg64(t64, a->vd);
+        neon_load_reg64(t64, a->vd + 1);
+        accfn(t64, t64, rn1_64);
+        neon_store_reg64(t64, a->vd + 1);
+        tcg_temp_free_i64(t64);
+    } else {
+        neon_store_reg64(rn0_64, a->vd);
+        neon_store_reg64(rn1_64, a->vd + 1);
+    }
+    tcg_temp_free_i64(rn0_64);
+    tcg_temp_free_i64(rn1_64);
+    return true;
+}
+
+static bool trans_VMULL_S_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_helper_neon_mull_s16,
+        gen_mull_s32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VMULL_U_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_helper_neon_mull_u16,
+        gen_mull_u32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], NULL);
+}
+
+#define DO_VMLAL_2SC(INSN, MULL, ACC)                                   \
+    static bool trans_##INSN##_2sc(DisasContext *s, arg_2scalar *a)     \
+    {                                                                   \
+        static NeonGenTwoOpWidenFn * const opfn[] = {                   \
+            NULL,                                                       \
+            gen_helper_neon_##MULL##16,                                 \
+            gen_##MULL##32,                                             \
+            NULL,                                                       \
+        };                                                              \
+        static NeonGenTwo64OpFn * const accfn[] = {                     \
+            NULL,                                                       \
+            gen_helper_neon_##ACC##l_u32,                               \
+            tcg_gen_##ACC##_i64,                                        \
+            NULL,                                                       \
+        };                                                              \
+        return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);    \
+    }
+
+DO_VMLAL_2SC(VMLAL_S, mull_s, add)
+DO_VMLAL_2SC(VMLAL_U, mull_u, add)
+DO_VMLAL_2SC(VMLSL_S, mull_s, sub)
+DO_VMLAL_2SC(VMLSL_U, mull_u, sub)
+
+static bool trans_VQDMULL_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_VQDMULL_16,
+        gen_VQDMULL_32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VQDMLAL_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_VQDMULL_16,
+        gen_VQDMULL_32,
+        NULL,
+    };
+    static NeonGenTwo64OpFn * const accfn[] = {
+        NULL,
+        gen_VQDMLAL_acc_16,
+        gen_VQDMLAL_acc_32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
+}
+
+static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_VQDMULL_16,
+        gen_VQDMULL_32,
+        NULL,
+    };
+    static NeonGenTwo64OpFn * const accfn[] = {
+        NULL,
+        gen_VQDMLSL_acc_16,
+        gen_VQDMLSL_acc_32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index f0db029f66d..4d39bbf035b 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -377,43 +377,6 @@ static void gen_revsh(TCGv_i32 dest, TCGv_i32 var)
     tcg_gen_ext16s_i32(dest, var);
 }
 
-/* 32x32->64 multiply.  Marks inputs as dead.  */
-static TCGv_i64 gen_mulu_i64_i32(TCGv_i32 a, TCGv_i32 b)
-{
-    TCGv_i32 lo = tcg_temp_new_i32();
-    TCGv_i32 hi = tcg_temp_new_i32();
-    TCGv_i64 ret;
-
-    tcg_gen_mulu2_i32(lo, hi, a, b);
-    tcg_temp_free_i32(a);
-    tcg_temp_free_i32(b);
-
-    ret = tcg_temp_new_i64();
-    tcg_gen_concat_i32_i64(ret, lo, hi);
-    tcg_temp_free_i32(lo);
-    tcg_temp_free_i32(hi);
-
-    return ret;
-}
-
-static TCGv_i64 gen_muls_i64_i32(TCGv_i32 a, TCGv_i32 b)
-{
-    TCGv_i32 lo = tcg_temp_new_i32();
-    TCGv_i32 hi = tcg_temp_new_i32();
-    TCGv_i64 ret;
-
-    tcg_gen_muls2_i32(lo, hi, a, b);
-    tcg_temp_free_i32(a);
-    tcg_temp_free_i32(b);
-
-    ret = tcg_temp_new_i64();
-    tcg_gen_concat_i32_i64(ret, lo, hi);
-    tcg_temp_free_i32(lo);
-    tcg_temp_free_i32(hi);
-
-    return ret;
-}
-
 /* Swap low and high halfwords.  */
 static void gen_swap_half(TCGv_i32 var)
 {
@@ -3164,58 +3127,6 @@ static inline void gen_neon_addl(int size)
     }
 }
 
-static inline void gen_neon_negl(TCGv_i64 var, int size)
-{
-    switch (size) {
-    case 0: gen_helper_neon_negl_u16(var, var); break;
-    case 1: gen_helper_neon_negl_u32(var, var); break;
-    case 2:
-        tcg_gen_neg_i64(var, var);
-        break;
-    default: abort();
-    }
-}
-
-static inline void gen_neon_addl_saturate(TCGv_i64 op0, TCGv_i64 op1, int size)
-{
-    switch (size) {
-    case 1: gen_helper_neon_addl_saturate_s32(op0, cpu_env, op0, op1); break;
-    case 2: gen_helper_neon_addl_saturate_s64(op0, cpu_env, op0, op1); break;
-    default: abort();
-    }
-}
-
-static inline void gen_neon_mull(TCGv_i64 dest, TCGv_i32 a, TCGv_i32 b,
-                                 int size, int u)
-{
-    TCGv_i64 tmp;
-
-    switch ((size << 1) | u) {
-    case 0: gen_helper_neon_mull_s8(dest, a, b); break;
-    case 1: gen_helper_neon_mull_u8(dest, a, b); break;
-    case 2: gen_helper_neon_mull_s16(dest, a, b); break;
-    case 3: gen_helper_neon_mull_u16(dest, a, b); break;
-    case 4:
-        tmp = gen_muls_i64_i32(a, b);
-        tcg_gen_mov_i64(dest, tmp);
-        tcg_temp_free_i64(tmp);
-        break;
-    case 5:
-        tmp = gen_mulu_i64_i32(a, b);
-        tcg_gen_mov_i64(dest, tmp);
-        tcg_temp_free_i64(tmp);
-        break;
-    default: abort();
-    }
-
-    /* gen_helper_neon_mull_[su]{8|16} do not free their parameters.
-       Don't forget to clean them now.  */
-    if (size < 2) {
-        tcg_temp_free_i32(a);
-        tcg_temp_free_i32(b);
-    }
-}
-
 static void gen_neon_narrow_op(int op, int u, int size,
                                TCGv_i32 dest, TCGv_i64 src)
 {
@@ -5120,7 +5031,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     int u;
     int vec_size;
     uint32_t imm;
-    TCGv_i32 tmp, tmp2, tmp3, tmp4, tmp5;
+    TCGv_i32 tmp, tmp2, tmp3, tmp5;
     TCGv_ptr ptr1;
     TCGv_i64 tmp64;
 
@@ -5158,92 +5069,11 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         return 1;
     } else { /* (insn & 0x00800010 == 0x00800000) */
         if (size != 3) {
-            op = (insn >> 8) & 0xf;
-            if ((insn & (1 << 6)) == 0) {
-                /* Three registers of different lengths: handled by decodetree */
-                return 1;
-            } else {
-                /* Two registers and a scalar. NB that for ops of this form
-                 * the ARM ARM labels bit 24 as Q, but it is in our variable
-                 * 'u', not 'q'.
-                 */
-                if (size == 0) {
-                    return 1;
-                }
-                switch (op) {
-                case 0: /* Integer VMLA scalar */
-                case 4: /* Integer VMLS scalar */
-                case 8: /* Integer VMUL scalar */
-                case 1: /* Float VMLA scalar */
-                case 5: /* Floating point VMLS scalar */
-                case 9: /* Floating point VMUL scalar */
-                case 12: /* VQDMULH scalar */
-                case 13: /* VQRDMULH scalar */
-                case 14: /* VQRDMLAH scalar */
-                case 15: /* VQRDMLSH scalar */
-                    return 1; /* handled by decodetree */
-
-                case 3: /* VQDMLAL scalar */
-                case 7: /* VQDMLSL scalar */
-                case 11: /* VQDMULL scalar */
-                    if (u == 1) {
-                        return 1;
-                    }
-                    /* fall through */
-                case 2: /* VMLAL sclar */
-                case 6: /* VMLSL scalar */
-                case 10: /* VMULL scalar */
-                    if (rd & 1) {
-                        return 1;
-                    }
-                    tmp2 = neon_get_scalar(size, rm);
-                    /* We need a copy of tmp2 because gen_neon_mull
-                     * deletes it during pass 0.  */
-                    tmp4 = tcg_temp_new_i32();
-                    tcg_gen_mov_i32(tmp4, tmp2);
-                    tmp3 = neon_load_reg(rn, 1);
-
-                    for (pass = 0; pass < 2; pass++) {
-                        if (pass == 0) {
-                            tmp = neon_load_reg(rn, 0);
-                        } else {
-                            tmp = tmp3;
-                            tmp2 = tmp4;
-                        }
-                        gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
-                        if (op != 11) {
-                            neon_load_reg64(cpu_V1, rd + pass);
-                        }
-                        switch (op) {
-                        case 6:
-                            gen_neon_negl(cpu_V0, size);
-                            /* Fall through */
-                        case 2:
-                            gen_neon_addl(size);
-                            break;
-                        case 3: case 7:
-                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
-                            if (op == 7) {
-                                gen_neon_negl(cpu_V0, size);
-                            }
-                            gen_neon_addl_saturate(cpu_V0, cpu_V1, size);
-                            break;
-                        case 10:
-                            /* no-op */
-                            break;
-                        case 11:
-                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
-                            break;
-                        default:
-                            abort();
-                        }
-                        neon_store_reg64(cpu_V0, rd + pass);
-                    }
-                    break;
-                default:
-                    g_assert_not_reached();
-                }
-            }
+            /*
+             * Three registers of different lengths, or two registers and
+             * a scalar: handled by decodetree
+             */
+            return 1;
         } else { /* size == 3 */
             if (!u) {
                 /* Extract.  */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/10] target/arm: Convert Neon VEXT to decodetree
  2020-06-11 14:45 [PATCH 00/10] target/arm: Convert 2-reg-scalar to decodetree Peter Maydell
                   ` (6 preceding siblings ...)
  2020-06-11 14:45 ` [PATCH 07/10] target/arm: Convert Neon 2-reg-scalar long multiplies " Peter Maydell
@ 2020-06-11 14:45 ` Peter Maydell
  2020-06-11 16:26   ` Richard Henderson
  2020-06-11 14:45 ` [PATCH 09/10] target/arm: Convert Neon VTBL, VTBX " Peter Maydell
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Peter Maydell @ 2020-06-11 14:45 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon VEXT insn to decodetree. Rather than keeping the
old implementation which used fixed temporaries cpu_V0 and cpu_V1
and did the extraction with by-hand shift and logic ops, we use
the TCG extract2 insn.

We don't need to special case 0 or 8 immediates any more as the
optimizer is smart enough to throw away the dead code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  8 +++-
 target/arm/translate-neon.inc.c | 76 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 58 +------------------------
 3 files changed, 85 insertions(+), 57 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 9ff182f56d7..26d60220168 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -413,7 +413,13 @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
 # return false for size==3.
 ######################################################################
 {
-  # 0b11 subgroup will go here
+  [
+    ##################################################################
+    # Miscellaneous size=0b11 insns
+    ##################################################################
+    VEXT         1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
+  ]
 
   # Subgroup for size != 0b11
   [
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index f054b567c8f..ba0e7091e1a 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -2812,3 +2812,79 @@ static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
 
     return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
 }
+
+static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
+{
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (a->imm > 7 && !a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (!a->q) {
+        /* Extract 64 bits from <Vm:Vn> */
+        TCGv_i64 left, right, dest;
+
+        left = tcg_temp_new_i64();
+        right = tcg_temp_new_i64();
+        dest = tcg_temp_new_i64();
+
+        neon_load_reg64(right, a->vn);
+        neon_load_reg64(left, a->vm);
+        tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
+        neon_store_reg64(dest, a->vd);
+
+        tcg_temp_free_i64(left);
+        tcg_temp_free_i64(right);
+        tcg_temp_free_i64(dest);
+    } else {
+        /* Extract 128 bits from <Vm+1:Vm:Vn+1:Vn> */
+        TCGv_i64 left, middle, right, destleft, destright;
+
+        left = tcg_temp_new_i64();
+        middle = tcg_temp_new_i64();
+        right = tcg_temp_new_i64();
+        destleft = tcg_temp_new_i64();
+        destright = tcg_temp_new_i64();
+
+        if (a->imm < 8) {
+            neon_load_reg64(right, a->vn);
+            neon_load_reg64(middle, a->vn + 1);
+            tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
+            neon_load_reg64(left, a->vm);
+            tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
+        } else {
+            neon_load_reg64(right, a->vn + 1);
+            neon_load_reg64(middle, a->vm);
+            tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
+            neon_load_reg64(left, a->vm + 1);
+            tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
+        }
+
+        neon_store_reg64(destright, a->vd);
+        neon_store_reg64(destleft, a->vd + 1);
+
+        tcg_temp_free_i64(destright);
+        tcg_temp_free_i64(destleft);
+        tcg_temp_free_i64(right);
+        tcg_temp_free_i64(middle);
+        tcg_temp_free_i64(left);
+    }
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 4d39bbf035b..a0822dba5e2 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5030,10 +5030,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     int pass;
     int u;
     int vec_size;
-    uint32_t imm;
     TCGv_i32 tmp, tmp2, tmp3, tmp5;
     TCGv_ptr ptr1;
-    TCGv_i64 tmp64;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return 1;
@@ -5076,60 +5074,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             return 1;
         } else { /* size == 3 */
             if (!u) {
-                /* Extract.  */
-                imm = (insn >> 8) & 0xf;
-
-                if (imm > 7 && !q)
-                    return 1;
-
-                if (q && ((rd | rn | rm) & 1)) {
-                    return 1;
-                }
-
-                if (imm == 0) {
-                    neon_load_reg64(cpu_V0, rn);
-                    if (q) {
-                        neon_load_reg64(cpu_V1, rn + 1);
-                    }
-                } else if (imm == 8) {
-                    neon_load_reg64(cpu_V0, rn + 1);
-                    if (q) {
-                        neon_load_reg64(cpu_V1, rm);
-                    }
-                } else if (q) {
-                    tmp64 = tcg_temp_new_i64();
-                    if (imm < 8) {
-                        neon_load_reg64(cpu_V0, rn);
-                        neon_load_reg64(tmp64, rn + 1);
-                    } else {
-                        neon_load_reg64(cpu_V0, rn + 1);
-                        neon_load_reg64(tmp64, rm);
-                    }
-                    tcg_gen_shri_i64(cpu_V0, cpu_V0, (imm & 7) * 8);
-                    tcg_gen_shli_i64(cpu_V1, tmp64, 64 - ((imm & 7) * 8));
-                    tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
-                    if (imm < 8) {
-                        neon_load_reg64(cpu_V1, rm);
-                    } else {
-                        neon_load_reg64(cpu_V1, rm + 1);
-                        imm -= 8;
-                    }
-                    tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
-                    tcg_gen_shri_i64(tmp64, tmp64, imm * 8);
-                    tcg_gen_or_i64(cpu_V1, cpu_V1, tmp64);
-                    tcg_temp_free_i64(tmp64);
-                } else {
-                    /* BUGFIX */
-                    neon_load_reg64(cpu_V0, rn);
-                    tcg_gen_shri_i64(cpu_V0, cpu_V0, imm * 8);
-                    neon_load_reg64(cpu_V1, rm);
-                    tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
-                    tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
-                }
-                neon_store_reg64(cpu_V0, rd);
-                if (q) {
-                    neon_store_reg64(cpu_V1, rd + 1);
-                }
+                /* Extract: handled by decodetree */
+                return 1;
             } else if ((insn & (1 << 11)) == 0) {
                 /* Two register misc.  */
                 op = ((insn >> 12) & 0x30) | ((insn >> 7) & 0xf);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/10] target/arm: Convert Neon VTBL, VTBX to decodetree
  2020-06-11 14:45 [PATCH 00/10] target/arm: Convert 2-reg-scalar to decodetree Peter Maydell
                   ` (7 preceding siblings ...)
  2020-06-11 14:45 ` [PATCH 08/10] target/arm: Convert Neon VEXT " Peter Maydell
@ 2020-06-11 14:45 ` Peter Maydell
  2020-06-11 16:43   ` Richard Henderson
  2020-06-11 16:45   ` Richard Henderson
  2020-06-11 14:45 ` [PATCH 10/10] target/arm: Convert Neon VDUP (scalar) " Peter Maydell
                   ` (2 subsequent siblings)
  11 siblings, 2 replies; 25+ messages in thread
From: Peter Maydell @ 2020-06-11 14:45 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon VTBL, VTBX instructions to decodetree.  The actual
implementation of the insn is copied across to the new trans function
unchanged except for renaming 'tmp5' to 'tmp4'.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  3 ++
 target/arm/translate-neon.inc.c | 56 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 41 +++---------------------
 3 files changed, 63 insertions(+), 37 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 26d60220168..91bc770dfbc 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -419,6 +419,9 @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
     ##################################################################
     VEXT         1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
                  vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+    VTBL         1111 001 1 1 . 11 .... .... 10 len:2 . op:1 . 0 .... \
+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
   ]
 
   # Subgroup for size != 0b11
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index ba0e7091e1a..83d5c3eae31 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -2888,3 +2888,59 @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
     }
     return true;
 }
+
+static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
+{
+    int n;
+    TCGv_i32 tmp, tmp2, tmp3, tmp4;
+    TCGv_ptr ptr1;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    n = a->len + 1;
+    if ((a->vn + n) > 32) {
+        /*
+         * This is UNPREDICTABLE; we choose to UNDEF to avoid the
+         * helper function running off the end of the register file.
+         */
+        return 1;
+    }
+    n <<= 3;
+    if (a->op) {
+        tmp = neon_load_reg(a->vd, 0);
+    } else {
+        tmp = tcg_temp_new_i32();
+        tcg_gen_movi_i32(tmp, 0);
+    }
+    tmp2 = neon_load_reg(a->vm, 0);
+    ptr1 = vfp_reg_ptr(true, a->vn);
+    tmp4 = tcg_const_i32(n);
+    gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
+    tcg_temp_free_i32(tmp);
+    if (a->op) {
+        tmp = neon_load_reg(a->vd, 1);
+    } else {
+        tmp = tcg_temp_new_i32();
+        tcg_gen_movi_i32(tmp, 0);
+    }
+    tmp3 = neon_load_reg(a->vm, 1);
+    gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
+    tcg_temp_free_i32(tmp4);
+    tcg_temp_free_ptr(ptr1);
+    neon_store_reg(a->vd, 0, tmp2);
+    neon_store_reg(a->vd, 1, tmp3);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index a0822dba5e2..0c6425928f6 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5025,13 +5025,12 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
 {
     int op;
     int q;
-    int rd, rn, rm, rd_ofs, rm_ofs;
+    int rd, rm, rd_ofs, rm_ofs;
     int size;
     int pass;
     int u;
     int vec_size;
-    TCGv_i32 tmp, tmp2, tmp3, tmp5;
-    TCGv_ptr ptr1;
+    TCGv_i32 tmp, tmp2, tmp3;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return 1;
@@ -5052,7 +5051,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     q = (insn & (1 << 6)) != 0;
     u = (insn >> 24) & 1;
     VFP_DREG_D(rd, insn);
-    VFP_DREG_N(rn, insn);
     VFP_DREG_M(rm, insn);
     size = (insn >> 20) & 3;
     vec_size = q ? 16 : 8;
@@ -5577,39 +5575,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     break;
                 }
             } else if ((insn & (1 << 10)) == 0) {
-                /* VTBL, VTBX.  */
-                int n = ((insn >> 8) & 3) + 1;
-                if ((rn + n) > 32) {
-                    /* This is UNPREDICTABLE; we choose to UNDEF to avoid the
-                     * helper function running off the end of the register file.
-                     */
-                    return 1;
-                }
-                n <<= 3;
-                if (insn & (1 << 6)) {
-                    tmp = neon_load_reg(rd, 0);
-                } else {
-                    tmp = tcg_temp_new_i32();
-                    tcg_gen_movi_i32(tmp, 0);
-                }
-                tmp2 = neon_load_reg(rm, 0);
-                ptr1 = vfp_reg_ptr(true, rn);
-                tmp5 = tcg_const_i32(n);
-                gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp5);
-                tcg_temp_free_i32(tmp);
-                if (insn & (1 << 6)) {
-                    tmp = neon_load_reg(rd, 1);
-                } else {
-                    tmp = tcg_temp_new_i32();
-                    tcg_gen_movi_i32(tmp, 0);
-                }
-                tmp3 = neon_load_reg(rm, 1);
-                gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp5);
-                tcg_temp_free_i32(tmp5);
-                tcg_temp_free_ptr(ptr1);
-                neon_store_reg(rd, 0, tmp2);
-                neon_store_reg(rd, 1, tmp3);
-                tcg_temp_free_i32(tmp);
+                /* VTBL, VTBX: handled by decodetree */
+                return 1;
             } else if ((insn & 0x380) == 0) {
                 /* VDUP */
                 int element;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/10] target/arm: Convert Neon VDUP (scalar) to decodetree
  2020-06-11 14:45 [PATCH 00/10] target/arm: Convert 2-reg-scalar to decodetree Peter Maydell
                   ` (8 preceding siblings ...)
  2020-06-11 14:45 ` [PATCH 09/10] target/arm: Convert Neon VTBL, VTBX " Peter Maydell
@ 2020-06-11 14:45 ` Peter Maydell
  2020-06-11 16:48   ` Richard Henderson
  2020-06-11 23:16 ` [PATCH 00/10] target/arm: Convert 2-reg-scalar " no-reply
  2020-06-11 23:42 ` no-reply
  11 siblings, 1 reply; 25+ messages in thread
From: Peter Maydell @ 2020-06-11 14:45 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon VDUP (scalar) insn to decodetree.  (Note that we
can't call this just "VDUP" as we used that already in vfp.decode for
the "VDUP (general purpose register" insn.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  7 +++++++
 target/arm/translate-neon.inc.c | 26 ++++++++++++++++++++++++++
 target/arm/translate.c          | 25 +------------------------
 3 files changed, 34 insertions(+), 24 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 91bc770dfbc..6d890b2161f 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -422,6 +422,13 @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
 
     VTBL         1111 001 1 1 . 11 .... .... 10 len:2 . op:1 . 0 .... \
                  vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+    VDUP_scalar  1111 001 1 1 . 11 index:3 1 .... 11 000 q:1 . 0 .... \
+                 vm=%vm_dp vd=%vd_dp size=0
+    VDUP_scalar  1111 001 1 1 . 11 index:2 10 .... 11 000 q:1 . 0 .... \
+                 vm=%vm_dp vd=%vd_dp size=1
+    VDUP_scalar  1111 001 1 1 . 11 index:1 100 .... 11 000 q:1 . 0 .... \
+                 vm=%vm_dp vd=%vd_dp size=2
   ]
 
   # Subgroup for size != 0b11
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 83d5c3eae31..cfbc546f484 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -2944,3 +2944,29 @@ static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
     tcg_temp_free_i32(tmp);
     return true;
 }
+
+static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
+{
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (a->vd & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tcg_gen_gvec_dup_mem(a->size, neon_reg_offset(a->vd, 0),
+                         neon_element_offset(a->vm, a->index, a->size),
+                         a->q ? 16 : 8, a->q ? 16 : 8);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 0c6425928f6..6d18892adee 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5574,31 +5574,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     }
                     break;
                 }
-            } else if ((insn & (1 << 10)) == 0) {
-                /* VTBL, VTBX: handled by decodetree */
-                return 1;
-            } else if ((insn & 0x380) == 0) {
-                /* VDUP */
-                int element;
-                MemOp size;
-
-                if ((insn & (7 << 16)) == 0 || (q && (rd & 1))) {
-                    return 1;
-                }
-                if (insn & (1 << 16)) {
-                    size = MO_8;
-                    element = (insn >> 17) & 7;
-                } else if (insn & (1 << 17)) {
-                    size = MO_16;
-                    element = (insn >> 18) & 3;
-                } else {
-                    size = MO_32;
-                    element = (insn >> 19) & 1;
-                }
-                tcg_gen_gvec_dup_mem(size, neon_reg_offset(rd, 0),
-                                     neon_element_offset(rm, element, size),
-                                     q ? 16 : 8, q ? 16 : 8);
             } else {
+                /* VTBL, VTBX, VDUP: handled by decodetree */
                 return 1;
             }
         }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 01/10] target/arm: Add 'static' and 'const' annotations to VSHLL function arrays
  2020-06-11 14:45 ` [PATCH 01/10] target/arm: Add 'static' and 'const' annotations to VSHLL function arrays Peter Maydell
@ 2020-06-11 15:41   ` Richard Henderson
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2020-06-11 15:41 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/11/20 7:45 AM, Peter Maydell wrote:
> Mark the arrays of function pointers in trans_VSHLL_S_2sh() and
> trans_VSHLL_U_2sh() as both 'static' and 'const'.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 02/10] target/arm: Add missing TCG temp free in do_2shift_env_64()
  2020-06-11 14:45 ` [PATCH 02/10] target/arm: Add missing TCG temp free in do_2shift_env_64() Peter Maydell
@ 2020-06-11 15:43   ` Richard Henderson
  2020-06-11 17:05     ` Peter Maydell
  0 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2020-06-11 15:43 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/11/20 7:45 AM, Peter Maydell wrote:
> In commit 37bfce81b10450071 we accidentally introduced a leak of a TCG
> temporary in do_2shift_env_64(); free it.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> My test setup wasn't looking for temporary-leak warnings (they are
> not as easy to get at as they used to be because they only appear
> if you enable qemu_log tracing for some other purpose). This is the
> only one that snuck through, though.
> ---
>  target/arm/translate-neon.inc.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
> index 7c4888a80c9..f2c241a87e9 100644
> --- a/target/arm/translate-neon.inc.c
> +++ b/target/arm/translate-neon.inc.c
> @@ -1329,6 +1329,7 @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
>          neon_load_reg64(tmp, a->vm + pass);
>          fn(tmp, cpu_env, tmp, constimm);
>          neon_store_reg64(tmp, a->vd + pass);
> +        tcg_temp_free_i64(tmp);

Huh.  I thought all the a32 stores magically freed their inputs.  I guess
that's just the general-purpose registers.  Anyway,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 03/10] target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree
  2020-06-11 14:45 ` [PATCH 03/10] target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree Peter Maydell
@ 2020-06-11 16:09   ` Richard Henderson
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2020-06-11 16:09 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/11/20 7:45 AM, Peter Maydell wrote:
> Convert the VMLA, VMLS and VMUL insns in the Neon "2 registers and a
> scalar" group to decodetree.  These are 32x32->32 operations where
> one of the inputs is the scalar, followed by a possible accumulate
> operation of the 32-bit result.
> 
> The refactoring removes some of the oddities of the old decoder:
>  * operands to the operation and accumulation were often
>    reversed (taking advantage of the fact that most of these ops
>    are commutative); the new code follows the pseudocode order
>  * the Q bit in the insn was in a local variable 'u'; in the
>    new code it is decoded into a->q
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


> +static void gen_neon_dup_low16(TCGv_i32 var)
> +{
> +    TCGv_i32 tmp = tcg_temp_new_i32();
> +    tcg_gen_ext16u_i32(var, var);
> +    tcg_gen_shli_i32(tmp, var, 16);
> +    tcg_gen_or_i32(var, var, tmp);
> +    tcg_temp_free_i32(tmp);
> +}
> +
> +static void gen_neon_dup_high16(TCGv_i32 var)
> +{
> +    TCGv_i32 tmp = tcg_temp_new_i32();
> +    tcg_gen_andi_i32(var, var, 0xffff0000);
> +    tcg_gen_shri_i32(tmp, var, 16);
> +    tcg_gen_or_i32(var, var, tmp);
> +    tcg_temp_free_i32(tmp);
> +}

I was going to quibble about this implementation, but see that it's a straight
move from translate.c.

The real TODO should be a conversion to tcg_gen_gvec_2s(), so that we use real
vector multiplies and adds here, with the scalar duped across the vector, not
just an i32.


r~


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 04/10] target/arm: Convert Neon 2-reg-scalar float multiplies to decodetree
  2020-06-11 14:45 ` [PATCH 04/10] target/arm: Convert Neon 2-reg-scalar float " Peter Maydell
@ 2020-06-11 16:14   ` Richard Henderson
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2020-06-11 16:14 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/11/20 7:45 AM, Peter Maydell wrote:
> Convert the float versions of VMLA, VMLS and VMUL in the Neon
> 2-reg-scalar group to decodetree.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> As noted in the comment on the WRAP_FP_FN macro, we could have
> had a do_2scalar_fp() function, but for 3 insns it seemed
> simpler to just do the wrapping to get hold of the fpstatus ptr.
> (These are the only fp insns in the group.)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 05/10] target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH to decodetree
  2020-06-11 14:45 ` [PATCH 05/10] target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH " Peter Maydell
@ 2020-06-11 16:15   ` Richard Henderson
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2020-06-11 16:15 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/11/20 7:45 AM, Peter Maydell wrote:
> Convert the VQDMULH and VQRDMULH insns in the 2-reg-scalar group
> to decodetree.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       |  3 +++
>  target/arm/translate-neon.inc.c | 29 +++++++++++++++++++++++
>  target/arm/translate.c          | 42 ++-------------------------------
>  3 files changed, 34 insertions(+), 40 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 06/10] target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH to decodetree
  2020-06-11 14:45 ` [PATCH 06/10] target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH " Peter Maydell
@ 2020-06-11 16:19   ` Richard Henderson
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2020-06-11 16:19 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/11/20 7:45 AM, Peter Maydell wrote:
> Convert the VQRDMLAH and VQRDMLSH insns in the 2-reg-scalar
> group to decodetree.
...
> +static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
> +                            NeonGenThreeOpEnvFn *opfn)
> +{
> +    /*
> +     * VQRDMLAH/VQRDMLSH: this is like do_2scalar, but the opfn
> +     * performs a kind of fused op-then-accumulate using a helper
> +     * function that takes all of rd, rn and the scalar at once.
> +     */
> +    TCGv_i32 scalar;
> +    int pass;
> +
> +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
> +        return false;
> +    }
> +
> +    if (!dc_isar_feature(aa32_rdm, s)) {
> +        return 1;
> +    }

return false;

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 07/10] target/arm: Convert Neon 2-reg-scalar long multiplies to decodetree
  2020-06-11 14:45 ` [PATCH 07/10] target/arm: Convert Neon 2-reg-scalar long multiplies " Peter Maydell
@ 2020-06-11 16:25   ` Richard Henderson
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2020-06-11 16:25 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/11/20 7:45 AM, Peter Maydell wrote:
> Convert the Neon 2-reg-scalar long multiplies to decodetree.
> These are the last instructions in the group.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       |  18 ++++
>  target/arm/translate-neon.inc.c | 163 ++++++++++++++++++++++++++++
>  target/arm/translate.c          | 182 ++------------------------------
>  3 files changed, 187 insertions(+), 176 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 08/10] target/arm: Convert Neon VEXT to decodetree
  2020-06-11 14:45 ` [PATCH 08/10] target/arm: Convert Neon VEXT " Peter Maydell
@ 2020-06-11 16:26   ` Richard Henderson
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2020-06-11 16:26 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/11/20 7:45 AM, Peter Maydell wrote:
> Convert the Neon VEXT insn to decodetree. Rather than keeping the
> old implementation which used fixed temporaries cpu_V0 and cpu_V1
> and did the extraction with by-hand shift and logic ops, we use
> the TCG extract2 insn.
> 
> We don't need to special case 0 or 8 immediates any more as the
> optimizer is smart enough to throw away the dead code.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       |  8 +++-
>  target/arm/translate-neon.inc.c | 76 +++++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 58 +------------------------
>  3 files changed, 85 insertions(+), 57 deletions(-)

Nice cleanup.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 09/10] target/arm: Convert Neon VTBL, VTBX to decodetree
  2020-06-11 14:45 ` [PATCH 09/10] target/arm: Convert Neon VTBL, VTBX " Peter Maydell
@ 2020-06-11 16:43   ` Richard Henderson
  2020-06-11 16:45   ` Richard Henderson
  1 sibling, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2020-06-11 16:43 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/11/20 7:45 AM, Peter Maydell wrote:
> Convert the Neon VTBL, VTBX instructions to decodetree.  The actual
> implementation of the insn is copied across to the new trans function
> unchanged except for renaming 'tmp5' to 'tmp4'.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       |  3 ++
>  target/arm/translate-neon.inc.c | 56 +++++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 41 +++---------------------
>  3 files changed, 63 insertions(+), 37 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 09/10] target/arm: Convert Neon VTBL, VTBX to decodetree
  2020-06-11 14:45 ` [PATCH 09/10] target/arm: Convert Neon VTBL, VTBX " Peter Maydell
  2020-06-11 16:43   ` Richard Henderson
@ 2020-06-11 16:45   ` Richard Henderson
  1 sibling, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2020-06-11 16:45 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/11/20 7:45 AM, Peter Maydell wrote:
> +static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
> +{
> +    int n;
> +    TCGv_i32 tmp, tmp2, tmp3, tmp4;
> +    TCGv_ptr ptr1;
> +
> +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
> +        return false;
> +    }
> +
> +    /* UNDEF accesses to D16-D31 if they don't exist. */
> +    if (!dc_isar_feature(aa32_simd_r32, s) &&
> +        ((a->vd | a->vn | a->vm) & 0x10)) {
> +        return false;
> +    }
> +
> +    if (!vfp_access_check(s)) {
> +        return true;
> +    }
> +
> +    n = a->len + 1;
> +    if ((a->vn + n) > 32) {
> +        /*
> +         * This is UNPREDICTABLE; we choose to UNDEF to avoid the
> +         * helper function running off the end of the register file.
> +         */
> +        return 1;
> +    }

Oops, meant to point out: return false.


r~


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 10/10] target/arm: Convert Neon VDUP (scalar) to decodetree
  2020-06-11 14:45 ` [PATCH 10/10] target/arm: Convert Neon VDUP (scalar) " Peter Maydell
@ 2020-06-11 16:48   ` Richard Henderson
  0 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2020-06-11 16:48 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/11/20 7:45 AM, Peter Maydell wrote:
> Convert the Neon VDUP (scalar) insn to decodetree.  (Note that we
> can't call this just "VDUP" as we used that already in vfp.decode for
> the "VDUP (general purpose register" insn.)
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       |  7 +++++++
>  target/arm/translate-neon.inc.c | 26 ++++++++++++++++++++++++++
>  target/arm/translate.c          | 25 +------------------------
>  3 files changed, 34 insertions(+), 24 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 02/10] target/arm: Add missing TCG temp free in do_2shift_env_64()
  2020-06-11 15:43   ` Richard Henderson
@ 2020-06-11 17:05     ` Peter Maydell
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Maydell @ 2020-06-11 17:05 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Thu, 11 Jun 2020 at 16:43, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 6/11/20 7:45 AM, Peter Maydell wrote:
> > In commit 37bfce81b10450071 we accidentally introduced a leak of a TCG
> > temporary in do_2shift_env_64(); free it.
> >
> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> > ---
> > My test setup wasn't looking for temporary-leak warnings (they are
> > not as easy to get at as they used to be because they only appear
> > if you enable qemu_log tracing for some other purpose). This is the
> > only one that snuck through, though.
> > ---
> >  target/arm/translate-neon.inc.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
> > index 7c4888a80c9..f2c241a87e9 100644
> > --- a/target/arm/translate-neon.inc.c
> > +++ b/target/arm/translate-neon.inc.c
> > @@ -1329,6 +1329,7 @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
> >          neon_load_reg64(tmp, a->vm + pass);
> >          fn(tmp, cpu_env, tmp, constimm);
> >          neon_store_reg64(tmp, a->vd + pass);
> > +        tcg_temp_free_i64(tmp);
>
> Huh.  I thought all the a32 stores magically freed their inputs.  I guess
> that's just the general-purpose registers.

Confusingly, neon_load_reg()/neon_store_reg() do the "create
new TCG temp for the load, free the store input" thing, but
neon_load_reg64()/neon_store_reg64()/neon_load_reg32()/neon_store_reg32()
do not. I'd kind of like to get rid of this weird proliferation
of load and store functions, but neon_load_reg()/neon_store_reg() are
also the only ones which take a (regno, pass) pair of inputs,
so it's not a completely trivial substitution.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/10] target/arm: Convert 2-reg-scalar to decodetree
  2020-06-11 14:45 [PATCH 00/10] target/arm: Convert 2-reg-scalar to decodetree Peter Maydell
                   ` (9 preceding siblings ...)
  2020-06-11 14:45 ` [PATCH 10/10] target/arm: Convert Neon VDUP (scalar) " Peter Maydell
@ 2020-06-11 23:16 ` no-reply
  2020-06-11 23:42 ` no-reply
  11 siblings, 0 replies; 25+ messages in thread
From: no-reply @ 2020-06-11 23:16 UTC (permalink / raw)
  To: peter.maydell; +Cc: qemu-arm, richard.henderson, qemu-devel

Patchew URL: https://patchew.org/QEMU/20200611144529.8873-1-peter.maydell@linaro.org/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC      aarch64-softmmu/target/arm/helper-a64.o
  GEN     aarch64-softmmu/target/arm/decode-sve.inc.c
  CC      aarch64-softmmu/target/arm/sve_helper.o
/tmp/qemu-test/src/target/arm/neon-dp.decode:416: error: ('definition has 0 bits',)
  CC      aarch64-softmmu/target/arm/pauth_helper.o
make[1]: *** [target/arm/decode-neon-dp.inc.c] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [aarch64-softmmu/all] Error 2
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 665, in <module>
    sys.exit(main())
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=652e5daeb9d74258b3b1583fbf30fe7a', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-plvski0d/src/docker-src.2020-06-11-19.13.01.2188:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=652e5daeb9d74258b3b1583fbf30fe7a
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-plvski0d/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    3m6.470s
user    0m8.159s


The full log is available at
http://patchew.org/logs/20200611144529.8873-1-peter.maydell@linaro.org/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/10] target/arm: Convert 2-reg-scalar to decodetree
  2020-06-11 14:45 [PATCH 00/10] target/arm: Convert 2-reg-scalar to decodetree Peter Maydell
                   ` (10 preceding siblings ...)
  2020-06-11 23:16 ` [PATCH 00/10] target/arm: Convert 2-reg-scalar " no-reply
@ 2020-06-11 23:42 ` no-reply
  11 siblings, 0 replies; 25+ messages in thread
From: no-reply @ 2020-06-11 23:42 UTC (permalink / raw)
  To: peter.maydell; +Cc: qemu-arm, richard.henderson, qemu-devel

Patchew URL: https://patchew.org/QEMU/20200611144529.8873-1-peter.maydell@linaro.org/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC      aarch64-softmmu/target/arm/psci.o
  CC      aarch64-softmmu/target/arm/translate-a64.o
  CC      aarch64-softmmu/target/arm/helper-a64.o
/tmp/qemu-test/src/target/arm/neon-dp.decode:416: error: ('definition has 0 bits',)
make[1]: *** [/tmp/qemu-test/src/target/arm/Makefile.objs:27: target/arm/decode-neon-dp.inc.c] Error 1
make[1]: *** Waiting for unfinished jobs....
  GEN     aarch64-softmmu/target/arm/decode-sve.inc.c
  GEN     x86_64-softmmu/qemu-system-x86_64.exe
make: *** [Makefile:527: aarch64-softmmu/all] Error 2
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 665, in <module>
    sys.exit(main())
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=a093840cf7c04ee4bf8dfca6073b2aea', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-asai5w4d/src/docker-src.2020-06-11-19.39.07.24908:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=a093840cf7c04ee4bf8dfca6073b2aea
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-asai5w4d/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real    3m22.484s
user    0m8.758s


The full log is available at
http://patchew.org/logs/20200611144529.8873-1-peter.maydell@linaro.org/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-06-11 23:43 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-11 14:45 [PATCH 00/10] target/arm: Convert 2-reg-scalar to decodetree Peter Maydell
2020-06-11 14:45 ` [PATCH 01/10] target/arm: Add 'static' and 'const' annotations to VSHLL function arrays Peter Maydell
2020-06-11 15:41   ` Richard Henderson
2020-06-11 14:45 ` [PATCH 02/10] target/arm: Add missing TCG temp free in do_2shift_env_64() Peter Maydell
2020-06-11 15:43   ` Richard Henderson
2020-06-11 17:05     ` Peter Maydell
2020-06-11 14:45 ` [PATCH 03/10] target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree Peter Maydell
2020-06-11 16:09   ` Richard Henderson
2020-06-11 14:45 ` [PATCH 04/10] target/arm: Convert Neon 2-reg-scalar float " Peter Maydell
2020-06-11 16:14   ` Richard Henderson
2020-06-11 14:45 ` [PATCH 05/10] target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH " Peter Maydell
2020-06-11 16:15   ` Richard Henderson
2020-06-11 14:45 ` [PATCH 06/10] target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH " Peter Maydell
2020-06-11 16:19   ` Richard Henderson
2020-06-11 14:45 ` [PATCH 07/10] target/arm: Convert Neon 2-reg-scalar long multiplies " Peter Maydell
2020-06-11 16:25   ` Richard Henderson
2020-06-11 14:45 ` [PATCH 08/10] target/arm: Convert Neon VEXT " Peter Maydell
2020-06-11 16:26   ` Richard Henderson
2020-06-11 14:45 ` [PATCH 09/10] target/arm: Convert Neon VTBL, VTBX " Peter Maydell
2020-06-11 16:43   ` Richard Henderson
2020-06-11 16:45   ` Richard Henderson
2020-06-11 14:45 ` [PATCH 10/10] target/arm: Convert Neon VDUP (scalar) " Peter Maydell
2020-06-11 16:48   ` Richard Henderson
2020-06-11 23:16 ` [PATCH 00/10] target/arm: Convert 2-reg-scalar " no-reply
2020-06-11 23:42 ` no-reply

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.