qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree
@ 2020-06-09 16:02 Peter Maydell
  2020-06-09 16:02 ` [PATCH 1/7] target/arm: Fix missing temp frees in do_vshll_2sh Peter Maydell
                   ` (13 more replies)
  0 siblings, 14 replies; 23+ messages in thread
From: Peter Maydell @ 2020-06-09 16:02 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

This patchset converts the Neon insns in the "3 registers of different
lengths" group to decodetree. Patch 1 is a bugfix for an earlier
part of the conversion that's now in master.

I'm definitely finding that the new decodetree version of Neon
is often easier to understand because we no longer try to
accommodate multiple different kinds of widening/narrowing/etc
insns in a single multi-pass loop: expanding out the loop and
specializing it to the particular insn type helps a lot.
(Or maybe it's just that having to read the old code and write
the new version means I understand it better ;-))

Based-on: id:20200608183652.661386-1-richard.henderson@linaro.org
("[PATCH v3 0/9] decodetree: Add non-overlapping groups")
because we use the new group syntax to set up the structure
for the "size==0b11" vs "size!=0b11" decode which we'll fill
in in subsequent patchsets.

thanks
-- PMM

Peter Maydell (7):
  target/arm: Fix missing temp frees in do_vshll_2sh
  target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
  target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
  target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
  target/arm: Convert Neon 3-reg-diff long multiplies
  target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
  target/arm: Convert Neon 3-reg-diff polynomial VMULL

 target/arm/translate.h          |   1 +
 target/arm/neon-dp.decode       |  72 +++++
 target/arm/translate-neon.inc.c | 521 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 222 +-------------
 4 files changed, 597 insertions(+), 219 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/7] target/arm: Fix missing temp frees in do_vshll_2sh
  2020-06-09 16:02 [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree Peter Maydell
@ 2020-06-09 16:02 ` Peter Maydell
  2020-06-09 17:09   ` Richard Henderson
  2020-06-10  7:34   ` Philippe Mathieu-Daudé
  2020-06-09 16:02 ` [PATCH 2/7] target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree Peter Maydell
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 23+ messages in thread
From: Peter Maydell @ 2020-06-09 16:02 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

The widenfn() in do_vshll_2sh() does not free the input 32-bit
TCGv, so we need to do this in the calling code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.inc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 664d3612607..299a61f067b 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1624,6 +1624,7 @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
     tmp = tcg_temp_new_i64();
 
     widenfn(tmp, rm0);
+    tcg_temp_free_i32(rm0);
     if (a->shift != 0) {
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
@@ -1631,6 +1632,7 @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
     neon_store_reg64(tmp, a->vd);
 
     widenfn(tmp, rm1);
+    tcg_temp_free_i32(rm1);
     if (a->shift != 0) {
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/7] target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
  2020-06-09 16:02 [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree Peter Maydell
  2020-06-09 16:02 ` [PATCH 1/7] target/arm: Fix missing temp frees in do_vshll_2sh Peter Maydell
@ 2020-06-09 16:02 ` Peter Maydell
  2020-06-09 17:26   ` Richard Henderson
  2020-06-09 16:02 ` [PATCH 3/7] target/arm: Convert Neon 3-reg-diff narrowing " Peter Maydell
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Peter Maydell @ 2020-06-09 16:02 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the "pre-widening" insns VADDL, VSUBL, VADDW and VSUBW
in the Neon 3-registers-different-lengths group to decodetree.
These insns work by widening one or both inputs to double their
size, performing an add or subtract at the doubled size and
then storing the double-size result.

As usual, rather than copying the loop of the original decoder
(which needs awkward code to avoid problems when source and
destination registers overlap) we just unroll the two passes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  43 +++++++++++++
 target/arm/translate-neon.inc.c | 104 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  16 ++---
 3 files changed, 151 insertions(+), 12 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index bd1b0e13f7d..144a527ee65 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -397,3 +397,46 @@ VCVT_FU_2sh      1111 001 1 1 . ...... .... 1111 0 . . 1 .... @2reg_vcvt
 # So we have a single decode line and check the cmode/op in the
 # trans function.
 Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+
+######################################################################
+# Within the "two registers, or three registers of different lengths"
+# grouping ([23,4]=0b10), bits [21:20] are either part of the opcode
+# decode: 0b11 for VEXT, two-reg-misc, VTBL, and duplicate-scalar;
+# or they are a size field for the three-reg-different-lengths and
+# two-reg-and-scalar insn groups (where size cannot be 0b11). This
+# is slightly awkward for decodetree: we handle it with this
+# non-exclusive group which contains within it two exclusive groups:
+# one for the size=0b11 patterns, and one for the size-not-0b11
+# patterns. This allows us to check that none of the insns within
+# each subgroup accidentally overlap each other. Note that all the
+# trans functions for the size-not-0b11 patterns must check and
+# return false for size==3.
+######################################################################
+{
+  # 0b11 subgroup will go here
+
+  # Subgroup for size != 0b11
+  [
+    ##################################################################
+    # 3-reg-different-length grouping:
+    # 1111 001 U 1 D sz!=11 Vn:4 Vd:4 opc:4 N 0 M 0 Vm:4
+    ##################################################################
+
+    &3diff vm vn vd size
+
+    @3diff       .... ... . . . size:2 .... .... .... . . . . .... \
+                 &3diff vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+    VADDL_S_3d   1111 001 0 1 . .. .... .... 0000 . 0 . 0 .... @3diff
+    VADDL_U_3d   1111 001 1 1 . .. .... .... 0000 . 0 . 0 .... @3diff
+
+    VADDW_S_3d   1111 001 0 1 . .. .... .... 0001 . 0 . 0 .... @3diff
+    VADDW_U_3d   1111 001 1 1 . .. .... .... 0001 . 0 . 0 .... @3diff
+
+    VSUBL_S_3d   1111 001 0 1 . .. .... .... 0010 . 0 . 0 .... @3diff
+    VSUBL_U_3d   1111 001 1 1 . .. .... .... 0010 . 0 . 0 .... @3diff
+
+    VSUBW_S_3d   1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
+    VSUBW_U_3d   1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
+  ]
+}
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 299a61f067b..f0ec13e5a91 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1828,3 +1828,107 @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
     }
     return do_1reg_imm(s, a, fn);
 }
+
+static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
+                           NeonGenWidenFn *widenfn,
+                           NeonGenTwo64OpFn *opfn,
+                           bool src1_wide)
+{
+    /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
+    TCGv_i64 rn0_64, rn1_64, rm_64;
+    TCGv_i32 rm;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!widenfn || !opfn) {
+        /* size == 3 case, which is an entirely different insn group */
+        return false;
+    }
+
+    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    rn0_64 = tcg_temp_new_i64();
+    rn1_64 = tcg_temp_new_i64();
+    rm_64 = tcg_temp_new_i64();
+
+    if (src1_wide) {
+        neon_load_reg64(rn0_64, a->vn);
+    } else {
+        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
+        widenfn(rn0_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
+    rm = neon_load_reg(a->vm, 0);
+
+    widenfn(rm_64, rm);
+    tcg_temp_free_i32(rm);
+    opfn(rn0_64, rn0_64, rm_64);
+
+    /*
+     * Load second pass inputs before storing the first pass result, to
+     * avoid incorrect results if a narrow input overlaps with the result.
+     */
+    if (src1_wide) {
+        neon_load_reg64(rn1_64, a->vn + 1);
+    } else {
+        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
+        widenfn(rn1_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
+    rm = neon_load_reg(a->vm, 1);
+
+    neon_store_reg64(rn0_64, a->vd);
+
+    widenfn(rm_64, rm);
+    tcg_temp_free_i32(rm);
+    opfn(rn1_64, rn1_64, rm_64);
+    neon_store_reg64(rn1_64, a->vd + 1);
+
+    tcg_temp_free_i64(rn0_64);
+    tcg_temp_free_i64(rn1_64);
+    tcg_temp_free_i64(rm_64);
+
+    return true;
+}
+
+#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
+    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
+    {                                                                   \
+        NeonGenWidenFn *widenfn[] = {                                   \
+            gen_helper_neon_widen_##S##8,                               \
+            gen_helper_neon_widen_##S##16,                              \
+            tcg_gen_##EXT##_i32_i64,                                    \
+            NULL,                                                       \
+        };                                                              \
+        NeonGenTwo64OpFn *addfn[] = {                                   \
+            gen_helper_neon_##OP##l_u16,                                \
+            gen_helper_neon_##OP##l_u32,                                \
+            tcg_gen_##OP##_i64,                                         \
+            NULL,                                                       \
+        };                                                              \
+        return do_prewiden_3d(s, a, widenfn[a->size],                   \
+                              addfn[a->size], SRC1WIDE);                \
+    }
+
+DO_PREWIDEN(VADDL_S, s, ext, add, false)
+DO_PREWIDEN(VADDL_U, u, extu, add, false)
+DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
+DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
+DO_PREWIDEN(VADDW_S, s, ext, add, true)
+DO_PREWIDEN(VADDW_U, u, extu, add, true)
+DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
+DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index bcdfec34d28..93765344414 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5241,7 +5241,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 /* Three registers of different lengths.  */
                 int src1_wide;
                 int src2_wide;
-                int prewiden;
                 /* undefreq: bit 0 : UNDEF if size == 0
                  *           bit 1 : UNDEF if size == 1
                  *           bit 2 : UNDEF if size == 2
@@ -5251,10 +5250,10 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 int undefreq;
                 /* prewiden, src1_wide, src2_wide, undefreq */
                 static const int neon_3reg_wide[16][4] = {
-                    {1, 0, 0, 0}, /* VADDL */
-                    {1, 1, 0, 0}, /* VADDW */
-                    {1, 0, 0, 0}, /* VSUBL */
-                    {1, 1, 0, 0}, /* VSUBW */
+                    {0, 0, 0, 7}, /* VADDL: handled by decodetree */
+                    {0, 0, 0, 7}, /* VADDW: handled by decodetree */
+                    {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
+                    {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
                     {0, 1, 1, 0}, /* VADDHN */
                     {0, 0, 0, 0}, /* VABAL */
                     {0, 1, 1, 0}, /* VSUBHN */
@@ -5269,7 +5268,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
                 };
 
-                prewiden = neon_3reg_wide[op][0];
                 src1_wide = neon_3reg_wide[op][1];
                 src2_wide = neon_3reg_wide[op][2];
                 undefreq = neon_3reg_wide[op][3];
@@ -5322,9 +5320,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         } else {
                             tmp = neon_load_reg(rn, pass);
                         }
-                        if (prewiden) {
-                            gen_neon_widen(cpu_V0, tmp, size, u);
-                        }
                     }
                     if (src2_wide) {
                         neon_load_reg64(cpu_V1, rm + pass);
@@ -5335,9 +5330,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         } else {
                             tmp2 = neon_load_reg(rm, pass);
                         }
-                        if (prewiden) {
-                            gen_neon_widen(cpu_V1, tmp2, size, u);
-                        }
                     }
                     switch (op) {
                     case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/7] target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
  2020-06-09 16:02 [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree Peter Maydell
  2020-06-09 16:02 ` [PATCH 1/7] target/arm: Fix missing temp frees in do_vshll_2sh Peter Maydell
  2020-06-09 16:02 ` [PATCH 2/7] target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree Peter Maydell
@ 2020-06-09 16:02 ` Peter Maydell
  2020-06-09 17:40   ` Richard Henderson
  2020-06-09 16:02 ` [PATCH 4/7] target/arm: Convert Neon 3-reg-diff VABAL, VABDL " Peter Maydell
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Peter Maydell @ 2020-06-09 16:02 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the narrow-to-high-half insns VADDHN, VSUBHN, VRADDHN,
VRSUBHN in the Neon 3-registers-different-lengths group to
decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  6 +++
 target/arm/translate-neon.inc.c | 87 +++++++++++++++++++++++++++++++
 target/arm/translate.c          | 91 ++++-----------------------------
 3 files changed, 104 insertions(+), 80 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 144a527ee65..a2234dfa4f3 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -438,5 +438,11 @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
 
     VSUBW_S_3d   1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
     VSUBW_U_3d   1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
+
+    VADDHN_3d    1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
+    VRADDHN_3d   1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
+
+    VSUBHN_3d    1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
+    VRSUBHN_3d   1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
   ]
 }
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index f0ec13e5a91..0033cb7eb25 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -1932,3 +1932,90 @@ DO_PREWIDEN(VADDW_S, s, ext, add, true)
 DO_PREWIDEN(VADDW_U, u, extu, add, true)
 DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
 DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
+
+static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
+                         NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
+{
+    /* 3-regs different lengths, narrowing (VADDHN/VSUBHN/VRADDHN/VRSUBHN) */
+    TCGv_i64 rn_64, rm_64;
+    TCGv_i32 rd0, rd1;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!opfn || !narrowfn) {
+        /* size == 3 case, which is an entirely different insn group */
+        return false;
+    }
+
+    if ((a->vn | a->vm) & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    rn_64 = tcg_temp_new_i64();
+    rm_64 = tcg_temp_new_i64();
+    rd0 = tcg_temp_new_i32();
+    rd1 = tcg_temp_new_i32();
+
+    neon_load_reg64(rn_64, a->vn);
+    neon_load_reg64(rm_64, a->vm);
+
+    opfn(rn_64, rn_64, rm_64);
+
+    narrowfn(rd0, rn_64);
+
+    neon_load_reg64(rn_64, a->vn + 1);
+    neon_load_reg64(rm_64, a->vm + 1);
+
+    opfn(rn_64, rn_64, rm_64);
+
+    narrowfn(rd1, rn_64);
+
+    neon_store_reg(a->vd, 0, rd0);
+    neon_store_reg(a->vd, 1, rd1);
+
+    tcg_temp_free_i64(rn_64);
+    tcg_temp_free_i64(rm_64);
+
+    return true;
+}
+
+#define DO_NARROW_3D(INSN, OP, NARROWTYPE, EXTOP)                       \
+    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
+    {                                                                   \
+        NeonGenTwo64OpFn *addfn[] = {                                   \
+            gen_helper_neon_##OP##l_u16,                                \
+            gen_helper_neon_##OP##l_u32,                                \
+            tcg_gen_##OP##_i64,                                         \
+            NULL,                                                       \
+        };                                                              \
+        NeonGenNarrowFn *narrowfn[] = {                                 \
+            gen_helper_neon_##NARROWTYPE##_high_u8,                     \
+            gen_helper_neon_##NARROWTYPE##_high_u16,                    \
+            EXTOP,                                                      \
+            NULL,                                                       \
+        };                                                              \
+        return do_narrow_3d(s, a, addfn[a->size], narrowfn[a->size]);   \
+    }
+
+static void gen_narrow_round_high_u32(TCGv_i32 rd, TCGv_i64 rn)
+{
+    tcg_gen_addi_i64(rn, rn, 1u << 31);
+    tcg_gen_extrh_i64_i32(rd, rn);
+}
+
+DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
+DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
+DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
+DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 93765344414..3fe39cd4f49 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -3231,16 +3231,6 @@ static inline void gen_neon_addl(int size)
     }
 }
 
-static inline void gen_neon_subl(int size)
-{
-    switch (size) {
-    case 0: gen_helper_neon_subl_u16(CPU_V001); break;
-    case 1: gen_helper_neon_subl_u32(CPU_V001); break;
-    case 2: tcg_gen_sub_i64(CPU_V001); break;
-    default: abort();
-    }
-}
-
 static inline void gen_neon_negl(TCGv_i64 var, int size)
 {
     switch (size) {
@@ -5239,8 +5229,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             op = (insn >> 8) & 0xf;
             if ((insn & (1 << 6)) == 0) {
                 /* Three registers of different lengths.  */
-                int src1_wide;
-                int src2_wide;
                 /* undefreq: bit 0 : UNDEF if size == 0
                  *           bit 1 : UNDEF if size == 1
                  *           bit 2 : UNDEF if size == 2
@@ -5254,9 +5242,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* VADDW: handled by decodetree */
                     {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
                     {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
-                    {0, 1, 1, 0}, /* VADDHN */
+                    {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
                     {0, 0, 0, 0}, /* VABAL */
-                    {0, 1, 1, 0}, /* VSUBHN */
+                    {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
                     {0, 0, 0, 0}, /* VABDL */
                     {0, 0, 0, 0}, /* VMLAL */
                     {0, 0, 0, 9}, /* VQDMLAL */
@@ -5268,17 +5256,13 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
                 };
 
-                src1_wide = neon_3reg_wide[op][1];
-                src2_wide = neon_3reg_wide[op][2];
                 undefreq = neon_3reg_wide[op][3];
 
                 if ((undefreq & (1 << size)) ||
                     ((undefreq & 8) && u)) {
                     return 1;
                 }
-                if ((src1_wide && (rn & 1)) ||
-                    (src2_wide && (rm & 1)) ||
-                    (!src2_wide && (rd & 1))) {
+                if (rd & 1) {
                     return 1;
                 }
 
@@ -5302,42 +5286,26 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 /* Avoid overlapping operands.  Wide source operands are
                    always aligned so will never overlap with wide
                    destinations in problematic ways.  */
-                if (rd == rm && !src2_wide) {
+                if (rd == rm) {
                     tmp = neon_load_reg(rm, 1);
                     neon_store_scratch(2, tmp);
-                } else if (rd == rn && !src1_wide) {
+                } else if (rd == rn) {
                     tmp = neon_load_reg(rn, 1);
                     neon_store_scratch(2, tmp);
                 }
                 tmp3 = NULL;
                 for (pass = 0; pass < 2; pass++) {
-                    if (src1_wide) {
-                        neon_load_reg64(cpu_V0, rn + pass);
-                        tmp = NULL;
+                    if (pass == 1 && rd == rn) {
+                        tmp = neon_load_scratch(2);
                     } else {
-                        if (pass == 1 && rd == rn) {
-                            tmp = neon_load_scratch(2);
-                        } else {
-                            tmp = neon_load_reg(rn, pass);
-                        }
+                        tmp = neon_load_reg(rn, pass);
                     }
-                    if (src2_wide) {
-                        neon_load_reg64(cpu_V1, rm + pass);
-                        tmp2 = NULL;
+                    if (pass == 1 && rd == rm) {
+                        tmp2 = neon_load_scratch(2);
                     } else {
-                        if (pass == 1 && rd == rm) {
-                            tmp2 = neon_load_scratch(2);
-                        } else {
-                            tmp2 = neon_load_reg(rm, pass);
-                        }
+                        tmp2 = neon_load_reg(rm, pass);
                     }
                     switch (op) {
-                    case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
-                        gen_neon_addl(size);
-                        break;
-                    case 2: case 3: case 6: /* VSUBL, VSUBW, VSUBHN, VRSUBHN */
-                        gen_neon_subl(size);
-                        break;
                     case 5: case 7: /* VABAL, VABDL */
                         switch ((size << 1) | u) {
                         case 0:
@@ -5395,43 +5363,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             abort();
                         }
                         neon_store_reg64(cpu_V0, rd + pass);
-                    } else if (op == 4 || op == 6) {
-                        /* Narrowing operation.  */
-                        tmp = tcg_temp_new_i32();
-                        if (!u) {
-                            switch (size) {
-                            case 0:
-                                gen_helper_neon_narrow_high_u8(tmp, cpu_V0);
-                                break;
-                            case 1:
-                                gen_helper_neon_narrow_high_u16(tmp, cpu_V0);
-                                break;
-                            case 2:
-                                tcg_gen_extrh_i64_i32(tmp, cpu_V0);
-                                break;
-                            default: abort();
-                            }
-                        } else {
-                            switch (size) {
-                            case 0:
-                                gen_helper_neon_narrow_round_high_u8(tmp, cpu_V0);
-                                break;
-                            case 1:
-                                gen_helper_neon_narrow_round_high_u16(tmp, cpu_V0);
-                                break;
-                            case 2:
-                                tcg_gen_addi_i64(cpu_V0, cpu_V0, 1u << 31);
-                                tcg_gen_extrh_i64_i32(tmp, cpu_V0);
-                                break;
-                            default: abort();
-                            }
-                        }
-                        if (pass == 0) {
-                            tmp3 = tmp;
-                        } else {
-                            neon_store_reg(rd, 0, tmp3);
-                            neon_store_reg(rd, 1, tmp);
-                        }
                     } else {
                         /* Write back the result.  */
                         neon_store_reg64(cpu_V0, rd + pass);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/7] target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
  2020-06-09 16:02 [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree Peter Maydell
                   ` (2 preceding siblings ...)
  2020-06-09 16:02 ` [PATCH 3/7] target/arm: Convert Neon 3-reg-diff narrowing " Peter Maydell
@ 2020-06-09 16:02 ` Peter Maydell
  2020-06-09 18:16   ` Richard Henderson
  2020-06-09 16:02 ` [PATCH 5/7] target/arm: Convert Neon 3-reg-diff long multiplies Peter Maydell
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Peter Maydell @ 2020-06-09 16:02 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon 3-reg-diff insns VABAL and VABDL to decodetree.
Like almost all the remaining insns in this group, these are
a combination of a two-input operation which returns a double width
result and then a possible accumulation of that double width
result into the destination.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.h          |   1 +
 target/arm/neon-dp.decode       |   6 ++
 target/arm/translate-neon.inc.c | 132 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  31 +-------
 4 files changed, 142 insertions(+), 28 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index c937dfe9bf0..62ed5c4780c 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -371,6 +371,7 @@ typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
 typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
 typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
 typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
+typedef void NeonGenTwoOpWidenFn(TCGv_i64, TCGv_i32, TCGv_i32);
 typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
 typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
 typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index a2234dfa4f3..4f0aaaf6bb2 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -442,7 +442,13 @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
     VADDHN_3d    1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
     VRADDHN_3d   1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
 
+    VABAL_S_3d   1111 001 0 1 . .. .... .... 0101 . 0 . 0 .... @3diff
+    VABAL_U_3d   1111 001 1 1 . .. .... .... 0101 . 0 . 0 .... @3diff
+
     VSUBHN_3d    1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
     VRSUBHN_3d   1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
+
+    VABDL_S_3d   1111 001 0 1 . .. .... .... 0111 . 0 . 0 .... @3diff
+    VABDL_U_3d   1111 001 1 1 . .. .... .... 0111 . 0 . 0 .... @3diff
   ]
 }
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 0033cb7eb25..fd85ff5ea50 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -2019,3 +2019,135 @@ DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
 DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
 DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
 DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
+
+static bool do_long_3d(DisasContext *s, arg_3diff *a,
+                       NeonGenTwoOpWidenFn *opfn,
+                       NeonGenTwo64OpFn *accfn)
+{
+    /*
+     * 3-regs different lengths, long operations.
+     * These perform an operation on two inputs that returns a double-width
+     * result, and then possibly perform an accumulation operation of
+     * that result into the double-width destination.
+     */
+    TCGv_i64 rd0, rd1, tmp;
+    TCGv_i32 rn, rm;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!opfn) {
+        /* size == 3 case, which is an entirely different insn group */
+        return false;
+    }
+
+    if (a->vd & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    rd0 = tcg_temp_new_i64();
+    rd1 = tcg_temp_new_i64();
+
+    rn = neon_load_reg(a->vn, 0);
+    rm = neon_load_reg(a->vm, 0);
+    opfn(rd0, rn, rm);
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(rm);
+
+    rn = neon_load_reg(a->vn, 1);
+    rm = neon_load_reg(a->vm, 1);
+    opfn(rd1, rn, rm);
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(rm);
+
+    /* Don't store results until after all loads: they might overlap */
+    if (accfn) {
+        tmp = tcg_temp_new_i64();
+        neon_load_reg64(tmp, a->vd);
+        accfn(tmp, tmp, rd0);
+        neon_store_reg64(tmp, a->vd);
+        neon_load_reg64(tmp, a->vd + 1);
+        accfn(tmp, tmp, rd1);
+        neon_store_reg64(tmp, a->vd + 1);
+        tcg_temp_free_i64(tmp);
+    } else {
+        neon_store_reg64(rd0, a->vd);
+        neon_store_reg64(rd1, a->vd + 1);
+    }
+
+    tcg_temp_free_i64(rd0);
+    tcg_temp_free_i64(rd1);
+
+    return true;
+}
+
+static bool trans_VABDL_S_3d(DisasContext *s, arg_3diff *a)
+{
+    NeonGenTwoOpWidenFn *opfn[] = {
+        gen_helper_neon_abdl_s16,
+        gen_helper_neon_abdl_s32,
+        gen_helper_neon_abdl_s64,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VABDL_U_3d(DisasContext *s, arg_3diff *a)
+{
+    NeonGenTwoOpWidenFn *opfn[] = {
+        gen_helper_neon_abdl_u16,
+        gen_helper_neon_abdl_u32,
+        gen_helper_neon_abdl_u64,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VABAL_S_3d(DisasContext *s, arg_3diff *a)
+{
+    NeonGenTwoOpWidenFn *opfn[] = {
+        gen_helper_neon_abdl_s16,
+        gen_helper_neon_abdl_s32,
+        gen_helper_neon_abdl_s64,
+        NULL,
+    };
+    NeonGenTwo64OpFn *addfn[] = {
+        gen_helper_neon_addl_u16,
+        gen_helper_neon_addl_u32,
+        tcg_gen_add_i64,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
+}
+
+static bool trans_VABAL_U_3d(DisasContext *s, arg_3diff *a)
+{
+    NeonGenTwoOpWidenFn *opfn[] = {
+        gen_helper_neon_abdl_u16,
+        gen_helper_neon_abdl_u32,
+        gen_helper_neon_abdl_u64,
+        NULL,
+    };
+    NeonGenTwo64OpFn *addfn[] = {
+        gen_helper_neon_addl_u16,
+        gen_helper_neon_addl_u32,
+        tcg_gen_add_i64,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 3fe39cd4f49..37fe9d46e0b 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5243,9 +5243,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
                     {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
                     {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
-                    {0, 0, 0, 0}, /* VABAL */
+                    {0, 0, 0, 7}, /* VABAL */
                     {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
-                    {0, 0, 0, 0}, /* VABDL */
+                    {0, 0, 0, 7}, /* VABDL */
                     {0, 0, 0, 0}, /* VMLAL */
                     {0, 0, 0, 9}, /* VQDMLAL */
                     {0, 0, 0, 0}, /* VMLSL */
@@ -5306,31 +5306,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         tmp2 = neon_load_reg(rm, pass);
                     }
                     switch (op) {
-                    case 5: case 7: /* VABAL, VABDL */
-                        switch ((size << 1) | u) {
-                        case 0:
-                            gen_helper_neon_abdl_s16(cpu_V0, tmp, tmp2);
-                            break;
-                        case 1:
-                            gen_helper_neon_abdl_u16(cpu_V0, tmp, tmp2);
-                            break;
-                        case 2:
-                            gen_helper_neon_abdl_s32(cpu_V0, tmp, tmp2);
-                            break;
-                        case 3:
-                            gen_helper_neon_abdl_u32(cpu_V0, tmp, tmp2);
-                            break;
-                        case 4:
-                            gen_helper_neon_abdl_s64(cpu_V0, tmp, tmp2);
-                            break;
-                        case 5:
-                            gen_helper_neon_abdl_u64(cpu_V0, tmp, tmp2);
-                            break;
-                        default: abort();
-                        }
-                        tcg_temp_free_i32(tmp2);
-                        tcg_temp_free_i32(tmp);
-                        break;
                     case 8: case 9: case 10: case 11: case 12: case 13:
                         /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
                         gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
@@ -5349,7 +5324,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         case 10: /* VMLSL */
                             gen_neon_negl(cpu_V0, size);
                             /* Fall through */
-                        case 5: case 8: /* VABAL, VMLAL */
+                        case 8: /* VABAL, VMLAL */
                             gen_neon_addl(size);
                             break;
                         case 9: case 11: /* VQDMLAL, VQDMLSL */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/7] target/arm: Convert Neon 3-reg-diff long multiplies
  2020-06-09 16:02 [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree Peter Maydell
                   ` (3 preceding siblings ...)
  2020-06-09 16:02 ` [PATCH 4/7] target/arm: Convert Neon 3-reg-diff VABAL, VABDL " Peter Maydell
@ 2020-06-09 16:02 ` Peter Maydell
  2020-06-09 18:26   ` Richard Henderson
  2020-06-09 16:02 ` [PATCH 6/7] target/arm: Convert Neon 3-reg-diff saturating doubling multiplies Peter Maydell
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Peter Maydell @ 2020-06-09 16:02 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon 3-reg-diff insns VMULL, VMLAL and VMLSL; these perform
a 32x32->64 multiply with possible accumulate.

Note that for VMLSL we do the accumulate directly with a subtraction
rather than doing a negate-then-add as the old code did.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  9 +++++
 target/arm/translate-neon.inc.c | 71 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 21 +++-------
 3 files changed, 86 insertions(+), 15 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 4f0aaaf6bb2..1da492df146 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -450,5 +450,14 @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
 
     VABDL_S_3d   1111 001 0 1 . .. .... .... 0111 . 0 . 0 .... @3diff
     VABDL_U_3d   1111 001 1 1 . .. .... .... 0111 . 0 . 0 .... @3diff
+
+    VMLAL_S_3d   1111 001 0 1 . .. .... .... 1000 . 0 . 0 .... @3diff
+    VMLAL_U_3d   1111 001 1 1 . .. .... .... 1000 . 0 . 0 .... @3diff
+
+    VMLSL_S_3d   1111 001 0 1 . .. .... .... 1010 . 0 . 0 .... @3diff
+    VMLSL_U_3d   1111 001 1 1 . .. .... .... 1010 . 0 . 0 .... @3diff
+
+    VMULL_S_3d   1111 001 0 1 . .. .... .... 1100 . 0 . 0 .... @3diff
+    VMULL_U_3d   1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
   ]
 }
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index fd85ff5ea50..00a779c65a0 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -2151,3 +2151,74 @@ static bool trans_VABAL_U_3d(DisasContext *s, arg_3diff *a)
 
     return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
 }
+
+static void gen_mull_s32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
+{
+    TCGv_i32 lo = tcg_temp_new_i32();
+    TCGv_i32 hi = tcg_temp_new_i32();
+
+    tcg_gen_muls2_i32(lo, hi, rn, rm);
+    tcg_gen_concat_i32_i64(rd, lo, hi);
+
+    tcg_temp_free_i32(lo);
+    tcg_temp_free_i32(hi);
+}
+
+static void gen_mull_u32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
+{
+    TCGv_i32 lo = tcg_temp_new_i32();
+    TCGv_i32 hi = tcg_temp_new_i32();
+
+    tcg_gen_mulu2_i32(lo, hi, rn, rm);
+    tcg_gen_concat_i32_i64(rd, lo, hi);
+
+    tcg_temp_free_i32(lo);
+    tcg_temp_free_i32(hi);
+}
+
+static bool trans_VMULL_S_3d(DisasContext *s, arg_3diff *a)
+{
+    NeonGenTwoOpWidenFn *opfn[] = {
+        gen_helper_neon_mull_s8,
+        gen_helper_neon_mull_s16,
+        gen_mull_s32,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VMULL_U_3d(DisasContext *s, arg_3diff *a)
+{
+    NeonGenTwoOpWidenFn *opfn[] = {
+        gen_helper_neon_mull_u8,
+        gen_helper_neon_mull_u16,
+        gen_mull_u32,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], NULL);
+}
+
+#define DO_VMLAL(INSN,MULL,ACC)                                         \
+    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
+    {                                                                   \
+        NeonGenTwoOpWidenFn *opfn[] = {                                 \
+            gen_helper_neon_##MULL##8,                                  \
+            gen_helper_neon_##MULL##16,                                 \
+            gen_##MULL##32,                                             \
+            NULL,                                                       \
+        };                                                              \
+        NeonGenTwo64OpFn *accfn[] = {                                   \
+            gen_helper_neon_##ACC##l_u16,                               \
+            gen_helper_neon_##ACC##l_u32,                               \
+            tcg_gen_##ACC##_i64,                                        \
+            NULL,                                                       \
+        };                                                              \
+        return do_long_3d(s, a, opfn[a->size], accfn[a->size]);         \
+    }
+
+DO_VMLAL(VMLAL_S,mull_s,add)
+DO_VMLAL(VMLAL_U,mull_u,add)
+DO_VMLAL(VMLSL_S,mull_s,sub)
+DO_VMLAL(VMLSL_U,mull_u,sub)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 37fe9d46e0b..a2c47d19f21 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5246,11 +5246,11 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* VABAL */
                     {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
                     {0, 0, 0, 7}, /* VABDL */
-                    {0, 0, 0, 0}, /* VMLAL */
+                    {0, 0, 0, 7}, /* VMLAL */
                     {0, 0, 0, 9}, /* VQDMLAL */
-                    {0, 0, 0, 0}, /* VMLSL */
+                    {0, 0, 0, 7}, /* VMLSL */
                     {0, 0, 0, 9}, /* VQDMLSL */
-                    {0, 0, 0, 0}, /* Integer VMULL */
+                    {0, 0, 0, 7}, /* Integer VMULL */
                     {0, 0, 0, 9}, /* VQDMULL */
                     {0, 0, 0, 0xa}, /* Polynomial VMULL */
                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
@@ -5306,8 +5306,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         tmp2 = neon_load_reg(rm, pass);
                     }
                     switch (op) {
-                    case 8: case 9: case 10: case 11: case 12: case 13:
-                        /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
+                    case 9: case 11: case 13:
+                        /* VQDMLAL, VQDMLSL, VQDMULL */
                         gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
                         break;
                     default: /* 15 is RESERVED: caught earlier  */
@@ -5317,16 +5317,10 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         /* VQDMULL */
                         gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
                         neon_store_reg64(cpu_V0, rd + pass);
-                    } else if (op == 5 || (op >= 8 && op <= 11)) {
+                    } else {
                         /* Accumulate.  */
                         neon_load_reg64(cpu_V1, rd + pass);
                         switch (op) {
-                        case 10: /* VMLSL */
-                            gen_neon_negl(cpu_V0, size);
-                            /* Fall through */
-                        case 8: /* VABAL, VMLAL */
-                            gen_neon_addl(size);
-                            break;
                         case 9: case 11: /* VQDMLAL, VQDMLSL */
                             gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
                             if (op == 11) {
@@ -5338,9 +5332,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             abort();
                         }
                         neon_store_reg64(cpu_V0, rd + pass);
-                    } else {
-                        /* Write back the result.  */
-                        neon_store_reg64(cpu_V0, rd + pass);
                     }
                 }
             } else {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/7] target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
  2020-06-09 16:02 [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree Peter Maydell
                   ` (4 preceding siblings ...)
  2020-06-09 16:02 ` [PATCH 5/7] target/arm: Convert Neon 3-reg-diff long multiplies Peter Maydell
@ 2020-06-09 16:02 ` Peter Maydell
  2020-06-09 18:45   ` Richard Henderson
  2020-06-09 16:02 ` [PATCH 7/7] target/arm: Convert Neon 3-reg-diff polynomial VMULL Peter Maydell
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Peter Maydell @ 2020-06-09 16:02 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon 3-reg-diff insns VQDMULL, VQDMLAL and VQDMLSL:
these are all saturating doubling long multiplies with a possible
accumulate step.

These are the last insns in the group which use the pass-over-each
elements loop, so we can delete that code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  6 +++
 target/arm/translate-neon.inc.c | 82 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 59 ++----------------------
 3 files changed, 92 insertions(+), 55 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 1da492df146..65ea30d3edf 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -454,10 +454,16 @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
     VMLAL_S_3d   1111 001 0 1 . .. .... .... 1000 . 0 . 0 .... @3diff
     VMLAL_U_3d   1111 001 1 1 . .. .... .... 1000 . 0 . 0 .... @3diff
 
+    VQDMLAL_3d   1111 001 0 1 . .. .... .... 1001 . 0 . 0 .... @3diff
+
     VMLSL_S_3d   1111 001 0 1 . .. .... .... 1010 . 0 . 0 .... @3diff
     VMLSL_U_3d   1111 001 1 1 . .. .... .... 1010 . 0 . 0 .... @3diff
 
+    VQDMLSL_3d   1111 001 0 1 . .. .... .... 1011 . 0 . 0 .... @3diff
+
     VMULL_S_3d   1111 001 0 1 . .. .... .... 1100 . 0 . 0 .... @3diff
     VMULL_U_3d   1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
+
+    VQDMULL_3d   1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
   ]
 }
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 00a779c65a0..5965b5ed845 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -2222,3 +2222,85 @@ DO_VMLAL(VMLAL_S,mull_s,add)
 DO_VMLAL(VMLAL_U,mull_u,add)
 DO_VMLAL(VMLSL_S,mull_s,sub)
 DO_VMLAL(VMLSL_U,mull_u,sub)
+
+static void gen_VQDMULL_16(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
+{
+    gen_helper_neon_mull_s16(rd, rn, rm);
+    gen_helper_neon_addl_saturate_s32(rd, cpu_env, rd, rd);
+}
+
+static void gen_VQDMULL_32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
+{
+    gen_mull_s32(rd, rn, rm);
+    gen_helper_neon_addl_saturate_s64(rd, cpu_env, rd, rd);
+}
+
+static bool trans_VQDMULL_3d(DisasContext *s, arg_3diff *a)
+{
+    NeonGenTwoOpWidenFn *opfn[] = {
+        NULL,
+        gen_VQDMULL_16,
+        gen_VQDMULL_32,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], NULL);
+}
+
+static void gen_VQDMLAL_acc_16(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
+{
+    gen_helper_neon_addl_saturate_s32(rd, cpu_env, rn, rm);
+}
+
+static void gen_VQDMLAL_acc_32(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
+{
+    gen_helper_neon_addl_saturate_s64(rd, cpu_env, rn, rm);
+}
+
+static bool trans_VQDMLAL_3d(DisasContext *s, arg_3diff *a)
+{
+    NeonGenTwoOpWidenFn *opfn[] = {
+        NULL,
+        gen_VQDMULL_16,
+        gen_VQDMULL_32,
+        NULL,
+    };
+    NeonGenTwo64OpFn *accfn[] = {
+        NULL,
+        gen_VQDMLAL_acc_16,
+        gen_VQDMLAL_acc_32,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
+}
+
+static void gen_VQDMLSL_acc_16(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
+{
+    gen_helper_neon_negl_u32(rm, rm);
+    gen_helper_neon_addl_saturate_s32(rd, cpu_env, rn, rm);
+}
+
+static void gen_VQDMLSL_acc_32(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
+{
+    tcg_gen_neg_i64(rm, rm);
+    gen_helper_neon_addl_saturate_s64(rd, cpu_env, rn, rm);
+}
+
+static bool trans_VQDMLSL_3d(DisasContext *s, arg_3diff *a)
+{
+    NeonGenTwoOpWidenFn *opfn[] = {
+        NULL,
+        gen_VQDMULL_16,
+        gen_VQDMULL_32,
+        NULL,
+    };
+    NeonGenTwo64OpFn *accfn[] = {
+        NULL,
+        gen_VQDMLSL_acc_16,
+        gen_VQDMLSL_acc_32,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index a2c47d19f21..88e91845c02 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5247,11 +5247,11 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
                     {0, 0, 0, 7}, /* VABDL */
                     {0, 0, 0, 7}, /* VMLAL */
-                    {0, 0, 0, 9}, /* VQDMLAL */
+                    {0, 0, 0, 7}, /* VQDMLAL */
                     {0, 0, 0, 7}, /* VMLSL */
-                    {0, 0, 0, 9}, /* VQDMLSL */
+                    {0, 0, 0, 7}, /* VQDMLSL */
                     {0, 0, 0, 7}, /* Integer VMULL */
-                    {0, 0, 0, 9}, /* VQDMULL */
+                    {0, 0, 0, 7}, /* VQDMULL */
                     {0, 0, 0, 0xa}, /* Polynomial VMULL */
                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
                 };
@@ -5282,58 +5282,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     }
                     return 0;
                 }
-
-                /* Avoid overlapping operands.  Wide source operands are
-                   always aligned so will never overlap with wide
-                   destinations in problematic ways.  */
-                if (rd == rm) {
-                    tmp = neon_load_reg(rm, 1);
-                    neon_store_scratch(2, tmp);
-                } else if (rd == rn) {
-                    tmp = neon_load_reg(rn, 1);
-                    neon_store_scratch(2, tmp);
-                }
-                tmp3 = NULL;
-                for (pass = 0; pass < 2; pass++) {
-                    if (pass == 1 && rd == rn) {
-                        tmp = neon_load_scratch(2);
-                    } else {
-                        tmp = neon_load_reg(rn, pass);
-                    }
-                    if (pass == 1 && rd == rm) {
-                        tmp2 = neon_load_scratch(2);
-                    } else {
-                        tmp2 = neon_load_reg(rm, pass);
-                    }
-                    switch (op) {
-                    case 9: case 11: case 13:
-                        /* VQDMLAL, VQDMLSL, VQDMULL */
-                        gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
-                        break;
-                    default: /* 15 is RESERVED: caught earlier  */
-                        abort();
-                    }
-                    if (op == 13) {
-                        /* VQDMULL */
-                        gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
-                        neon_store_reg64(cpu_V0, rd + pass);
-                    } else {
-                        /* Accumulate.  */
-                        neon_load_reg64(cpu_V1, rd + pass);
-                        switch (op) {
-                        case 9: case 11: /* VQDMLAL, VQDMLSL */
-                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
-                            if (op == 11) {
-                                gen_neon_negl(cpu_V0, size);
-                            }
-                            gen_neon_addl_saturate(cpu_V0, cpu_V1, size);
-                            break;
-                        default:
-                            abort();
-                        }
-                        neon_store_reg64(cpu_V0, rd + pass);
-                    }
-                }
+                abort(); /* all others handled by decodetree */
             } else {
                 /* Two registers and a scalar. NB that for ops of this form
                  * the ARM ARM labels bit 24 as Q, but it is in our variable
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/7] target/arm: Convert Neon 3-reg-diff polynomial VMULL
  2020-06-09 16:02 [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree Peter Maydell
                   ` (5 preceding siblings ...)
  2020-06-09 16:02 ` [PATCH 6/7] target/arm: Convert Neon 3-reg-diff saturating doubling multiplies Peter Maydell
@ 2020-06-09 16:02 ` Peter Maydell
  2020-06-09 18:56   ` Richard Henderson
  2020-06-09 17:01 ` [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree no-reply
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Peter Maydell @ 2020-06-09 16:02 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Convert the Neon 3-reg-diff insn polynomial VMULL. This is the last
insn in this group to be converted.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/neon-dp.decode       |  2 ++
 target/arm/translate-neon.inc.c | 43 +++++++++++++++++++++++
 target/arm/translate.c          | 60 ++-------------------------------
 3 files changed, 48 insertions(+), 57 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 65ea30d3edf..ed49726abf5 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -465,5 +465,7 @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
     VMULL_U_3d   1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
 
     VQDMULL_3d   1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
+
+    VMULL_P_3d   1111 001 0 1 . .. .... .... 1110 . 0 . 0 .... @3diff
   ]
 }
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 5965b5ed845..7dbbfaaac41 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -2304,3 +2304,46 @@ static bool trans_VQDMLSL_3d(DisasContext *s, arg_3diff *a)
 
     return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
 }
+
+static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
+{
+    gen_helper_gvec_3 *fn_gvec;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (a->vd & 1) {
+        return false;
+    }
+
+    switch (a->size) {
+    case 0:
+        fn_gvec = gen_helper_neon_pmull_h;
+        break;
+    case 2:
+        if (!dc_isar_feature(aa32_pmull, s)) {
+            return false;
+        }
+        fn_gvec = gen_helper_gvec_pmull_q;
+        break;
+    default:
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tcg_gen_gvec_3_ool(neon_reg_offset(a->vd, 0),
+                       neon_reg_offset(a->vn, 0),
+                       neon_reg_offset(a->vm, 0),
+                       16, 16, 0, fn_gvec);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 88e91845c02..f459fad8646 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5181,7 +5181,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
 {
     int op;
     int q;
-    int rd, rn, rm, rd_ofs, rn_ofs, rm_ofs;
+    int rd, rn, rm, rd_ofs, rm_ofs;
     int size;
     int pass;
     int u;
@@ -5215,7 +5215,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     size = (insn >> 20) & 3;
     vec_size = q ? 16 : 8;
     rd_ofs = neon_reg_offset(rd, 0);
-    rn_ofs = neon_reg_offset(rn, 0);
     rm_ofs = neon_reg_offset(rm, 0);
 
     if ((insn & (1 << 23)) == 0) {
@@ -5228,61 +5227,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         if (size != 3) {
             op = (insn >> 8) & 0xf;
             if ((insn & (1 << 6)) == 0) {
-                /* Three registers of different lengths.  */
-                /* undefreq: bit 0 : UNDEF if size == 0
-                 *           bit 1 : UNDEF if size == 1
-                 *           bit 2 : UNDEF if size == 2
-                 *           bit 3 : UNDEF if U == 1
-                 * Note that [2:0] set implies 'always UNDEF'
-                 */
-                int undefreq;
-                /* prewiden, src1_wide, src2_wide, undefreq */
-                static const int neon_3reg_wide[16][4] = {
-                    {0, 0, 0, 7}, /* VADDL: handled by decodetree */
-                    {0, 0, 0, 7}, /* VADDW: handled by decodetree */
-                    {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
-                    {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
-                    {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
-                    {0, 0, 0, 7}, /* VABAL */
-                    {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
-                    {0, 0, 0, 7}, /* VABDL */
-                    {0, 0, 0, 7}, /* VMLAL */
-                    {0, 0, 0, 7}, /* VQDMLAL */
-                    {0, 0, 0, 7}, /* VMLSL */
-                    {0, 0, 0, 7}, /* VQDMLSL */
-                    {0, 0, 0, 7}, /* Integer VMULL */
-                    {0, 0, 0, 7}, /* VQDMULL */
-                    {0, 0, 0, 0xa}, /* Polynomial VMULL */
-                    {0, 0, 0, 7}, /* Reserved: always UNDEF */
-                };
-
-                undefreq = neon_3reg_wide[op][3];
-
-                if ((undefreq & (1 << size)) ||
-                    ((undefreq & 8) && u)) {
-                    return 1;
-                }
-                if (rd & 1) {
-                    return 1;
-                }
-
-                /* Handle polynomial VMULL in a single pass.  */
-                if (op == 14) {
-                    if (size == 0) {
-                        /* VMULL.P8 */
-                        tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
-                                           0, gen_helper_neon_pmull_h);
-                    } else {
-                        /* VMULL.P64 */
-                        if (!dc_isar_feature(aa32_pmull, s)) {
-                            return 1;
-                        }
-                        tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
-                                           0, gen_helper_gvec_pmull_q);
-                    }
-                    return 0;
-                }
-                abort(); /* all others handled by decodetree */
+                /* Three registers of different lengths: handled by decodetree */
+                return 1;
             } else {
                 /* Two registers and a scalar. NB that for ops of this form
                  * the ARM ARM labels bit 24 as Q, but it is in our variable
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree
  2020-06-09 16:02 [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree Peter Maydell
                   ` (6 preceding siblings ...)
  2020-06-09 16:02 ` [PATCH 7/7] target/arm: Convert Neon 3-reg-diff polynomial VMULL Peter Maydell
@ 2020-06-09 17:01 ` no-reply
  2020-06-09 17:29 ` no-reply
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: no-reply @ 2020-06-09 17:01 UTC (permalink / raw)
  To: peter.maydell; +Cc: qemu-arm, richard.henderson, qemu-devel

Patchew URL: https://patchew.org/QEMU/20200609160209.29960-1-peter.maydell@linaro.org/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC      aarch64-softmmu/target/arm/pauth_helper.o
  GEN     trace/generated-helpers.c
  CC      aarch64-softmmu/trace/control-target.o
/tmp/qemu-test/src/target/arm/neon-dp.decode:419: error: ('definition has 0 bits',)
make[1]: *** [target/arm/decode-neon-dp.inc.c] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [aarch64-softmmu/all] Error 2
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 665, in <module>
    sys.exit(main())
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=44908a20ab39404aacf5b8640f6c781e', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-onpovaq5/src/docker-src.2020-06-09-12.57.59.30607:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=44908a20ab39404aacf5b8640f6c781e
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-onpovaq5/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    3m15.295s
user    0m8.538s


The full log is available at
http://patchew.org/logs/20200609160209.29960-1-peter.maydell@linaro.org/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] target/arm: Fix missing temp frees in do_vshll_2sh
  2020-06-09 16:02 ` [PATCH 1/7] target/arm: Fix missing temp frees in do_vshll_2sh Peter Maydell
@ 2020-06-09 17:09   ` Richard Henderson
  2020-06-10  7:34   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2020-06-09 17:09 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/9/20 9:02 AM, Peter Maydell wrote:
> The widenfn() in do_vshll_2sh() does not free the input 32-bit
> TCGv, so we need to do this in the calling code.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

> ---
>  target/arm/translate-neon.inc.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
> index 664d3612607..299a61f067b 100644
> --- a/target/arm/translate-neon.inc.c
> +++ b/target/arm/translate-neon.inc.c
> @@ -1624,6 +1624,7 @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
>      tmp = tcg_temp_new_i64();
>  
>      widenfn(tmp, rm0);
> +    tcg_temp_free_i32(rm0);
>      if (a->shift != 0) {
>          tcg_gen_shli_i64(tmp, tmp, a->shift);
>          tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
> @@ -1631,6 +1632,7 @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
>      neon_store_reg64(tmp, a->vd);
>  
>      widenfn(tmp, rm1);
> +    tcg_temp_free_i32(rm1);
>      if (a->shift != 0) {
>          tcg_gen_shli_i64(tmp, tmp, a->shift);
>          tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/7] target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
  2020-06-09 16:02 ` [PATCH 2/7] target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree Peter Maydell
@ 2020-06-09 17:26   ` Richard Henderson
  0 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2020-06-09 17:26 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/9/20 9:02 AM, Peter Maydell wrote:
 +#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
> +    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
> +    {                                                                   \
> +        NeonGenWidenFn *widenfn[] = {                                   \
> +            gen_helper_neon_widen_##S##8,                               \
> +            gen_helper_neon_widen_##S##16,                              \
> +            tcg_gen_##EXT##_i32_i64,                                    \
> +            NULL,                                                       \
> +        };                                                              \
> +        NeonGenTwo64OpFn *addfn[] = {                                   \

Missing const here.

Actually, patch 1 made me look back and we missed it, and static, in
trans_VSHLL_*_2sh.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree
  2020-06-09 16:02 [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree Peter Maydell
                   ` (7 preceding siblings ...)
  2020-06-09 17:01 ` [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree no-reply
@ 2020-06-09 17:29 ` no-reply
  2020-06-09 17:33 ` no-reply
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: no-reply @ 2020-06-09 17:29 UTC (permalink / raw)
  To: peter.maydell; +Cc: qemu-arm, richard.henderson, qemu-devel

Patchew URL: https://patchew.org/QEMU/20200609160209.29960-1-peter.maydell@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20200609160209.29960-1-peter.maydell@linaro.org
Subject: [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/1591720118-7378-1-git-send-email-aleksandar.qemu.devel@gmail.com -> patchew/1591720118-7378-1-git-send-email-aleksandar.qemu.devel@gmail.com
 * [new tag]         patchew/20200609163932.1566209-1-armbru@redhat.com -> patchew/20200609163932.1566209-1-armbru@redhat.com
Switched to a new branch 'test'
10d6c26 target/arm: Convert Neon 3-reg-diff polynomial VMULL
30b00c9 target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
185eaf7 target/arm: Convert Neon 3-reg-diff long multiplies
57a3a2d target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
8e2567a target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
b4908b3 target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
764df93 target/arm: Fix missing temp frees in do_vshll_2sh

=== OUTPUT BEGIN ===
1/7 Checking commit 764df93babf9 (target/arm: Fix missing temp frees in do_vshll_2sh)
2/7 Checking commit b4908b34dc71 (target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree)
3/7 Checking commit 8e2567a86999 (target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree)
4/7 Checking commit 57a3a2dc757a (target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree)
5/7 Checking commit 185eaf7a1f0a (target/arm: Convert Neon 3-reg-diff long multiplies)
ERROR: space required after that ',' (ctx:VxV)
#93: FILE: target/arm/translate-neon.inc.c:2203:
+#define DO_VMLAL(INSN,MULL,ACC)                                         \
                      ^

ERROR: space required after that ',' (ctx:VxV)
#93: FILE: target/arm/translate-neon.inc.c:2203:
+#define DO_VMLAL(INSN,MULL,ACC)                                         \
                           ^

ERROR: space required after that ',' (ctx:VxV)
#111: FILE: target/arm/translate-neon.inc.c:2221:
+DO_VMLAL(VMLAL_S,mull_s,add)
                 ^

ERROR: space required after that ',' (ctx:VxV)
#111: FILE: target/arm/translate-neon.inc.c:2221:
+DO_VMLAL(VMLAL_S,mull_s,add)
                        ^

ERROR: space required after that ',' (ctx:VxV)
#112: FILE: target/arm/translate-neon.inc.c:2222:
+DO_VMLAL(VMLAL_U,mull_u,add)
                 ^

ERROR: space required after that ',' (ctx:VxV)
#112: FILE: target/arm/translate-neon.inc.c:2222:
+DO_VMLAL(VMLAL_U,mull_u,add)
                        ^

ERROR: space required after that ',' (ctx:VxV)
#113: FILE: target/arm/translate-neon.inc.c:2223:
+DO_VMLAL(VMLSL_S,mull_s,sub)
                 ^

ERROR: space required after that ',' (ctx:VxV)
#113: FILE: target/arm/translate-neon.inc.c:2223:
+DO_VMLAL(VMLSL_S,mull_s,sub)
                        ^

ERROR: space required after that ',' (ctx:VxV)
#114: FILE: target/arm/translate-neon.inc.c:2224:
+DO_VMLAL(VMLSL_U,mull_u,sub)
                 ^

ERROR: space required after that ',' (ctx:VxV)
#114: FILE: target/arm/translate-neon.inc.c:2224:
+DO_VMLAL(VMLSL_U,mull_u,sub)
                        ^

total: 10 errors, 0 warnings, 138 lines checked

Patch 5/7 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

6/7 Checking commit 30b00c96fb41 (target/arm: Convert Neon 3-reg-diff saturating doubling multiplies)
7/7 Checking commit 10d6c26e24fe (target/arm: Convert Neon 3-reg-diff polynomial VMULL)
WARNING: line over 80 characters
#157: FILE: target/arm/translate.c:5230:
+                /* Three registers of different lengths: handled by decodetree */

total: 0 errors, 1 warnings, 131 lines checked

Patch 7/7 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200609160209.29960-1-peter.maydell@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree
  2020-06-09 16:02 [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree Peter Maydell
                   ` (8 preceding siblings ...)
  2020-06-09 17:29 ` no-reply
@ 2020-06-09 17:33 ` no-reply
  2020-06-09 20:12 ` no-reply
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: no-reply @ 2020-06-09 17:33 UTC (permalink / raw)
  To: peter.maydell; +Cc: qemu-arm, richard.henderson, qemu-devel

Patchew URL: https://patchew.org/QEMU/20200609160209.29960-1-peter.maydell@linaro.org/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC      aarch64-softmmu/target/arm/psci.o
  CC      aarch64-softmmu/target/arm/translate-a64.o
  CC      aarch64-softmmu/target/arm/helper-a64.o
/tmp/qemu-test/src/target/arm/neon-dp.decode:419: error: ('definition has 0 bits',)
make[1]: *** [/tmp/qemu-test/src/target/arm/Makefile.objs:27: target/arm/decode-neon-dp.inc.c] Error 1
make[1]: *** Waiting for unfinished jobs....
  GEN     x86_64-softmmu/qemu-system-x86_64.exe
make: *** [Makefile:527: aarch64-softmmu/all] Error 2
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 665, in <module>
    sys.exit(main())
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=69eb65b4813d4082974e1945e31adc22', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-okk7r4jy/src/docker-src.2020-06-09-13.30.20.21045:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=69eb65b4813d4082974e1945e31adc22
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-okk7r4jy/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real    3m25.223s
user    0m9.106s


The full log is available at
http://patchew.org/logs/20200609160209.29960-1-peter.maydell@linaro.org/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/7] target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
  2020-06-09 16:02 ` [PATCH 3/7] target/arm: Convert Neon 3-reg-diff narrowing " Peter Maydell
@ 2020-06-09 17:40   ` Richard Henderson
  0 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2020-06-09 17:40 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/9/20 9:02 AM, Peter Maydell wrote:
> +#define DO_NARROW_3D(INSN, OP, NARROWTYPE, EXTOP)                       \
> +    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
> +    {                                                                   \
> +        NeonGenTwo64OpFn *addfn[] = {                                   \
> +            gen_helper_neon_##OP##l_u16,                                \
> +            gen_helper_neon_##OP##l_u32,                                \
> +            tcg_gen_##OP##_i64,                                         \
> +            NULL,                                                       \
> +        };                                                              \
> +        NeonGenNarrowFn *narrowfn[] = {                                 \
> +            gen_helper_neon_##NARROWTYPE##_high_u8,                     \
> +            gen_helper_neon_##NARROWTYPE##_high_u16,                    \
> +            EXTOP,                                                      \
> +            NULL,                                                       \
> +        };                                                              \
> +        return do_narrow_3d(s, a, addfn[a->size], narrowfn[a->size]);   \
> +    }

Missing const again.  Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
  2020-06-09 16:02 ` [PATCH 4/7] target/arm: Convert Neon 3-reg-diff VABAL, VABDL " Peter Maydell
@ 2020-06-09 18:16   ` Richard Henderson
  0 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2020-06-09 18:16 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/9/20 9:02 AM, Peter Maydell wrote:
> +static bool trans_VABDL_S_3d(DisasContext *s, arg_3diff *a)
> +{
> +    NeonGenTwoOpWidenFn *opfn[] = {
> +        gen_helper_neon_abdl_s16,
> +        gen_helper_neon_abdl_s32,
> +        gen_helper_neon_abdl_s64,
> +        NULL,
> +    };

Missing static const. Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/7] target/arm: Convert Neon 3-reg-diff long multiplies
  2020-06-09 16:02 ` [PATCH 5/7] target/arm: Convert Neon 3-reg-diff long multiplies Peter Maydell
@ 2020-06-09 18:26   ` Richard Henderson
  0 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2020-06-09 18:26 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/9/20 9:02 AM, Peter Maydell wrote:
> +static bool trans_VMULL_S_3d(DisasContext *s, arg_3diff *a)
> +{
> +    NeonGenTwoOpWidenFn *opfn[] = {

static const again, 4 instances.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/7] target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
  2020-06-09 16:02 ` [PATCH 6/7] target/arm: Convert Neon 3-reg-diff saturating doubling multiplies Peter Maydell
@ 2020-06-09 18:45   ` Richard Henderson
  0 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2020-06-09 18:45 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/9/20 9:02 AM, Peter Maydell wrote:
> +static bool trans_VQDMULL_3d(DisasContext *s, arg_3diff *a)
> +{
> +    NeonGenTwoOpWidenFn *opfn[] = {

5 more static const.  Otherwise,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/7] target/arm: Convert Neon 3-reg-diff polynomial VMULL
  2020-06-09 16:02 ` [PATCH 7/7] target/arm: Convert Neon 3-reg-diff polynomial VMULL Peter Maydell
@ 2020-06-09 18:56   ` Richard Henderson
  0 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2020-06-09 18:56 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/9/20 9:02 AM, Peter Maydell wrote:
> Convert the Neon 3-reg-diff insn polynomial VMULL. This is the last
> insn in this group to be converted.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/neon-dp.decode       |  2 ++
>  target/arm/translate-neon.inc.c | 43 +++++++++++++++++++++++
>  target/arm/translate.c          | 60 ++-------------------------------
>  3 files changed, 48 insertions(+), 57 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree
  2020-06-09 16:02 [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree Peter Maydell
                   ` (9 preceding siblings ...)
  2020-06-09 17:33 ` no-reply
@ 2020-06-09 20:12 ` no-reply
  2020-06-09 21:07 ` no-reply
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: no-reply @ 2020-06-09 20:12 UTC (permalink / raw)
  To: peter.maydell; +Cc: qemu-arm, richard.henderson, qemu-devel

Patchew URL: https://patchew.org/QEMU/20200609160209.29960-1-peter.maydell@linaro.org/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC      aarch64-softmmu/target/arm/psci.o
  CC      aarch64-softmmu/target/arm/helper-a64.o
  GEN     aarch64-softmmu/target/arm/decode-sve.inc.c
/tmp/qemu-test/src/target/arm/neon-dp.decode:419: error: ('definition has 0 bits',)
make[1]: *** [target/arm/decode-neon-dp.inc.c] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [aarch64-softmmu/all] Error 2
make: *** Waiting for unfinished jobs....
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 665, in <module>
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=fd2ea42b79504fd5bdbdd4ead15f2c62', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-w7za9v94/src/docker-src.2020-06-09-16.09.24.2468:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=fd2ea42b79504fd5bdbdd4ead15f2c62
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-w7za9v94/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    3m4.489s
user    0m8.005s


The full log is available at
http://patchew.org/logs/20200609160209.29960-1-peter.maydell@linaro.org/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree
  2020-06-09 16:02 [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree Peter Maydell
                   ` (10 preceding siblings ...)
  2020-06-09 20:12 ` no-reply
@ 2020-06-09 21:07 ` no-reply
  2020-06-09 21:10 ` no-reply
  2020-06-10  7:41 ` Philippe Mathieu-Daudé
  13 siblings, 0 replies; 23+ messages in thread
From: no-reply @ 2020-06-09 21:07 UTC (permalink / raw)
  To: peter.maydell; +Cc: qemu-arm, richard.henderson, qemu-devel

Patchew URL: https://patchew.org/QEMU/20200609160209.29960-1-peter.maydell@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20200609160209.29960-1-peter.maydell@linaro.org
Subject: [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

From https://github.com/patchew-project/qemu
   49ee115..31d321c  master     -> master
Switched to a new branch 'test'
fb9cd52 target/arm: Convert Neon 3-reg-diff polynomial VMULL
08c5030 target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
df05eae target/arm: Convert Neon 3-reg-diff long multiplies
42bfcfc target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
53025bc target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
bc32157 target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
f6dc16f target/arm: Fix missing temp frees in do_vshll_2sh

=== OUTPUT BEGIN ===
1/7 Checking commit f6dc16f463ee (target/arm: Fix missing temp frees in do_vshll_2sh)
2/7 Checking commit bc3215754d8f (target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree)
3/7 Checking commit 53025bc0a61e (target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree)
4/7 Checking commit 42bfcfcc08ac (target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree)
5/7 Checking commit df05eae86282 (target/arm: Convert Neon 3-reg-diff long multiplies)
ERROR: space required after that ',' (ctx:VxV)
#93: FILE: target/arm/translate-neon.inc.c:2203:
+#define DO_VMLAL(INSN,MULL,ACC)                                         \
                      ^

ERROR: space required after that ',' (ctx:VxV)
#93: FILE: target/arm/translate-neon.inc.c:2203:
+#define DO_VMLAL(INSN,MULL,ACC)                                         \
                           ^

ERROR: space required after that ',' (ctx:VxV)
#111: FILE: target/arm/translate-neon.inc.c:2221:
+DO_VMLAL(VMLAL_S,mull_s,add)
                 ^

ERROR: space required after that ',' (ctx:VxV)
#111: FILE: target/arm/translate-neon.inc.c:2221:
+DO_VMLAL(VMLAL_S,mull_s,add)
                        ^

ERROR: space required after that ',' (ctx:VxV)
#112: FILE: target/arm/translate-neon.inc.c:2222:
+DO_VMLAL(VMLAL_U,mull_u,add)
                 ^

ERROR: space required after that ',' (ctx:VxV)
#112: FILE: target/arm/translate-neon.inc.c:2222:
+DO_VMLAL(VMLAL_U,mull_u,add)
                        ^

ERROR: space required after that ',' (ctx:VxV)
#113: FILE: target/arm/translate-neon.inc.c:2223:
+DO_VMLAL(VMLSL_S,mull_s,sub)
                 ^

ERROR: space required after that ',' (ctx:VxV)
#113: FILE: target/arm/translate-neon.inc.c:2223:
+DO_VMLAL(VMLSL_S,mull_s,sub)
                        ^

ERROR: space required after that ',' (ctx:VxV)
#114: FILE: target/arm/translate-neon.inc.c:2224:
+DO_VMLAL(VMLSL_U,mull_u,sub)
                 ^

ERROR: space required after that ',' (ctx:VxV)
#114: FILE: target/arm/translate-neon.inc.c:2224:
+DO_VMLAL(VMLSL_U,mull_u,sub)
                        ^

total: 10 errors, 0 warnings, 138 lines checked

Patch 5/7 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

6/7 Checking commit 08c503092761 (target/arm: Convert Neon 3-reg-diff saturating doubling multiplies)
7/7 Checking commit fb9cd52b2791 (target/arm: Convert Neon 3-reg-diff polynomial VMULL)
WARNING: line over 80 characters
#157: FILE: target/arm/translate.c:5230:
+                /* Three registers of different lengths: handled by decodetree */

total: 0 errors, 1 warnings, 131 lines checked

Patch 7/7 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200609160209.29960-1-peter.maydell@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree
  2020-06-09 16:02 [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree Peter Maydell
                   ` (11 preceding siblings ...)
  2020-06-09 21:07 ` no-reply
@ 2020-06-09 21:10 ` no-reply
  2020-06-10  7:41 ` Philippe Mathieu-Daudé
  13 siblings, 0 replies; 23+ messages in thread
From: no-reply @ 2020-06-09 21:10 UTC (permalink / raw)
  To: peter.maydell; +Cc: qemu-arm, richard.henderson, qemu-devel

Patchew URL: https://patchew.org/QEMU/20200609160209.29960-1-peter.maydell@linaro.org/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC      aarch64-softmmu/target/arm/translate-a64.o
  GEN     aarch64-softmmu/target/arm/decode-sve.inc.c
  CC      aarch64-softmmu/target/arm/helper-a64.o
/tmp/qemu-test/src/target/arm/neon-dp.decode:419: error: ('definition has 0 bits',)
  CC      aarch64-softmmu/target/arm/sve_helper.o
make[1]: *** [/tmp/qemu-test/src/target/arm/Makefile.objs:27: target/arm/decode-neon-dp.inc.c] Error 1
make[1]: *** Waiting for unfinished jobs....
  CC      aarch64-softmmu/target/arm/pauth_helper.o
  GEN     x86_64-softmmu/qemu-system-x86_64.exe
make: *** [Makefile:527: aarch64-softmmu/all] Error 2
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 665, in <module>
    sys.exit(main())
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=a76ec30164464962b992161c1d94e390', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-evguu8xo/src/docker-src.2020-06-09-17.07.30.29444:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=a76ec30164464962b992161c1d94e390
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-evguu8xo/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real    2m40.106s
user    0m8.477s


The full log is available at
http://patchew.org/logs/20200609160209.29960-1-peter.maydell@linaro.org/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] target/arm: Fix missing temp frees in do_vshll_2sh
  2020-06-09 16:02 ` [PATCH 1/7] target/arm: Fix missing temp frees in do_vshll_2sh Peter Maydell
  2020-06-09 17:09   ` Richard Henderson
@ 2020-06-10  7:34   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 23+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-06-10  7:34 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel; +Cc: Richard Henderson

On 6/9/20 6:02 PM, Peter Maydell wrote:
> The widenfn() in do_vshll_2sh() does not free the input 32-bit
> TCGv, so we need to do this in the calling code.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target/arm/translate-neon.inc.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
> index 664d3612607..299a61f067b 100644
> --- a/target/arm/translate-neon.inc.c
> +++ b/target/arm/translate-neon.inc.c
> @@ -1624,6 +1624,7 @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
>      tmp = tcg_temp_new_i64();
>  
>      widenfn(tmp, rm0);
> +    tcg_temp_free_i32(rm0);
>      if (a->shift != 0) {
>          tcg_gen_shli_i64(tmp, tmp, a->shift);
>          tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
> @@ -1631,6 +1632,7 @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
>      neon_store_reg64(tmp, a->vd);
>  
>      widenfn(tmp, rm1);
> +    tcg_temp_free_i32(rm1);
>      if (a->shift != 0) {
>          tcg_gen_shli_i64(tmp, tmp, a->shift);
>          tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
> 

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree
  2020-06-09 16:02 [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree Peter Maydell
                   ` (12 preceding siblings ...)
  2020-06-09 21:10 ` no-reply
@ 2020-06-10  7:41 ` Philippe Mathieu-Daudé
  13 siblings, 0 replies; 23+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-06-10  7:41 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel; +Cc: Richard Henderson

On 6/9/20 6:02 PM, Peter Maydell wrote:
> This patchset converts the Neon insns in the "3 registers of different
> lengths" group to decodetree. Patch 1 is a bugfix for an earlier
> part of the conversion that's now in master.
> 
> I'm definitely finding that the new decodetree version of Neon
> is often easier to understand because we no longer try to
> accommodate multiple different kinds of widening/narrowing/etc
> insns in a single multi-pass loop: expanding out the loop and
> specializing it to the particular insn type helps a lot.

I agree. The TCG ARM code is well documented, but the decodetree
view makes it easier to review. Kinda obvious when you compare
with the TCG code in older QEMU architectures.
Personally I also find it easier to set breakpoints.

> (Or maybe it's just that having to read the old code and write
> the new version means I understand it better ;-))
> 
> Based-on: id:20200608183652.661386-1-richard.henderson@linaro.org
> ("[PATCH v3 0/9] decodetree: Add non-overlapping groups")
> because we use the new group syntax to set up the structure
> for the "size==0b11" vs "size!=0b11" decode which we'll fill
> in in subsequent patchsets.
> 
> thanks
> -- PMM
> 
> Peter Maydell (7):
>   target/arm: Fix missing temp frees in do_vshll_2sh
>   target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
>   target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
>   target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
>   target/arm: Convert Neon 3-reg-diff long multiplies
>   target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
>   target/arm: Convert Neon 3-reg-diff polynomial VMULL
> 
>  target/arm/translate.h          |   1 +
>  target/arm/neon-dp.decode       |  72 +++++
>  target/arm/translate-neon.inc.c | 521 ++++++++++++++++++++++++++++++++
>  target/arm/translate.c          | 222 +-------------
>  4 files changed, 597 insertions(+), 219 deletions(-)
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-06-10  7:42 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-09 16:02 [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree Peter Maydell
2020-06-09 16:02 ` [PATCH 1/7] target/arm: Fix missing temp frees in do_vshll_2sh Peter Maydell
2020-06-09 17:09   ` Richard Henderson
2020-06-10  7:34   ` Philippe Mathieu-Daudé
2020-06-09 16:02 ` [PATCH 2/7] target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree Peter Maydell
2020-06-09 17:26   ` Richard Henderson
2020-06-09 16:02 ` [PATCH 3/7] target/arm: Convert Neon 3-reg-diff narrowing " Peter Maydell
2020-06-09 17:40   ` Richard Henderson
2020-06-09 16:02 ` [PATCH 4/7] target/arm: Convert Neon 3-reg-diff VABAL, VABDL " Peter Maydell
2020-06-09 18:16   ` Richard Henderson
2020-06-09 16:02 ` [PATCH 5/7] target/arm: Convert Neon 3-reg-diff long multiplies Peter Maydell
2020-06-09 18:26   ` Richard Henderson
2020-06-09 16:02 ` [PATCH 6/7] target/arm: Convert Neon 3-reg-diff saturating doubling multiplies Peter Maydell
2020-06-09 18:45   ` Richard Henderson
2020-06-09 16:02 ` [PATCH 7/7] target/arm: Convert Neon 3-reg-diff polynomial VMULL Peter Maydell
2020-06-09 18:56   ` Richard Henderson
2020-06-09 17:01 ` [PATCH 0/7] target/arm: Convert Neon 3-reg-diff to decodetree no-reply
2020-06-09 17:29 ` no-reply
2020-06-09 17:33 ` no-reply
2020-06-09 20:12 ` no-reply
2020-06-09 21:07 ` no-reply
2020-06-09 21:10 ` no-reply
2020-06-10  7:41 ` Philippe Mathieu-Daudé

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).