[Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups
@ 2019-02-09  3:38 Richard Henderson
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 01/12] target/arm: Rely on optimization within tcg_gen_gvec_or Richard Henderson
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: Richard Henderson @ 2019-02-09  3:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Changes since v2:
  * Fix some representational issues with FPSCR.
  * Use host vector saturation for SQADD/UQADD.
    This requires changing the internal representation of FPSR.QC.
  * Fix a latent vector bug, noticed during the rest.

Correct RISU results depend on Mark C-A's patch from today,
"tcg/i386: fix unsigned vector saturating arithmetic",
which will be in my next tcg pull.


r~


Richard Henderson (12):
  target/arm: Rely on optimization within tcg_gen_gvec_or
  target/arm: Use vector minmax expanders for aarch64
  target/arm: Use vector minmax expanders for aarch32
  target/arm: Use tcg integer min/max primitives for neon
  target/arm: Remove neon min/max helpers
  target/arm: Fix vfp_gdb_get/set_reg vs FPSCR
  target/arm: Fix arm_cpu_dump_state vs FPSCR
  target/arm: Split out flags setting from vfp compares
  target/arm: Fix set of bits kept in xregs[ARM_VFP_FPSCR]
  target/arm: Split out FPSCR.QC to a vector field
  target/arm: Use vector operations for saturation
  target/arm: Add missing clear_tail calls

 target/arm/cpu.h           |   5 +-
 target/arm/helper.h        |  45 ++++++--
 target/arm/translate.h     |   4 +
 target/arm/helper.c        |  81 +++++++++-----
 target/arm/neon_helper.c   |  14 +--
 target/arm/translate-a64.c |  77 ++++++-------
 target/arm/translate-sve.c |   6 +-
 target/arm/translate.c     | 219 +++++++++++++++++++++++++++++--------
 target/arm/vec_helper.c    | 134 ++++++++++++++++++++++-
 9 files changed, 433 insertions(+), 152 deletions(-)

-- 
2.17.2

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v3 01/12] target/arm: Rely on optimization within tcg_gen_gvec_or
  2019-02-09  3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson
@ 2019-02-09  3:38 ` Richard Henderson
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 02/12] target/arm: Use vector minmax expanders for aarch64 Richard Henderson
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2019-02-09  3:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Since we're now handling a == b generically, we no longer need
to do it by hand within target/arm/.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c |  6 +-----
 target/arm/translate-sve.c |  6 +-----
 target/arm/translate.c     | 12 +++---------
 3 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index e002251ac6..a12bfac719 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -10648,11 +10648,7 @@ static void disas_simd_3same_logic(DisasContext *s, uint32_t insn)
         gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_andc, 0);
         return;
     case 2: /* ORR */
-        if (rn == rm) { /* MOV */
-            gen_gvec_fn2(s, is_q, rd, rn, tcg_gen_gvec_mov, 0);
-        } else {
-            gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_or, 0);
-        }
+        gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_or, 0);
         return;
     case 3: /* ORN */
         gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_orc, 0);
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index b15b615ceb..3a2eb51566 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -280,11 +280,7 @@ static bool trans_AND_zzz(DisasContext *s, arg_rrr_esz *a)
 
 static bool trans_ORR_zzz(DisasContext *s, arg_rrr_esz *a)
 {
-    if (a->rn == a->rm) { /* MOV */
-        return do_mov_z(s, a->rd, a->rn);
-    } else {
-        return do_vector3_z(s, tcg_gen_gvec_or, 0, a->rd, a->rn, a->rm);
-    }
+    return do_vector3_z(s, tcg_gen_gvec_or, 0, a->rd, a->rn, a->rm);
 }
 
 static bool trans_EOR_zzz(DisasContext *s, arg_rrr_esz *a)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 66cf28c8cb..9d2dba7ed2 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -6294,15 +6294,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 tcg_gen_gvec_andc(0, rd_ofs, rn_ofs, rm_ofs,
                                   vec_size, vec_size);
                 break;
-            case 2:
-                if (rn == rm) {
-                    /* VMOV */
-                    tcg_gen_gvec_mov(0, rd_ofs, rn_ofs, vec_size, vec_size);
-                } else {
-                    /* VORR */
-                    tcg_gen_gvec_or(0, rd_ofs, rn_ofs, rm_ofs,
-                                    vec_size, vec_size);
-                }
+            case 2: /* VORR */
+                tcg_gen_gvec_or(0, rd_ofs, rn_ofs, rm_ofs,
+                                vec_size, vec_size);
                 break;
             case 3: /* VORN */
                 tcg_gen_gvec_orc(0, rd_ofs, rn_ofs, rm_ofs,
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v3 02/12] target/arm: Use vector minmax expanders for aarch64
  2019-02-09  3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 01/12] target/arm: Rely on optimization within tcg_gen_gvec_or Richard Henderson
@ 2019-02-09  3:38 ` Richard Henderson
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 03/12] target/arm: Use vector minmax expanders for aarch32 Richard Henderson
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2019-02-09  3:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 35 ++++++++++++++---------------------
 1 file changed, 14 insertions(+), 21 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index a12bfac719..fd5ceb6613 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -10948,6 +10948,20 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
     }
 
     switch (opcode) {
+    case 0x0c: /* SMAX, UMAX */
+        if (u) {
+            gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
+        } else {
+            gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_smax, size);
+        }
+        return;
+    case 0x0d: /* SMIN, UMIN */
+        if (u) {
+            gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umin, size);
+        } else {
+            gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_smin, size);
+        }
+        return;
     case 0x10: /* ADD, SUB */
         if (u) {
             gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_sub, size);
@@ -11109,27 +11123,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                 genenvfn = fns[size][u];
                 break;
             }
-            case 0xc: /* SMAX, UMAX */
-            {
-                static NeonGenTwoOpFn * const fns[3][2] = {
-                    { gen_helper_neon_max_s8, gen_helper_neon_max_u8 },
-                    { gen_helper_neon_max_s16, gen_helper_neon_max_u16 },
-                    { tcg_gen_smax_i32, tcg_gen_umax_i32 },
-                };
-                genfn = fns[size][u];
-                break;
-            }
-
-            case 0xd: /* SMIN, UMIN */
-            {
-                static NeonGenTwoOpFn * const fns[3][2] = {
-                    { gen_helper_neon_min_s8, gen_helper_neon_min_u8 },
-                    { gen_helper_neon_min_s16, gen_helper_neon_min_u16 },
-                    { tcg_gen_smin_i32, tcg_gen_umin_i32 },
-                };
-                genfn = fns[size][u];
-                break;
-            }
             case 0xe: /* SABD, UABD */
             case 0xf: /* SABA, UABA */
             {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v3 03/12] target/arm: Use vector minmax expanders for aarch32
  2019-02-09  3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 01/12] target/arm: Rely on optimization within tcg_gen_gvec_or Richard Henderson
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 02/12] target/arm: Use vector minmax expanders for aarch64 Richard Henderson
@ 2019-02-09  3:38 ` Richard Henderson
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 04/12] target/arm: Use tcg integer min/max primitives for neon Richard Henderson
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2019-02-09  3:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 9d2dba7ed2..df1cd3fa3e 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -6368,6 +6368,25 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             tcg_gen_gvec_cmp(u ? TCG_COND_GEU : TCG_COND_GE, size,
                              rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
             return 0;
+
+        case NEON_3R_VMAX:
+            if (u) {
+                tcg_gen_gvec_umax(size, rd_ofs, rn_ofs, rm_ofs,
+                                  vec_size, vec_size);
+            } else {
+                tcg_gen_gvec_smax(size, rd_ofs, rn_ofs, rm_ofs,
+                                  vec_size, vec_size);
+            }
+            return 0;
+        case NEON_3R_VMIN:
+            if (u) {
+                tcg_gen_gvec_umin(size, rd_ofs, rn_ofs, rm_ofs,
+                                  vec_size, vec_size);
+            } else {
+                tcg_gen_gvec_smin(size, rd_ofs, rn_ofs, rm_ofs,
+                                  vec_size, vec_size);
+            }
+            return 0;
         }
 
         if (size == 3) {
@@ -6533,12 +6552,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VQRSHL:
             GEN_NEON_INTEGER_OP_ENV(qrshl);
             break;
-        case NEON_3R_VMAX:
-            GEN_NEON_INTEGER_OP(max);
-            break;
-        case NEON_3R_VMIN:
-            GEN_NEON_INTEGER_OP(min);
-            break;
         case NEON_3R_VABD:
             GEN_NEON_INTEGER_OP(abd);
             break;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v3 04/12] target/arm: Use tcg integer min/max primitives for neon
  2019-02-09  3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson
                   ` (2 preceding siblings ...)
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 03/12] target/arm: Use vector minmax expanders for aarch32 Richard Henderson
@ 2019-02-09  3:38 ` Richard Henderson
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 05/12] target/arm: Remove neon min/max helpers Richard Henderson
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2019-02-09  3:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

The 32-bit PMIN/PMAX has been decomposed to scalars,
and so can be trivially expanded inline.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index df1cd3fa3e..f0101d2788 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4760,10 +4760,10 @@ static inline void gen_neon_rsb(int size, TCGv_i32 t0, TCGv_i32 t1)
 }
 
 /* 32-bit pairwise ops end up the same as the elementwise versions.  */
-#define gen_helper_neon_pmax_s32  gen_helper_neon_max_s32
-#define gen_helper_neon_pmax_u32  gen_helper_neon_max_u32
-#define gen_helper_neon_pmin_s32  gen_helper_neon_min_s32
-#define gen_helper_neon_pmin_u32  gen_helper_neon_min_u32
+#define gen_helper_neon_pmax_s32  tcg_gen_smax_i32
+#define gen_helper_neon_pmax_u32  tcg_gen_umax_i32
+#define gen_helper_neon_pmin_s32  tcg_gen_smin_i32
+#define gen_helper_neon_pmin_u32  tcg_gen_umin_i32
 
 #define GEN_NEON_INTEGER_OP_ENV(name) do { \
     switch ((size << 1) | u) { \
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v3 05/12] target/arm: Remove neon min/max helpers
  2019-02-09  3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson
                   ` (3 preceding siblings ...)
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 04/12] target/arm: Use tcg integer min/max primitives for neon Richard Henderson
@ 2019-02-09  3:38 ` Richard Henderson
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 06/12] target/arm: Fix vfp_gdb_get/set_reg vs FPSCR Richard Henderson
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2019-02-09  3:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

These are now unused.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h      | 12 ------------
 target/arm/neon_helper.c | 12 ------------
 2 files changed, 24 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 53a38188c6..9874c35ea9 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -276,18 +276,6 @@ DEF_HELPER_2(neon_cge_s16, i32, i32, i32)
 DEF_HELPER_2(neon_cge_u32, i32, i32, i32)
 DEF_HELPER_2(neon_cge_s32, i32, i32, i32)
 
-DEF_HELPER_2(neon_min_u8, i32, i32, i32)
-DEF_HELPER_2(neon_min_s8, i32, i32, i32)
-DEF_HELPER_2(neon_min_u16, i32, i32, i32)
-DEF_HELPER_2(neon_min_s16, i32, i32, i32)
-DEF_HELPER_2(neon_min_u32, i32, i32, i32)
-DEF_HELPER_2(neon_min_s32, i32, i32, i32)
-DEF_HELPER_2(neon_max_u8, i32, i32, i32)
-DEF_HELPER_2(neon_max_s8, i32, i32, i32)
-DEF_HELPER_2(neon_max_u16, i32, i32, i32)
-DEF_HELPER_2(neon_max_s16, i32, i32, i32)
-DEF_HELPER_2(neon_max_u32, i32, i32, i32)
-DEF_HELPER_2(neon_max_s32, i32, i32, i32)
 DEF_HELPER_2(neon_pmin_u8, i32, i32, i32)
 DEF_HELPER_2(neon_pmin_s8, i32, i32, i32)
 DEF_HELPER_2(neon_pmin_u16, i32, i32, i32)
diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
index c2c6491a83..3249005b62 100644
--- a/target/arm/neon_helper.c
+++ b/target/arm/neon_helper.c
@@ -581,12 +581,6 @@ NEON_VOP(cge_u32, neon_u32, 1)
 #undef NEON_FN
 
 #define NEON_FN(dest, src1, src2) dest = (src1 < src2) ? src1 : src2
-NEON_VOP(min_s8, neon_s8, 4)
-NEON_VOP(min_u8, neon_u8, 4)
-NEON_VOP(min_s16, neon_s16, 2)
-NEON_VOP(min_u16, neon_u16, 2)
-NEON_VOP(min_s32, neon_s32, 1)
-NEON_VOP(min_u32, neon_u32, 1)
 NEON_POP(pmin_s8, neon_s8, 4)
 NEON_POP(pmin_u8, neon_u8, 4)
 NEON_POP(pmin_s16, neon_s16, 2)
@@ -594,12 +588,6 @@ NEON_POP(pmin_u16, neon_u16, 2)
 #undef NEON_FN
 
 #define NEON_FN(dest, src1, src2) dest = (src1 > src2) ? src1 : src2
-NEON_VOP(max_s8, neon_s8, 4)
-NEON_VOP(max_u8, neon_u8, 4)
-NEON_VOP(max_s16, neon_s16, 2)
-NEON_VOP(max_u16, neon_u16, 2)
-NEON_VOP(max_s32, neon_s32, 1)
-NEON_VOP(max_u32, neon_u32, 1)
 NEON_POP(pmax_s8, neon_s8, 4)
 NEON_POP(pmax_u8, neon_u8, 4)
 NEON_POP(pmax_s16, neon_s16, 2)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v3 06/12] target/arm: Fix vfp_gdb_get/set_reg vs FPSCR
  2019-02-09  3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson
                   ` (4 preceding siblings ...)
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 05/12] target/arm: Remove neon min/max helpers Richard Henderson
@ 2019-02-09  3:38 ` Richard Henderson
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 07/12] target/arm: Fix arm_cpu_dump_state " Richard Henderson
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2019-02-09  3:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

The components of this register is stored in several
different locations.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 520ceea7a4..6ac81c2ca2 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -81,7 +81,7 @@ static int vfp_gdb_get_reg(CPUARMState *env, uint8_t *buf, int reg)
     }
     switch (reg - nregs) {
     case 0: stl_p(buf, env->vfp.xregs[ARM_VFP_FPSID]); return 4;
-    case 1: stl_p(buf, env->vfp.xregs[ARM_VFP_FPSCR]); return 4;
+    case 1: stl_p(buf, vfp_get_fpscr(env)); return 4;
     case 2: stl_p(buf, env->vfp.xregs[ARM_VFP_FPEXC]); return 4;
     }
     return 0;
@@ -107,7 +107,7 @@ static int vfp_gdb_set_reg(CPUARMState *env, uint8_t *buf, int reg)
     }
     switch (reg - nregs) {
     case 0: env->vfp.xregs[ARM_VFP_FPSID] = ldl_p(buf); return 4;
-    case 1: env->vfp.xregs[ARM_VFP_FPSCR] = ldl_p(buf); return 4;
+    case 1: vfp_set_fpscr(env, ldl_p(buf)); return 4;
     case 2: env->vfp.xregs[ARM_VFP_FPEXC] = ldl_p(buf) & (1 << 30); return 4;
     }
     return 0;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v3 07/12] target/arm: Fix arm_cpu_dump_state vs FPSCR
  2019-02-09  3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson
                   ` (5 preceding siblings ...)
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 06/12] target/arm: Fix vfp_gdb_get/set_reg vs FPSCR Richard Henderson
@ 2019-02-09  3:38 ` Richard Henderson
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 08/12] target/arm: Split out flags setting from vfp compares Richard Henderson
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2019-02-09  3:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index f0101d2788..9b426f4271 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -13641,7 +13641,7 @@ void arm_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
                         i * 2 + 1, (uint32_t)(v >> 32),
                         i, v);
         }
-        cpu_fprintf(f, "FPSCR: %08x\n", (int)env->vfp.xregs[ARM_VFP_FPSCR]);
+        cpu_fprintf(f, "FPSCR: %08x\n", vfp_get_fpscr(env));
     }
 }
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v3 08/12] target/arm: Split out flags setting from vfp compares
  2019-02-09  3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson
                   ` (6 preceding siblings ...)
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 07/12] target/arm: Fix arm_cpu_dump_state " Richard Henderson
@ 2019-02-09  3:38 ` Richard Henderson
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 09/12] target/arm: Fix set of bits kept in xregs[ARM_VFP_FPSCR] Richard Henderson
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2019-02-09  3:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Minimize the code within a macro by splitting out a helper function.
Use deposit32 instead of manual bit manipulation.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 45 +++++++++++++++++++++++++++------------------
 1 file changed, 27 insertions(+), 18 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 6ac81c2ca2..51be3fa16f 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -12752,31 +12752,40 @@ float64 VFP_HELPER(sqrt, d)(float64 a, CPUARMState *env)
     return float64_sqrt(a, &env->vfp.fp_status);
 }
 
+static void softfloat_to_vfp_compare(CPUARMState *env, int cmp)
+{
+    uint32_t flags;
+    switch (cmp) {
+    case float_relation_equal:
+        flags = 0x6;
+        break;
+    case float_relation_less:
+        flags = 0x8;
+        break;
+    case float_relation_greater:
+        flags = 0x2;
+        break;
+    case float_relation_unordered:
+        flags = 0x3;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    env->vfp.xregs[ARM_VFP_FPSCR] =
+        deposit32(env->vfp.xregs[ARM_VFP_FPSCR], 28, 4, flags);
+}
+
 /* XXX: check quiet/signaling case */
 #define DO_VFP_cmp(p, type) \
 void VFP_HELPER(cmp, p)(type a, type b, CPUARMState *env)  \
 { \
-    uint32_t flags; \
-    switch(type ## _compare_quiet(a, b, &env->vfp.fp_status)) { \
-    case 0: flags = 0x6; break; \
-    case -1: flags = 0x8; break; \
-    case 1: flags = 0x2; break; \
-    default: case 2: flags = 0x3; break; \
-    } \
-    env->vfp.xregs[ARM_VFP_FPSCR] = (flags << 28) \
-        | (env->vfp.xregs[ARM_VFP_FPSCR] & 0x0fffffff); \
+    softfloat_to_vfp_compare(env, \
+        type ## _compare_quiet(a, b, &env->vfp.fp_status)); \
 } \
 void VFP_HELPER(cmpe, p)(type a, type b, CPUARMState *env) \
 { \
-    uint32_t flags; \
-    switch(type ## _compare(a, b, &env->vfp.fp_status)) { \
-    case 0: flags = 0x6; break; \
-    case -1: flags = 0x8; break; \
-    case 1: flags = 0x2; break; \
-    default: case 2: flags = 0x3; break; \
-    } \
-    env->vfp.xregs[ARM_VFP_FPSCR] = (flags << 28) \
-        | (env->vfp.xregs[ARM_VFP_FPSCR] & 0x0fffffff); \
+    softfloat_to_vfp_compare(env, \
+        type ## _compare(a, b, &env->vfp.fp_status)); \
 }
 DO_VFP_cmp(s, float32)
 DO_VFP_cmp(d, float64)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v3 09/12] target/arm: Fix set of bits kept in xregs[ARM_VFP_FPSCR]
  2019-02-09  3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson
                   ` (7 preceding siblings ...)
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 08/12] target/arm: Split out flags setting from vfp compares Richard Henderson
@ 2019-02-09  3:38 ` Richard Henderson
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 10/12] target/arm: Split out FPSCR.QC to a vector field Richard Henderson
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2019-02-09  3:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Given that we mask bits properly on set, there is no reason
to mask them again on get.  We failed to clear the exception
status bits, 0x9f, which means that the wrong value would be
returned on get.  Except in the (probably normal) case in which
the set clears all of the bits.

Simplify the code in set to also clear the RES0 bits.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 51be3fa16f..af22274bd9 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -12588,7 +12588,7 @@ uint32_t HELPER(vfp_get_fpscr)(CPUARMState *env)
     int i;
     uint32_t fpscr;
 
-    fpscr = (env->vfp.xregs[ARM_VFP_FPSCR] & 0xffc8ffff)
+    fpscr = env->vfp.xregs[ARM_VFP_FPSCR]
             | (env->vfp.vec_len << 16)
             | (env->vfp.vec_stride << 20);
 
@@ -12630,7 +12630,7 @@ static inline int vfp_exceptbits_to_host(int target_bits)
 void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
 {
     int i;
-    uint32_t changed;
+    uint32_t changed = env->vfp.xregs[ARM_VFP_FPSCR];
 
     /* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
     if (!cpu_isar_feature(aa64_fp16, arm_env_get_cpu(env))) {
@@ -12639,12 +12639,13 @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
 
     /*
      * We don't implement trapped exception handling, so the
-     * trap enable bits are all RAZ/WI (not RES0!)
+     * trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!)
+     *
+     * If we exclude the exception flags, IOC|DZC|OFC|UFC|IXC|IDC
+     * (which are stored in fp_status), and the other RES0 bits
+     * in between, then we clear all of the low 16 bits.
      */
-    val &= ~(FPCR_IDE | FPCR_IXE | FPCR_UFE | FPCR_OFE | FPCR_DZE | FPCR_IOE);
-
-    changed = env->vfp.xregs[ARM_VFP_FPSCR];
-    env->vfp.xregs[ARM_VFP_FPSCR] = (val & 0xffc8ffff);
+    env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xffc80000;
     env->vfp.vec_len = (val >> 16) & 7;
     env->vfp.vec_stride = (val >> 20) & 3;
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v3 10/12] target/arm: Split out FPSCR.QC to a vector field
  2019-02-09  3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson
                   ` (8 preceding siblings ...)
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 09/12] target/arm: Fix set of bits kept in xregs[ARM_VFP_FPSCR] Richard Henderson
@ 2019-02-09  3:38 ` Richard Henderson
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 11/12] target/arm: Use vector operations for saturation Richard Henderson
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2019-02-09  3:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Change the representation of this field such that it is easy
to set from vector code.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h         |  5 ++++-
 target/arm/helper.c      | 19 +++++++++++++++----
 target/arm/neon_helper.c |  2 +-
 target/arm/vec_helper.c  |  2 +-
 4 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 47238e4245..b96463e8f1 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -577,11 +577,13 @@ typedef struct CPUARMState {
         ARMPredicateReg preg_tmp;
 #endif
 
-        uint32_t xregs[16];
         /* We store these fpcsr fields separately for convenience.  */
+        uint32_t qc[4] QEMU_ALIGNED(16);
         int vec_len;
         int vec_stride;
 
+        uint32_t xregs[16];
+
         /* Scratch space for aa32 neon expansion.  */
         uint32_t scratch[8];
 
@@ -1427,6 +1429,7 @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
 #define FPCR_FZ16   (1 << 19)   /* ARMv8.2+, FP16 flush-to-zero */
 #define FPCR_FZ     (1 << 24)   /* Flush-to-zero enable bit */
 #define FPCR_DN     (1 << 25)   /* Default NaN enable bit */
+#define FPCR_QC     (1 << 27)   /* Cumulative saturation bit */
 
 static inline uint32_t vfp_get_fpsr(CPUARMState *env)
 {
diff --git a/target/arm/helper.c b/target/arm/helper.c
index af22274bd9..7ed9933663 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -12585,8 +12585,7 @@ static inline int vfp_exceptbits_from_host(int host_bits)
 
 uint32_t HELPER(vfp_get_fpscr)(CPUARMState *env)
 {
-    int i;
-    uint32_t fpscr;
+    uint32_t i, fpscr;
 
     fpscr = env->vfp.xregs[ARM_VFP_FPSCR]
             | (env->vfp.vec_len << 16)
@@ -12597,8 +12596,11 @@ uint32_t HELPER(vfp_get_fpscr)(CPUARMState *env)
     /* FZ16 does not generate an input denormal exception.  */
     i |= (get_float_exception_flags(&env->vfp.fp_status_f16)
           & ~float_flag_input_denormal);
-
     fpscr |= vfp_exceptbits_from_host(i);
+
+    i = env->vfp.qc[0] | env->vfp.qc[1] | env->vfp.qc[2] | env->vfp.qc[3];
+    fpscr |= i ? FPCR_QC : 0;
+
     return fpscr;
 }
 
@@ -12645,10 +12647,19 @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
      * (which are stored in fp_status), and the other RES0 bits
      * in between, then we clear all of the low 16 bits.
      */
-    env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xffc80000;
+    env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7c80000;
     env->vfp.vec_len = (val >> 16) & 7;
     env->vfp.vec_stride = (val >> 20) & 3;
 
+    /*
+     * The bit we set within fpscr_q is arbitrary; the register as a
+     * whole being zero/non-zero is what counts.
+     */
+    env->vfp.qc[0] = val & FPCR_QC;
+    env->vfp.qc[1] = 0;
+    env->vfp.qc[2] = 0;
+    env->vfp.qc[3] = 0;
+
     changed ^= val;
     if (changed & (3 << 22)) {
         i = (val >> 22) & 3;
diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
index 3249005b62..ed1c6fc41c 100644
--- a/target/arm/neon_helper.c
+++ b/target/arm/neon_helper.c
@@ -15,7 +15,7 @@
 #define SIGNBIT (uint32_t)0x80000000
 #define SIGNBIT64 ((uint64_t)1 << 63)
 
-#define SET_QC() env->vfp.xregs[ARM_VFP_FPSCR] |= CPSR_Q
+#define SET_QC() env->vfp.qc[0] = 1
 
 #define NEON_TYPE1(name, type) \
 typedef struct \
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index 37f338732e..65a18af4e0 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -36,7 +36,7 @@
 #define H4(x)  (x)
 #endif
 
-#define SET_QC() env->vfp.xregs[ARM_VFP_FPSCR] |= CPSR_Q
+#define SET_QC() env->vfp.qc[0] = 1
 
 static void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz)
 {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v3 11/12] target/arm: Use vector operations for saturation
  2019-02-09  3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson
                   ` (9 preceding siblings ...)
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 10/12] target/arm: Split out FPSCR.QC to a vector field Richard Henderson
@ 2019-02-09  3:38 ` Richard Henderson
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 12/12] target/arm: Add missing clear_tail calls Richard Henderson
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2019-02-09  3:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

For same-sign saturation, we have tcg vector operations.  We can
compute the QC bit by comparing the saturated value against the
unsaturated value.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h        |  33 +++++++
 target/arm/translate.h     |   4 +
 target/arm/translate-a64.c |  36 ++++----
 target/arm/translate.c     | 172 +++++++++++++++++++++++++++++++------
 target/arm/vec_helper.c    | 130 ++++++++++++++++++++++++++++
 5 files changed, 331 insertions(+), 44 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 9874c35ea9..923e8e1525 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -641,6 +641,39 @@ DEF_HELPER_FLAGS_6(gvec_fmla_idx_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(gvec_fmla_idx_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(gvec_uqadd_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_uqadd_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_uqadd_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_uqadd_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_sqadd_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_sqadd_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_sqadd_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_sqadd_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_uqsub_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_uqsub_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_uqsub_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_uqsub_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_sqsub_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_sqsub_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_sqsub_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_sqsub_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/translate.h b/target/arm/translate.h
index 17748ddfb9..f25fe75685 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -214,6 +214,10 @@ extern const GVecGen2i ssra_op[4];
 extern const GVecGen2i usra_op[4];
 extern const GVecGen2i sri_op[4];
 extern const GVecGen2i sli_op[4];
+extern const GVecGen4 uqadd_op[4];
+extern const GVecGen4 sqadd_op[4];
+extern const GVecGen4 uqsub_op[4];
+extern const GVecGen4 sqsub_op[4];
 void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 
 /*
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index fd5ceb6613..af8e4fd4be 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -10948,6 +10948,22 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
     }
 
     switch (opcode) {
+    case 0x01: /* SQADD, UQADD */
+        tcg_gen_gvec_4(vec_full_reg_offset(s, rd),
+                       offsetof(CPUARMState, vfp.qc),
+                       vec_full_reg_offset(s, rn),
+                       vec_full_reg_offset(s, rm),
+                       is_q ? 16 : 8, vec_full_reg_size(s),
+                       (u ? uqadd_op : sqadd_op) + size);
+        return;
+    case 0x05: /* SQSUB, UQSUB */
+        tcg_gen_gvec_4(vec_full_reg_offset(s, rd),
+                       offsetof(CPUARMState, vfp.qc),
+                       vec_full_reg_offset(s, rn),
+                       vec_full_reg_offset(s, rm),
+                       is_q ? 16 : 8, vec_full_reg_size(s),
+                       (u ? uqsub_op : sqsub_op) + size);
+        return;
     case 0x0c: /* SMAX, UMAX */
         if (u) {
             gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
@@ -11043,16 +11059,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                 genfn = fns[size][u];
                 break;
             }
-            case 0x1: /* SQADD, UQADD */
-            {
-                static NeonGenTwoOpEnvFn * const fns[3][2] = {
-                    { gen_helper_neon_qadd_s8, gen_helper_neon_qadd_u8 },
-                    { gen_helper_neon_qadd_s16, gen_helper_neon_qadd_u16 },
-                    { gen_helper_neon_qadd_s32, gen_helper_neon_qadd_u32 },
-                };
-                genenvfn = fns[size][u];
-                break;
-            }
             case 0x2: /* SRHADD, URHADD */
             {
                 static NeonGenTwoOpFn * const fns[3][2] = {
@@ -11073,16 +11079,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                 genfn = fns[size][u];
                 break;
             }
-            case 0x5: /* SQSUB, UQSUB */
-            {
-                static NeonGenTwoOpEnvFn * const fns[3][2] = {
-                    { gen_helper_neon_qsub_s8, gen_helper_neon_qsub_u8 },
-                    { gen_helper_neon_qsub_s16, gen_helper_neon_qsub_u16 },
-                    { gen_helper_neon_qsub_s32, gen_helper_neon_qsub_u32 },
-                };
-                genenvfn = fns[size][u];
-                break;
-            }
             case 0x8: /* SSHL, USHL */
             {
                 static NeonGenTwoOpFn * const fns[3][2] = {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 9b426f4271..dac737f6ca 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -6148,6 +6148,142 @@ const GVecGen3 cmtst_op[4] = {
       .vece = MO_64 },
 };
 
+static void gen_uqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
+                          TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec x = tcg_temp_new_vec_matching(t);
+    tcg_gen_add_vec(vece, x, a, b);
+    tcg_gen_usadd_vec(vece, t, a, b);
+    tcg_gen_cmp_vec(TCG_COND_NE, vece, x, x, t);
+    tcg_gen_or_vec(vece, sat, sat, x);
+    tcg_temp_free_vec(x);
+}
+
+const GVecGen4 uqadd_op[4] = {
+    { .fniv = gen_uqadd_vec,
+      .fno = gen_helper_gvec_uqadd_b,
+      .opc = INDEX_op_usadd_vec,
+      .write_aofs = true,
+      .vece = MO_8 },
+    { .fniv = gen_uqadd_vec,
+      .fno = gen_helper_gvec_uqadd_h,
+      .opc = INDEX_op_usadd_vec,
+      .write_aofs = true,
+      .vece = MO_16 },
+    { .fniv = gen_uqadd_vec,
+      .fno = gen_helper_gvec_uqadd_s,
+      .opc = INDEX_op_usadd_vec,
+      .write_aofs = true,
+      .vece = MO_32 },
+    { .fniv = gen_uqadd_vec,
+      .fno = gen_helper_gvec_uqadd_d,
+      .opc = INDEX_op_usadd_vec,
+      .write_aofs = true,
+      .vece = MO_64 },
+};
+
+static void gen_sqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
+                          TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec x = tcg_temp_new_vec_matching(t);
+    tcg_gen_add_vec(vece, x, a, b);
+    tcg_gen_ssadd_vec(vece, t, a, b);
+    tcg_gen_cmp_vec(TCG_COND_NE, vece, x, x, t);
+    tcg_gen_or_vec(vece, sat, sat, x);
+    tcg_temp_free_vec(x);
+}
+
+const GVecGen4 sqadd_op[4] = {
+    { .fniv = gen_sqadd_vec,
+      .fno = gen_helper_gvec_sqadd_b,
+      .opc = INDEX_op_ssadd_vec,
+      .write_aofs = true,
+      .vece = MO_8 },
+    { .fniv = gen_sqadd_vec,
+      .fno = gen_helper_gvec_sqadd_h,
+      .opc = INDEX_op_ssadd_vec,
+      .write_aofs = true,
+      .vece = MO_16 },
+    { .fniv = gen_sqadd_vec,
+      .fno = gen_helper_gvec_sqadd_s,
+      .opc = INDEX_op_ssadd_vec,
+      .write_aofs = true,
+      .vece = MO_32 },
+    { .fniv = gen_sqadd_vec,
+      .fno = gen_helper_gvec_sqadd_d,
+      .opc = INDEX_op_ssadd_vec,
+      .write_aofs = true,
+      .vece = MO_64 },
+};
+
+static void gen_uqsub_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
+                          TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec x = tcg_temp_new_vec_matching(t);
+    tcg_gen_sub_vec(vece, x, a, b);
+    tcg_gen_ussub_vec(vece, t, a, b);
+    tcg_gen_cmp_vec(TCG_COND_NE, vece, x, x, t);
+    tcg_gen_or_vec(vece, sat, sat, x);
+    tcg_temp_free_vec(x);
+}
+
+const GVecGen4 uqsub_op[4] = {
+    { .fniv = gen_uqsub_vec,
+      .fno = gen_helper_gvec_uqsub_b,
+      .opc = INDEX_op_ussub_vec,
+      .write_aofs = true,
+      .vece = MO_8 },
+    { .fniv = gen_uqsub_vec,
+      .fno = gen_helper_gvec_uqsub_h,
+      .opc = INDEX_op_ussub_vec,
+      .write_aofs = true,
+      .vece = MO_16 },
+    { .fniv = gen_uqsub_vec,
+      .fno = gen_helper_gvec_uqsub_s,
+      .opc = INDEX_op_ussub_vec,
+      .write_aofs = true,
+      .vece = MO_32 },
+    { .fniv = gen_uqsub_vec,
+      .fno = gen_helper_gvec_uqsub_d,
+      .opc = INDEX_op_ussub_vec,
+      .write_aofs = true,
+      .vece = MO_64 },
+};
+
+static void gen_sqsub_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
+                          TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec x = tcg_temp_new_vec_matching(t);
+    tcg_gen_sub_vec(vece, x, a, b);
+    tcg_gen_sssub_vec(vece, t, a, b);
+    tcg_gen_cmp_vec(TCG_COND_NE, vece, x, x, t);
+    tcg_gen_or_vec(vece, sat, sat, x);
+    tcg_temp_free_vec(x);
+}
+
+const GVecGen4 sqsub_op[4] = {
+    { .fniv = gen_sqsub_vec,
+      .fno = gen_helper_gvec_sqsub_b,
+      .opc = INDEX_op_sssub_vec,
+      .write_aofs = true,
+      .vece = MO_8 },
+    { .fniv = gen_sqsub_vec,
+      .fno = gen_helper_gvec_sqsub_h,
+      .opc = INDEX_op_sssub_vec,
+      .write_aofs = true,
+      .vece = MO_16 },
+    { .fniv = gen_sqsub_vec,
+      .fno = gen_helper_gvec_sqsub_s,
+      .opc = INDEX_op_sssub_vec,
+      .write_aofs = true,
+      .vece = MO_32 },
+    { .fniv = gen_sqsub_vec,
+      .fno = gen_helper_gvec_sqsub_d,
+      .opc = INDEX_op_sssub_vec,
+      .write_aofs = true,
+      .vece = MO_64 },
+};
+
 /* Translate a NEON data processing instruction.  Return nonzero if the
    instruction is invalid.
    We process data in a mixture of 32-bit and 64-bit chunks.
@@ -6331,6 +6467,18 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             }
             return 0;
 
+        case NEON_3R_VQADD:
+            tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
+                           rn_ofs, rm_ofs, vec_size, vec_size,
+                           (u ? uqadd_op : sqadd_op) + size);
+            break;
+
+        case NEON_3R_VQSUB:
+            tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
+                           rn_ofs, rm_ofs, vec_size, vec_size,
+                           (u ? uqsub_op : sqsub_op) + size);
+            break;
+
         case NEON_3R_VMUL: /* VMUL */
             if (u) {
                 /* Polynomial case allows only P8 and is handled below.  */
@@ -6395,24 +6543,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 neon_load_reg64(cpu_V0, rn + pass);
                 neon_load_reg64(cpu_V1, rm + pass);
                 switch (op) {
-                case NEON_3R_VQADD:
-                    if (u) {
-                        gen_helper_neon_qadd_u64(cpu_V0, cpu_env,
-                                                 cpu_V0, cpu_V1);
-                    } else {
-                        gen_helper_neon_qadd_s64(cpu_V0, cpu_env,
-                                                 cpu_V0, cpu_V1);
-                    }
-                    break;
-                case NEON_3R_VQSUB:
-                    if (u) {
-                        gen_helper_neon_qsub_u64(cpu_V0, cpu_env,
-                                                 cpu_V0, cpu_V1);
-                    } else {
-                        gen_helper_neon_qsub_s64(cpu_V0, cpu_env,
-                                                 cpu_V0, cpu_V1);
-                    }
-                    break;
                 case NEON_3R_VSHL:
                     if (u) {
                         gen_helper_neon_shl_u64(cpu_V0, cpu_V1, cpu_V0);
@@ -6528,18 +6658,12 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VHADD:
             GEN_NEON_INTEGER_OP(hadd);
             break;
-        case NEON_3R_VQADD:
-            GEN_NEON_INTEGER_OP_ENV(qadd);
-            break;
         case NEON_3R_VRHADD:
             GEN_NEON_INTEGER_OP(rhadd);
             break;
         case NEON_3R_VHSUB:
             GEN_NEON_INTEGER_OP(hsub);
             break;
-        case NEON_3R_VQSUB:
-            GEN_NEON_INTEGER_OP_ENV(qsub);
-            break;
         case NEON_3R_VSHL:
             GEN_NEON_INTEGER_OP(shl);
             break;
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index 65a18af4e0..10f17e4b5c 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -766,3 +766,133 @@ DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4)
 DO_FMLA_IDX(gvec_fmla_idx_d, float64, )
 
 #undef DO_FMLA_IDX
+
+#define DO_SAT(NAME, WTYPE, TYPEN, TYPEM, OP, MIN, MAX) \
+void HELPER(NAME)(void *vd, void *vq, void *vn, void *vm, uint32_t desc)   \
+{                                                                          \
+    intptr_t i, oprsz = simd_oprsz(desc);                                  \
+    TYPEN *d = vd, *n = vn; TYPEM *m = vm;                                 \
+    bool q = false;                                                        \
+    for (i = 0; i < oprsz / sizeof(TYPEN); i++) {                          \
+        WTYPE dd = (WTYPE)n[i] OP m[i];                                    \
+        if (dd < MIN) {                                                    \
+            dd = MIN;                                                      \
+            q = true;                                                      \
+        } else if (dd > MAX) {                                             \
+            dd = MAX;                                                      \
+            q = true;                                                      \
+        }                                                                  \
+        d[i] = dd;                                                         \
+    }                                                                      \
+    if (q) {                                                               \
+        uint32_t *qc = vq;                                                 \
+        qc[0] = 1;                                                         \
+    }                                                                      \
+    clear_tail(d, oprsz, simd_maxsz(desc));                                \
+}
+
+DO_SAT(gvec_uqadd_b, int, uint8_t, uint8_t, +, 0, UINT8_MAX)
+DO_SAT(gvec_uqadd_h, int, uint16_t, uint16_t, +, 0, UINT16_MAX)
+DO_SAT(gvec_uqadd_s, int64_t, uint32_t, uint32_t, +, 0, UINT32_MAX)
+
+DO_SAT(gvec_sqadd_b, int, int8_t, int8_t, +, INT8_MIN, INT8_MAX)
+DO_SAT(gvec_sqadd_h, int, int16_t, int16_t, +, INT16_MIN, INT16_MAX)
+DO_SAT(gvec_sqadd_s, int64_t, int32_t, int32_t, +, INT32_MIN, INT32_MAX)
+
+DO_SAT(gvec_uqsub_b, int, uint8_t, uint8_t, -, 0, UINT8_MAX)
+DO_SAT(gvec_uqsub_h, int, uint16_t, uint16_t, -, 0, UINT16_MAX)
+DO_SAT(gvec_uqsub_s, int64_t, uint32_t, uint32_t, -, 0, UINT32_MAX)
+
+DO_SAT(gvec_sqsub_b, int, int8_t, int8_t, -, INT8_MIN, INT8_MAX)
+DO_SAT(gvec_sqsub_h, int, int16_t, int16_t, -, INT16_MIN, INT16_MAX)
+DO_SAT(gvec_sqsub_s, int64_t, int32_t, int32_t, -, INT32_MIN, INT32_MAX)
+
+#undef DO_SAT
+
+void HELPER(gvec_uqadd_d)(void *vd, void *vq, void *vn,
+                          void *vm, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc);
+    uint64_t *d = vd, *n = vn, *m = vm;
+    bool q = false;
+
+    for (i = 0; i < oprsz / 8; i++) {
+        uint64_t nn = n[i], mm = m[i], dd = nn + mm;
+        if (dd < nn) {
+            dd = UINT64_MAX;
+            q = true;
+        }
+        d[i] = dd;
+    }
+    if (q) {
+        uint32_t *qc = vq;
+        qc[0] = 1;
+    }
+    clear_tail(d, oprsz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_uqsub_d)(void *vd, void *vq, void *vn,
+                          void *vm, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc);
+    uint64_t *d = vd, *n = vn, *m = vm;
+    bool q = false;
+
+    for (i = 0; i < oprsz / 8; i++) {
+        uint64_t nn = n[i], mm = m[i], dd = nn - mm;
+        if (nn < mm) {
+            dd = 0;
+            q = true;
+        }
+        d[i] = dd;
+    }
+    if (q) {
+        uint32_t *qc = vq;
+        qc[0] = 1;
+    }
+    clear_tail(d, oprsz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_sqadd_d)(void *vd, void *vq, void *vn,
+                          void *vm, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc);
+    int64_t *d = vd, *n = vn, *m = vm;
+    bool q = false;
+
+    for (i = 0; i < oprsz / 8; i++) {
+        int64_t nn = n[i], mm = m[i], dd = nn + mm;
+        if (((dd ^ nn) & ~(nn ^ mm)) & INT64_MIN) {
+            dd = (nn >> 63) ^ ~INT64_MIN;
+            q = true;
+        }
+        d[i] = dd;
+    }
+    if (q) {
+        uint32_t *qc = vq;
+        qc[0] = 1;
+    }
+    clear_tail(d, oprsz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_sqsub_d)(void *vd, void *vq, void *vn,
+                          void *vm, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc);
+    int64_t *d = vd, *n = vn, *m = vm;
+    bool q = false;
+
+    for (i = 0; i < oprsz / 8; i++) {
+        int64_t nn = n[i], mm = m[i], dd = nn - mm;
+        if (((dd ^ nn) & (nn ^ mm)) & INT64_MIN) {
+            dd = (nn >> 63) ^ ~INT64_MIN;
+            q = true;
+        }
+        d[i] = dd;
+    }
+    if (q) {
+        uint32_t *qc = vq;
+        qc[0] = 1;
+    }
+    clear_tail(d, oprsz, simd_maxsz(desc));
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v3 12/12] target/arm: Add missing clear_tail calls
  2019-02-09  3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson
                   ` (10 preceding siblings ...)
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 11/12] target/arm: Use vector operations for saturation Richard Henderson
@ 2019-02-09  3:38 ` Richard Henderson
  2019-02-09  3:56 ` [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups no-reply
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2019-02-09  3:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Fortunately, the functions affected are so far only called from SVE,
so there is no tail to be cleared.  But as we convert more of AdvSIMD
to gvec, this will matter.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/vec_helper.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index 10f17e4b5c..dfc635cf9a 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -638,6 +638,7 @@ void HELPER(NAME)(void *vd, void *vn, void *stat, uint32_t desc)  \
     for (i = 0; i < oprsz / sizeof(TYPE); i++) {                  \
         d[i] = FUNC(n[i], stat);                                  \
     }                                                             \
+    clear_tail(d, oprsz, simd_maxsz(desc));                       \
 }
 
 DO_2OP(gvec_frecpe_h, helper_recpe_f16, float16)
@@ -688,6 +689,7 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
     for (i = 0; i < oprsz / sizeof(TYPE); i++) {                           \
         d[i] = FUNC(n[i], m[i], stat);                                     \
     }                                                                      \
+    clear_tail(d, oprsz, simd_maxsz(desc));                                \
 }
 
 DO_3OP(gvec_fadd_h, float16_add, float16)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups
  2019-02-09  3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson
                   ` (11 preceding siblings ...)
  2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 12/12] target/arm: Add missing clear_tail calls Richard Henderson
@ 2019-02-09  3:56 ` no-reply
  2019-02-09  4:00 ` no-reply
  2019-02-14 16:12 ` Peter Maydell
  14 siblings, 0 replies; 16+ messages in thread
From: no-reply @ 2019-02-09  3:56 UTC (permalink / raw)
  To: richard.henderson; +Cc: fam, qemu-devel, peter.maydell

Patchew URL: https://patchew.org/QEMU/20190209033847.9014-1-richard.henderson@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups
Type: series
Message-id: 20190209033847.9014-1-richard.henderson@linaro.org

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]               patchew/20190209033847.9014-1-richard.henderson@linaro.org -> patchew/20190209033847.9014-1-richard.henderson@linaro.org
Switched to a new branch 'test'
14ab3e4b2d target/arm: Add missing clear_tail calls
6e1ff3edb0 target/arm: Use vector operations for saturation
0e4e921a93 target/arm: Split out FPSCR.QC to a vector field
40b552d504 target/arm: Fix set of bits kept in xregs[ARM_VFP_FPSCR]
3ee1da71e1 target/arm: Split out flags setting from vfp compares
41e8694369 target/arm: Fix arm_cpu_dump_state vs FPSCR
23628c03ad target/arm: Fix vfp_gdb_get/set_reg vs FPSCR
780dc13c0e target/arm: Remove neon min/max helpers
fc636d7cdc target/arm: Use tcg integer min/max primitives for neon
8ff6492f62 target/arm: Use vector minmax expanders for aarch32
d86eaa3793 target/arm: Use vector minmax expanders for aarch64
fc0b82c4ab target/arm: Rely on optimization within tcg_gen_gvec_or

=== OUTPUT BEGIN ===
1/12 Checking commit fc0b82c4ab74 (target/arm: Rely on optimization within tcg_gen_gvec_or)
2/12 Checking commit d86eaa379343 (target/arm: Use vector minmax expanders for aarch64)
3/12 Checking commit 8ff6492f628b (target/arm: Use vector minmax expanders for aarch32)
4/12 Checking commit fc636d7cdc94 (target/arm: Use tcg integer min/max primitives for neon)
5/12 Checking commit 780dc13c0ef9 (target/arm: Remove neon min/max helpers)
6/12 Checking commit 23628c03ad1b (target/arm: Fix vfp_gdb_get/set_reg vs FPSCR)
ERROR: trailing statements should be on next line
#22: FILE: target/arm/helper.c:84:
+    case 1: stl_p(buf, vfp_get_fpscr(env)); return 4;

ERROR: trailing statements should be on next line
#31: FILE: target/arm/helper.c:110:
+    case 1: vfp_set_fpscr(env, ldl_p(buf)); return 4;

total: 2 errors, 0 warnings, 16 lines checked

Patch 6/12 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

7/12 Checking commit 41e8694369f0 (target/arm: Fix arm_cpu_dump_state vs FPSCR)
8/12 Checking commit 3ee1da71e10e (target/arm: Split out flags setting from vfp compares)
9/12 Checking commit 40b552d5049b (target/arm: Fix set of bits kept in xregs[ARM_VFP_FPSCR])
10/12 Checking commit 0e4e921a93e3 (target/arm: Split out FPSCR.QC to a vector field)
11/12 Checking commit 6e1ff3edb001 (target/arm: Use vector operations for saturation)
ERROR: spaces required around that '*' (ctx:WxV)
#357: FILE: target/arm/vec_helper.c:774:
+    TYPEN *d = vd, *n = vn; TYPEM *m = vm;                                 \
                                   ^

total: 1 errors, 0 warnings, 438 lines checked

Patch 11/12 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

12/12 Checking commit 14ab3e4b2dd0 (target/arm: Add missing clear_tail calls)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190209033847.9014-1-richard.henderson@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups
  2019-02-09  3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson
                   ` (12 preceding siblings ...)
  2019-02-09  3:56 ` [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups no-reply
@ 2019-02-09  4:00 ` no-reply
  2019-02-14 16:12 ` Peter Maydell
  14 siblings, 0 replies; 16+ messages in thread
From: no-reply @ 2019-02-09  4:00 UTC (permalink / raw)
  To: richard.henderson; +Cc: fam, qemu-devel, peter.maydell

Patchew URL: https://patchew.org/QEMU/20190209033847.9014-1-richard.henderson@linaro.org/

Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups
Message-id: 20190209033847.9014-1-richard.henderson@linaro.org
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/20190209033847.9014-1-richard.henderson@linaro.org -> patchew/20190209033847.9014-1-richard.henderson@linaro.org
Submodule 'capstone' (https://git.qemu.org/git/capstone.git) registered for path 'capstone'
Submodule 'dtc' (https://git.qemu.org/git/dtc.git) registered for path 'dtc'
Submodule 'roms/QemuMacDrivers' (https://git.qemu.org/git/QemuMacDrivers.git) registered for path 'roms/QemuMacDrivers'
Submodule 'roms/SLOF' (https://git.qemu.org/git/SLOF.git) registered for path 'roms/SLOF'
Submodule 'roms/ipxe' (https://git.qemu.org/git/ipxe.git) registered for path 'roms/ipxe'
Submodule 'roms/openbios' (https://git.qemu.org/git/openbios.git) registered for path 'roms/openbios'
Submodule 'roms/openhackware' (https://git.qemu.org/git/openhackware.git) registered for path 'roms/openhackware'
Submodule 'roms/qemu-palcode' (https://git.qemu.org/git/qemu-palcode.git) registered for path 'roms/qemu-palcode'
Submodule 'roms/seabios' (https://git.qemu.org/git/seabios.git/) registered for path 'roms/seabios'
Submodule 'roms/seabios-hppa' (https://github.com/hdeller/seabios-hppa.git) registered for path 'roms/seabios-hppa'
Submodule 'roms/sgabios' (https://git.qemu.org/git/sgabios.git) registered for path 'roms/sgabios'
Submodule 'roms/skiboot' (https://git.qemu.org/git/skiboot.git) registered for path 'roms/skiboot'
Submodule 'roms/u-boot' (https://git.qemu.org/git/u-boot.git) registered for path 'roms/u-boot'
Submodule 'roms/u-boot-sam460ex' (https://git.qemu.org/git/u-boot-sam460ex.git) registered for path 'roms/u-boot-sam460ex'
Submodule 'tests/fp/berkeley-softfloat-3' (https://github.com/cota/berkeley-softfloat-3) registered for path 'tests/fp/berkeley-softfloat-3'
Submodule 'tests/fp/berkeley-testfloat-3' (https://github.com/cota/berkeley-testfloat-3) registered for path 'tests/fp/berkeley-testfloat-3'
Submodule 'ui/keycodemapdb' (https://git.qemu.org/git/keycodemapdb.git) registered for path 'ui/keycodemapdb'
Cloning into 'capstone'...
Submodule path 'capstone': checked out '22ead3e0bfdb87516656453336160e0a37b066bf'
Cloning into 'dtc'...
Submodule path 'dtc': checked out '88f18909db731a627456f26d779445f84e449536'
Cloning into 'roms/QemuMacDrivers'...
Submodule path 'roms/QemuMacDrivers': checked out '90c488d5f4a407342247b9ea869df1c2d9c8e266'
Cloning into 'roms/SLOF'...
Submodule path 'roms/SLOF': checked out 'a5b428e1c1eae703bdd62a3f527223c291ee3fdc'
Cloning into 'roms/ipxe'...
Submodule path 'roms/ipxe': checked out 'de4565cbe76ea9f7913a01f331be3ee901bb6e17'
Cloning into 'roms/openbios'...
Submodule path 'roms/openbios': checked out '441a84d3a642a10b948369c63f32367e8ff6395b'
Cloning into 'roms/openhackware'...
Submodule path 'roms/openhackware': checked out 'c559da7c8eec5e45ef1f67978827af6f0b9546f5'
Cloning into 'roms/qemu-palcode'...
Submodule path 'roms/qemu-palcode': checked out '51c237d7e20d05100eacadee2f61abc17e6bc097'
Cloning into 'roms/seabios'...
Submodule path 'roms/seabios': checked out 'a698c8995ffb2838296ec284fe3c4ad33dfca307'
Cloning into 'roms/seabios-hppa'...
Submodule path 'roms/seabios-hppa': checked out '1ef99a01572c2581c30e16e6fe69e9ea2ef92ce0'
Cloning into 'roms/sgabios'...
Submodule path 'roms/sgabios': checked out 'cbaee52287e5f32373181cff50a00b6c4ac9015a'
Cloning into 'roms/skiboot'...
Submodule path 'roms/skiboot': checked out 'e0ee24c27a172bcf482f6f2bc905e6211c134bcc'
Cloning into 'roms/u-boot'...
Submodule path 'roms/u-boot': checked out 'd85ca029f257b53a96da6c2fb421e78a003a9943'
Cloning into 'roms/u-boot-sam460ex'...
Submodule path 'roms/u-boot-sam460ex': checked out '60b3916f33e617a815973c5a6df77055b2e3a588'
Cloning into 'tests/fp/berkeley-softfloat-3'...
Submodule path 'tests/fp/berkeley-softfloat-3': checked out 'b64af41c3276f97f0e181920400ee056b9c88037'
Cloning into 'tests/fp/berkeley-testfloat-3'...
Submodule path 'tests/fp/berkeley-testfloat-3': checked out '5a59dcec19327396a011a17fd924aed4fec416b3'
Cloning into 'ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out '6b3d716e2b6472eb7189d3220552280ef3d832ce'
Switched to a new branch 'test'
14ab3e4 target/arm: Add missing clear_tail calls
6e1ff3e target/arm: Use vector operations for saturation
0e4e921 target/arm: Split out FPSCR.QC to a vector field
40b552d target/arm: Fix set of bits kept in xregs[ARM_VFP_FPSCR]
3ee1da7 target/arm: Split out flags setting from vfp compares
41e8694 target/arm: Fix arm_cpu_dump_state vs FPSCR
23628c0 target/arm: Fix vfp_gdb_get/set_reg vs FPSCR
780dc13 target/arm: Remove neon min/max helpers
fc636d7 target/arm: Use tcg integer min/max primitives for neon
8ff6492 target/arm: Use vector minmax expanders for aarch32
d86eaa3 target/arm: Use vector minmax expanders for aarch64
fc0b82c target/arm: Rely on optimization within tcg_gen_gvec_or

=== OUTPUT BEGIN ===
1/12 Checking commit fc0b82c4ab74 (target/arm: Rely on optimization within tcg_gen_gvec_or)
2/12 Checking commit d86eaa379343 (target/arm: Use vector minmax expanders for aarch64)
3/12 Checking commit 8ff6492f628b (target/arm: Use vector minmax expanders for aarch32)
4/12 Checking commit fc636d7cdc94 (target/arm: Use tcg integer min/max primitives for neon)
5/12 Checking commit 780dc13c0ef9 (target/arm: Remove neon min/max helpers)
6/12 Checking commit 23628c03ad1b (target/arm: Fix vfp_gdb_get/set_reg vs FPSCR)
ERROR: trailing statements should be on next line
#22: FILE: target/arm/helper.c:84:
+    case 1: stl_p(buf, vfp_get_fpscr(env)); return 4;

ERROR: trailing statements should be on next line
#31: FILE: target/arm/helper.c:110:
+    case 1: vfp_set_fpscr(env, ldl_p(buf)); return 4;

total: 2 errors, 0 warnings, 16 lines checked

Patch 6/12 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

7/12 Checking commit 41e8694369f0 (target/arm: Fix arm_cpu_dump_state vs FPSCR)
8/12 Checking commit 3ee1da71e10e (target/arm: Split out flags setting from vfp compares)
9/12 Checking commit 40b552d5049b (target/arm: Fix set of bits kept in xregs[ARM_VFP_FPSCR])
10/12 Checking commit 0e4e921a93e3 (target/arm: Split out FPSCR.QC to a vector field)
11/12 Checking commit 6e1ff3edb001 (target/arm: Use vector operations for saturation)
ERROR: spaces required around that '*' (ctx:WxV)
#357: FILE: target/arm/vec_helper.c:774:
+    TYPEN *d = vd, *n = vn; TYPEM *m = vm;                                 \
                                   ^

total: 1 errors, 0 warnings, 438 lines checked

Patch 11/12 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

12/12 Checking commit 14ab3e4b2dd0 (target/arm: Add missing clear_tail calls)
=== OUTPUT END ===

Test command exited with code: 1

The full log is available at
http://patchew.org/logs/20190209033847.9014-1-richard.henderson@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups
  2019-02-09  3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson
                   ` (13 preceding siblings ...)
  2019-02-09  4:00 ` no-reply
@ 2019-02-14 16:12 ` Peter Maydell
  14 siblings, 0 replies; 16+ messages in thread
From: Peter Maydell @ 2019-02-14 16:12 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Sat, 9 Feb 2019 at 03:38, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Changes since v2:
>   * Fix some representational issues with FPSCR.
>   * Use host vector saturation for SQADD/UQADD.
>     This requires changing the internal representation of FPSR.QC.
>   * Fix a latent vector bug, noticed during the rest.
>
> Correct RISU results depend on Mark C-A's patch from today,
> "tcg/i386: fix unsigned vector saturating arithmetic",
> which will be in my next tcg pull.
>

Applied to target-arm.next, thanks. (That tcg fix is in
master now.)

-- PMM

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-02-14 16:12 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-09  3:38 [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups Richard Henderson
2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 01/12] target/arm: Rely on optimization within tcg_gen_gvec_or Richard Henderson
2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 02/12] target/arm: Use vector minmax expanders for aarch64 Richard Henderson
2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 03/12] target/arm: Use vector minmax expanders for aarch32 Richard Henderson
2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 04/12] target/arm: Use tcg integer min/max primitives for neon Richard Henderson
2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 05/12] target/arm: Remove neon min/max helpers Richard Henderson
2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 06/12] target/arm: Fix vfp_gdb_get/set_reg vs FPSCR Richard Henderson
2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 07/12] target/arm: Fix arm_cpu_dump_state " Richard Henderson
2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 08/12] target/arm: Split out flags setting from vfp compares Richard Henderson
2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 09/12] target/arm: Fix set of bits kept in xregs[ARM_VFP_FPSCR] Richard Henderson
2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 10/12] target/arm: Split out FPSCR.QC to a vector field Richard Henderson
2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 11/12] target/arm: Use vector operations for saturation Richard Henderson
2019-02-09  3:38 ` [Qemu-devel] [PATCH v3 12/12] target/arm: Add missing clear_tail calls Richard Henderson
2019-02-09  3:56 ` [Qemu-devel] [PATCH v3 00/12] target/arm: tcg vector cleanups no-reply
2019-02-09  4:00 ` no-reply
2019-02-14 16:12 ` Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.