All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/20] target/arm: SVE2 preparatory patches
@ 2020-08-15  1:31 Richard Henderson
  2020-08-15  1:31 ` [PATCH 01/20] qemu/int128: Add int128_lshift Richard Henderson
                   ` (21 more replies)
  0 siblings, 22 replies; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

This is collection of cleanups and changes that are required by
SVE2, but do not directly implement it.  The final 3 patches are
relevant to Peter's aa32 neon work.


r~


Richard Henderson (20):
  qemu/int128: Add int128_lshift
  target/arm: Split out gen_gvec_fn_zz
  target/arm: Split out gen_gvec_fn_zzz, do_zzz_fn
  target/arm: Rearrange {sve,fp}_check_access assert
  target/arm: Merge do_vector2_p into do_mov_p
  target/arm: Clean up 4-operand predicate expansion
  target/arm: Use tcg_gen_gvec_bitsel for trans_SEL_pppp
  target/arm: Split out gen_gvec_ool_zzzp
  target/arm: Merge helper_sve_clr_* and helper_sve_movz_*
  target/arm: Split out gen_gvec_ool_zzp
  target/arm: Split out gen_gvec_ool_zzz
  target/arm: Split out gen_gvec_ool_zz
  target/arm: Tidy SVE tszimm shift formats
  target/arm: Generalize inl_qrdmlah_* helper functions
  target/arm: Fix sve_uzp_p vs odd vector lengths
  target/arm: Fix sve_zip_p vs odd vector lengths
  target/arm: Fix sve_punpk_p vs odd vector lengths
  target/arm: Convert integer multiply (indexed) to gvec for aa64
    advsimd
  target/arm: Convert integer multiply-add (indexed) to gvec for aa64
    advsimd
  target/arm: Convert sq{,r}dmulh to gvec for aa64 advsimd

 include/qemu/int128.h      |  16 ++
 target/arm/helper-sve.h    |   5 -
 target/arm/helper.h        |  28 +++
 target/arm/translate.h     |   1 +
 target/arm/sve.decode      |  35 ++--
 target/arm/sve_helper.c    | 129 +++++-------
 target/arm/translate-a64.c | 110 ++++++++--
 target/arm/translate-sve.c | 399 +++++++++++++++----------------------
 target/arm/vec_helper.c    | 182 ++++++++++++-----
 9 files changed, 492 insertions(+), 413 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 01/20] qemu/int128: Add int128_lshift
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
@ 2020-08-15  1:31 ` Richard Henderson
  2020-08-24 16:40   ` Peter Maydell
  2020-08-15  1:31 ` [PATCH 02/20] target/arm: Split out gen_gvec_fn_zz Richard Henderson
                   ` (20 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Add left-shift to match the existing right-shift.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/int128.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 5c9890db8b..76ea405922 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -63,6 +63,11 @@ static inline Int128 int128_rshift(Int128 a, int n)
     return a >> n;
 }
 
+static inline Int128 int128_lshift(Int128 a, int n)
+{
+    return a << n;
+}
+
 static inline Int128 int128_add(Int128 a, Int128 b)
 {
     return a + b;
@@ -217,6 +222,17 @@ static inline Int128 int128_rshift(Int128 a, int n)
     }
 }
 
+static inline Int128 int128_lshift(Int128 a, int n)
+{
+    uint64_t l = a.lo << (n & 63);
+    if (n >= 64) {
+        return int128_make128(0, l);
+    } else if (n > 0) {
+        return int128_make128(l, (a.hi << n) | (a.lo >> (64 - n)));
+    }
+    return a;
+}
+
 static inline Int128 int128_add(Int128 a, Int128 b)
 {
     uint64_t lo = a.lo + b.lo;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 02/20] target/arm: Split out gen_gvec_fn_zz
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
  2020-08-15  1:31 ` [PATCH 01/20] qemu/int128: Add int128_lshift Richard Henderson
@ 2020-08-15  1:31 ` Richard Henderson
  2020-08-24 16:40   ` Peter Maydell
  2020-08-15  1:31 ` [PATCH 03/20] target/arm: Split out gen_gvec_fn_zzz, do_zzz_fn Richard Henderson
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Model the new function on gen_gvec_fn2 in translate-a64.c, but
indicating which kind of register and in which order.  Since there
is only one user of do_vector2_z, fold it into do_mov_z.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 88a2fb271d..28e27c55b5 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -143,15 +143,13 @@ static int pred_gvec_reg_size(DisasContext *s)
 }
 
 /* Invoke a vector expander on two Zregs.  */
-static bool do_vector2_z(DisasContext *s, GVecGen2Fn *gvec_fn,
-                         int esz, int rd, int rn)
+
+static void gen_gvec_fn_zz(DisasContext *s, GVecGen2Fn *gvec_fn,
+                           int esz, int rd, int rn)
 {
-    if (sve_access_check(s)) {
-        unsigned vsz = vec_full_reg_size(s);
-        gvec_fn(esz, vec_full_reg_offset(s, rd),
-                vec_full_reg_offset(s, rn), vsz, vsz);
-    }
-    return true;
+    unsigned vsz = vec_full_reg_size(s);
+    gvec_fn(esz, vec_full_reg_offset(s, rd),
+            vec_full_reg_offset(s, rn), vsz, vsz);
 }
 
 /* Invoke a vector expander on three Zregs.  */
@@ -170,7 +168,10 @@ static bool do_vector3_z(DisasContext *s, GVecGen3Fn *gvec_fn,
 /* Invoke a vector move on two Zregs.  */
 static bool do_mov_z(DisasContext *s, int rd, int rn)
 {
-    return do_vector2_z(s, tcg_gen_gvec_mov, 0, rd, rn);
+    if (sve_access_check(s)) {
+        gen_gvec_fn_zz(s, tcg_gen_gvec_mov, MO_8, rd, rn);
+    }
+    return true;
 }
 
 /* Initialize a Zreg with replications of a 64-bit immediate.  */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 03/20] target/arm: Split out gen_gvec_fn_zzz, do_zzz_fn
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
  2020-08-15  1:31 ` [PATCH 01/20] qemu/int128: Add int128_lshift Richard Henderson
  2020-08-15  1:31 ` [PATCH 02/20] target/arm: Split out gen_gvec_fn_zz Richard Henderson
@ 2020-08-15  1:31 ` Richard Henderson
  2020-08-24 16:40   ` Peter Maydell
  2020-08-15  1:31 ` [PATCH 04/20] target/arm: Rearrange {sve,fp}_check_access assert Richard Henderson
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Model gen_gvec_fn_zzz on gen_gvec_fn3 in translate-a64.c, but
indicating which kind of register and in which order.

Model do_zzz_fn on the other do_foo functions that take an
argument set and verify sve enabled.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 43 +++++++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 28e27c55b5..b0fa38db1c 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -153,16 +153,13 @@ static void gen_gvec_fn_zz(DisasContext *s, GVecGen2Fn *gvec_fn,
 }
 
 /* Invoke a vector expander on three Zregs.  */
-static bool do_vector3_z(DisasContext *s, GVecGen3Fn *gvec_fn,
-                         int esz, int rd, int rn, int rm)
+static void gen_gvec_fn_zzz(DisasContext *s, GVecGen3Fn *gvec_fn,
+                            int esz, int rd, int rn, int rm)
 {
-    if (sve_access_check(s)) {
-        unsigned vsz = vec_full_reg_size(s);
-        gvec_fn(esz, vec_full_reg_offset(s, rd),
-                vec_full_reg_offset(s, rn),
-                vec_full_reg_offset(s, rm), vsz, vsz);
-    }
-    return true;
+    unsigned vsz = vec_full_reg_size(s);
+    gvec_fn(esz, vec_full_reg_offset(s, rd),
+            vec_full_reg_offset(s, rn),
+            vec_full_reg_offset(s, rm), vsz, vsz);
 }
 
 /* Invoke a vector move on two Zregs.  */
@@ -274,24 +271,32 @@ const uint64_t pred_esz_masks[4] = {
  *** SVE Logical - Unpredicated Group
  */
 
+static bool do_zzz_fn(DisasContext *s, arg_rrr_esz *a, GVecGen3Fn *gvec_fn)
+{
+    if (sve_access_check(s)) {
+        gen_gvec_fn_zzz(s, gvec_fn, a->esz, a->rd, a->rn, a->rm);
+    }
+    return true;
+}
+
 static bool trans_AND_zzz(DisasContext *s, arg_rrr_esz *a)
 {
-    return do_vector3_z(s, tcg_gen_gvec_and, 0, a->rd, a->rn, a->rm);
+    return do_zzz_fn(s, a, tcg_gen_gvec_and);
 }
 
 static bool trans_ORR_zzz(DisasContext *s, arg_rrr_esz *a)
 {
-    return do_vector3_z(s, tcg_gen_gvec_or, 0, a->rd, a->rn, a->rm);
+    return do_zzz_fn(s, a, tcg_gen_gvec_or);
 }
 
 static bool trans_EOR_zzz(DisasContext *s, arg_rrr_esz *a)
 {
-    return do_vector3_z(s, tcg_gen_gvec_xor, 0, a->rd, a->rn, a->rm);
+    return do_zzz_fn(s, a, tcg_gen_gvec_xor);
 }
 
 static bool trans_BIC_zzz(DisasContext *s, arg_rrr_esz *a)
 {
-    return do_vector3_z(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm);
+    return do_zzz_fn(s, a, tcg_gen_gvec_andc);
 }
 
 /*
@@ -300,32 +305,32 @@ static bool trans_BIC_zzz(DisasContext *s, arg_rrr_esz *a)
 
 static bool trans_ADD_zzz(DisasContext *s, arg_rrr_esz *a)
 {
-    return do_vector3_z(s, tcg_gen_gvec_add, a->esz, a->rd, a->rn, a->rm);
+    return do_zzz_fn(s, a, tcg_gen_gvec_add);
 }
 
 static bool trans_SUB_zzz(DisasContext *s, arg_rrr_esz *a)
 {
-    return do_vector3_z(s, tcg_gen_gvec_sub, a->esz, a->rd, a->rn, a->rm);
+    return do_zzz_fn(s, a, tcg_gen_gvec_sub);
 }
 
 static bool trans_SQADD_zzz(DisasContext *s, arg_rrr_esz *a)
 {
-    return do_vector3_z(s, tcg_gen_gvec_ssadd, a->esz, a->rd, a->rn, a->rm);
+    return do_zzz_fn(s, a, tcg_gen_gvec_ssadd);
 }
 
 static bool trans_SQSUB_zzz(DisasContext *s, arg_rrr_esz *a)
 {
-    return do_vector3_z(s, tcg_gen_gvec_sssub, a->esz, a->rd, a->rn, a->rm);
+    return do_zzz_fn(s, a, tcg_gen_gvec_sssub);
 }
 
 static bool trans_UQADD_zzz(DisasContext *s, arg_rrr_esz *a)
 {
-    return do_vector3_z(s, tcg_gen_gvec_usadd, a->esz, a->rd, a->rn, a->rm);
+    return do_zzz_fn(s, a, tcg_gen_gvec_usadd);
 }
 
 static bool trans_UQSUB_zzz(DisasContext *s, arg_rrr_esz *a)
 {
-    return do_vector3_z(s, tcg_gen_gvec_ussub, a->esz, a->rd, a->rn, a->rm);
+    return do_zzz_fn(s, a, tcg_gen_gvec_ussub);
 }
 
 /*
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 04/20] target/arm: Rearrange {sve,fp}_check_access assert
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
                   ` (2 preceding siblings ...)
  2020-08-15  1:31 ` [PATCH 03/20] target/arm: Split out gen_gvec_fn_zzz, do_zzz_fn Richard Henderson
@ 2020-08-15  1:31 ` Richard Henderson
  2020-08-24 16:59   ` Peter Maydell
  2020-08-15  1:31 ` [PATCH 05/20] target/arm: Merge do_vector2_p into do_mov_p Richard Henderson
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

We want to ensure that access is checked by the time we ask
for a specific fp/vector register.  We want to ensure that
we do not emit two lots of code to raise an exception.

But sometimes it's difficult to cleanly organize the code
such that we never pass through sve_check_access exactly once.
Allow multiple calls so long as the result is true, that is,
no exception to be raised.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.h     |  1 +
 target/arm/translate-a64.c | 27 ++++++++++++++++-----------
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index 16f2699ad7..ad7972eb22 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -64,6 +64,7 @@ typedef struct DisasContext {
      * that it is set at the point where we actually touch the FP regs.
      */
     bool fp_access_checked;
+    bool sve_access_checked;
     /* ARMv8 single-step state (this is distinct from the QEMU gdbstub
      * single-step support).
      */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 534c3ff5f3..42aa695dff 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1175,18 +1175,18 @@ static void do_vec_ld(DisasContext *s, int destidx, int element,
  * unallocated-encoding checks (otherwise the syndrome information
  * for the resulting exception will be incorrect).
  */
-static inline bool fp_access_check(DisasContext *s)
+static bool fp_access_check(DisasContext *s)
 {
-    assert(!s->fp_access_checked);
-    s->fp_access_checked = true;
+    if (s->fp_excp_el) {
+        assert(!s->fp_access_checked);
+        s->fp_access_checked = true;
 
-    if (!s->fp_excp_el) {
-        return true;
+        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+                           syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
+        return false;
     }
-
-    gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
-                       syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
-    return false;
+    s->fp_access_checked = true;
+    return true;
 }
 
 /* Check that SVE access is enabled.  If it is, return true.
@@ -1195,10 +1195,14 @@ static inline bool fp_access_check(DisasContext *s)
 bool sve_access_check(DisasContext *s)
 {
     if (s->sve_excp_el) {
-        gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_sve_access_trap(),
-                           s->sve_excp_el);
+        assert(!s->sve_access_checked);
+        s->sve_access_checked = true;
+
+        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+                           syn_sve_access_trap(), s->sve_excp_el);
         return false;
     }
+    s->sve_access_checked = true;
     return fp_access_check(s);
 }
 
@@ -14548,6 +14552,7 @@ static void disas_a64_insn(CPUARMState *env, DisasContext *s)
     s->base.pc_next += 4;
 
     s->fp_access_checked = false;
+    s->sve_access_checked = false;
 
     if (dc_isar_feature(aa64_bti, s)) {
         if (s->base.num_insns == 1) {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 05/20] target/arm: Merge do_vector2_p into do_mov_p
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
                   ` (3 preceding siblings ...)
  2020-08-15  1:31 ` [PATCH 04/20] target/arm: Rearrange {sve,fp}_check_access assert Richard Henderson
@ 2020-08-15  1:31 ` Richard Henderson
  2020-08-24 16:41   ` Peter Maydell
  2020-08-15  1:31 ` [PATCH 06/20] target/arm: Clean up 4-operand predicate expansion Richard Henderson
                   ` (16 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

This is the only user of the function.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index b0fa38db1c..d310709de3 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -178,18 +178,6 @@ static void do_dupi_z(DisasContext *s, int rd, uint64_t word)
     tcg_gen_gvec_dup_imm(MO_64, vec_full_reg_offset(s, rd), vsz, vsz, word);
 }
 
-/* Invoke a vector expander on two Pregs.  */
-static bool do_vector2_p(DisasContext *s, GVecGen2Fn *gvec_fn,
-                         int esz, int rd, int rn)
-{
-    if (sve_access_check(s)) {
-        unsigned psz = pred_gvec_reg_size(s);
-        gvec_fn(esz, pred_full_reg_offset(s, rd),
-                pred_full_reg_offset(s, rn), psz, psz);
-    }
-    return true;
-}
-
 /* Invoke a vector expander on three Pregs.  */
 static bool do_vector3_p(DisasContext *s, GVecGen3Fn *gvec_fn,
                          int esz, int rd, int rn, int rm)
@@ -221,7 +209,12 @@ static bool do_vecop4_p(DisasContext *s, const GVecGen4 *gvec_op,
 /* Invoke a vector move on two Pregs.  */
 static bool do_mov_p(DisasContext *s, int rd, int rn)
 {
-    return do_vector2_p(s, tcg_gen_gvec_mov, 0, rd, rn);
+    if (sve_access_check(s)) {
+        unsigned psz = pred_gvec_reg_size(s);
+        tcg_gen_gvec_mov(MO_8, pred_full_reg_offset(s, rd),
+                         pred_full_reg_offset(s, rn), psz, psz);
+    }
+    return true;
 }
 
 /* Set the cpu flags as per a return from an SVE helper.  */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 06/20] target/arm: Clean up 4-operand predicate expansion
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
                   ` (4 preceding siblings ...)
  2020-08-15  1:31 ` [PATCH 05/20] target/arm: Merge do_vector2_p into do_mov_p Richard Henderson
@ 2020-08-15  1:31 ` Richard Henderson
  2020-08-25 11:13   ` Peter Maydell
  2020-08-15  1:31 ` [PATCH 07/20] target/arm: Use tcg_gen_gvec_bitsel for trans_SEL_pppp Richard Henderson
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Move the check for !S into do_pppp_flags, which allows to merge in
do_vecop4_p.  Split out gen_gvec_fn_ppp without sve_access_check,
to mirror gen_gvec_fn_zzz.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 111 ++++++++++++++-----------------------
 1 file changed, 43 insertions(+), 68 deletions(-)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index d310709de3..13a0194d59 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -179,31 +179,13 @@ static void do_dupi_z(DisasContext *s, int rd, uint64_t word)
 }
 
 /* Invoke a vector expander on three Pregs.  */
-static bool do_vector3_p(DisasContext *s, GVecGen3Fn *gvec_fn,
-                         int esz, int rd, int rn, int rm)
+static void gen_gvec_fn_ppp(DisasContext *s, GVecGen3Fn *gvec_fn,
+                            int rd, int rn, int rm)
 {
-    if (sve_access_check(s)) {
-        unsigned psz = pred_gvec_reg_size(s);
-        gvec_fn(esz, pred_full_reg_offset(s, rd),
-                pred_full_reg_offset(s, rn),
-                pred_full_reg_offset(s, rm), psz, psz);
-    }
-    return true;
-}
-
-/* Invoke a vector operation on four Pregs.  */
-static bool do_vecop4_p(DisasContext *s, const GVecGen4 *gvec_op,
-                        int rd, int rn, int rm, int rg)
-{
-    if (sve_access_check(s)) {
-        unsigned psz = pred_gvec_reg_size(s);
-        tcg_gen_gvec_4(pred_full_reg_offset(s, rd),
-                       pred_full_reg_offset(s, rn),
-                       pred_full_reg_offset(s, rm),
-                       pred_full_reg_offset(s, rg),
-                       psz, psz, gvec_op);
-    }
-    return true;
+    unsigned psz = pred_gvec_reg_size(s);
+    gvec_fn(MO_64, pred_full_reg_offset(s, rd),
+            pred_full_reg_offset(s, rn),
+            pred_full_reg_offset(s, rm), psz, psz);
 }
 
 /* Invoke a vector move on two Pregs.  */
@@ -1067,6 +1049,11 @@ static bool do_pppp_flags(DisasContext *s, arg_rprr_s *a,
     int mofs = pred_full_reg_offset(s, a->rm);
     int gofs = pred_full_reg_offset(s, a->pg);
 
+    if (!a->s) {
+        tcg_gen_gvec_4(dofs, nofs, mofs, gofs, psz, psz, gvec_op);
+        return true;
+    }
+
     if (psz == 8) {
         /* Do the operation and the flags generation in temps.  */
         TCGv_i64 pd = tcg_temp_new_i64();
@@ -1126,19 +1113,24 @@ static bool trans_AND_pppp(DisasContext *s, arg_rprr_s *a)
         .fno = gen_helper_sve_and_pppp,
         .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     };
-    if (a->s) {
-        return do_pppp_flags(s, a, &op);
-    } else if (a->rn == a->rm) {
-        if (a->pg == a->rn) {
-            return do_mov_p(s, a->rd, a->rn);
-        } else {
-            return do_vector3_p(s, tcg_gen_gvec_and, 0, a->rd, a->rn, a->pg);
+
+    if (!a->s) {
+        if (!sve_access_check(s)) {
+            return true;
+        }
+        if (a->rn == a->rm) {
+            if (a->pg == a->rn) {
+                do_mov_p(s, a->rd, a->rn);
+            } else {
+                gen_gvec_fn_ppp(s, tcg_gen_gvec_and, a->rd, a->rn, a->pg);
+            }
+            return true;
+        } else if (a->pg == a->rn || a->pg == a->rm) {
+            gen_gvec_fn_ppp(s, tcg_gen_gvec_and, a->rd, a->rn, a->rm);
+            return true;
         }
-    } else if (a->pg == a->rn || a->pg == a->rm) {
-        return do_vector3_p(s, tcg_gen_gvec_and, 0, a->rd, a->rn, a->rm);
-    } else {
-        return do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
     }
+    return do_pppp_flags(s, a, &op);
 }
 
 static void gen_bic_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
@@ -1162,13 +1154,14 @@ static bool trans_BIC_pppp(DisasContext *s, arg_rprr_s *a)
         .fno = gen_helper_sve_bic_pppp,
         .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     };
-    if (a->s) {
-        return do_pppp_flags(s, a, &op);
-    } else if (a->pg == a->rn) {
-        return do_vector3_p(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm);
-    } else {
-        return do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
+
+    if (!a->s && a->pg == a->rn) {
+        if (sve_access_check(s)) {
+            gen_gvec_fn_ppp(s, tcg_gen_gvec_andc, a->rd, a->rn, a->rm);
+        }
+        return true;
     }
+    return do_pppp_flags(s, a, &op);
 }
 
 static void gen_eor_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
@@ -1192,11 +1185,7 @@ static bool trans_EOR_pppp(DisasContext *s, arg_rprr_s *a)
         .fno = gen_helper_sve_eor_pppp,
         .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     };
-    if (a->s) {
-        return do_pppp_flags(s, a, &op);
-    } else {
-        return do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
-    }
+    return do_pppp_flags(s, a, &op);
 }
 
 static void gen_sel_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
@@ -1222,11 +1211,11 @@ static bool trans_SEL_pppp(DisasContext *s, arg_rprr_s *a)
         .fno = gen_helper_sve_sel_pppp,
         .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     };
+
     if (a->s) {
         return false;
-    } else {
-        return do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
     }
+    return do_pppp_flags(s, a, &op);
 }
 
 static void gen_orr_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
@@ -1250,13 +1239,11 @@ static bool trans_ORR_pppp(DisasContext *s, arg_rprr_s *a)
         .fno = gen_helper_sve_orr_pppp,
         .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     };
-    if (a->s) {
-        return do_pppp_flags(s, a, &op);
-    } else if (a->pg == a->rn && a->rn == a->rm) {
+
+    if (!a->s && a->pg == a->rn && a->rn == a->rm) {
         return do_mov_p(s, a->rd, a->rn);
-    } else {
-        return do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
     }
+    return do_pppp_flags(s, a, &op);
 }
 
 static void gen_orn_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
@@ -1280,11 +1267,7 @@ static bool trans_ORN_pppp(DisasContext *s, arg_rprr_s *a)
         .fno = gen_helper_sve_orn_pppp,
         .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     };
-    if (a->s) {
-        return do_pppp_flags(s, a, &op);
-    } else {
-        return do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
-    }
+    return do_pppp_flags(s, a, &op);
 }
 
 static void gen_nor_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
@@ -1308,11 +1291,7 @@ static bool trans_NOR_pppp(DisasContext *s, arg_rprr_s *a)
         .fno = gen_helper_sve_nor_pppp,
         .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     };
-    if (a->s) {
-        return do_pppp_flags(s, a, &op);
-    } else {
-        return do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
-    }
+    return do_pppp_flags(s, a, &op);
 }
 
 static void gen_nand_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
@@ -1336,11 +1315,7 @@ static bool trans_NAND_pppp(DisasContext *s, arg_rprr_s *a)
         .fno = gen_helper_sve_nand_pppp,
         .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     };
-    if (a->s) {
-        return do_pppp_flags(s, a, &op);
-    } else {
-        return do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
-    }
+    return do_pppp_flags(s, a, &op);
 }
 
 /*
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 07/20] target/arm: Use tcg_gen_gvec_bitsel for trans_SEL_pppp
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
                   ` (5 preceding siblings ...)
  2020-08-15  1:31 ` [PATCH 06/20] target/arm: Clean up 4-operand predicate expansion Richard Henderson
@ 2020-08-15  1:31 ` Richard Henderson
  2020-08-24 16:44   ` Peter Maydell
  2020-08-15  1:31 ` [PATCH 08/20] target/arm: Split out gen_gvec_ool_zzzp Richard Henderson
                   ` (14 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

The gvec operation was added after the initial implementation
of the SEL instruction and was missed in the conversion.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 31 ++++++++-----------------------
 1 file changed, 8 insertions(+), 23 deletions(-)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 13a0194d59..aa7ed070e3 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -1188,34 +1188,19 @@ static bool trans_EOR_pppp(DisasContext *s, arg_rprr_s *a)
     return do_pppp_flags(s, a, &op);
 }
 
-static void gen_sel_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
-{
-    tcg_gen_and_i64(pn, pn, pg);
-    tcg_gen_andc_i64(pm, pm, pg);
-    tcg_gen_or_i64(pd, pn, pm);
-}
-
-static void gen_sel_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
-                           TCGv_vec pm, TCGv_vec pg)
-{
-    tcg_gen_and_vec(vece, pn, pn, pg);
-    tcg_gen_andc_vec(vece, pm, pm, pg);
-    tcg_gen_or_vec(vece, pd, pn, pm);
-}
-
 static bool trans_SEL_pppp(DisasContext *s, arg_rprr_s *a)
 {
-    static const GVecGen4 op = {
-        .fni8 = gen_sel_pg_i64,
-        .fniv = gen_sel_pg_vec,
-        .fno = gen_helper_sve_sel_pppp,
-        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-    };
-
     if (a->s) {
         return false;
     }
-    return do_pppp_flags(s, a, &op);
+    if (sve_access_check(s)) {
+        unsigned psz = pred_gvec_reg_size(s);
+        tcg_gen_gvec_bitsel(MO_8, pred_full_reg_offset(s, a->rd),
+                            pred_full_reg_offset(s, a->pg),
+                            pred_full_reg_offset(s, a->rn),
+                            pred_full_reg_offset(s, a->rm), psz, psz);
+    }
+    return true;
 }
 
 static void gen_orr_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 08/20] target/arm: Split out gen_gvec_ool_zzzp
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
                   ` (6 preceding siblings ...)
  2020-08-15  1:31 ` [PATCH 07/20] target/arm: Use tcg_gen_gvec_bitsel for trans_SEL_pppp Richard Henderson
@ 2020-08-15  1:31 ` Richard Henderson
  2020-08-24 16:43   ` Peter Maydell
  2020-08-15  1:31 ` [PATCH 09/20] target/arm: Merge helper_sve_clr_* and helper_sve_movz_* Richard Henderson
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Model after gen_gvec_fn_zzz et al.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 35 ++++++++++++++++-------------------
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index aa7ed070e3..535d086838 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -142,8 +142,19 @@ static int pred_gvec_reg_size(DisasContext *s)
     return size_for_gvec(pred_full_reg_size(s));
 }
 
-/* Invoke a vector expander on two Zregs.  */
+/* Invoke an out-of-line helper on 3 Zregs and a predicate. */
+static void gen_gvec_ool_zzzp(DisasContext *s, gen_helper_gvec_4 *fn,
+                              int rd, int rn, int rm, int pg, int data)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    tcg_gen_gvec_4_ool(vec_full_reg_offset(s, rd),
+                       vec_full_reg_offset(s, rn),
+                       vec_full_reg_offset(s, rm),
+                       pred_full_reg_offset(s, pg),
+                       vsz, vsz, data, fn);
+}
 
+/* Invoke a vector expander on two Zregs.  */
 static void gen_gvec_fn_zz(DisasContext *s, GVecGen2Fn *gvec_fn,
                            int esz, int rd, int rn)
 {
@@ -314,16 +325,11 @@ static bool trans_UQSUB_zzz(DisasContext *s, arg_rrr_esz *a)
 
 static bool do_zpzz_ool(DisasContext *s, arg_rprr_esz *a, gen_helper_gvec_4 *fn)
 {
-    unsigned vsz = vec_full_reg_size(s);
     if (fn == NULL) {
         return false;
     }
     if (sve_access_check(s)) {
-        tcg_gen_gvec_4_ool(vec_full_reg_offset(s, a->rd),
-                           vec_full_reg_offset(s, a->rn),
-                           vec_full_reg_offset(s, a->rm),
-                           pred_full_reg_offset(s, a->pg),
-                           vsz, vsz, 0, fn);
+        gen_gvec_ool_zzzp(s, fn, a->rd, a->rn, a->rm, a->pg, 0);
     }
     return true;
 }
@@ -337,12 +343,7 @@ static void do_sel_z(DisasContext *s, int rd, int rn, int rm, int pg, int esz)
         gen_helper_sve_sel_zpzz_b, gen_helper_sve_sel_zpzz_h,
         gen_helper_sve_sel_zpzz_s, gen_helper_sve_sel_zpzz_d
     };
-    unsigned vsz = vec_full_reg_size(s);
-    tcg_gen_gvec_4_ool(vec_full_reg_offset(s, rd),
-                       vec_full_reg_offset(s, rn),
-                       vec_full_reg_offset(s, rm),
-                       pred_full_reg_offset(s, pg),
-                       vsz, vsz, 0, fns[esz]);
+    gen_gvec_ool_zzzp(s, fns[esz], rd, rn, rm, pg, 0);
 }
 
 #define DO_ZPZZ(NAME, name) \
@@ -2704,12 +2705,8 @@ static bool trans_RBIT(DisasContext *s, arg_rpr_esz *a)
 static bool trans_SPLICE(DisasContext *s, arg_rprr_esz *a)
 {
     if (sve_access_check(s)) {
-        unsigned vsz = vec_full_reg_size(s);
-        tcg_gen_gvec_4_ool(vec_full_reg_offset(s, a->rd),
-                           vec_full_reg_offset(s, a->rn),
-                           vec_full_reg_offset(s, a->rm),
-                           pred_full_reg_offset(s, a->pg),
-                           vsz, vsz, a->esz, gen_helper_sve_splice);
+        gen_gvec_ool_zzzp(s, gen_helper_sve_splice,
+                          a->rd, a->rn, a->rm, a->pg, 0);
     }
     return true;
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 09/20] target/arm: Merge helper_sve_clr_* and helper_sve_movz_*
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
                   ` (7 preceding siblings ...)
  2020-08-15  1:31 ` [PATCH 08/20] target/arm: Split out gen_gvec_ool_zzzp Richard Henderson
@ 2020-08-15  1:31 ` Richard Henderson
  2020-08-25 11:16   ` Peter Maydell
  2020-08-15  1:31 ` [PATCH 10/20] target/arm: Split out gen_gvec_ool_zzp Richard Henderson
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

The existing clr functions have only one vector argument, and so
can only clear in place.  The existing movz functions have two
vector arguments, and so can clear while moving.  Merge them, with
a flag that controls the sense of active vs inactive elements
being cleared.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  5 ---
 target/arm/sve_helper.c    | 70 ++++++++------------------------------
 target/arm/translate-sve.c | 53 +++++++++++------------------
 3 files changed, 34 insertions(+), 94 deletions(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 63c4a087ca..4411c47120 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -269,11 +269,6 @@ DEF_HELPER_FLAGS_3(sve_uminv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_uminv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_uminv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
 
-DEF_HELPER_FLAGS_3(sve_clr_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
-DEF_HELPER_FLAGS_3(sve_clr_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
-DEF_HELPER_FLAGS_3(sve_clr_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
-DEF_HELPER_FLAGS_3(sve_clr_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
-
 DEF_HELPER_FLAGS_4(sve_movz_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_movz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_movz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 382fa82bc8..4758d46f34 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -956,85 +956,43 @@ uint32_t HELPER(sve_pnext)(void *vd, void *vg, uint32_t pred_desc)
     return flags;
 }
 
-/* Store zero into every active element of Zd.  We will use this for two
- * and three-operand predicated instructions for which logic dictates a
- * zero result.  In particular, logical shift by element size, which is
- * otherwise undefined on the host.
- *
- * For element sizes smaller than uint64_t, we use tables to expand
- * the N bits of the controlling predicate to a byte mask, and clear
- * those bytes.
+/*
+ * Copy Zn into Zd, and store zero into inactive elements.
+ * If inv, store zeros into the active elements.
  */
-void HELPER(sve_clr_b)(void *vd, void *vg, uint32_t desc)
-{
-    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
-    uint64_t *d = vd;
-    uint8_t *pg = vg;
-    for (i = 0; i < opr_sz; i += 1) {
-        d[i] &= ~expand_pred_b(pg[H1(i)]);
-    }
-}
-
-void HELPER(sve_clr_h)(void *vd, void *vg, uint32_t desc)
-{
-    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
-    uint64_t *d = vd;
-    uint8_t *pg = vg;
-    for (i = 0; i < opr_sz; i += 1) {
-        d[i] &= ~expand_pred_h(pg[H1(i)]);
-    }
-}
-
-void HELPER(sve_clr_s)(void *vd, void *vg, uint32_t desc)
-{
-    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
-    uint64_t *d = vd;
-    uint8_t *pg = vg;
-    for (i = 0; i < opr_sz; i += 1) {
-        d[i] &= ~expand_pred_s(pg[H1(i)]);
-    }
-}
-
-void HELPER(sve_clr_d)(void *vd, void *vg, uint32_t desc)
-{
-    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
-    uint64_t *d = vd;
-    uint8_t *pg = vg;
-    for (i = 0; i < opr_sz; i += 1) {
-        if (pg[H1(i)] & 1) {
-            d[i] = 0;
-        }
-    }
-}
-
-/* Copy Zn into Zd, and store zero into inactive elements.  */
 void HELPER(sve_movz_b)(void *vd, void *vn, void *vg, uint32_t desc)
 {
     intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t inv = -(uint64_t)(simd_data(desc) & 1);
     uint64_t *d = vd, *n = vn;
     uint8_t *pg = vg;
+
     for (i = 0; i < opr_sz; i += 1) {
-        d[i] = n[i] & expand_pred_b(pg[H1(i)]);
+        d[i] = n[i] & (expand_pred_b(pg[H1(i)]) ^ inv);
     }
 }
 
 void HELPER(sve_movz_h)(void *vd, void *vn, void *vg, uint32_t desc)
 {
     intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t inv = -(uint64_t)(simd_data(desc) & 1);
     uint64_t *d = vd, *n = vn;
     uint8_t *pg = vg;
+
     for (i = 0; i < opr_sz; i += 1) {
-        d[i] = n[i] & expand_pred_h(pg[H1(i)]);
+        d[i] = n[i] & (expand_pred_h(pg[H1(i)]) ^ inv);
     }
 }
 
 void HELPER(sve_movz_s)(void *vd, void *vn, void *vg, uint32_t desc)
 {
     intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t inv = -(uint64_t)(simd_data(desc) & 1);
     uint64_t *d = vd, *n = vn;
     uint8_t *pg = vg;
+
     for (i = 0; i < opr_sz; i += 1) {
-        d[i] = n[i] & expand_pred_s(pg[H1(i)]);
+        d[i] = n[i] & (expand_pred_s(pg[H1(i)]) ^ inv);
     }
 }
 
@@ -1043,8 +1001,10 @@ void HELPER(sve_movz_d)(void *vd, void *vn, void *vg, uint32_t desc)
     intptr_t i, opr_sz = simd_oprsz(desc) / 8;
     uint64_t *d = vd, *n = vn;
     uint8_t *pg = vg;
+    uint8_t inv = simd_data(desc);
+
     for (i = 0; i < opr_sz; i += 1) {
-        d[i] = n[i] & -(uint64_t)(pg[H1(i)] & 1);
+        d[i] = n[i] & -(uint64_t)((pg[H1(i)] ^ inv) & 1);
     }
 }
 
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 535d086838..ea6058f7ef 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -590,37 +590,26 @@ static bool trans_SADDV(DisasContext *s, arg_rpr_esz *a)
  *** SVE Shift by Immediate - Predicated Group
  */
 
-/* Store zero into every active element of Zd.  We will use this for two
- * and three-operand predicated instructions for which logic dictates a
- * zero result.
+/*
+ * Copy Zn into Zd, storing zeros into inactive elements.
+ * If invert, store zeros into the active elements.
  */
-static bool do_clr_zp(DisasContext *s, int rd, int pg, int esz)
-{
-    static gen_helper_gvec_2 * const fns[4] = {
-        gen_helper_sve_clr_b, gen_helper_sve_clr_h,
-        gen_helper_sve_clr_s, gen_helper_sve_clr_d,
-    };
-    if (sve_access_check(s)) {
-        unsigned vsz = vec_full_reg_size(s);
-        tcg_gen_gvec_2_ool(vec_full_reg_offset(s, rd),
-                           pred_full_reg_offset(s, pg),
-                           vsz, vsz, 0, fns[esz]);
-    }
-    return true;
-}
-
-/* Copy Zn into Zd, storing zeros into inactive elements.  */
-static void do_movz_zpz(DisasContext *s, int rd, int rn, int pg, int esz)
+static bool do_movz_zpz(DisasContext *s, int rd, int rn, int pg,
+                        int esz, bool invert)
 {
     static gen_helper_gvec_3 * const fns[4] = {
         gen_helper_sve_movz_b, gen_helper_sve_movz_h,
         gen_helper_sve_movz_s, gen_helper_sve_movz_d,
     };
-    unsigned vsz = vec_full_reg_size(s);
-    tcg_gen_gvec_3_ool(vec_full_reg_offset(s, rd),
-                       vec_full_reg_offset(s, rn),
-                       pred_full_reg_offset(s, pg),
-                       vsz, vsz, 0, fns[esz]);
+
+    if (sve_access_check(s)) {
+        unsigned vsz = vec_full_reg_size(s);
+        tcg_gen_gvec_3_ool(vec_full_reg_offset(s, rd),
+                           vec_full_reg_offset(s, rn),
+                           pred_full_reg_offset(s, pg),
+                           vsz, vsz, invert, fns[esz]);
+    }
+    return true;
 }
 
 static bool do_zpzi_ool(DisasContext *s, arg_rpri_esz *a,
@@ -664,7 +653,7 @@ static bool trans_LSR_zpzi(DisasContext *s, arg_rpri_esz *a)
     /* Shift by element size is architecturally valid.
        For logical shifts, it is a zeroing operation.  */
     if (a->imm >= (8 << a->esz)) {
-        return do_clr_zp(s, a->rd, a->pg, a->esz);
+        return do_movz_zpz(s, a->rd, a->rd, a->pg, a->esz, true);
     } else {
         return do_zpzi_ool(s, a, fns[a->esz]);
     }
@@ -682,7 +671,7 @@ static bool trans_LSL_zpzi(DisasContext *s, arg_rpri_esz *a)
     /* Shift by element size is architecturally valid.
        For logical shifts, it is a zeroing operation.  */
     if (a->imm >= (8 << a->esz)) {
-        return do_clr_zp(s, a->rd, a->pg, a->esz);
+        return do_movz_zpz(s, a->rd, a->rd, a->pg, a->esz, true);
     } else {
         return do_zpzi_ool(s, a, fns[a->esz]);
     }
@@ -700,7 +689,7 @@ static bool trans_ASRD(DisasContext *s, arg_rpri_esz *a)
     /* Shift by element size is architecturally valid.  For arithmetic
        right shift for division, it is a zeroing operation.  */
     if (a->imm >= (8 << a->esz)) {
-        return do_clr_zp(s, a->rd, a->pg, a->esz);
+        return do_movz_zpz(s, a->rd, a->rd, a->pg, a->esz, true);
     } else {
         return do_zpzi_ool(s, a, fns[a->esz]);
     }
@@ -5049,8 +5038,7 @@ static bool trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a)
 
     /* Zero the inactive elements.  */
     gen_set_label(over);
-    do_movz_zpz(s, a->rd, a->rd, a->pg, esz);
-    return true;
+    return do_movz_zpz(s, a->rd, a->rd, a->pg, esz, false);
 }
 
 static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
@@ -5833,8 +5821,5 @@ static bool trans_MOVPRFX_m(DisasContext *s, arg_rpr_esz *a)
 
 static bool trans_MOVPRFX_z(DisasContext *s, arg_rpr_esz *a)
 {
-    if (sve_access_check(s)) {
-        do_movz_zpz(s, a->rd, a->rn, a->pg, a->esz);
-    }
-    return true;
+    return do_movz_zpz(s, a->rd, a->rn, a->pg, a->esz, false);
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 10/20] target/arm: Split out gen_gvec_ool_zzp
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
                   ` (8 preceding siblings ...)
  2020-08-15  1:31 ` [PATCH 09/20] target/arm: Merge helper_sve_clr_* and helper_sve_movz_* Richard Henderson
@ 2020-08-15  1:31 ` Richard Henderson
  2020-08-24 16:46   ` Peter Maydell
  2020-08-15  1:31 ` [PATCH 11/20] target/arm: Split out gen_gvec_ool_zzz Richard Henderson
                   ` (11 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Model after gen_gvec_fn_zzz et al.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index ea6058f7ef..3361e1c01f 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -142,6 +142,17 @@ static int pred_gvec_reg_size(DisasContext *s)
     return size_for_gvec(pred_full_reg_size(s));
 }
 
+/* Invoke an out-of-line helper on 2 Zregs and a predicate. */
+static void gen_gvec_ool_zzp(DisasContext *s, gen_helper_gvec_3 *fn,
+                             int rd, int rn, int pg, int data)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    tcg_gen_gvec_3_ool(vec_full_reg_offset(s, rd),
+                       vec_full_reg_offset(s, rn),
+                       pred_full_reg_offset(s, pg),
+                       vsz, vsz, data, fn);
+}
+
 /* Invoke an out-of-line helper on 3 Zregs and a predicate. */
 static void gen_gvec_ool_zzzp(DisasContext *s, gen_helper_gvec_4 *fn,
                               int rd, int rn, int rm, int pg, int data)
@@ -415,11 +426,7 @@ static bool do_zpz_ool(DisasContext *s, arg_rpr_esz *a, gen_helper_gvec_3 *fn)
         return false;
     }
     if (sve_access_check(s)) {
-        unsigned vsz = vec_full_reg_size(s);
-        tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
-                           vec_full_reg_offset(s, a->rn),
-                           pred_full_reg_offset(s, a->pg),
-                           vsz, vsz, 0, fn);
+        gen_gvec_ool_zzp(s, fn, a->rd, a->rn, a->pg, 0);
     }
     return true;
 }
@@ -603,11 +610,7 @@ static bool do_movz_zpz(DisasContext *s, int rd, int rn, int pg,
     };
 
     if (sve_access_check(s)) {
-        unsigned vsz = vec_full_reg_size(s);
-        tcg_gen_gvec_3_ool(vec_full_reg_offset(s, rd),
-                           vec_full_reg_offset(s, rn),
-                           pred_full_reg_offset(s, pg),
-                           vsz, vsz, invert, fns[esz]);
+        gen_gvec_ool_zzp(s, fns[esz], rd, rn, pg, invert);
     }
     return true;
 }
@@ -616,11 +619,7 @@ static bool do_zpzi_ool(DisasContext *s, arg_rpri_esz *a,
                         gen_helper_gvec_3 *fn)
 {
     if (sve_access_check(s)) {
-        unsigned vsz = vec_full_reg_size(s);
-        tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
-                           vec_full_reg_offset(s, a->rn),
-                           pred_full_reg_offset(s, a->pg),
-                           vsz, vsz, a->imm, fn);
+        gen_gvec_ool_zzp(s, fn, a->rd, a->rn, a->pg, a->imm);
     }
     return true;
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 11/20] target/arm: Split out gen_gvec_ool_zzz
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
                   ` (9 preceding siblings ...)
  2020-08-15  1:31 ` [PATCH 10/20] target/arm: Split out gen_gvec_ool_zzp Richard Henderson
@ 2020-08-15  1:31 ` Richard Henderson
  2020-08-24 16:47   ` Peter Maydell
  2020-08-15  1:31 ` [PATCH 12/20] target/arm: Split out gen_gvec_ool_zz Richard Henderson
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 53 +++++++++++++-------------------------
 1 file changed, 18 insertions(+), 35 deletions(-)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 3361e1c01f..3a90a645fd 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -142,6 +142,17 @@ static int pred_gvec_reg_size(DisasContext *s)
     return size_for_gvec(pred_full_reg_size(s));
 }
 
+/* Invoke an out-of-line helper on 3 Zregs. */
+static void gen_gvec_ool_zzz(DisasContext *s, gen_helper_gvec_3 *fn,
+                             int rd, int rn, int rm, int data)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    tcg_gen_gvec_3_ool(vec_full_reg_offset(s, rd),
+                       vec_full_reg_offset(s, rn),
+                       vec_full_reg_offset(s, rm),
+                       vsz, vsz, data, fn);
+}
+
 /* Invoke an out-of-line helper on 2 Zregs and a predicate. */
 static void gen_gvec_ool_zzp(DisasContext *s, gen_helper_gvec_3 *fn,
                              int rd, int rn, int pg, int data)
@@ -769,11 +780,7 @@ static bool do_zzw_ool(DisasContext *s, arg_rrr_esz *a, gen_helper_gvec_3 *fn)
         return false;
     }
     if (sve_access_check(s)) {
-        unsigned vsz = vec_full_reg_size(s);
-        tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
-                           vec_full_reg_offset(s, a->rn),
-                           vec_full_reg_offset(s, a->rm),
-                           vsz, vsz, 0, fn);
+        gen_gvec_ool_zzz(s, fn, a->rd, a->rn, a->rm, 0);
     }
     return true;
 }
@@ -947,11 +954,7 @@ static bool trans_RDVL(DisasContext *s, arg_RDVL *a)
 static bool do_adr(DisasContext *s, arg_rrri *a, gen_helper_gvec_3 *fn)
 {
     if (sve_access_check(s)) {
-        unsigned vsz = vec_full_reg_size(s);
-        tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
-                           vec_full_reg_offset(s, a->rn),
-                           vec_full_reg_offset(s, a->rm),
-                           vsz, vsz, a->imm, fn);
+        gen_gvec_ool_zzz(s, fn, a->rd, a->rn, a->rm, a->imm);
     }
     return true;
 }
@@ -1012,11 +1015,7 @@ static bool trans_FTSSEL(DisasContext *s, arg_rrr_esz *a)
         return false;
     }
     if (sve_access_check(s)) {
-        unsigned vsz = vec_full_reg_size(s);
-        tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
-                           vec_full_reg_offset(s, a->rn),
-                           vec_full_reg_offset(s, a->rm),
-                           vsz, vsz, 0, fns[a->esz]);
+        gen_gvec_ool_zzz(s, fns[a->esz], a->rd, a->rn, a->rm, 0);
     }
     return true;
 }
@@ -2067,11 +2066,7 @@ static bool trans_TBL(DisasContext *s, arg_rrr_esz *a)
     };
 
     if (sve_access_check(s)) {
-        unsigned vsz = vec_full_reg_size(s);
-        tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
-                           vec_full_reg_offset(s, a->rn),
-                           vec_full_reg_offset(s, a->rm),
-                           vsz, vsz, 0, fns[a->esz]);
+        gen_gvec_ool_zzz(s, fns[a->esz], a->rd, a->rn, a->rm, 0);
     }
     return true;
 }
@@ -2244,11 +2239,7 @@ static bool do_zzz_data_ool(DisasContext *s, arg_rrr_esz *a, int data,
                             gen_helper_gvec_3 *fn)
 {
     if (sve_access_check(s)) {
-        unsigned vsz = vec_full_reg_size(s);
-        tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
-                           vec_full_reg_offset(s, a->rn),
-                           vec_full_reg_offset(s, a->rm),
-                           vsz, vsz, data, fn);
+        gen_gvec_ool_zzz(s, fn, a->rd, a->rn, a->rm, data);
     }
     return true;
 }
@@ -3373,11 +3364,7 @@ static bool trans_DOT_zzz(DisasContext *s, arg_DOT_zzz *a)
     };
 
     if (sve_access_check(s)) {
-        unsigned vsz = vec_full_reg_size(s);
-        tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
-                           vec_full_reg_offset(s, a->rn),
-                           vec_full_reg_offset(s, a->rm),
-                           vsz, vsz, 0, fns[a->u][a->sz]);
+        gen_gvec_ool_zzz(s, fns[a->u][a->sz], a->rd, a->rn, a->rm, 0);
     }
     return true;
 }
@@ -3390,11 +3377,7 @@ static bool trans_DOT_zzx(DisasContext *s, arg_DOT_zzx *a)
     };
 
     if (sve_access_check(s)) {
-        unsigned vsz = vec_full_reg_size(s);
-        tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
-                           vec_full_reg_offset(s, a->rn),
-                           vec_full_reg_offset(s, a->rm),
-                           vsz, vsz, a->index, fns[a->u][a->sz]);
+        gen_gvec_ool_zzz(s, fns[a->u][a->sz], a->rd, a->rn, a->rm, a->index);
     }
     return true;
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 12/20] target/arm: Split out gen_gvec_ool_zz
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
                   ` (10 preceding siblings ...)
  2020-08-15  1:31 ` [PATCH 11/20] target/arm: Split out gen_gvec_ool_zzz Richard Henderson
@ 2020-08-15  1:31 ` Richard Henderson
  2020-08-24 16:47   ` Peter Maydell
  2020-08-15  1:31 ` [PATCH 13/20] target/arm: Tidy SVE tszimm shift formats Richard Henderson
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 3a90a645fd..a2948b5128 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -142,6 +142,16 @@ static int pred_gvec_reg_size(DisasContext *s)
     return size_for_gvec(pred_full_reg_size(s));
 }
 
+/* Invoke an out-of-line helper on 2 Zregs. */
+static void gen_gvec_ool_zz(DisasContext *s, gen_helper_gvec_2 *fn,
+                            int rd, int rn, int data)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    tcg_gen_gvec_2_ool(vec_full_reg_offset(s, rd),
+                       vec_full_reg_offset(s, rn),
+                       vsz, vsz, data, fn);
+}
+
 /* Invoke an out-of-line helper on 3 Zregs. */
 static void gen_gvec_ool_zzz(DisasContext *s, gen_helper_gvec_3 *fn,
                              int rd, int rn, int rm, int data)
@@ -995,10 +1005,7 @@ static bool trans_FEXPA(DisasContext *s, arg_rr_esz *a)
         return false;
     }
     if (sve_access_check(s)) {
-        unsigned vsz = vec_full_reg_size(s);
-        tcg_gen_gvec_2_ool(vec_full_reg_offset(s, a->rd),
-                           vec_full_reg_offset(s, a->rn),
-                           vsz, vsz, 0, fns[a->esz]);
+        gen_gvec_ool_zz(s, fns[a->esz], a->rd, a->rn, 0);
     }
     return true;
 }
@@ -2050,10 +2057,7 @@ static bool trans_REV_v(DisasContext *s, arg_rr_esz *a)
     };
 
     if (sve_access_check(s)) {
-        unsigned vsz = vec_full_reg_size(s);
-        tcg_gen_gvec_2_ool(vec_full_reg_offset(s, a->rd),
-                           vec_full_reg_offset(s, a->rn),
-                           vsz, vsz, 0, fns[a->esz]);
+        gen_gvec_ool_zz(s, fns[a->esz], a->rd, a->rn, 0);
     }
     return true;
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 13/20] target/arm: Tidy SVE tszimm shift formats
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
                   ` (11 preceding siblings ...)
  2020-08-15  1:31 ` [PATCH 12/20] target/arm: Split out gen_gvec_ool_zz Richard Henderson
@ 2020-08-15  1:31 ` Richard Henderson
  2020-08-25 11:18   ` Peter Maydell
  2020-08-15  1:31 ` [PATCH 14/20] target/arm: Generalize inl_qrdmlah_* helper functions Richard Henderson
                   ` (8 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Rather than require the user to fill in the immediate (shl or shr),
create full formats that include the immediate.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/sve.decode | 35 ++++++++++++++++-------------------
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 4f580a25e7..6425396ac1 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -150,13 +150,17 @@
 @rd_rn_i6       ........ ... rn:5 ..... imm:s6 rd:5             &rri
 
 # Two register operand, one immediate operand, with predicate,
-# element size encoded as TSZHL.  User must fill in imm.
-@rdn_pg_tszimm  ........ .. ... ... ... pg:3 ..... rd:5 \
-                &rpri_esz rn=%reg_movprfx esz=%tszimm_esz
+# element size encoded as TSZHL.
+@rdn_pg_tszimm_shl  ........ .. ... ... ... pg:3 ..... rd:5 \
+                    &rpri_esz rn=%reg_movprfx esz=%tszimm_esz imm=%tszimm_shl
+@rdn_pg_tszimm_shr  ........ .. ... ... ... pg:3 ..... rd:5 \
+                    &rpri_esz rn=%reg_movprfx esz=%tszimm_esz imm=%tszimm_shr
 
 # Similarly without predicate.
-@rd_rn_tszimm   ........ .. ... ... ...... rn:5 rd:5 \
-                &rri_esz esz=%tszimm16_esz
+@rd_rn_tszimm_shl   ........ .. ... ... ...... rn:5 rd:5 \
+                    &rri_esz esz=%tszimm16_esz imm=%tszimm16_shl
+@rd_rn_tszimm_shr   ........ .. ... ... ...... rn:5 rd:5 \
+                    &rri_esz esz=%tszimm16_esz imm=%tszimm16_shr
 
 # Two register operand, one immediate operand, with 4-bit predicate.
 # User must fill in imm.
@@ -289,14 +293,10 @@ UMINV           00000100 .. 001 011 001 ... ..... .....         @rd_pg_rn
 ### SVE Shift by Immediate - Predicated Group
 
 # SVE bitwise shift by immediate (predicated)
-ASR_zpzi        00000100 .. 000 000 100 ... .. ... ..... \
-                @rdn_pg_tszimm imm=%tszimm_shr
-LSR_zpzi        00000100 .. 000 001 100 ... .. ... ..... \
-                @rdn_pg_tszimm imm=%tszimm_shr
-LSL_zpzi        00000100 .. 000 011 100 ... .. ... ..... \
-                @rdn_pg_tszimm imm=%tszimm_shl
-ASRD            00000100 .. 000 100 100 ... .. ... ..... \
-                @rdn_pg_tszimm imm=%tszimm_shr
+ASR_zpzi        00000100 .. 000 000 100 ... .. ... .....  @rdn_pg_tszimm_shr
+LSR_zpzi        00000100 .. 000 001 100 ... .. ... .....  @rdn_pg_tszimm_shr
+LSL_zpzi        00000100 .. 000 011 100 ... .. ... .....  @rdn_pg_tszimm_shl
+ASRD            00000100 .. 000 100 100 ... .. ... .....  @rdn_pg_tszimm_shr
 
 # SVE bitwise shift by vector (predicated)
 ASR_zpzz        00000100 .. 010 000 100 ... ..... .....   @rdn_pg_rm
@@ -400,12 +400,9 @@ RDVL            00000100 101 11111 01010 imm:s6 rd:5
 ### SVE Bitwise Shift - Unpredicated Group
 
 # SVE bitwise shift by immediate (unpredicated)
-ASR_zzi         00000100 .. 1 ..... 1001 00 ..... ..... \
-                @rd_rn_tszimm imm=%tszimm16_shr
-LSR_zzi         00000100 .. 1 ..... 1001 01 ..... ..... \
-                @rd_rn_tszimm imm=%tszimm16_shr
-LSL_zzi         00000100 .. 1 ..... 1001 11 ..... ..... \
-                @rd_rn_tszimm imm=%tszimm16_shl
+ASR_zzi         00000100 .. 1 ..... 1001 00 ..... .....  @rd_rn_tszimm_shr
+LSR_zzi         00000100 .. 1 ..... 1001 01 ..... .....  @rd_rn_tszimm_shr
+LSL_zzi         00000100 .. 1 ..... 1001 11 ..... .....  @rd_rn_tszimm_shl
 
 # SVE bitwise shift by wide elements (unpredicated)
 # Note esz != 3
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 14/20] target/arm: Generalize inl_qrdmlah_* helper functions
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
                   ` (12 preceding siblings ...)
  2020-08-15  1:31 ` [PATCH 13/20] target/arm: Tidy SVE tszimm shift formats Richard Henderson
@ 2020-08-15  1:31 ` Richard Henderson
  2020-08-25 13:06   ` Peter Maydell
  2020-08-15  1:31 ` [PATCH 15/20] target/arm: Fix sve_uzp_p vs odd vector lengths Richard Henderson
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Unify add/sub helpers and add a parameter for rounding.
This will allow saturating non-rounding to reuse this code.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/vec_helper.c | 80 +++++++++++++++--------------------------
 1 file changed, 29 insertions(+), 51 deletions(-)

diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index 7d76412ee0..bbd1141dfc 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -37,19 +37,24 @@
 #endif
 
 /* Signed saturating rounding doubling multiply-accumulate high half, 16-bit */
-static int16_t inl_qrdmlah_s16(int16_t src1, int16_t src2,
-                               int16_t src3, uint32_t *sat)
+static int16_t do_sqrdmlah_h(int16_t src1, int16_t src2, int16_t src3,
+                             bool neg, bool round, uint32_t *sat)
 {
-    /* Simplify:
+    /*
+     * Simplify:
      * = ((a3 << 16) + ((e1 * e2) << 1) + (1 << 15)) >> 16
      * = ((a3 << 15) + (e1 * e2) + (1 << 14)) >> 15
      */
     int32_t ret = (int32_t)src1 * src2;
-    ret = ((int32_t)src3 << 15) + ret + (1 << 14);
+    if (neg) {
+        ret = -ret;
+    }
+    ret += ((int32_t)src3 << 15) + (round << 14);
     ret >>= 15;
+
     if (ret != (int16_t)ret) {
         *sat = 1;
-        ret = (ret < 0 ? -0x8000 : 0x7fff);
+        ret = (ret < 0 ? INT16_MIN : INT16_MAX);
     }
     return ret;
 }
@@ -58,8 +63,9 @@ uint32_t HELPER(neon_qrdmlah_s16)(CPUARMState *env, uint32_t src1,
                                   uint32_t src2, uint32_t src3)
 {
     uint32_t *sat = &env->vfp.qc[0];
-    uint16_t e1 = inl_qrdmlah_s16(src1, src2, src3, sat);
-    uint16_t e2 = inl_qrdmlah_s16(src1 >> 16, src2 >> 16, src3 >> 16, sat);
+    uint16_t e1 = do_sqrdmlah_h(src1, src2, src3, false, true, sat);
+    uint16_t e2 = do_sqrdmlah_h(src1 >> 16, src2 >> 16, src3 >> 16,
+                                false, true, sat);
     return deposit32(e1, 16, 16, e2);
 }
 
@@ -73,35 +79,18 @@ void HELPER(gvec_qrdmlah_s16)(void *vd, void *vn, void *vm,
     uintptr_t i;
 
     for (i = 0; i < opr_sz / 2; ++i) {
-        d[i] = inl_qrdmlah_s16(n[i], m[i], d[i], vq);
+        d[i] = do_sqrdmlah_h(n[i], m[i], d[i], false, true, vq);
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
 
-/* Signed saturating rounding doubling multiply-subtract high half, 16-bit */
-static int16_t inl_qrdmlsh_s16(int16_t src1, int16_t src2,
-                               int16_t src3, uint32_t *sat)
-{
-    /* Similarly, using subtraction:
-     * = ((a3 << 16) - ((e1 * e2) << 1) + (1 << 15)) >> 16
-     * = ((a3 << 15) - (e1 * e2) + (1 << 14)) >> 15
-     */
-    int32_t ret = (int32_t)src1 * src2;
-    ret = ((int32_t)src3 << 15) - ret + (1 << 14);
-    ret >>= 15;
-    if (ret != (int16_t)ret) {
-        *sat = 1;
-        ret = (ret < 0 ? -0x8000 : 0x7fff);
-    }
-    return ret;
-}
-
 uint32_t HELPER(neon_qrdmlsh_s16)(CPUARMState *env, uint32_t src1,
                                   uint32_t src2, uint32_t src3)
 {
     uint32_t *sat = &env->vfp.qc[0];
-    uint16_t e1 = inl_qrdmlsh_s16(src1, src2, src3, sat);
-    uint16_t e2 = inl_qrdmlsh_s16(src1 >> 16, src2 >> 16, src3 >> 16, sat);
+    uint16_t e1 = do_sqrdmlah_h(src1, src2, src3, true, true, sat);
+    uint16_t e2 = do_sqrdmlah_h(src1 >> 16, src2 >> 16, src3 >> 16,
+                                true, true, sat);
     return deposit32(e1, 16, 16, e2);
 }
 
@@ -115,19 +104,23 @@ void HELPER(gvec_qrdmlsh_s16)(void *vd, void *vn, void *vm,
     uintptr_t i;
 
     for (i = 0; i < opr_sz / 2; ++i) {
-        d[i] = inl_qrdmlsh_s16(n[i], m[i], d[i], vq);
+        d[i] = do_sqrdmlah_h(n[i], m[i], d[i], true, true, vq);
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
 
 /* Signed saturating rounding doubling multiply-accumulate high half, 32-bit */
-static int32_t inl_qrdmlah_s32(int32_t src1, int32_t src2,
-                               int32_t src3, uint32_t *sat)
+static int32_t do_sqrdmlah_s(int32_t src1, int32_t src2, int32_t src3,
+                             bool neg, bool round, uint32_t *sat)
 {
     /* Simplify similarly to int_qrdmlah_s16 above.  */
     int64_t ret = (int64_t)src1 * src2;
-    ret = ((int64_t)src3 << 31) + ret + (1 << 30);
+    if (neg) {
+        ret = -ret;
+    }
+    ret = ((int64_t)src3 << 31) + (round << 30);
     ret >>= 31;
+
     if (ret != (int32_t)ret) {
         *sat = 1;
         ret = (ret < 0 ? INT32_MIN : INT32_MAX);
@@ -139,7 +132,7 @@ uint32_t HELPER(neon_qrdmlah_s32)(CPUARMState *env, int32_t src1,
                                   int32_t src2, int32_t src3)
 {
     uint32_t *sat = &env->vfp.qc[0];
-    return inl_qrdmlah_s32(src1, src2, src3, sat);
+    return do_sqrdmlah_s(src1, src2, src3, false, true, sat);
 }
 
 void HELPER(gvec_qrdmlah_s32)(void *vd, void *vn, void *vm,
@@ -152,31 +145,16 @@ void HELPER(gvec_qrdmlah_s32)(void *vd, void *vn, void *vm,
     uintptr_t i;
 
     for (i = 0; i < opr_sz / 4; ++i) {
-        d[i] = inl_qrdmlah_s32(n[i], m[i], d[i], vq);
+        d[i] = do_sqrdmlah_s(n[i], m[i], d[i], false, true, vq);
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
 
-/* Signed saturating rounding doubling multiply-subtract high half, 32-bit */
-static int32_t inl_qrdmlsh_s32(int32_t src1, int32_t src2,
-                               int32_t src3, uint32_t *sat)
-{
-    /* Simplify similarly to int_qrdmlsh_s16 above.  */
-    int64_t ret = (int64_t)src1 * src2;
-    ret = ((int64_t)src3 << 31) - ret + (1 << 30);
-    ret >>= 31;
-    if (ret != (int32_t)ret) {
-        *sat = 1;
-        ret = (ret < 0 ? INT32_MIN : INT32_MAX);
-    }
-    return ret;
-}
-
 uint32_t HELPER(neon_qrdmlsh_s32)(CPUARMState *env, int32_t src1,
                                   int32_t src2, int32_t src3)
 {
     uint32_t *sat = &env->vfp.qc[0];
-    return inl_qrdmlsh_s32(src1, src2, src3, sat);
+    return do_sqrdmlah_s(src1, src2, src3, true, true, sat);
 }
 
 void HELPER(gvec_qrdmlsh_s32)(void *vd, void *vn, void *vm,
@@ -189,7 +167,7 @@ void HELPER(gvec_qrdmlsh_s32)(void *vd, void *vn, void *vm,
     uintptr_t i;
 
     for (i = 0; i < opr_sz / 4; ++i) {
-        d[i] = inl_qrdmlsh_s32(n[i], m[i], d[i], vq);
+        d[i] = do_sqrdmlah_s(n[i], m[i], d[i], true, true, vq);
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 15/20] target/arm: Fix sve_uzp_p vs odd vector lengths
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
                   ` (13 preceding siblings ...)
  2020-08-15  1:31 ` [PATCH 14/20] target/arm: Generalize inl_qrdmlah_* helper functions Richard Henderson
@ 2020-08-15  1:31 ` Richard Henderson
  2020-08-25 13:43   ` Peter Maydell
  2020-08-15  1:31 ` [PATCH 16/20] target/arm: Fix sve_zip_p " Richard Henderson
                   ` (6 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: Laurent Desnogues, peter.maydell

Missed out on compressing the second half of a predicate
with length vl % 512 > 256.

Adjust all of the x + (y << s) to x | (y << s) as a
general style fix.

Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/sve_helper.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 4758d46f34..fcb46f150f 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -1938,7 +1938,7 @@ void HELPER(sve_uzp_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
     if (oprsz <= 8) {
         l = compress_bits(n[0] >> odd, esz);
         h = compress_bits(m[0] >> odd, esz);
-        d[0] = extract64(l + (h << (4 * oprsz)), 0, 8 * oprsz);
+        d[0] = l | (h << (4 * oprsz));
     } else {
         ARMPredicateReg tmp_m;
         intptr_t oprsz_16 = oprsz / 16;
@@ -1952,23 +1952,35 @@ void HELPER(sve_uzp_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
             h = n[2 * i + 1];
             l = compress_bits(l >> odd, esz);
             h = compress_bits(h >> odd, esz);
-            d[i] = l + (h << 32);
+            d[i] = l | (h << 32);
         }
 
-        /* For VL which is not a power of 2, the results from M do not
-           align nicely with the uint64_t for D.  Put the aligned results
-           from M into TMP_M and then copy it into place afterward.  */
+        /*
+         * For VL which is not a multiple of 512, the results from M do not
+         * align nicely with the uint64_t for D.  Put the aligned results
+         * from M into TMP_M and then copy it into place afterward.
+         */
         if (oprsz & 15) {
-            d[i] = compress_bits(n[2 * i] >> odd, esz);
+            int final_shift = (oprsz & 15) * 2;
+
+            l = n[2 * i + 0];
+            h = n[2 * i + 1];
+            l = compress_bits(l >> odd, esz);
+            h = compress_bits(h >> odd, esz);
+            d[i] = l | (h << final_shift);
 
             for (i = 0; i < oprsz_16; i++) {
                 l = m[2 * i + 0];
                 h = m[2 * i + 1];
                 l = compress_bits(l >> odd, esz);
                 h = compress_bits(h >> odd, esz);
-                tmp_m.p[i] = l + (h << 32);
+                tmp_m.p[i] = l | (h << 32);
             }
-            tmp_m.p[i] = compress_bits(m[2 * i] >> odd, esz);
+            l = m[2 * i + 0];
+            h = m[2 * i + 1];
+            l = compress_bits(l >> odd, esz);
+            h = compress_bits(h >> odd, esz);
+            tmp_m.p[i] = l | (h << final_shift);
 
             swap_memmove(vd + oprsz / 2, &tmp_m, oprsz / 2);
         } else {
@@ -1977,7 +1989,7 @@ void HELPER(sve_uzp_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
                 h = m[2 * i + 1];
                 l = compress_bits(l >> odd, esz);
                 h = compress_bits(h >> odd, esz);
-                d[oprsz_16 + i] = l + (h << 32);
+                d[oprsz_16 + i] = l | (h << 32);
             }
         }
     }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 16/20] target/arm: Fix sve_zip_p vs odd vector lengths
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
                   ` (14 preceding siblings ...)
  2020-08-15  1:31 ` [PATCH 15/20] target/arm: Fix sve_uzp_p vs odd vector lengths Richard Henderson
@ 2020-08-15  1:31 ` Richard Henderson
  2020-08-25 13:49   ` Peter Maydell
  2020-08-15  1:31 ` [PATCH 17/20] target/arm: Fix sve_punpk_p " Richard Henderson
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: Laurent Desnogues, peter.maydell

Wrote too much with low-half zip (zip1) with vl % 512 != 0.

Adjust all of the x + (y << s) to x | (y << s) as a style fix.

Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/sve_helper.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index fcb46f150f..b8651ae173 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -1870,6 +1870,7 @@ void HELPER(sve_zip_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
     intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
     int esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2);
     intptr_t high = extract32(pred_desc, SIMD_DATA_SHIFT + 2, 1);
+    int esize = 1 << esz;
     uint64_t *d = vd;
     intptr_t i;
 
@@ -1882,33 +1883,35 @@ void HELPER(sve_zip_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
         mm = extract64(mm, high * half, half);
         nn = expand_bits(nn, esz);
         mm = expand_bits(mm, esz);
-        d[0] = nn + (mm << (1 << esz));
+        d[0] = nn | (mm << esize);
     } else {
-        ARMPredicateReg tmp_n, tmp_m;
+        ARMPredicateReg tmp;
 
         /* We produce output faster than we consume input.
            Therefore we must be mindful of possible overlap.  */
-        if ((vn - vd) < (uintptr_t)oprsz) {
-            vn = memcpy(&tmp_n, vn, oprsz);
-        }
-        if ((vm - vd) < (uintptr_t)oprsz) {
-            vm = memcpy(&tmp_m, vm, oprsz);
+        if (vd == vn) {
+            vn = memcpy(&tmp, vn, oprsz);
+            if (vd == vm) {
+                vm = vn;
+            }
+        } else if (vd == vm) {
+            vm = memcpy(&tmp, vm, oprsz);
         }
         if (high) {
             high = oprsz >> 1;
         }
 
-        if ((high & 3) == 0) {
+        if ((oprsz & 7) == 0) {
             uint32_t *n = vn, *m = vm;
             high >>= 2;
 
-            for (i = 0; i < DIV_ROUND_UP(oprsz, 8); i++) {
+            for (i = 0; i < oprsz / 8; i++) {
                 uint64_t nn = n[H4(high + i)];
                 uint64_t mm = m[H4(high + i)];
 
                 nn = expand_bits(nn, esz);
                 mm = expand_bits(mm, esz);
-                d[i] = nn + (mm << (1 << esz));
+                d[i] = nn | (mm << esize);
             }
         } else {
             uint8_t *n = vn, *m = vm;
@@ -1920,7 +1923,7 @@ void HELPER(sve_zip_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
 
                 nn = expand_bits(nn, esz);
                 mm = expand_bits(mm, esz);
-                d16[H2(i)] = nn + (mm << (1 << esz));
+                d16[H2(i)] = nn | (mm << esize);
             }
         }
     }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 17/20] target/arm: Fix sve_punpk_p vs odd vector lengths
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
                   ` (15 preceding siblings ...)
  2020-08-15  1:31 ` [PATCH 16/20] target/arm: Fix sve_zip_p " Richard Henderson
@ 2020-08-15  1:31 ` Richard Henderson
  2020-08-25 13:53   ` Peter Maydell
  2020-08-15  1:31 ` [PATCH 18/20] target/arm: Convert integer multiply (indexed) to gvec for aa64 advsimd Richard Henderson
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: Laurent Desnogues, peter.maydell

Wrote too much with punpk1 with vl % 512 != 0.

Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/sve_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index b8651ae173..c983cd4356 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -2104,11 +2104,11 @@ void HELPER(sve_punpk_p)(void *vd, void *vn, uint32_t pred_desc)
             high = oprsz >> 1;
         }
 
-        if ((high & 3) == 0) {
+        if ((oprsz & 7) == 0) {
             uint32_t *n = vn;
             high >>= 2;
 
-            for (i = 0; i < DIV_ROUND_UP(oprsz, 8); i++) {
+            for (i = 0; i < oprsz / 8; i++) {
                 uint64_t nn = n[H4(high + i)];
                 d[i] = expand_bits(nn, 0);
             }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 18/20] target/arm: Convert integer multiply (indexed) to gvec for aa64 advsimd
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
                   ` (16 preceding siblings ...)
  2020-08-15  1:31 ` [PATCH 17/20] target/arm: Fix sve_punpk_p " Richard Henderson
@ 2020-08-15  1:31 ` Richard Henderson
  2020-08-25 13:54   ` Peter Maydell
  2020-08-15  1:31 ` [PATCH 19/20] target/arm: Convert integer multiply-add " Richard Henderson
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h        |  4 ++++
 target/arm/translate-a64.c | 16 ++++++++++++++++
 target/arm/vec_helper.c    | 29 +++++++++++++++++++++++++----
 3 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 759639a63a..d0573a53c8 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -758,6 +758,10 @@ DEF_HELPER_FLAGS_4(gvec_uaba_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_uaba_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_uaba_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(gvec_mul_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_mul_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_mul_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 42aa695dff..d08960a1c8 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -13507,6 +13507,22 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
                                data, gen_helper_gvec_fmlal_idx_a64);
         }
         return;
+
+    case 0x08: /* MUL */
+        if (!is_long && !is_scalar) {
+            static gen_helper_gvec_3 * const fns[3] = {
+                gen_helper_gvec_mul_idx_h,
+                gen_helper_gvec_mul_idx_s,
+                gen_helper_gvec_mul_idx_d,
+            };
+            tcg_gen_gvec_3_ool(vec_full_reg_offset(s, rd),
+                               vec_full_reg_offset(s, rn),
+                               vec_full_reg_offset(s, rm),
+                               is_q ? 16 : 8, vec_full_reg_size(s),
+                               index, fns[size - 1]);
+            return;
+        }
+        break;
     }
 
     if (size == 3) {
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index bbd1141dfc..aa1de36921 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -711,6 +711,27 @@ DO_3OP(gvec_rsqrts_d, helper_rsqrtsf_f64, float64)
  */
 
 #define DO_MUL_IDX(NAME, TYPE, H) \
+void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \
+{                                                                          \
+    intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE);  \
+    intptr_t idx = simd_data(desc);                                        \
+    TYPE *d = vd, *n = vn, *m = vm;                                        \
+    for (i = 0; i < oprsz / sizeof(TYPE); i += segment) {                  \
+        TYPE mm = m[H(i + idx)];                                           \
+        for (j = 0; j < segment; j++) {                                    \
+            d[i + j] = n[i + j] * mm;                                      \
+        }                                                                  \
+    }                                                                      \
+    clear_tail(d, oprsz, simd_maxsz(desc));                                \
+}
+
+DO_MUL_IDX(gvec_mul_idx_h, uint16_t, H2)
+DO_MUL_IDX(gvec_mul_idx_s, uint32_t, H4)
+DO_MUL_IDX(gvec_mul_idx_d, uint64_t, )
+
+#undef DO_MUL_IDX
+
+#define DO_FMUL_IDX(NAME, TYPE, H) \
 void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
 {                                                                          \
     intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE);  \
@@ -725,11 +746,11 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
     clear_tail(d, oprsz, simd_maxsz(desc));                                \
 }
 
-DO_MUL_IDX(gvec_fmul_idx_h, float16, H2)
-DO_MUL_IDX(gvec_fmul_idx_s, float32, H4)
-DO_MUL_IDX(gvec_fmul_idx_d, float64, )
+DO_FMUL_IDX(gvec_fmul_idx_h, float16, H2)
+DO_FMUL_IDX(gvec_fmul_idx_s, float32, H4)
+DO_FMUL_IDX(gvec_fmul_idx_d, float64, )
 
-#undef DO_MUL_IDX
+#undef DO_FMUL_IDX
 
 #define DO_FMLA_IDX(NAME, TYPE, H)                                         \
 void HELPER(NAME)(void *vd, void *vn, void *vm, void *va,                  \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 19/20] target/arm: Convert integer multiply-add (indexed) to gvec for aa64 advsimd
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
                   ` (17 preceding siblings ...)
  2020-08-15  1:31 ` [PATCH 18/20] target/arm: Convert integer multiply (indexed) to gvec for aa64 advsimd Richard Henderson
@ 2020-08-15  1:31 ` Richard Henderson
  2020-08-25 13:55   ` Peter Maydell
  2020-08-15  1:31 ` [PATCH 20/20] target/arm: Convert sq{, r}dmulh " Richard Henderson
                   ` (2 subsequent siblings)
  21 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h        | 14 ++++++++++++++
 target/arm/translate-a64.c | 34 ++++++++++++++++++++++++++++++++++
 target/arm/vec_helper.c    | 25 +++++++++++++++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index d0573a53c8..378bb1898b 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -762,6 +762,20 @@ DEF_HELPER_FLAGS_4(gvec_mul_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_mul_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_mul_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(gvec_mla_idx_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_mla_idx_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_mla_idx_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(gvec_mls_idx_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_mls_idx_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_mls_idx_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index d08960a1c8..c74c6e854c 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -13523,6 +13523,40 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
             return;
         }
         break;
+
+    case 0x10: /* MLA */
+        if (!is_long && !is_scalar) {
+            static gen_helper_gvec_4 * const fns[3] = {
+                gen_helper_gvec_mla_idx_h,
+                gen_helper_gvec_mla_idx_s,
+                gen_helper_gvec_mla_idx_d,
+            };
+            tcg_gen_gvec_4_ool(vec_full_reg_offset(s, rd),
+                               vec_full_reg_offset(s, rn),
+                               vec_full_reg_offset(s, rm),
+                               vec_full_reg_offset(s, rd),
+                               is_q ? 16 : 8, vec_full_reg_size(s),
+                               index, fns[size - 1]);
+            return;
+        }
+        break;
+
+    case 0x14: /* MLS */
+        if (!is_long && !is_scalar) {
+            static gen_helper_gvec_4 * const fns[3] = {
+                gen_helper_gvec_mls_idx_h,
+                gen_helper_gvec_mls_idx_s,
+                gen_helper_gvec_mls_idx_d,
+            };
+            tcg_gen_gvec_4_ool(vec_full_reg_offset(s, rd),
+                               vec_full_reg_offset(s, rn),
+                               vec_full_reg_offset(s, rm),
+                               vec_full_reg_offset(s, rd),
+                               is_q ? 16 : 8, vec_full_reg_size(s),
+                               index, fns[size - 1]);
+            return;
+        }
+        break;
     }
 
     if (size == 3) {
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index aa1de36921..fb53684ce3 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -731,6 +731,31 @@ DO_MUL_IDX(gvec_mul_idx_d, uint64_t, )
 
 #undef DO_MUL_IDX
 
+#define DO_MLA_IDX(NAME, TYPE, OP, H) \
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc)   \
+{                                                                          \
+    intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE);  \
+    intptr_t idx = simd_data(desc);                                        \
+    TYPE *d = vd, *n = vn, *m = vm, *a = va;                               \
+    for (i = 0; i < oprsz / sizeof(TYPE); i += segment) {                  \
+        TYPE mm = m[H(i + idx)];                                           \
+        for (j = 0; j < segment; j++) {                                    \
+            d[i + j] = a[i + j] OP n[i + j] * mm;                          \
+        }                                                                  \
+    }                                                                      \
+    clear_tail(d, oprsz, simd_maxsz(desc));                                \
+}
+
+DO_MLA_IDX(gvec_mla_idx_h, uint16_t, +, H2)
+DO_MLA_IDX(gvec_mla_idx_s, uint32_t, +, H4)
+DO_MLA_IDX(gvec_mla_idx_d, uint64_t, +,   )
+
+DO_MLA_IDX(gvec_mls_idx_h, uint16_t, -, H2)
+DO_MLA_IDX(gvec_mls_idx_s, uint32_t, -, H4)
+DO_MLA_IDX(gvec_mls_idx_d, uint64_t, -,   )
+
+#undef DO_MLA_IDX
+
 #define DO_FMUL_IDX(NAME, TYPE, H) \
 void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
 {                                                                          \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 20/20] target/arm: Convert sq{, r}dmulh to gvec for aa64 advsimd
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
                   ` (18 preceding siblings ...)
  2020-08-15  1:31 ` [PATCH 19/20] target/arm: Convert integer multiply-add " Richard Henderson
@ 2020-08-15  1:31 ` Richard Henderson
  2020-08-25 13:57   ` Peter Maydell
  2020-08-15 17:55 ` [PATCH 00/20] target/arm: SVE2 preparatory patches no-reply
  2020-08-27 18:28 ` Peter Maydell
  21 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-15  1:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h        | 10 ++++++++
 target/arm/translate-a64.c | 33 ++++++++++++++++++--------
 target/arm/vec_helper.c    | 48 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 81 insertions(+), 10 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 378bb1898b..3ca73a1764 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -776,6 +776,16 @@ DEF_HELPER_FLAGS_5(gvec_mls_idx_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_mls_idx_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(neon_sqdmulh_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_sqdmulh_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(neon_sqrdmulh_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_sqrdmulh_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index c74c6e854c..d4da12268c 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -697,6 +697,20 @@ static void gen_gvec_op3_fpst(DisasContext *s, bool is_q, int rd, int rn,
     tcg_temp_free_ptr(fpst);
 }
 
+/* Expand a 3-operand + qc + operation using an out-of-line helper.  */
+static void gen_gvec_op3_qc(DisasContext *s, bool is_q, int rd, int rn,
+                            int rm, gen_helper_gvec_3_ptr *fn)
+{
+    TCGv_ptr qc_ptr = tcg_temp_new_ptr();
+
+    tcg_gen_addi_ptr(qc_ptr, cpu_env, offsetof(CPUARMState, vfp.qc));
+    tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
+                       vec_full_reg_offset(s, rn),
+                       vec_full_reg_offset(s, rm), qc_ptr,
+                       is_q ? 16 : 8, vec_full_reg_size(s), 0, fn);
+    tcg_temp_free_ptr(qc_ptr);
+}
+
 /* Set ZF and NF based on a 64 bit result. This is alas fiddlier
  * than the 32 bit equivalent.
  */
@@ -11753,6 +11767,15 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
             gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_mla, size);
         }
         return;
+    case 0x16: /* SQDMULH, SQRDMULH */
+        {
+            static gen_helper_gvec_3_ptr * const fns[2][2] = {
+                { gen_helper_neon_sqdmulh_h, gen_helper_neon_sqrdmulh_h },
+                { gen_helper_neon_sqdmulh_s, gen_helper_neon_sqrdmulh_s },
+            };
+            gen_gvec_op3_qc(s, is_q, rd, rn, rm, fns[size - 1][u]);
+        }
+        return;
     case 0x11:
         if (!u) { /* CMTST */
             gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_cmtst, size);
@@ -11864,16 +11887,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                 genenvfn = fns[size][u];
                 break;
             }
-            case 0x16: /* SQDMULH, SQRDMULH */
-            {
-                static NeonGenTwoOpEnvFn * const fns[2][2] = {
-                    { gen_helper_neon_qdmulh_s16, gen_helper_neon_qrdmulh_s16 },
-                    { gen_helper_neon_qdmulh_s32, gen_helper_neon_qrdmulh_s32 },
-                };
-                assert(size == 1 || size == 2);
-                genenvfn = fns[size - 1][u];
-                break;
-            }
             default:
                 g_assert_not_reached();
             }
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index fb53684ce3..73d62c4e4f 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -109,6 +109,30 @@ void HELPER(gvec_qrdmlsh_s16)(void *vd, void *vn, void *vm,
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
 
+void HELPER(neon_sqdmulh_h)(void *vd, void *vn, void *vm,
+                            void *vq, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    int16_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 2; ++i) {
+        d[i] = do_sqrdmlah_h(n[i], m[i], 0, false, false, vq);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(neon_sqrdmulh_h)(void *vd, void *vn, void *vm,
+                             void *vq, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    int16_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 2; ++i) {
+        d[i] = do_sqrdmlah_h(n[i], m[i], 0, false, true, vq);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
 /* Signed saturating rounding doubling multiply-accumulate high half, 32-bit */
 static int32_t do_sqrdmlah_s(int32_t src1, int32_t src2, int32_t src3,
                              bool neg, bool round, uint32_t *sat)
@@ -172,6 +196,30 @@ void HELPER(gvec_qrdmlsh_s32)(void *vd, void *vn, void *vm,
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
 
+void HELPER(neon_sqdmulh_s)(void *vd, void *vn, void *vm,
+                            void *vq, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    int32_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 4; ++i) {
+        d[i] = do_sqrdmlah_s(n[i], m[i], 0, false, false, vq);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(neon_sqrdmulh_s)(void *vd, void *vn, void *vm,
+                             void *vq, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    int32_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 4; ++i) {
+        d[i] = do_sqrdmlah_s(n[i], m[i], 0, false, true, vq);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
 /* Integer 8 and 16-bit dot-product.
  *
  * Note that for the loops herein, host endianness does not matter
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/20] target/arm: SVE2 preparatory patches
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
                   ` (19 preceding siblings ...)
  2020-08-15  1:31 ` [PATCH 20/20] target/arm: Convert sq{, r}dmulh " Richard Henderson
@ 2020-08-15 17:55 ` no-reply
  2020-08-27 18:28 ` Peter Maydell
  21 siblings, 0 replies; 49+ messages in thread
From: no-reply @ 2020-08-15 17:55 UTC (permalink / raw)
  To: richard.henderson; +Cc: peter.maydell, qemu-devel

Patchew URL: https://patchew.org/QEMU/20200815013145.539409-1-richard.henderson@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20200815013145.539409-1-richard.henderson@linaro.org
Subject: [PATCH 00/20] target/arm: SVE2 preparatory patches

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]      patchew/20200812183250.9221-1-cfontana@suse.de -> patchew/20200812183250.9221-1-cfontana@suse.de
 - [tag update]      patchew/20200814082841.27000-1-f4bug@amsat.org -> patchew/20200814082841.27000-1-f4bug@amsat.org
 * [new tag]         patchew/20200815013145.539409-1-richard.henderson@linaro.org -> patchew/20200815013145.539409-1-richard.henderson@linaro.org
Switched to a new branch 'test'
ee9e70c target/arm: Convert sq{, r}dmulh to gvec for aa64 advsimd
68d3120 target/arm: Convert integer multiply-add (indexed) to gvec for aa64 advsimd
4915d69 target/arm: Convert integer multiply (indexed) to gvec for aa64 advsimd
4ede158 target/arm: Fix sve_punpk_p vs odd vector lengths
94aae8d target/arm: Fix sve_zip_p vs odd vector lengths
dd7dc33 target/arm: Fix sve_uzp_p vs odd vector lengths
b32338b target/arm: Generalize inl_qrdmlah_* helper functions
095ea16 target/arm: Tidy SVE tszimm shift formats
e34d62c target/arm: Split out gen_gvec_ool_zz
c0d82b9 target/arm: Split out gen_gvec_ool_zzz
a99dbac target/arm: Split out gen_gvec_ool_zzp
cfb28d1 target/arm: Merge helper_sve_clr_* and helper_sve_movz_*
0592c7a target/arm: Split out gen_gvec_ool_zzzp
6e5fc25 target/arm: Use tcg_gen_gvec_bitsel for trans_SEL_pppp
05415f2 target/arm: Clean up 4-operand predicate expansion
7adbccc target/arm: Merge do_vector2_p into do_mov_p
c7cd875 target/arm: Rearrange {sve,fp}_check_access assert
a86390b target/arm: Split out gen_gvec_fn_zzz, do_zzz_fn
289f152 target/arm: Split out gen_gvec_fn_zz
4bbc96a qemu/int128: Add int128_lshift

=== OUTPUT BEGIN ===
1/20 Checking commit 4bbc96a0e6a5 (qemu/int128: Add int128_lshift)
2/20 Checking commit 289f152a8194 (target/arm: Split out gen_gvec_fn_zz)
3/20 Checking commit a86390b7a972 (target/arm: Split out gen_gvec_fn_zzz, do_zzz_fn)
4/20 Checking commit c7cd87586c88 (target/arm: Rearrange {sve,fp}_check_access assert)
5/20 Checking commit 7adbccc844ea (target/arm: Merge do_vector2_p into do_mov_p)
6/20 Checking commit 05415f287adb (target/arm: Clean up 4-operand predicate expansion)
7/20 Checking commit 6e5fc25bd7b0 (target/arm: Use tcg_gen_gvec_bitsel for trans_SEL_pppp)
8/20 Checking commit 0592c7a7e8a5 (target/arm: Split out gen_gvec_ool_zzzp)
9/20 Checking commit cfb28d174ae5 (target/arm: Merge helper_sve_clr_* and helper_sve_movz_*)
10/20 Checking commit a99dbace62f6 (target/arm: Split out gen_gvec_ool_zzp)
11/20 Checking commit c0d82b9048d4 (target/arm: Split out gen_gvec_ool_zzz)
12/20 Checking commit e34d62ca031f (target/arm: Split out gen_gvec_ool_zz)
13/20 Checking commit 095ea164e906 (target/arm: Tidy SVE tszimm shift formats)
14/20 Checking commit b32338b23247 (target/arm: Generalize inl_qrdmlah_* helper functions)
15/20 Checking commit dd7dc33f523b (target/arm: Fix sve_uzp_p vs odd vector lengths)
16/20 Checking commit 94aae8d657bb (target/arm: Fix sve_zip_p vs odd vector lengths)
17/20 Checking commit 4ede158d452e (target/arm: Fix sve_punpk_p vs odd vector lengths)
18/20 Checking commit 4915d696b4a9 (target/arm: Convert integer multiply (indexed) to gvec for aa64 advsimd)
ERROR: space prohibited before that close parenthesis ')'
#76: FILE: target/arm/vec_helper.c:730:
+DO_MUL_IDX(gvec_mul_idx_d, uint64_t, )

ERROR: space prohibited before that close parenthesis ')'
#93: FILE: target/arm/vec_helper.c:751:
+DO_FMUL_IDX(gvec_fmul_idx_d, float64, )

total: 2 errors, 0 warnings, 74 lines checked

Patch 18/20 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

19/20 Checking commit 68d3120b2c82 (target/arm: Convert integer multiply-add (indexed) to gvec for aa64 advsimd)
ERROR: space prohibited before that close parenthesis ')'
#105: FILE: target/arm/vec_helper.c:751:
+DO_MLA_IDX(gvec_mla_idx_d, uint64_t, +,   )

ERROR: space prohibited before that close parenthesis ')'
#109: FILE: target/arm/vec_helper.c:755:
+DO_MLA_IDX(gvec_mls_idx_d, uint64_t, -,   )

total: 2 errors, 0 warnings, 91 lines checked

Patch 19/20 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

20/20 Checking commit ee9e70cbd125 (target/arm: Convert sq{, r}dmulh to gvec for aa64 advsimd)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200815013145.539409-1-richard.henderson@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/20] qemu/int128: Add int128_lshift
  2020-08-15  1:31 ` [PATCH 01/20] qemu/int128: Add int128_lshift Richard Henderson
@ 2020-08-24 16:40   ` Peter Maydell
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Maydell @ 2020-08-24 16:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Sat, 15 Aug 2020 at 02:31, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Add left-shift to match the existing right-shift.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/20] target/arm: Split out gen_gvec_fn_zz
  2020-08-15  1:31 ` [PATCH 02/20] target/arm: Split out gen_gvec_fn_zz Richard Henderson
@ 2020-08-24 16:40   ` Peter Maydell
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Maydell @ 2020-08-24 16:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Sat, 15 Aug 2020 at 02:31, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Model the new function on gen_gvec_fn2 in translate-a64.c, but
> indicating which kind of register and in which order.  Since there
> is only one user of do_vector2_z, fold it into do_mov_z.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 03/20] target/arm: Split out gen_gvec_fn_zzz, do_zzz_fn
  2020-08-15  1:31 ` [PATCH 03/20] target/arm: Split out gen_gvec_fn_zzz, do_zzz_fn Richard Henderson
@ 2020-08-24 16:40   ` Peter Maydell
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Maydell @ 2020-08-24 16:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Sat, 15 Aug 2020 at 02:31, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Model gen_gvec_fn_zzz on gen_gvec_fn3 in translate-a64.c, but
> indicating which kind of register and in which order.
>
> Model do_zzz_fn on the other do_foo functions that take an
> argument set and verify sve enabled.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/20] target/arm: Merge do_vector2_p into do_mov_p
  2020-08-15  1:31 ` [PATCH 05/20] target/arm: Merge do_vector2_p into do_mov_p Richard Henderson
@ 2020-08-24 16:41   ` Peter Maydell
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Maydell @ 2020-08-24 16:41 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Sat, 15 Aug 2020 at 02:31, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> This is the only user of the function.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-sve.c | 19 ++++++-------------
>  1 file changed, 6 insertions(+), 13 deletions(-)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/20] target/arm: Split out gen_gvec_ool_zzzp
  2020-08-15  1:31 ` [PATCH 08/20] target/arm: Split out gen_gvec_ool_zzzp Richard Henderson
@ 2020-08-24 16:43   ` Peter Maydell
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Maydell @ 2020-08-24 16:43 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Sat, 15 Aug 2020 at 02:31, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Model after gen_gvec_fn_zzz et al.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 07/20] target/arm: Use tcg_gen_gvec_bitsel for trans_SEL_pppp
  2020-08-15  1:31 ` [PATCH 07/20] target/arm: Use tcg_gen_gvec_bitsel for trans_SEL_pppp Richard Henderson
@ 2020-08-24 16:44   ` Peter Maydell
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Maydell @ 2020-08-24 16:44 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Sat, 15 Aug 2020 at 02:31, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> The gvec operation was added after the initial implementation
> of the SEL instruction and was missed in the conversion.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-sve.c | 31 ++++++++-----------------------
>  1 file changed, 8 insertions(+), 23 deletions(-)
>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 10/20] target/arm: Split out gen_gvec_ool_zzp
  2020-08-15  1:31 ` [PATCH 10/20] target/arm: Split out gen_gvec_ool_zzp Richard Henderson
@ 2020-08-24 16:46   ` Peter Maydell
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Maydell @ 2020-08-24 16:46 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Sat, 15 Aug 2020 at 02:32, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Model after gen_gvec_fn_zzz et al.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 11/20] target/arm: Split out gen_gvec_ool_zzz
  2020-08-15  1:31 ` [PATCH 11/20] target/arm: Split out gen_gvec_ool_zzz Richard Henderson
@ 2020-08-24 16:47   ` Peter Maydell
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Maydell @ 2020-08-24 16:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Sat, 15 Aug 2020 at 02:32, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-sve.c | 53 +++++++++++++-------------------------
>  1 file changed, 18 insertions(+), 35 deletions(-)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 12/20] target/arm: Split out gen_gvec_ool_zz
  2020-08-15  1:31 ` [PATCH 12/20] target/arm: Split out gen_gvec_ool_zz Richard Henderson
@ 2020-08-24 16:47   ` Peter Maydell
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Maydell @ 2020-08-24 16:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Sat, 15 Aug 2020 at 02:32, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-sve.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)


Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 04/20] target/arm: Rearrange {sve,fp}_check_access assert
  2020-08-15  1:31 ` [PATCH 04/20] target/arm: Rearrange {sve,fp}_check_access assert Richard Henderson
@ 2020-08-24 16:59   ` Peter Maydell
  2020-08-25 13:47     ` Richard Henderson
  0 siblings, 1 reply; 49+ messages in thread
From: Peter Maydell @ 2020-08-24 16:59 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Sat, 15 Aug 2020 at 02:31, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> We want to ensure that access is checked by the time we ask
> for a specific fp/vector register.  We want to ensure that
> we do not emit two lots of code to raise an exception.
>
> But sometimes it's difficult to cleanly organize the code
> such that we never pass through sve_check_access exactly once.
> Allow multiple calls so long as the result is true, that is,
> no exception to be raised.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate.h     |  1 +
>  target/arm/translate-a64.c | 27 ++++++++++++++++-----------
>  2 files changed, 17 insertions(+), 11 deletions(-)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

> diff --git a/target/arm/translate.h b/target/arm/translate.h
> index 16f2699ad7..ad7972eb22 100644
> --- a/target/arm/translate.h
> +++ b/target/arm/translate.h
> @@ -64,6 +64,7 @@ typedef struct DisasContext {
>       * that it is set at the point where we actually touch the FP regs.
>       */
>      bool fp_access_checked;
> +    bool sve_access_checked;

Is there anywhere it's worthwhile to put in an equivalent
of assert_fp_access_checked() for sve_access_checked, or is
there no point that's both (a) common to SVE accesses and
(b) not common to generic FP accesses ? One could put it
in pred_full_reg_offset() I suppose but I dunno if that
really gains us much. The existing fp_access_checked will
catch "forgot the check entirely" which seems more likely
as a bug than "put in the FP check when we wanted SVE".

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/20] target/arm: Clean up 4-operand predicate expansion
  2020-08-15  1:31 ` [PATCH 06/20] target/arm: Clean up 4-operand predicate expansion Richard Henderson
@ 2020-08-25 11:13   ` Peter Maydell
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Maydell @ 2020-08-25 11:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Sat, 15 Aug 2020 at 02:31, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Move the check for !S into do_pppp_flags, which allows to merge in
> do_vecop4_p.  Split out gen_gvec_fn_ppp without sve_access_check,
> to mirror gen_gvec_fn_zzz.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-sve.c | 111 ++++++++++++++-----------------------
>  1 file changed, 43 insertions(+), 68 deletions(-)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 09/20] target/arm: Merge helper_sve_clr_* and helper_sve_movz_*
  2020-08-15  1:31 ` [PATCH 09/20] target/arm: Merge helper_sve_clr_* and helper_sve_movz_* Richard Henderson
@ 2020-08-25 11:16   ` Peter Maydell
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Maydell @ 2020-08-25 11:16 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Sat, 15 Aug 2020 at 02:32, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> The existing clr functions have only one vector argument, and so
> can only clear in place.  The existing movz functions have two
> vector arguments, and so can clear while moving.  Merge them, with
> a flag that controls the sense of active vs inactive elements
> being cleared.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 13/20] target/arm: Tidy SVE tszimm shift formats
  2020-08-15  1:31 ` [PATCH 13/20] target/arm: Tidy SVE tszimm shift formats Richard Henderson
@ 2020-08-25 11:18   ` Peter Maydell
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Maydell @ 2020-08-25 11:18 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Sat, 15 Aug 2020 at 02:32, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Rather than require the user to fill in the immediate (shl or shr),
> create full formats that include the immediate.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/sve.decode | 35 ++++++++++++++++-------------------
>  1 file changed, 16 insertions(+), 19 deletions(-)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 14/20] target/arm: Generalize inl_qrdmlah_* helper functions
  2020-08-15  1:31 ` [PATCH 14/20] target/arm: Generalize inl_qrdmlah_* helper functions Richard Henderson
@ 2020-08-25 13:06   ` Peter Maydell
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Maydell @ 2020-08-25 13:06 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Sat, 15 Aug 2020 at 02:32, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Unify add/sub helpers and add a parameter for rounding.
> This will allow saturating non-rounding to reuse this code.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

>  /* Signed saturating rounding doubling multiply-accumulate high half, 32-bit */
> -static int32_t inl_qrdmlah_s32(int32_t src1, int32_t src2,
> -                               int32_t src3, uint32_t *sat)
> +static int32_t do_sqrdmlah_s(int32_t src1, int32_t src2, int32_t src3,
> +                             bool neg, bool round, uint32_t *sat)
>  {
>      /* Simplify similarly to int_qrdmlah_s16 above.  */
>      int64_t ret = (int64_t)src1 * src2;
> -    ret = ((int64_t)src3 << 31) + ret + (1 << 30);
> +    if (neg) {
> +        ret = -ret;
> +    }
> +    ret = ((int64_t)src3 << 31) + (round << 30);

Shouldn't this be "+=" as with the _h version earlier ?
(risu testing ought to catch this -- do we have a coverage hole?)

>      ret >>= 31;
> +
>      if (ret != (int32_t)ret) {
>          *sat = 1;
>          ret = (ret < 0 ? INT32_MIN : INT32_MAX);

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 15/20] target/arm: Fix sve_uzp_p vs odd vector lengths
  2020-08-15  1:31 ` [PATCH 15/20] target/arm: Fix sve_uzp_p vs odd vector lengths Richard Henderson
@ 2020-08-25 13:43   ` Peter Maydell
  2020-08-25 14:02     ` Richard Henderson
  0 siblings, 1 reply; 49+ messages in thread
From: Peter Maydell @ 2020-08-25 13:43 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Laurent Desnogues, QEMU Developers

On Sat, 15 Aug 2020 at 02:32, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Missed out on compressing the second half of a predicate
> with length vl % 512 > 256.
>
> Adjust all of the x + (y << s) to x | (y << s) as a
> general style fix.
>
> Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/sve_helper.c | 30 +++++++++++++++++++++---------
>  1 file changed, 21 insertions(+), 9 deletions(-)
>
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index 4758d46f34..fcb46f150f 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -1938,7 +1938,7 @@ void HELPER(sve_uzp_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
>      if (oprsz <= 8) {
>          l = compress_bits(n[0] >> odd, esz);
>          h = compress_bits(m[0] >> odd, esz);
> -        d[0] = extract64(l + (h << (4 * oprsz)), 0, 8 * oprsz);
> +        d[0] = l | (h << (4 * oprsz));

Why did we drop the extract64() here ? This doesn't seem
to correspond to either of the things the commit message
says we're doing.

Also, if oprsz is < 8, don't we need to mask out the high
bits in l that would otherwise overlap with h << (4 * oprsz) ?
Are they guaranteed zeroes somehow?

>      } else {
>          ARMPredicateReg tmp_m;
>          intptr_t oprsz_16 = oprsz / 16;
> @@ -1952,23 +1952,35 @@ void HELPER(sve_uzp_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
>              h = n[2 * i + 1];
>              l = compress_bits(l >> odd, esz);
>              h = compress_bits(h >> odd, esz);
> -            d[i] = l + (h << 32);
> +            d[i] = l | (h << 32);
>          }
>
> -        /* For VL which is not a power of 2, the results from M do not
> -           align nicely with the uint64_t for D.  Put the aligned results
> -           from M into TMP_M and then copy it into place afterward.  */
> +        /*
> +         * For VL which is not a multiple of 512, the results from M do not
> +         * align nicely with the uint64_t for D.  Put the aligned results
> +         * from M into TMP_M and then copy it into place afterward.
> +         */
>          if (oprsz & 15) {
> -            d[i] = compress_bits(n[2 * i] >> odd, esz);
> +            int final_shift = (oprsz & 15) * 2;
> +
> +            l = n[2 * i + 0];
> +            h = n[2 * i + 1];
> +            l = compress_bits(l >> odd, esz);
> +            h = compress_bits(h >> odd, esz);
> +            d[i] = l | (h << final_shift);

Similarly here, why don't we need to mask out the top parts of l and h ?

>
>              for (i = 0; i < oprsz_16; i++) {
>                  l = m[2 * i + 0];
>                  h = m[2 * i + 1];
>                  l = compress_bits(l >> odd, esz);
>                  h = compress_bits(h >> odd, esz);
> -                tmp_m.p[i] = l + (h << 32);
> +                tmp_m.p[i] = l | (h << 32);
>              }
> -            tmp_m.p[i] = compress_bits(m[2 * i] >> odd, esz);
> +            l = m[2 * i + 0];
> +            h = m[2 * i + 1];
> +            l = compress_bits(l >> odd, esz);
> +            h = compress_bits(h >> odd, esz);
> +            tmp_m.p[i] = l | (h << final_shift);
>
>              swap_memmove(vd + oprsz / 2, &tmp_m, oprsz / 2);

Aren't there cases where the 'n' part of the result doesn't
end up a whole number of bytes and we have to do a shift as
well as a byte copy?

>          } else {
> @@ -1977,7 +1989,7 @@ void HELPER(sve_uzp_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
>                  h = m[2 * i + 1];
>                  l = compress_bits(l >> odd, esz);
>                  h = compress_bits(h >> odd, esz);
> -                d[oprsz_16 + i] = l + (h << 32);
> +                d[oprsz_16 + i] = l | (h << 32);
>              }
>          }
>      }

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 04/20] target/arm: Rearrange {sve,fp}_check_access assert
  2020-08-24 16:59   ` Peter Maydell
@ 2020-08-25 13:47     ` Richard Henderson
  0 siblings, 0 replies; 49+ messages in thread
From: Richard Henderson @ 2020-08-25 13:47 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On 8/24/20 9:59 AM, Peter Maydell wrote:
>> +    bool sve_access_checked;
> 
> Is there anywhere it's worthwhile to put in an equivalent
> of assert_fp_access_checked() for sve_access_checked, or is
> there no point that's both (a) common to SVE accesses and
> (b) not common to generic FP accesses ? One could put it
> in pred_full_reg_offset() I suppose but I dunno if that
> really gains us much. The existing fp_access_checked will
> catch "forgot the check entirely" which seems more likely
> as a bug than "put in the FP check when we wanted SVE".

While adding one to pred_full_ref_offset() might be useful, there are plenty of
sve instructions that don't touch predicate registers.

I suppose I could make vec_full_reg_offset() be different between
translate-a64.c and translate-sve.c, rather than sharing it via translate-a64.h.


r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 16/20] target/arm: Fix sve_zip_p vs odd vector lengths
  2020-08-15  1:31 ` [PATCH 16/20] target/arm: Fix sve_zip_p " Richard Henderson
@ 2020-08-25 13:49   ` Peter Maydell
  2020-08-28 19:26     ` Richard Henderson
  0 siblings, 1 reply; 49+ messages in thread
From: Peter Maydell @ 2020-08-25 13:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Laurent Desnogues, QEMU Developers

On Sat, 15 Aug 2020 at 02:32, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Wrote too much with low-half zip (zip1) with vl % 512 != 0.
>
> Adjust all of the x + (y << s) to x | (y << s) as a style fix.
>
> Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/sve_helper.c | 25 ++++++++++++++-----------
>  1 file changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index fcb46f150f..b8651ae173 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -1870,6 +1870,7 @@ void HELPER(sve_zip_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
>      intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
>      int esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2);
>      intptr_t high = extract32(pred_desc, SIMD_DATA_SHIFT + 2, 1);
> +    int esize = 1 << esz;
>      uint64_t *d = vd;
>      intptr_t i;
>
> @@ -1882,33 +1883,35 @@ void HELPER(sve_zip_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
>          mm = extract64(mm, high * half, half);
>          nn = expand_bits(nn, esz);
>          mm = expand_bits(mm, esz);
> -        d[0] = nn + (mm << (1 << esz));
> +        d[0] = nn | (mm << esize);
>      } else {
> -        ARMPredicateReg tmp_n, tmp_m;
> +        ARMPredicateReg tmp;
>
>          /* We produce output faster than we consume input.
>             Therefore we must be mindful of possible overlap.  */
> -        if ((vn - vd) < (uintptr_t)oprsz) {
> -            vn = memcpy(&tmp_n, vn, oprsz);
> -        }
> -        if ((vm - vd) < (uintptr_t)oprsz) {
> -            vm = memcpy(&tmp_m, vm, oprsz);
> +        if (vd == vn) {
> +            vn = memcpy(&tmp, vn, oprsz);
> +            if (vd == vm) {
> +                vm = vn;
> +            }
> +        } else if (vd == vm) {
> +            vm = memcpy(&tmp, vm, oprsz);

Why is it OK to only check vd==vn etc rather than checking for
overlap the way the old code did ? The commit message doesn't
mention this.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 17/20] target/arm: Fix sve_punpk_p vs odd vector lengths
  2020-08-15  1:31 ` [PATCH 17/20] target/arm: Fix sve_punpk_p " Richard Henderson
@ 2020-08-25 13:53   ` Peter Maydell
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Maydell @ 2020-08-25 13:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Laurent Desnogues, QEMU Developers

On Sat, 15 Aug 2020 at 02:32, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Wrote too much with punpk1 with vl % 512 != 0.
>
> Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/sve_helper.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index b8651ae173..c983cd4356 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -2104,11 +2104,11 @@ void HELPER(sve_punpk_p)(void *vd, void *vn, uint32_t pred_desc)
>              high = oprsz >> 1;
>          }
>
> -        if ((high & 3) == 0) {
> +        if ((oprsz & 7) == 0) {
>              uint32_t *n = vn;
>              high >>= 2;
>
> -            for (i = 0; i < DIV_ROUND_UP(oprsz, 8); i++) {
> +            for (i = 0; i < oprsz / 8; i++) {
>                  uint64_t nn = n[H4(high + i)];
>                  d[i] = expand_bits(nn, 0);
>              }
> --
> 2.25.1

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 18/20] target/arm: Convert integer multiply (indexed) to gvec for aa64 advsimd
  2020-08-15  1:31 ` [PATCH 18/20] target/arm: Convert integer multiply (indexed) to gvec for aa64 advsimd Richard Henderson
@ 2020-08-25 13:54   ` Peter Maydell
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Maydell @ 2020-08-25 13:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Sat, 15 Aug 2020 at 02:32, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper.h        |  4 ++++
>  target/arm/translate-a64.c | 16 ++++++++++++++++
>  target/arm/vec_helper.c    | 29 +++++++++++++++++++++++++----
>  3 files changed, 45 insertions(+), 4 deletions(-)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 19/20] target/arm: Convert integer multiply-add (indexed) to gvec for aa64 advsimd
  2020-08-15  1:31 ` [PATCH 19/20] target/arm: Convert integer multiply-add " Richard Henderson
@ 2020-08-25 13:55   ` Peter Maydell
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Maydell @ 2020-08-25 13:55 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Sat, 15 Aug 2020 at 02:32, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper.h        | 14 ++++++++++++++
>  target/arm/translate-a64.c | 34 ++++++++++++++++++++++++++++++++++
>  target/arm/vec_helper.c    | 25 +++++++++++++++++++++++++
>  3 files changed, 73 insertions(+)


Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 20/20] target/arm: Convert sq{, r}dmulh to gvec for aa64 advsimd
  2020-08-15  1:31 ` [PATCH 20/20] target/arm: Convert sq{, r}dmulh " Richard Henderson
@ 2020-08-25 13:57   ` Peter Maydell
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Maydell @ 2020-08-25 13:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Sat, 15 Aug 2020 at 02:32, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper.h        | 10 ++++++++
>  target/arm/translate-a64.c | 33 ++++++++++++++++++--------
>  target/arm/vec_helper.c    | 48 ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 81 insertions(+), 10 deletions(-)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 15/20] target/arm: Fix sve_uzp_p vs odd vector lengths
  2020-08-25 13:43   ` Peter Maydell
@ 2020-08-25 14:02     ` Richard Henderson
  2020-08-25 14:09       ` Peter Maydell
  0 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-25 14:02 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Laurent Desnogues, QEMU Developers

On 8/25/20 6:43 AM, Peter Maydell wrote:
> On Sat, 15 Aug 2020 at 02:32, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> Missed out on compressing the second half of a predicate
>> with length vl % 512 > 256.
>>
>> Adjust all of the x + (y << s) to x | (y << s) as a
>> general style fix.
>>
>> Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  target/arm/sve_helper.c | 30 +++++++++++++++++++++---------
>>  1 file changed, 21 insertions(+), 9 deletions(-)
>>
>> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
>> index 4758d46f34..fcb46f150f 100644
>> --- a/target/arm/sve_helper.c
>> +++ b/target/arm/sve_helper.c
>> @@ -1938,7 +1938,7 @@ void HELPER(sve_uzp_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
>>      if (oprsz <= 8) {
>>          l = compress_bits(n[0] >> odd, esz);
>>          h = compress_bits(m[0] >> odd, esz);
>> -        d[0] = extract64(l + (h << (4 * oprsz)), 0, 8 * oprsz);
>> +        d[0] = l | (h << (4 * oprsz));
> 
> Why did we drop the extract64() here ? This doesn't seem
> to correspond to either of the things the commit message
> says we're doing.

Indeed, the commit message could use expansion.

> Also, if oprsz is < 8, don't we need to mask out the high
> bits in l that would otherwise overlap with h << (4 * oprsz) ?
> Are they guaranteed zeroes somehow?

They are guaranteed zeros.  See aarch64_sve_narrow_vq.

>>              for (i = 0; i < oprsz_16; i++) {
>>                  l = m[2 * i + 0];
>>                  h = m[2 * i + 1];
>>                  l = compress_bits(l >> odd, esz);
>>                  h = compress_bits(h >> odd, esz);
>> -                tmp_m.p[i] = l + (h << 32);
>> +                tmp_m.p[i] = l | (h << 32);
>>              }
>> -            tmp_m.p[i] = compress_bits(m[2 * i] >> odd, esz);
>> +            l = m[2 * i + 0];
>> +            h = m[2 * i + 1];
>> +            l = compress_bits(l >> odd, esz);
>> +            h = compress_bits(h >> odd, esz);
>> +            tmp_m.p[i] = l | (h << final_shift);
>>
>>              swap_memmove(vd + oprsz / 2, &tmp_m, oprsz / 2);
> 
> Aren't there cases where the 'n' part of the result doesn't
> end up a whole number of bytes and we have to do a shift as
> well as a byte copy?

No, oprsz will always be a multiple of 2 for predicates.
Just like oprsz will always be a multiple of 16 for sve vectors.


r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 15/20] target/arm: Fix sve_uzp_p vs odd vector lengths
  2020-08-25 14:02     ` Richard Henderson
@ 2020-08-25 14:09       ` Peter Maydell
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Maydell @ 2020-08-25 14:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Laurent Desnogues, QEMU Developers

On Tue, 25 Aug 2020 at 15:02, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 8/25/20 6:43 AM, Peter Maydell wrote:
> > On Sat, 15 Aug 2020 at 02:32, Richard Henderson
> > <richard.henderson@linaro.org> wrote:
> >>
> >> Missed out on compressing the second half of a predicate
> >> with length vl % 512 > 256.
> >>
> >> Adjust all of the x + (y << s) to x | (y << s) as a
> >> general style fix.
> >>
> >> Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> >> ---
> >>  target/arm/sve_helper.c | 30 +++++++++++++++++++++---------
> >>  1 file changed, 21 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> >> index 4758d46f34..fcb46f150f 100644
> >> --- a/target/arm/sve_helper.c
> >> +++ b/target/arm/sve_helper.c
> >> @@ -1938,7 +1938,7 @@ void HELPER(sve_uzp_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
> >>      if (oprsz <= 8) {
> >>          l = compress_bits(n[0] >> odd, esz);
> >>          h = compress_bits(m[0] >> odd, esz);
> >> -        d[0] = extract64(l + (h << (4 * oprsz)), 0, 8 * oprsz);
> >> +        d[0] = l | (h << (4 * oprsz));
> >
> > Why did we drop the extract64() here ? This doesn't seem
> > to correspond to either of the things the commit message
> > says we're doing.
>
> Indeed, the commit message could use expansion.
>
> > Also, if oprsz is < 8, don't we need to mask out the high
> > bits in l that would otherwise overlap with h << (4 * oprsz) ?
> > Are they guaranteed zeroes somehow?
>
> They are guaranteed zeros.  See aarch64_sve_narrow_vq.
>
> >>              for (i = 0; i < oprsz_16; i++) {
> >>                  l = m[2 * i + 0];
> >>                  h = m[2 * i + 1];
> >>                  l = compress_bits(l >> odd, esz);
> >>                  h = compress_bits(h >> odd, esz);
> >> -                tmp_m.p[i] = l + (h << 32);
> >> +                tmp_m.p[i] = l | (h << 32);
> >>              }
> >> -            tmp_m.p[i] = compress_bits(m[2 * i] >> odd, esz);
> >> +            l = m[2 * i + 0];
> >> +            h = m[2 * i + 1];
> >> +            l = compress_bits(l >> odd, esz);
> >> +            h = compress_bits(h >> odd, esz);
> >> +            tmp_m.p[i] = l | (h << final_shift);
> >>
> >>              swap_memmove(vd + oprsz / 2, &tmp_m, oprsz / 2);
> >
> > Aren't there cases where the 'n' part of the result doesn't
> > end up a whole number of bytes and we have to do a shift as
> > well as a byte copy?
>
> No, oprsz will always be a multiple of 2 for predicates.

Ah, I see, so final_shift is a multiple of 4, and (if it's
not also a multiple of 8) the last byte of the 'n' part of
the result is then 4 bits from n[2 * i] and 4 bits from
n[2 * i + 1] making up a complete byte.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/20] target/arm: SVE2 preparatory patches
  2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
                   ` (20 preceding siblings ...)
  2020-08-15 17:55 ` [PATCH 00/20] target/arm: SVE2 preparatory patches no-reply
@ 2020-08-27 18:28 ` Peter Maydell
  2020-08-27 21:12   ` Richard Henderson
  21 siblings, 1 reply; 49+ messages in thread
From: Peter Maydell @ 2020-08-27 18:28 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Sat, 15 Aug 2020 at 02:31, Richard Henderson
<richard.henderson@linaro.org> wrote:
> This is collection of cleanups and changes that are required by
> SVE2, but do not directly implement it.  The final 3 patches are
> relevant to Peter's aa32 neon work.

If you agree with me about my suggested bugfix (s/=/+=/) in patch 14,
I can take the reviewed patches (1-14,18-20) into target-arm.next
(which will be useful for me as I need 14,18-20 for my neon work).

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/20] target/arm: SVE2 preparatory patches
  2020-08-27 18:28 ` Peter Maydell
@ 2020-08-27 21:12   ` Richard Henderson
  0 siblings, 0 replies; 49+ messages in thread
From: Richard Henderson @ 2020-08-27 21:12 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On 8/27/20 11:28 AM, Peter Maydell wrote:
> On Sat, 15 Aug 2020 at 02:31, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> This is collection of cleanups and changes that are required by
>> SVE2, but do not directly implement it.  The final 3 patches are
>> relevant to Peter's aa32 neon work.
> 
> If you agree with me about my suggested bugfix (s/=/+=/) in patch 14,
> I can take the reviewed patches (1-14,18-20) into target-arm.next
> (which will be useful for me as I need 14,18-20 for my neon work).

Yes, please.  If once done you'll push to your target-arm.next, I can rebase
the rest of the patches on that.


r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 16/20] target/arm: Fix sve_zip_p vs odd vector lengths
  2020-08-25 13:49   ` Peter Maydell
@ 2020-08-28 19:26     ` Richard Henderson
  2020-08-28 23:01       ` Peter Maydell
  0 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2020-08-28 19:26 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Laurent Desnogues, QEMU Developers

On 8/25/20 6:49 AM, Peter Maydell wrote:
> On Sat, 15 Aug 2020 at 02:32, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> Wrote too much with low-half zip (zip1) with vl % 512 != 0.
>>
>> Adjust all of the x + (y << s) to x | (y << s) as a style fix.
>>
>> Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  target/arm/sve_helper.c | 25 ++++++++++++++-----------
>>  1 file changed, 14 insertions(+), 11 deletions(-)
>>
>> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
>> index fcb46f150f..b8651ae173 100644
>> --- a/target/arm/sve_helper.c
>> +++ b/target/arm/sve_helper.c
>> @@ -1870,6 +1870,7 @@ void HELPER(sve_zip_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
>>      intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
>>      int esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2);
>>      intptr_t high = extract32(pred_desc, SIMD_DATA_SHIFT + 2, 1);
>> +    int esize = 1 << esz;
>>      uint64_t *d = vd;
>>      intptr_t i;
>>
>> @@ -1882,33 +1883,35 @@ void HELPER(sve_zip_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
>>          mm = extract64(mm, high * half, half);
>>          nn = expand_bits(nn, esz);
>>          mm = expand_bits(mm, esz);
>> -        d[0] = nn + (mm << (1 << esz));
>> +        d[0] = nn | (mm << esize);
>>      } else {
>> -        ARMPredicateReg tmp_n, tmp_m;
>> +        ARMPredicateReg tmp;
>>
>>          /* We produce output faster than we consume input.
>>             Therefore we must be mindful of possible overlap.  */
>> -        if ((vn - vd) < (uintptr_t)oprsz) {
>> -            vn = memcpy(&tmp_n, vn, oprsz);
>> -        }
>> -        if ((vm - vd) < (uintptr_t)oprsz) {
>> -            vm = memcpy(&tmp_m, vm, oprsz);
>> +        if (vd == vn) {
>> +            vn = memcpy(&tmp, vn, oprsz);
>> +            if (vd == vm) {
>> +                vm = vn;
>> +            }
>> +        } else if (vd == vm) {
>> +            vm = memcpy(&tmp, vm, oprsz);
> 
> Why is it OK to only check vd==vn etc rather than checking for
> overlap the way the old code did ? The commit message doesn't
> mention this.

We only ever pass pred_full_reg_offset, so there will only ever be exact
overlap.  I can either split this out as a separate change or simply add it to
the patch description.


r~



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 16/20] target/arm: Fix sve_zip_p vs odd vector lengths
  2020-08-28 19:26     ` Richard Henderson
@ 2020-08-28 23:01       ` Peter Maydell
  0 siblings, 0 replies; 49+ messages in thread
From: Peter Maydell @ 2020-08-28 23:01 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Laurent Desnogues, QEMU Developers

On Fri, 28 Aug 2020 at 20:26, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 8/25/20 6:49 AM, Peter Maydell wrote:
> > Why is it OK to only check vd==vn etc rather than checking for
> > overlap the way the old code did ? The commit message doesn't
> > mention this.
>
> We only ever pass pred_full_reg_offset, so there will only ever be exact
> overlap.  I can either split this out as a separate change or simply add it to
> the patch description.

Whichever you prefer, I guess.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2020-08-28 23:02 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-15  1:31 [PATCH 00/20] target/arm: SVE2 preparatory patches Richard Henderson
2020-08-15  1:31 ` [PATCH 01/20] qemu/int128: Add int128_lshift Richard Henderson
2020-08-24 16:40   ` Peter Maydell
2020-08-15  1:31 ` [PATCH 02/20] target/arm: Split out gen_gvec_fn_zz Richard Henderson
2020-08-24 16:40   ` Peter Maydell
2020-08-15  1:31 ` [PATCH 03/20] target/arm: Split out gen_gvec_fn_zzz, do_zzz_fn Richard Henderson
2020-08-24 16:40   ` Peter Maydell
2020-08-15  1:31 ` [PATCH 04/20] target/arm: Rearrange {sve,fp}_check_access assert Richard Henderson
2020-08-24 16:59   ` Peter Maydell
2020-08-25 13:47     ` Richard Henderson
2020-08-15  1:31 ` [PATCH 05/20] target/arm: Merge do_vector2_p into do_mov_p Richard Henderson
2020-08-24 16:41   ` Peter Maydell
2020-08-15  1:31 ` [PATCH 06/20] target/arm: Clean up 4-operand predicate expansion Richard Henderson
2020-08-25 11:13   ` Peter Maydell
2020-08-15  1:31 ` [PATCH 07/20] target/arm: Use tcg_gen_gvec_bitsel for trans_SEL_pppp Richard Henderson
2020-08-24 16:44   ` Peter Maydell
2020-08-15  1:31 ` [PATCH 08/20] target/arm: Split out gen_gvec_ool_zzzp Richard Henderson
2020-08-24 16:43   ` Peter Maydell
2020-08-15  1:31 ` [PATCH 09/20] target/arm: Merge helper_sve_clr_* and helper_sve_movz_* Richard Henderson
2020-08-25 11:16   ` Peter Maydell
2020-08-15  1:31 ` [PATCH 10/20] target/arm: Split out gen_gvec_ool_zzp Richard Henderson
2020-08-24 16:46   ` Peter Maydell
2020-08-15  1:31 ` [PATCH 11/20] target/arm: Split out gen_gvec_ool_zzz Richard Henderson
2020-08-24 16:47   ` Peter Maydell
2020-08-15  1:31 ` [PATCH 12/20] target/arm: Split out gen_gvec_ool_zz Richard Henderson
2020-08-24 16:47   ` Peter Maydell
2020-08-15  1:31 ` [PATCH 13/20] target/arm: Tidy SVE tszimm shift formats Richard Henderson
2020-08-25 11:18   ` Peter Maydell
2020-08-15  1:31 ` [PATCH 14/20] target/arm: Generalize inl_qrdmlah_* helper functions Richard Henderson
2020-08-25 13:06   ` Peter Maydell
2020-08-15  1:31 ` [PATCH 15/20] target/arm: Fix sve_uzp_p vs odd vector lengths Richard Henderson
2020-08-25 13:43   ` Peter Maydell
2020-08-25 14:02     ` Richard Henderson
2020-08-25 14:09       ` Peter Maydell
2020-08-15  1:31 ` [PATCH 16/20] target/arm: Fix sve_zip_p " Richard Henderson
2020-08-25 13:49   ` Peter Maydell
2020-08-28 19:26     ` Richard Henderson
2020-08-28 23:01       ` Peter Maydell
2020-08-15  1:31 ` [PATCH 17/20] target/arm: Fix sve_punpk_p " Richard Henderson
2020-08-25 13:53   ` Peter Maydell
2020-08-15  1:31 ` [PATCH 18/20] target/arm: Convert integer multiply (indexed) to gvec for aa64 advsimd Richard Henderson
2020-08-25 13:54   ` Peter Maydell
2020-08-15  1:31 ` [PATCH 19/20] target/arm: Convert integer multiply-add " Richard Henderson
2020-08-25 13:55   ` Peter Maydell
2020-08-15  1:31 ` [PATCH 20/20] target/arm: Convert sq{, r}dmulh " Richard Henderson
2020-08-25 13:57   ` Peter Maydell
2020-08-15 17:55 ` [PATCH 00/20] target/arm: SVE2 preparatory patches no-reply
2020-08-27 18:28 ` Peter Maydell
2020-08-27 21:12   ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.