All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension
@ 2018-02-17 18:22 Richard Henderson
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 01/67] target/arm: Enable SVE for aarch64-linux-user Richard Henderson
                   ` (68 more replies)
  0 siblings, 69 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

This is 99% of the instruction set.  There are a few things missing,
notably first-fault and non-fault loads (even these are decoded, but
simply treated as normal loads for now).

The patch set is dependant on at least 3 other branches.
A fully composed tree is available as

  git://github.com/rth7680/qemu.git tgt-arm-sve-7

There are a few checkpatch errors due to macros and typedefs, but
nothing that isn't be obvious as a false positive.

This is able to run SVE enabled Himeno and LULESH benchmarks as
compiled by last week's gcc-8:

$ ./aarch64-linux-user/qemu-aarch64 ~/himeno-advsimd
mimax = 129 mjmax = 65 mkmax = 65
imax = 128 jmax = 64 kmax =64
cpu : 67.028643 sec.
Loop executed for 200 times
Gosa : 1.688752e-03 
MFLOPS measured : 49.136295
Score based on MMX Pentium 200MHz : 1.522662

$ ./aarch64-linux-user/qemu-aarch64 ~/himeno-sve 
mimax = 129 mjmax = 65 mkmax = 65
imax = 128 jmax = 64 kmax =64
cpu : 43.481213 sec.
Loop executed for 200 times
Gosa : 3.830036e-06 
MFLOPS measured : 75.746259
Score based on MMX Pentium 200MHz : 2.347266

Hopefully the size of the patch set isn't too daunting...


r~


Richard Henderson (67):
  target/arm: Enable SVE for aarch64-linux-user
  target/arm: Introduce translate-a64.h
  target/arm: Add SVE decode skeleton
  target/arm: Implement SVE Bitwise Logical - Unpredicated Group
  target/arm: Implement SVE load vector/predicate
  target/arm: Implement SVE predicate test
  target/arm: Implement SVE Predicate Logical Operations Group
  target/arm: Implement SVE Predicate Misc Group
  target/arm: Implement SVE Integer Binary Arithmetic - Predicated Group
  target/arm: Implement SVE Integer Reduction Group
  target/arm: Implement SVE bitwise shift by immediate (predicated)
  target/arm: Implement SVE bitwise shift by vector (predicated)
  target/arm: Implement SVE bitwise shift by wide elements (predicated)
  target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group
  target/arm: Implement SVE Integer Multiply-Add Group
  target/arm: Implement SVE Integer Arithmetic - Unpredicated Group
  target/arm: Implement SVE Index Generation Group
  target/arm: Implement SVE Stack Allocation Group
  target/arm: Implement SVE Bitwise Shift - Unpredicated Group
  target/arm: Implement SVE Compute Vector Address Group
  target/arm: Implement SVE floating-point exponential accelerator
  target/arm: Implement SVE floating-point trig select coefficient
  target/arm: Implement SVE Element Count Group
  target/arm: Implement SVE Bitwise Immediate Group
  target/arm: Implement SVE Integer Wide Immediate - Predicated Group
  target/arm: Implement SVE Permute - Extract Group
  target/arm: Implement SVE Permute - Unpredicated Group
  target/arm: Implement SVE Permute - Predicates Group
  target/arm: Implement SVE Permute - Interleaving Group
  target/arm: Implement SVE compress active elements
  target/arm: Implement SVE conditionally broadcast/extract element
  target/arm: Implement SVE copy to vector (predicated)
  target/arm: Implement SVE reverse within elements
  target/arm: Implement SVE vector splice (predicated)
  target/arm: Implement SVE Select Vectors Group
  target/arm: Implement SVE Integer Compare - Vectors Group
  target/arm: Implement SVE Integer Compare - Immediate Group
  target/arm: Implement SVE Partition Break Group
  target/arm: Implement SVE Predicate Count Group
  target/arm: Implement SVE Integer Compare - Scalars Group
  target/arm: Implement FDUP/DUP
  target/arm: Implement SVE Integer Wide Immediate - Unpredicated Group
  target/arm: Implement SVE Floating Point Arithmetic - Unpredicated
    Group
  target/arm: Implement SVE Memory Contiguous Load Group
  target/arm: Implement SVE Memory Contiguous Store Group
  target/arm: Implement SVE load and broadcast quadword
  target/arm: Implement SVE integer convert to floating-point
  target/arm: Implement SVE floating-point arithmetic (predicated)
  target/arm: Implement SVE FP Multiply-Add Group
  target/arm: Implement SVE Floating Point Accumulating Reduction Group
  target/arm: Implement SVE load and broadcast element
  target/arm: Implement SVE store vector/predicate register
  target/arm: Implement SVE scatter stores
  target/arm: Implement SVE prefetches
  target/arm: Implement SVE gather loads
  target/arm: Implement SVE scatter store vector immediate
  target/arm: Implement SVE floating-point compare vectors
  target/arm: Implement SVE floating-point arithmetic with immediate
  target/arm: Implement SVE Floating Point Multiply Indexed Group
  target/arm: Implement SVE FP Fast Reduction Group
  target/arm: Implement SVE Floating Point Unary Operations -
    Unpredicated Group
  target/arm: Implement SVE FP Compare with Zero Group
  target/arm: Implement SVE floating-point trig multiply-add coefficient
  target/arm: Implement SVE floating-point convert precision
  target/arm: Implement SVE floating-point convert to integer
  target/arm: Implement SVE floating-point round to integral value
  target/arm: Implement SVE floating-point unary operations

 target/arm/cpu.h           |    7 +-
 target/arm/helper-sve.h    | 1285 ++++++++++++
 target/arm/helper.h        |   42 +
 target/arm/translate-a64.h |  110 ++
 target/arm/cpu.c           |    7 +
 target/arm/cpu64.c         |    1 +
 target/arm/sve_helper.c    | 4051 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-a64.c |  112 +-
 target/arm/translate-sve.c | 4626 ++++++++++++++++++++++++++++++++++++++++++++
 target/arm/vec_helper.c    |  178 ++
 .gitignore                 |    1 +
 target/arm/Makefile.objs   |   12 +-
 target/arm/sve.decode      | 1067 ++++++++++
 13 files changed, 11408 insertions(+), 91 deletions(-)
 create mode 100644 target/arm/helper-sve.h
 create mode 100644 target/arm/translate-a64.h
 create mode 100644 target/arm/sve_helper.c
 create mode 100644 target/arm/translate-sve.c
 create mode 100644 target/arm/vec_helper.c
 create mode 100644 target/arm/sve.decode

-- 
2.14.3

^ permalink raw reply	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 01/67] target/arm: Enable SVE for aarch64-linux-user
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-22 17:28   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-23 17:00   ` Alex Bennée
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 02/67] target/arm: Introduce translate-a64.h Richard Henderson
                   ` (67 subsequent siblings)
  68 siblings, 2 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Enable ARM_FEATURE_SVE for the generic "any" cpu.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.c   | 7 +++++++
 target/arm/cpu64.c | 1 +
 2 files changed, 8 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 1b3ae62db6..10843994c3 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -150,6 +150,13 @@ static void arm_cpu_reset(CPUState *s)
         env->cp15.sctlr_el[1] |= SCTLR_UCT | SCTLR_UCI | SCTLR_DZE;
         /* and to the FP/Neon instructions */
         env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 20, 2, 3);
+        /* and to the SVE instructions */
+        env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 16, 2, 3);
+        env->cp15.cptr_el[3] |= CPTR_EZ;
+        /* with maximum vector length */
+        env->vfp.zcr_el[1] = ARM_MAX_VQ - 1;
+        env->vfp.zcr_el[2] = ARM_MAX_VQ - 1;
+        env->vfp.zcr_el[3] = ARM_MAX_VQ - 1;
 #else
         /* Reset into the highest available EL */
         if (arm_feature(env, ARM_FEATURE_EL3)) {
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index efc519b49b..36ef9e9d9d 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -231,6 +231,7 @@ static void aarch64_any_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
     set_feature(&cpu->env, ARM_FEATURE_CRC);
     set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
+    set_feature(&cpu->env, ARM_FEATURE_SVE);
     cpu->ctr = 0x80038003; /* 32 byte I and D cacheline size, VIPT icache */
     cpu->dcz_blocksize = 7; /*  512 bytes */
 }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 02/67] target/arm: Introduce translate-a64.h
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 01/67] target/arm: Enable SVE for aarch64-linux-user Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-22 17:30   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-04-03  9:01   ` Alex Bennée
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 03/67] target/arm: Add SVE decode skeleton Richard Henderson
                   ` (66 subsequent siblings)
  68 siblings, 2 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Move some stuff that will be common to both translate-a64.c
and translate-sve.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.h | 110 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-a64.c | 101 ++++++-----------------------------------
 2 files changed, 123 insertions(+), 88 deletions(-)
 create mode 100644 target/arm/translate-a64.h

diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
new file mode 100644
index 0000000000..e519aee314
--- /dev/null
+++ b/target/arm/translate-a64.h
@@ -0,0 +1,110 @@
+/*
+ *  AArch64 translation, common definitions.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef TARGET_ARM_TRANSLATE_A64_H
+#define TARGET_ARM_TRANSLATE_A64_H
+
+void unallocated_encoding(DisasContext *s);
+
+#define unsupported_encoding(s, insn)                                    \
+    do {                                                                 \
+        qemu_log_mask(LOG_UNIMP,                                         \
+                      "%s:%d: unsupported instruction encoding 0x%08x "  \
+                      "at pc=%016" PRIx64 "\n",                          \
+                      __FILE__, __LINE__, insn, s->pc - 4);              \
+        unallocated_encoding(s);                                         \
+    } while (0)
+
+TCGv_i64 new_tmp_a64(DisasContext *s);
+TCGv_i64 new_tmp_a64_zero(DisasContext *s);
+TCGv_i64 cpu_reg(DisasContext *s, int reg);
+TCGv_i64 cpu_reg_sp(DisasContext *s, int reg);
+TCGv_i64 read_cpu_reg(DisasContext *s, int reg, int sf);
+TCGv_i64 read_cpu_reg_sp(DisasContext *s, int reg, int sf);
+void write_fp_dreg(DisasContext *s, int reg, TCGv_i64 v);
+TCGv_ptr get_fpstatus_ptr(bool);
+bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
+                            unsigned int imms, unsigned int immr);
+uint64_t vfp_expand_imm(int size, uint8_t imm8);
+
+/* We should have at some point before trying to access an FP register
+ * done the necessary access check, so assert that
+ * (a) we did the check and
+ * (b) we didn't then just plough ahead anyway if it failed.
+ * Print the instruction pattern in the abort message so we can figure
+ * out what we need to fix if a user encounters this problem in the wild.
+ */
+static inline void assert_fp_access_checked(DisasContext *s)
+{
+#ifdef CONFIG_DEBUG_TCG
+    if (unlikely(!s->fp_access_checked || s->fp_excp_el)) {
+        fprintf(stderr, "target-arm: FP access check missing for "
+                "instruction 0x%08x\n", s->insn);
+        abort();
+    }
+#endif
+}
+
+/* Return the offset into CPUARMState of an element of specified
+ * size, 'element' places in from the least significant end of
+ * the FP/vector register Qn.
+ */
+static inline int vec_reg_offset(DisasContext *s, int regno,
+                                 int element, TCGMemOp size)
+{
+    int offs = 0;
+#ifdef HOST_WORDS_BIGENDIAN
+    /* This is complicated slightly because vfp.zregs[n].d[0] is
+     * still the low half and vfp.zregs[n].d[1] the high half
+     * of the 128 bit vector, even on big endian systems.
+     * Calculate the offset assuming a fully bigendian 128 bits,
+     * then XOR to account for the order of the two 64 bit halves.
+     */
+    offs += (16 - ((element + 1) * (1 << size)));
+    offs ^= 8;
+#else
+    offs += element * (1 << size);
+#endif
+    offs += offsetof(CPUARMState, vfp.zregs[regno]);
+    assert_fp_access_checked(s);
+    return offs;
+}
+
+/* Return the offset info CPUARMState of the "whole" vector register Qn.  */
+static inline int vec_full_reg_offset(DisasContext *s, int regno)
+{
+    assert_fp_access_checked(s);
+    return offsetof(CPUARMState, vfp.zregs[regno]);
+}
+
+/* Return a newly allocated pointer to the vector register.  */
+static inline TCGv_ptr vec_full_reg_ptr(DisasContext *s, int regno)
+{
+    TCGv_ptr ret = tcg_temp_new_ptr();
+    tcg_gen_addi_ptr(ret, cpu_env, vec_full_reg_offset(s, regno));
+    return ret;
+}
+
+/* Return the byte size of the "whole" vector register, VL / 8.  */
+static inline int vec_full_reg_size(DisasContext *s)
+{
+    return s->sve_len;
+}
+
+bool disas_sve(DisasContext *, uint32_t);
+
+#endif /* TARGET_ARM_TRANSLATE_A64_H */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 032cbfa17d..e0e7ebf68c 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -36,13 +36,13 @@
 #include "exec/log.h"
 
 #include "trace-tcg.h"
+#include "translate-a64.h"
 
 static TCGv_i64 cpu_X[32];
 static TCGv_i64 cpu_pc;
 
 /* Load/store exclusive handling */
 static TCGv_i64 cpu_exclusive_high;
-static TCGv_i64 cpu_reg(DisasContext *s, int reg);
 
 static const char *regnames[] = {
     "x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7",
@@ -392,22 +392,13 @@ static inline void gen_goto_tb(DisasContext *s, int n, uint64_t dest)
     }
 }
 
-static void unallocated_encoding(DisasContext *s)
+void unallocated_encoding(DisasContext *s)
 {
     /* Unallocated and reserved encodings are uncategorized */
     gen_exception_insn(s, 4, EXCP_UDEF, syn_uncategorized(),
                        default_exception_el(s));
 }
 
-#define unsupported_encoding(s, insn)                                    \
-    do {                                                                 \
-        qemu_log_mask(LOG_UNIMP,                                         \
-                      "%s:%d: unsupported instruction encoding 0x%08x "  \
-                      "at pc=%016" PRIx64 "\n",                          \
-                      __FILE__, __LINE__, insn, s->pc - 4);              \
-        unallocated_encoding(s);                                         \
-    } while (0)
-
 static void init_tmp_a64_array(DisasContext *s)
 {
 #ifdef CONFIG_DEBUG_TCG
@@ -425,13 +416,13 @@ static void free_tmp_a64(DisasContext *s)
     init_tmp_a64_array(s);
 }
 
-static TCGv_i64 new_tmp_a64(DisasContext *s)
+TCGv_i64 new_tmp_a64(DisasContext *s)
 {
     assert(s->tmp_a64_count < TMP_A64_MAX);
     return s->tmp_a64[s->tmp_a64_count++] = tcg_temp_new_i64();
 }
 
-static TCGv_i64 new_tmp_a64_zero(DisasContext *s)
+TCGv_i64 new_tmp_a64_zero(DisasContext *s)
 {
     TCGv_i64 t = new_tmp_a64(s);
     tcg_gen_movi_i64(t, 0);
@@ -453,7 +444,7 @@ static TCGv_i64 new_tmp_a64_zero(DisasContext *s)
  * to cpu_X[31] and ZR accesses to a temporary which can be discarded.
  * This is the point of the _sp forms.
  */
-static TCGv_i64 cpu_reg(DisasContext *s, int reg)
+TCGv_i64 cpu_reg(DisasContext *s, int reg)
 {
     if (reg == 31) {
         return new_tmp_a64_zero(s);
@@ -463,7 +454,7 @@ static TCGv_i64 cpu_reg(DisasContext *s, int reg)
 }
 
 /* register access for when 31 == SP */
-static TCGv_i64 cpu_reg_sp(DisasContext *s, int reg)
+TCGv_i64 cpu_reg_sp(DisasContext *s, int reg)
 {
     return cpu_X[reg];
 }
@@ -472,7 +463,7 @@ static TCGv_i64 cpu_reg_sp(DisasContext *s, int reg)
  * representing the register contents. This TCGv is an auto-freed
  * temporary so it need not be explicitly freed, and may be modified.
  */
-static TCGv_i64 read_cpu_reg(DisasContext *s, int reg, int sf)
+TCGv_i64 read_cpu_reg(DisasContext *s, int reg, int sf)
 {
     TCGv_i64 v = new_tmp_a64(s);
     if (reg != 31) {
@@ -487,7 +478,7 @@ static TCGv_i64 read_cpu_reg(DisasContext *s, int reg, int sf)
     return v;
 }
 
-static TCGv_i64 read_cpu_reg_sp(DisasContext *s, int reg, int sf)
+TCGv_i64 read_cpu_reg_sp(DisasContext *s, int reg, int sf)
 {
     TCGv_i64 v = new_tmp_a64(s);
     if (sf) {
@@ -498,72 +489,6 @@ static TCGv_i64 read_cpu_reg_sp(DisasContext *s, int reg, int sf)
     return v;
 }
 
-/* We should have at some point before trying to access an FP register
- * done the necessary access check, so assert that
- * (a) we did the check and
- * (b) we didn't then just plough ahead anyway if it failed.
- * Print the instruction pattern in the abort message so we can figure
- * out what we need to fix if a user encounters this problem in the wild.
- */
-static inline void assert_fp_access_checked(DisasContext *s)
-{
-#ifdef CONFIG_DEBUG_TCG
-    if (unlikely(!s->fp_access_checked || s->fp_excp_el)) {
-        fprintf(stderr, "target-arm: FP access check missing for "
-                "instruction 0x%08x\n", s->insn);
-        abort();
-    }
-#endif
-}
-
-/* Return the offset into CPUARMState of an element of specified
- * size, 'element' places in from the least significant end of
- * the FP/vector register Qn.
- */
-static inline int vec_reg_offset(DisasContext *s, int regno,
-                                 int element, TCGMemOp size)
-{
-    int offs = 0;
-#ifdef HOST_WORDS_BIGENDIAN
-    /* This is complicated slightly because vfp.zregs[n].d[0] is
-     * still the low half and vfp.zregs[n].d[1] the high half
-     * of the 128 bit vector, even on big endian systems.
-     * Calculate the offset assuming a fully bigendian 128 bits,
-     * then XOR to account for the order of the two 64 bit halves.
-     */
-    offs += (16 - ((element + 1) * (1 << size)));
-    offs ^= 8;
-#else
-    offs += element * (1 << size);
-#endif
-    offs += offsetof(CPUARMState, vfp.zregs[regno]);
-    assert_fp_access_checked(s);
-    return offs;
-}
-
-/* Return the offset info CPUARMState of the "whole" vector register Qn.  */
-static inline int vec_full_reg_offset(DisasContext *s, int regno)
-{
-    assert_fp_access_checked(s);
-    return offsetof(CPUARMState, vfp.zregs[regno]);
-}
-
-/* Return a newly allocated pointer to the vector register.  */
-static TCGv_ptr vec_full_reg_ptr(DisasContext *s, int regno)
-{
-    TCGv_ptr ret = tcg_temp_new_ptr();
-    tcg_gen_addi_ptr(ret, cpu_env, vec_full_reg_offset(s, regno));
-    return ret;
-}
-
-/* Return the byte size of the "whole" vector register, VL / 8.  */
-static inline int vec_full_reg_size(DisasContext *s)
-{
-    /* FIXME SVE: We should put the composite ZCR_EL* value into tb->flags.
-       In the meantime this is just the AdvSIMD length of 128.  */
-    return 128 / 8;
-}
-
 /* Return the offset into CPUARMState of a slice (from
  * the least significant end) of FP register Qn (ie
  * Dn, Sn, Hn or Bn).
@@ -620,7 +545,7 @@ static void clear_vec_high(DisasContext *s, bool is_q, int rd)
     }
 }
 
-static void write_fp_dreg(DisasContext *s, int reg, TCGv_i64 v)
+void write_fp_dreg(DisasContext *s, int reg, TCGv_i64 v)
 {
     unsigned ofs = fp_reg_offset(s, reg, MO_64);
 
@@ -637,7 +562,7 @@ static void write_fp_sreg(DisasContext *s, int reg, TCGv_i32 v)
     tcg_temp_free_i64(tmp);
 }
 
-static TCGv_ptr get_fpstatus_ptr(bool is_f16)
+TCGv_ptr get_fpstatus_ptr(bool is_f16)
 {
     TCGv_ptr statusptr = tcg_temp_new_ptr();
     int offset;
@@ -3130,8 +3055,8 @@ static inline uint64_t bitmask64(unsigned int length)
  * value (ie should cause a guest UNDEF exception), and true if they are
  * valid, in which case the decoded bit pattern is written to result.
  */
-static bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
-                                   unsigned int imms, unsigned int immr)
+bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
+                            unsigned int imms, unsigned int immr)
 {
     uint64_t mask;
     unsigned e, levels, s, r;
@@ -5164,7 +5089,7 @@ static void disas_fp_3src(DisasContext *s, uint32_t insn)
  * the range 01....1xx to 10....0xx, and the most significant 4 bits of
  * the mantissa; see VFPExpandImm() in the v8 ARM ARM.
  */
-static uint64_t vfp_expand_imm(int size, uint8_t imm8)
+uint64_t vfp_expand_imm(int size, uint8_t imm8)
 {
     uint64_t imm;
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 03/67] target/arm: Add SVE decode skeleton
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 01/67] target/arm: Enable SVE for aarch64-linux-user Richard Henderson
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 02/67] target/arm: Introduce translate-a64.h Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-22 18:00   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-23 11:40   ` Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 04/67] target/arm: Implement SVE Bitwise Logical - Unpredicated Group Richard Henderson
                   ` (65 subsequent siblings)
  68 siblings, 2 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Including only 4, as-yet unimplemented, instruction patterns
so that the whole thing compiles.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 11 +++++++-
 target/arm/translate-sve.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++
 .gitignore                 |  1 +
 target/arm/Makefile.objs   | 10 ++++++++
 target/arm/sve.decode      | 45 +++++++++++++++++++++++++++++++++
 5 files changed, 129 insertions(+), 1 deletion(-)
 create mode 100644 target/arm/translate-sve.c
 create mode 100644 target/arm/sve.decode

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index e0e7ebf68c..a50fef98af 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -12772,9 +12772,18 @@ static void disas_a64_insn(CPUARMState *env, DisasContext *s)
     s->fp_access_checked = false;
 
     switch (extract32(insn, 25, 4)) {
-    case 0x0: case 0x1: case 0x2: case 0x3: /* UNALLOCATED */
+    case 0x0: case 0x1: case 0x3: /* UNALLOCATED */
         unallocated_encoding(s);
         break;
+    case 0x2:
+        if (!arm_dc_feature(s, ARM_FEATURE_SVE)) {
+            unallocated_encoding(s);
+        } else if (!sve_access_check(s) || !fp_access_check(s)) {
+            /* exception raised */
+        } else if (!disas_sve(s, insn)) {
+            unallocated_encoding(s);
+        }
+        break;
     case 0x8: case 0x9: /* Data processing - immediate */
         disas_data_proc_imm(s, insn);
         break;
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
new file mode 100644
index 0000000000..2c9e4733cb
--- /dev/null
+++ b/target/arm/translate-sve.c
@@ -0,0 +1,63 @@
+/*
+ * AArch64 SVE translation
+ *
+ * Copyright (c) 2018 Linaro, Ltd
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "tcg-op.h"
+#include "tcg-op-gvec.h"
+#include "qemu/log.h"
+#include "arm_ldst.h"
+#include "translate.h"
+#include "internals.h"
+#include "exec/helper-proto.h"
+#include "exec/helper-gen.h"
+#include "exec/log.h"
+#include "trace-tcg.h"
+#include "translate-a64.h"
+
+/*
+ * Include the generated decoder.
+ */
+
+#include "decode-sve.inc.c"
+
+/*
+ * Implement all of the translator functions referenced by the decoder.
+ */
+
+static void trans_AND_zzz(DisasContext *s, arg_AND_zzz *a, uint32_t insn)
+{
+    unsupported_encoding(s, insn);
+}
+
+static void trans_ORR_zzz(DisasContext *s, arg_ORR_zzz *a, uint32_t insn)
+{
+    unsupported_encoding(s, insn);
+}
+
+static void trans_EOR_zzz(DisasContext *s, arg_EOR_zzz *a, uint32_t insn)
+{
+    unsupported_encoding(s, insn);
+}
+
+static void trans_BIC_zzz(DisasContext *s, arg_BIC_zzz *a, uint32_t insn)
+{
+    unsupported_encoding(s, insn);
+}
diff --git a/.gitignore b/.gitignore
index 704b22285d..abe2b81a26 100644
--- a/.gitignore
+++ b/.gitignore
@@ -140,3 +140,4 @@ trace-dtrace-root.h
 trace-dtrace-root.dtrace
 trace-ust-all.h
 trace-ust-all.c
+/target/arm/decode-sve.inc.c
diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
index 847fb52ee0..9934cf1d4d 100644
--- a/target/arm/Makefile.objs
+++ b/target/arm/Makefile.objs
@@ -10,3 +10,13 @@ obj-y += gdbstub.o
 obj-$(TARGET_AARCH64) += cpu64.o translate-a64.o helper-a64.o gdbstub64.o
 obj-y += crypto_helper.o
 obj-$(CONFIG_SOFTMMU) += arm-powerctl.o
+
+DECODETREE = $(SRC_PATH)/scripts/decodetree.py
+
+target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.decode $(DECODETREE)
+	$(call quiet-command,\
+	  $(PYTHON) $(DECODETREE) --decode disas_sve -o $@ $<,\
+	  "GEN", $(TARGET_DIR)$@)
+
+target/arm/translate-sve.o: target/arm/decode-sve.inc.c
+obj-$(TARGET_AARCH64) += translate-sve.o
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
new file mode 100644
index 0000000000..2c13a6024a
--- /dev/null
+++ b/target/arm/sve.decode
@@ -0,0 +1,45 @@
+# AArch64 SVE instruction descriptions
+#
+#  Copyright (c) 2017 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+#
+# This file is processed by scripts/decodetree.py
+#
+
+###########################################################################
+# Named attribute sets.  These are used to make nice(er) names
+# when creating helpers common to those for the individual
+# instruction patterns.
+
+&rrr_esz	rd rn rm esz
+
+###########################################################################
+# Named instruction formats.  These are generally used to
+# reduce the amount of duplication between instruction patterns.
+
+# Three operand with unused vector element size
+@rd_rn_rm_e0	........ ... rm:5  ... ...  rn:5 rd:5		&rrr_esz esz=0
+
+###########################################################################
+# Instruction patterns.  Grouped according to the SVE encodingindex.xhtml.
+
+### SVE Logical - Unpredicated Group
+
+# SVE bitwise logical operations (unpredicated)
+AND_zzz		00000100 00 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
+ORR_zzz		00000100 01 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
+EOR_zzz		00000100 10 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
+BIC_zzz		00000100 11 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 04/67] target/arm: Implement SVE Bitwise Logical - Unpredicated Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (2 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 03/67] target/arm: Add SVE decode skeleton Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-22 18:04   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 05/67] target/arm: Implement SVE load vector/predicate Richard Henderson
                   ` (64 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

These were the instructions that were stubbed out when
introducing the decode skeleton.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 50 +++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 43 insertions(+), 7 deletions(-)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 2c9e4733cb..50cf2a1fdd 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -32,6 +32,10 @@
 #include "trace-tcg.h"
 #include "translate-a64.h"
 
+typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
+typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
+                        uint32_t, uint32_t, uint32_t);
+
 /*
  * Include the generated decoder.
  */
@@ -42,22 +46,54 @@
  * Implement all of the translator functions referenced by the decoder.
  */
 
-static void trans_AND_zzz(DisasContext *s, arg_AND_zzz *a, uint32_t insn)
+/* Invoke a vector expander on two Zregs.  */
+static void do_vector2_z(DisasContext *s, GVecGen2Fn *gvec_fn,
+                         int esz, int rd, int rn)
 {
-    unsupported_encoding(s, insn);
+    unsigned vsz = vec_full_reg_size(s);
+    gvec_fn(esz, vec_full_reg_offset(s, rd),
+            vec_full_reg_offset(s, rn), vsz, vsz);
 }
 
-static void trans_ORR_zzz(DisasContext *s, arg_ORR_zzz *a, uint32_t insn)
+/* Invoke a vector expander on three Zregs.  */
+static void do_vector3_z(DisasContext *s, GVecGen3Fn *gvec_fn,
+                         int esz, int rd, int rn, int rm)
 {
-    unsupported_encoding(s, insn);
+    unsigned vsz = vec_full_reg_size(s);
+    gvec_fn(esz, vec_full_reg_offset(s, rd), vec_full_reg_offset(s, rn),
+            vec_full_reg_offset(s, rm), vsz, vsz);
 }
 
-static void trans_EOR_zzz(DisasContext *s, arg_EOR_zzz *a, uint32_t insn)
+/* Invoke a vector move on two Zregs.  */
+static void do_mov_z(DisasContext *s, int rd, int rn)
 {
-    unsupported_encoding(s, insn);
+    do_vector2_z(s, tcg_gen_gvec_mov, 0, rd, rn);
+}
+
+/*
+ *** SVE Logical - Unpredicated Group
+ */
+
+static void trans_AND_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_vector3_z(s, tcg_gen_gvec_and, 0, a->rd, a->rn, a->rm);
+}
+
+static void trans_ORR_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    if (a->rn == a->rm) { /* MOV */
+        do_mov_z(s, a->rd, a->rn);
+    } else {
+        do_vector3_z(s, tcg_gen_gvec_or, 0, a->rd, a->rn, a->rm);
+    }
+}
+
+static void trans_EOR_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_vector3_z(s, tcg_gen_gvec_xor, 0, a->rd, a->rn, a->rm);
 }
 
 static void trans_BIC_zzz(DisasContext *s, arg_BIC_zzz *a, uint32_t insn)
 {
-    unsupported_encoding(s, insn);
+    do_vector3_z(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm);
 }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 05/67] target/arm: Implement SVE load vector/predicate
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (3 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 04/67] target/arm: Implement SVE Bitwise Logical - Unpredicated Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-22 18:20   ` Peter Maydell
  2018-04-03  9:26   ` Alex Bennée
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 06/67] target/arm: Implement SVE predicate test Richard Henderson
                   ` (63 subsequent siblings)
  68 siblings, 2 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 132 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      |  22 +++++++-
 2 files changed, 153 insertions(+), 1 deletion(-)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 50cf2a1fdd..c0cccfda6f 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -46,6 +46,19 @@ typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
  * Implement all of the translator functions referenced by the decoder.
  */
 
+/* Return the offset info CPUARMState of the predicate vector register Pn.
+ * Note for this purpose, FFR is P16.  */
+static inline int pred_full_reg_offset(DisasContext *s, int regno)
+{
+    return offsetof(CPUARMState, vfp.pregs[regno]);
+}
+
+/* Return the byte size of the whole predicate register, VL / 64.  */
+static inline int pred_full_reg_size(DisasContext *s)
+{
+    return s->sve_len >> 3;
+}
+
 /* Invoke a vector expander on two Zregs.  */
 static void do_vector2_z(DisasContext *s, GVecGen2Fn *gvec_fn,
                          int esz, int rd, int rn)
@@ -97,3 +110,122 @@ static void trans_BIC_zzz(DisasContext *s, arg_BIC_zzz *a, uint32_t insn)
 {
     do_vector3_z(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm);
 }
+
+/*
+ *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
+ */
+
+/* Subroutine loading a vector register at VOFS of LEN bytes.
+ * The load should begin at the address Rn + IMM.
+ */
+
+#if UINTPTR_MAX == UINT32_MAX
+# define ptr i32
+#else
+# define ptr i64
+#endif
+
+static void do_ldr(DisasContext *s, uint32_t vofs, uint32_t len,
+                   int rn, int imm)
+{
+    uint32_t len_align = QEMU_ALIGN_DOWN(len, 8);
+    uint32_t len_remain = len % 8;
+    uint32_t nparts = len / 8 + ctpop8(len_remain);
+    int midx = get_mem_index(s);
+    TCGv_i64 addr, t0, t1;
+
+    addr = tcg_temp_new_i64();
+    t0 = tcg_temp_new_i64();
+
+    /* Note that unpredicated load/store of vector/predicate registers
+     * are defined as a stream of bytes, which equates to little-endian
+     * operations on larger quantities.  There is no nice way to force
+     * a little-endian load for aarch64_be-linux-user out of line.
+     *
+     * Attempt to keep code expansion to a minimum by limiting the
+     * amount of unrolling done.
+     */
+    if (nparts <= 4) {
+        int i;
+
+        for (i = 0; i < len_align; i += 8) {
+            tcg_gen_addi_i64(addr, cpu_reg_sp(s, rn), imm + i);
+            tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LEQ);
+            tcg_gen_st_i64(t0, cpu_env, vofs + i);
+        }
+    } else {
+        TCGLabel *loop = gen_new_label();
+        TCGv_ptr i = TCGV_NAT_TO_PTR(glue(tcg_const_local_, ptr)(0));
+        TCGv_ptr dest;
+
+        gen_set_label(loop);
+
+        /* Minimize the number of local temps that must be re-read from
+         * the stack each iteration.  Instead, re-compute values other
+         * than the loop counter.
+         */
+        dest = tcg_temp_new_ptr();
+        tcg_gen_addi_ptr(dest, i, imm);
+#if UINTPTR_MAX == UINT32_MAX
+        tcg_gen_extu_i32_i64(addr, TCGV_PTR_TO_NAT(dest));
+        tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, rn));
+#else
+        tcg_gen_add_i64(addr, TCGV_PTR_TO_NAT(dest), cpu_reg_sp(s, rn));
+#endif
+
+        tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LEQ);
+
+        tcg_gen_add_ptr(dest, cpu_env, i);
+        tcg_gen_addi_ptr(i, i, 8);
+        tcg_gen_st_i64(t0, dest, vofs);
+        tcg_temp_free_ptr(dest);
+
+        glue(tcg_gen_brcondi_, ptr)(TCG_COND_LTU, TCGV_PTR_TO_NAT(i),
+                                    len_align, loop);
+        tcg_temp_free_ptr(i);
+    }
+
+    /* Predicate register loads can be any multiple of 2.
+     * Note that we still store the entire 64-bit unit into cpu_env.
+     */
+    if (len_remain) {
+        tcg_gen_addi_i64(addr, cpu_reg_sp(s, rn), imm + len_align);
+
+        switch (len_remain) {
+        case 2:
+        case 4:
+        case 8:
+            tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LE | ctz32(len_remain));
+            break;
+
+        case 6:
+            t1 = tcg_temp_new_i64();
+            tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LEUL);
+            tcg_gen_addi_i64(addr, addr, 4);
+            tcg_gen_qemu_ld_i64(t1, addr, midx, MO_LEUW);
+            tcg_gen_deposit_i64(t0, t0, t1, 32, 32);
+            tcg_temp_free_i64(t1);
+            break;
+
+        default:
+            g_assert_not_reached();
+        }
+        tcg_gen_st_i64(t0, cpu_env, vofs + len_align);
+    }
+    tcg_temp_free_i64(addr);
+    tcg_temp_free_i64(t0);
+}
+
+#undef ptr
+
+static void trans_LDR_zri(DisasContext *s, arg_rri *a, uint32_t insn)
+{
+    int size = vec_full_reg_size(s);
+    do_ldr(s, vec_full_reg_offset(s, a->rd), size, a->rn, a->imm * size);
+}
+
+static void trans_LDR_pri(DisasContext *s, arg_rri *a, uint32_t insn)
+{
+    int size = pred_full_reg_size(s);
+    do_ldr(s, pred_full_reg_offset(s, a->rd), size, a->rn, a->imm * size);
+}
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 2c13a6024a..0c6a7ba34d 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -19,11 +19,17 @@
 # This file is processed by scripts/decodetree.py
 #
 
+###########################################################################
+# Named fields.  These are primarily for disjoint fields.
+
+%imm9_16_10	16:s6 10:3
+
 ###########################################################################
 # Named attribute sets.  These are used to make nice(er) names
 # when creating helpers common to those for the individual
 # instruction patterns.
 
+&rri		rd rn imm
 &rrr_esz	rd rn rm esz
 
 ###########################################################################
@@ -31,7 +37,13 @@
 # reduce the amount of duplication between instruction patterns.
 
 # Three operand with unused vector element size
-@rd_rn_rm_e0	........ ... rm:5  ... ...  rn:5 rd:5		&rrr_esz esz=0
+@rd_rn_rm_e0	........ ... rm:5 ... ... rn:5 rd:5		&rrr_esz esz=0
+
+# Basic Load/Store with 9-bit immediate offset
+@pd_rn_i9	........ ........ ...... rn:5 . rd:4	\
+		&rri imm=%imm9_16_10
+@rd_rn_i9	........ ........ ...... rn:5 rd:5	\
+		&rri imm=%imm9_16_10
 
 ###########################################################################
 # Instruction patterns.  Grouped according to the SVE encodingindex.xhtml.
@@ -43,3 +55,11 @@ AND_zzz		00000100 00 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
 ORR_zzz		00000100 01 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
 EOR_zzz		00000100 10 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
 BIC_zzz		00000100 11 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
+
+### SVE Memory - 32-bit Gather and Unsized Contiguous Group
+
+# SVE load predicate register
+LDR_pri		10000101 10 ...... 000 ... ..... 0 ....		@pd_rn_i9
+
+# SVE load vector register
+LDR_zri		10000101 10 ...... 010 ... ..... .....		@rd_rn_i9
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 06/67] target/arm: Implement SVE predicate test
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (4 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 05/67] target/arm: Implement SVE load vector/predicate Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-22 18:38   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-04-03  9:16   ` Alex Bennée
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 07/67] target/arm: Implement SVE Predicate Logical Operations Group Richard Henderson
                   ` (62 subsequent siblings)
  68 siblings, 2 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 21 +++++++++++++
 target/arm/helper.h        |  1 +
 target/arm/sve_helper.c    | 77 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 62 +++++++++++++++++++++++++++++++++++++
 target/arm/Makefile.objs   |  2 +-
 target/arm/sve.decode      |  5 +++
 6 files changed, 167 insertions(+), 1 deletion(-)
 create mode 100644 target/arm/helper-sve.h
 create mode 100644 target/arm/sve_helper.c

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
new file mode 100644
index 0000000000..b6e91539ae
--- /dev/null
+++ b/target/arm/helper-sve.h
@@ -0,0 +1,21 @@
+/*
+ *  AArch64 SVE specific helper definitions
+ *
+ *  Copyright (c) 2018 Linaro, Ltd
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+DEF_HELPER_FLAGS_2(sve_predtest1, TCG_CALL_NO_WG, i32, i64, i64)
+DEF_HELPER_FLAGS_3(sve_predtest, TCG_CALL_NO_WG, i32, ptr, ptr, i32)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 6dd8504ec3..be3c2fcdc0 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -567,4 +567,5 @@ DEF_HELPER_FLAGS_2(neon_pmull_64_hi, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
+#include "helper-sve.h"
 #endif
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
new file mode 100644
index 0000000000..7d13fd40ed
--- /dev/null
+++ b/target/arm/sve_helper.c
@@ -0,0 +1,77 @@
+/*
+ *  ARM SVE Operations
+ *
+ *  Copyright (c) 2018 Linaro
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "exec/cpu_ldst.h"
+#include "exec/helper-proto.h"
+#include "tcg/tcg-gvec-desc.h"
+
+
+/* Return a value for NZCV as per the ARM PredTest pseudofunction.
+ *
+ * The return value has bit 31 set if N is set, bit 1 set if Z is clear,
+ * and bit 0 set if C is set.
+ *
+ * This is an iterative function, called for each Pd and Pg word
+ * moving forward.
+ */
+
+/* For no G bits set, NZCV = C.  */
+#define PREDTEST_INIT  1
+
+static uint32_t iter_predtest_fwd(uint64_t d, uint64_t g, uint32_t flags)
+{
+    if (g) {
+        /* Compute N from first D & G.
+           Use bit 2 to signal first G bit seen.  */
+        if (!(flags & 4)) {
+            flags |= ((d & (g & -g)) != 0) << 31;
+            flags |= 4;
+        }
+
+        /* Accumulate Z from each D & G.  */
+        flags |= ((d & g) != 0) << 1;
+
+        /* Compute C from last !(D & G).  Replace previous.  */
+        flags = deposit32(flags, 0, 1, (d & pow2floor(g)) == 0);
+    }
+    return flags;
+}
+
+/* The same for a single word predicate.  */
+uint32_t HELPER(sve_predtest1)(uint64_t d, uint64_t g)
+{
+    return iter_predtest_fwd(d, g, PREDTEST_INIT);
+}
+
+/* The same for a multi-word predicate.  */
+uint32_t HELPER(sve_predtest)(void *vd, void *vg, uint32_t words)
+{
+    uint32_t flags = PREDTEST_INIT;
+    uint64_t *d = vd, *g = vg;
+    uintptr_t i = 0;
+
+    do {
+        flags = iter_predtest_fwd(d[i], g[i], flags);
+    } while (++i < words);
+
+    return flags;
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index c0cccfda6f..c2e7fac938 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -83,6 +83,43 @@ static void do_mov_z(DisasContext *s, int rd, int rn)
     do_vector2_z(s, tcg_gen_gvec_mov, 0, rd, rn);
 }
 
+/* Set the cpu flags as per a return from an SVE helper.  */
+static void do_pred_flags(TCGv_i32 t)
+{
+    tcg_gen_mov_i32(cpu_NF, t);
+    tcg_gen_andi_i32(cpu_ZF, t, 2);
+    tcg_gen_andi_i32(cpu_CF, t, 1);
+    tcg_gen_movi_i32(cpu_VF, 0);
+}
+
+/* Subroutines computing the ARM PredTest psuedofunction.  */
+static void do_predtest1(TCGv_i64 d, TCGv_i64 g)
+{
+    TCGv_i32 t = tcg_temp_new_i32();
+
+    gen_helper_sve_predtest1(t, d, g);
+    do_pred_flags(t);
+    tcg_temp_free_i32(t);
+}
+
+static void do_predtest(DisasContext *s, int dofs, int gofs, int words)
+{
+    TCGv_ptr dptr = tcg_temp_new_ptr();
+    TCGv_ptr gptr = tcg_temp_new_ptr();
+    TCGv_i32 t;
+
+    tcg_gen_addi_ptr(dptr, cpu_env, dofs);
+    tcg_gen_addi_ptr(gptr, cpu_env, gofs);
+    t = tcg_const_i32(words);
+
+    gen_helper_sve_predtest(t, dptr, gptr, t);
+    tcg_temp_free_ptr(dptr);
+    tcg_temp_free_ptr(gptr);
+
+    do_pred_flags(t);
+    tcg_temp_free_i32(t);
+}
+
 /*
  *** SVE Logical - Unpredicated Group
  */
@@ -111,6 +148,31 @@ static void trans_BIC_zzz(DisasContext *s, arg_BIC_zzz *a, uint32_t insn)
     do_vector3_z(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm);
 }
 
+/*
+ *** SVE Predicate Misc Group
+ */
+
+void trans_PTEST(DisasContext *s, arg_PTEST *a, uint32_t insn)
+{
+    int nofs = pred_full_reg_offset(s, a->rn);
+    int gofs = pred_full_reg_offset(s, a->pg);
+    int words = DIV_ROUND_UP(pred_full_reg_size(s), 8);
+
+    if (words == 1) {
+        TCGv_i64 pn = tcg_temp_new_i64();
+        TCGv_i64 pg = tcg_temp_new_i64();
+
+        tcg_gen_ld_i64(pn, cpu_env, nofs);
+        tcg_gen_ld_i64(pg, cpu_env, gofs);
+        do_predtest1(pn, pg);
+
+        tcg_temp_free_i64(pn);
+        tcg_temp_free_i64(pg);
+    } else {
+        do_predtest(s, nofs, gofs, words);
+    }
+}
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
index 9934cf1d4d..452ac6f453 100644
--- a/target/arm/Makefile.objs
+++ b/target/arm/Makefile.objs
@@ -19,4 +19,4 @@ target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.decode $(DECODETREE)
 	  "GEN", $(TARGET_DIR)$@)
 
 target/arm/translate-sve.o: target/arm/decode-sve.inc.c
-obj-$(TARGET_AARCH64) += translate-sve.o
+obj-$(TARGET_AARCH64) += translate-sve.o sve_helper.o
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 0c6a7ba34d..7efaa8fe8e 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -56,6 +56,11 @@ ORR_zzz		00000100 01 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
 EOR_zzz		00000100 10 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
 BIC_zzz		00000100 11 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
 
+### SVE Predicate Misc Group
+
+# SVE predicate test
+PTEST		00100101 01010000 11 pg:4 0 rn:4 00000
+
 ### SVE Memory - 32-bit Gather and Unsized Contiguous Group
 
 # SVE load predicate register
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 07/67] target/arm: Implement SVE Predicate Logical Operations Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (5 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 06/67] target/arm: Implement SVE predicate test Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-22 18:55   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 08/67] target/arm: Implement SVE Predicate Misc Group Richard Henderson
                   ` (61 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h           |   4 +-
 target/arm/helper-sve.h    |  10 ++
 target/arm/sve_helper.c    |  39 ++++++
 target/arm/translate-sve.c | 338 ++++++++++++++++++++++++++++++++++++++++++++-
 target/arm/sve.decode      |  16 +++
 5 files changed, 405 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 70e05f00fe..8befe43a01 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -527,6 +527,8 @@ typedef struct CPUARMState {
 #ifdef TARGET_AARCH64
         /* Store FFR as pregs[16] to make it easier to treat as any other.  */
         ARMPredicateReg pregs[17];
+        /* Scratch space for aa64 sve predicate temporary.  */
+        ARMPredicateReg preg_tmp;
 #endif
 
         uint32_t xregs[16];
@@ -534,7 +536,7 @@ typedef struct CPUARMState {
         int vec_len;
         int vec_stride;
 
-        /* scratch space when Tn are not sufficient.  */
+        /* Scratch space for aa32 neon expansion.  */
         uint32_t scratch[8];
 
         /* There are a number of distinct float control structures:
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index b6e91539ae..57adc4d912 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -19,3 +19,13 @@
 
 DEF_HELPER_FLAGS_2(sve_predtest1, TCG_CALL_NO_WG, i32, i64, i64)
 DEF_HELPER_FLAGS_3(sve_predtest, TCG_CALL_NO_WG, i32, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_sel_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_orr_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_orn_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_nor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_nand_pppp, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 7d13fd40ed..b63e7cc90e 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -75,3 +75,42 @@ uint32_t HELPER(sve_predtest)(void *vd, void *vg, uint32_t words)
 
     return flags;
 }
+
+#define LOGICAL_PPPP(NAME, FUNC) \
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc)  \
+{                                                                         \
+    uintptr_t opr_sz = simd_oprsz(desc);                                  \
+    uint64_t *d = vd, *n = vn, *m = vm, *g = vg;                          \
+    uintptr_t i;                                                          \
+    for (i = 0; i < opr_sz / 8; ++i) {                                    \
+        d[i] = FUNC(n[i], m[i], g[i]);                                    \
+    }                                                                     \
+}
+
+#define DO_AND(N, M, G)  (((N) & (M)) & (G))
+#define DO_BIC(N, M, G)  (((N) & ~(M)) & (G))
+#define DO_EOR(N, M, G)  (((N) ^ (M)) & (G))
+#define DO_ORR(N, M, G)  (((N) | (M)) & (G))
+#define DO_ORN(N, M, G)  (((N) | ~(M)) & (G))
+#define DO_NOR(N, M, G)  (~((N) | (M)) & (G))
+#define DO_NAND(N, M, G) (~((N) & (M)) & (G))
+#define DO_SEL(N, M, G)  (((N) & (G)) | ((M) & ~(G)))
+
+LOGICAL_PPPP(sve_and_pppp, DO_AND)
+LOGICAL_PPPP(sve_bic_pppp, DO_BIC)
+LOGICAL_PPPP(sve_eor_pppp, DO_EOR)
+LOGICAL_PPPP(sve_sel_pppp, DO_SEL)
+LOGICAL_PPPP(sve_orr_pppp, DO_ORR)
+LOGICAL_PPPP(sve_orn_pppp, DO_ORN)
+LOGICAL_PPPP(sve_nor_pppp, DO_NOR)
+LOGICAL_PPPP(sve_nand_pppp, DO_NAND)
+
+#undef DO_ADD
+#undef DO_BIC
+#undef DO_EOR
+#undef DO_ORR
+#undef DO_ORN
+#undef DO_NOR
+#undef DO_NAND
+#undef DO_SEL
+#undef LOGICAL_PPPP
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index c2e7fac938..405f9397a1 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -59,6 +59,24 @@ static inline int pred_full_reg_size(DisasContext *s)
     return s->sve_len >> 3;
 }
 
+/* Round up the size of a predicate register to a size allowed by
+ * the tcg vector infrastructure.  Any operation which uses this
+ * size may assume that the bits above pred_full_reg_size are zero,
+ * and must leave them the same way.
+ *
+ * Note that this is not needed for the vector registers as they
+ * are always properly sized for tcg vectors.
+ */
+static int pred_gvec_reg_size(DisasContext *s)
+{
+    int size = pred_full_reg_size(s);
+    if (size <= 8) {
+        return 8;
+    } else {
+        return QEMU_ALIGN_UP(size, 16);
+    }
+}
+
 /* Invoke a vector expander on two Zregs.  */
 static void do_vector2_z(DisasContext *s, GVecGen2Fn *gvec_fn,
                          int esz, int rd, int rn)
@@ -83,6 +101,40 @@ static void do_mov_z(DisasContext *s, int rd, int rn)
     do_vector2_z(s, tcg_gen_gvec_mov, 0, rd, rn);
 }
 
+/* Invoke a vector expander on two Pregs.  */
+static void do_vector2_p(DisasContext *s, GVecGen2Fn *gvec_fn,
+                         int esz, int rd, int rn)
+{
+    unsigned psz = pred_gvec_reg_size(s);
+    gvec_fn(esz, pred_full_reg_offset(s, rd),
+            pred_full_reg_offset(s, rn), psz, psz);
+}
+
+/* Invoke a vector expander on three Pregs.  */
+static void do_vector3_p(DisasContext *s, GVecGen3Fn *gvec_fn,
+                         int esz, int rd, int rn, int rm)
+{
+    unsigned psz = pred_gvec_reg_size(s);
+    gvec_fn(esz, pred_full_reg_offset(s, rd), pred_full_reg_offset(s, rn),
+            pred_full_reg_offset(s, rm), psz, psz);
+}
+
+/* Invoke a vector operation on four Pregs.  */
+static void do_vecop4_p(DisasContext *s, const GVecGen4 *gvec_op,
+                        int rd, int rn, int rm, int rg)
+{
+    unsigned psz = pred_gvec_reg_size(s);
+    tcg_gen_gvec_4(pred_full_reg_offset(s, rd), pred_full_reg_offset(s, rn),
+                   pred_full_reg_offset(s, rm), pred_full_reg_offset(s, rg),
+                   psz, psz, gvec_op);
+}
+
+/* Invoke a vector move on two Pregs.  */
+static void do_mov_p(DisasContext *s, int rd, int rn)
+{
+    do_vector2_p(s, tcg_gen_gvec_mov, 0, rd, rn);
+}
+
 /* Set the cpu flags as per a return from an SVE helper.  */
 static void do_pred_flags(TCGv_i32 t)
 {
@@ -148,11 +200,295 @@ static void trans_BIC_zzz(DisasContext *s, arg_BIC_zzz *a, uint32_t insn)
     do_vector3_z(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm);
 }
 
+/*
+ *** SVE Predicate Logical Operations Group
+ */
+
+static void do_pppp_flags(DisasContext *s, arg_rprr_s *a,
+                          const GVecGen4 *gvec_op)
+{
+    unsigned psz = pred_gvec_reg_size(s);
+    int dofs = pred_full_reg_offset(s, a->rd);
+    int nofs = pred_full_reg_offset(s, a->rn);
+    int mofs = pred_full_reg_offset(s, a->rm);
+    int gofs = pred_full_reg_offset(s, a->pg);
+
+    if (psz == 8) {
+        /* Do the operation and the flags generation in temps.  */
+        TCGv_i64 pd = tcg_temp_new_i64();
+        TCGv_i64 pn = tcg_temp_new_i64();
+        TCGv_i64 pm = tcg_temp_new_i64();
+        TCGv_i64 pg = tcg_temp_new_i64();
+
+        tcg_gen_ld_i64(pn, cpu_env, nofs);
+        tcg_gen_ld_i64(pm, cpu_env, mofs);
+        tcg_gen_ld_i64(pg, cpu_env, gofs);
+
+        gvec_op->fni8(pd, pn, pm, pg);
+        tcg_gen_st_i64(pd, cpu_env, dofs);
+
+        do_predtest1(pd, pg);
+
+        tcg_temp_free_i64(pd);
+        tcg_temp_free_i64(pn);
+        tcg_temp_free_i64(pm);
+        tcg_temp_free_i64(pg);
+    } else {
+        /* The operation and flags generation is large.  The computation
+         * of the flags depends on the original contents of the guarding
+         * predicate.  If the destination overwrites the guarding predicate,
+         * then the easiest way to get this right is to save a copy.
+          */
+        int tofs = gofs;
+        if (a->rd == a->pg) {
+            tofs = offsetof(CPUARMState, vfp.preg_tmp);
+            tcg_gen_gvec_mov(0, tofs, gofs, psz, psz);
+        }
+
+        tcg_gen_gvec_4(dofs, nofs, mofs, gofs, psz, psz, gvec_op);
+        do_predtest(s, dofs, tofs, psz / 8);
+    }
+}
+
+static void gen_and_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
+{
+    tcg_gen_and_i64(pd, pn, pm);
+    tcg_gen_and_i64(pd, pd, pg);
+}
+
+static void gen_and_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
+                           TCGv_vec pm, TCGv_vec pg)
+{
+    tcg_gen_and_vec(vece, pd, pn, pm);
+    tcg_gen_and_vec(vece, pd, pd, pg);
+}
+
+static void trans_AND_pppp(DisasContext *s, arg_rprr_s *a, uint32_t insn)
+{
+    static const GVecGen4 op = {
+        .fni8 = gen_and_pg_i64,
+        .fniv = gen_and_pg_vec,
+        .fno = gen_helper_sve_and_pppp,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    };
+    if (a->s) {
+        do_pppp_flags(s, a, &op);
+    } else if (a->pg == a->rn && a->rn == a->rm) {
+        do_mov_p(s, a->rd, a->rn);
+    } else if (a->pg == a->rn || a->pg == a->rm) {
+        do_vector3_p(s, tcg_gen_gvec_and, 0, a->rd, a->rn, a->rm);
+    } else {
+        do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
+    }
+}
+
+static void gen_bic_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
+{
+    tcg_gen_andc_i64(pd, pn, pm);
+    tcg_gen_and_i64(pd, pd, pg);
+}
+
+static void gen_bic_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
+                           TCGv_vec pm, TCGv_vec pg)
+{
+    tcg_gen_andc_vec(vece, pd, pn, pm);
+    tcg_gen_and_vec(vece, pd, pd, pg);
+}
+
+static void trans_BIC_pppp(DisasContext *s, arg_rprr_s *a, uint32_t insn)
+{
+    static const GVecGen4 op = {
+        .fni8 = gen_bic_pg_i64,
+        .fniv = gen_bic_pg_vec,
+        .fno = gen_helper_sve_bic_pppp,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    };
+    if (a->s) {
+        do_pppp_flags(s, a, &op);
+    } else if (a->pg == a->rn) {
+        do_vector3_p(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm);
+    } else {
+        do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
+    }
+}
+
+static void gen_eor_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
+{
+    tcg_gen_xor_i64(pd, pn, pm);
+    tcg_gen_and_i64(pd, pd, pg);
+}
+
+static void gen_eor_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
+                           TCGv_vec pm, TCGv_vec pg)
+{
+    tcg_gen_xor_vec(vece, pd, pn, pm);
+    tcg_gen_and_vec(vece, pd, pd, pg);
+}
+
+static void trans_EOR_pppp(DisasContext *s, arg_rprr_s *a, uint32_t insn)
+{
+    static const GVecGen4 op = {
+        .fni8 = gen_eor_pg_i64,
+        .fniv = gen_eor_pg_vec,
+        .fno = gen_helper_sve_eor_pppp,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    };
+    if (a->s) {
+        do_pppp_flags(s, a, &op);
+    } else {
+        do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
+    }
+}
+
+static void gen_sel_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
+{
+    tcg_gen_and_i64(pn, pn, pg);
+    tcg_gen_andc_i64(pm, pm, pg);
+    tcg_gen_or_i64(pd, pn, pm);
+}
+
+static void gen_sel_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
+                           TCGv_vec pm, TCGv_vec pg)
+{
+    tcg_gen_and_vec(vece, pn, pn, pg);
+    tcg_gen_andc_vec(vece, pm, pm, pg);
+    tcg_gen_or_vec(vece, pd, pn, pm);
+}
+
+static void trans_SEL_pppp(DisasContext *s, arg_rprr_s *a, uint32_t insn)
+{
+    static const GVecGen4 op = {
+        .fni8 = gen_sel_pg_i64,
+        .fniv = gen_sel_pg_vec,
+        .fno = gen_helper_sve_sel_pppp,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    };
+    if (a->s) {
+        unallocated_encoding(s);
+    } else {
+        do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
+    }
+}
+
+static void gen_orr_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
+{
+    tcg_gen_or_i64(pd, pn, pm);
+    tcg_gen_and_i64(pd, pd, pg);
+}
+
+static void gen_orr_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
+                           TCGv_vec pm, TCGv_vec pg)
+{
+    tcg_gen_or_vec(vece, pd, pn, pm);
+    tcg_gen_and_vec(vece, pd, pd, pg);
+}
+
+static void trans_ORR_pppp(DisasContext *s, arg_rprr_s *a, uint32_t insn)
+{
+    static const GVecGen4 op = {
+        .fni8 = gen_orr_pg_i64,
+        .fniv = gen_orr_pg_vec,
+        .fno = gen_helper_sve_orr_pppp,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    };
+    if (a->s) {
+        do_pppp_flags(s, a, &op);
+    } else if (a->pg == a->rn && a->rn == a->rm) {
+        do_mov_p(s, a->rd, a->rn);
+    } else {
+        do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
+    }
+}
+
+static void gen_orn_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
+{
+    tcg_gen_orc_i64(pd, pn, pm);
+    tcg_gen_and_i64(pd, pd, pg);
+}
+
+static void gen_orn_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
+                           TCGv_vec pm, TCGv_vec pg)
+{
+    tcg_gen_orc_vec(vece, pd, pn, pm);
+    tcg_gen_and_vec(vece, pd, pd, pg);
+}
+
+static void trans_ORN_pppp(DisasContext *s, arg_rprr_s *a, uint32_t insn)
+{
+    static const GVecGen4 op = {
+        .fni8 = gen_orn_pg_i64,
+        .fniv = gen_orn_pg_vec,
+        .fno = gen_helper_sve_orn_pppp,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    };
+    if (a->s) {
+        do_pppp_flags(s, a, &op);
+    } else {
+        do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
+    }
+}
+
+static void gen_nor_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
+{
+    tcg_gen_or_i64(pd, pn, pm);
+    tcg_gen_andc_i64(pd, pg, pd);
+}
+
+static void gen_nor_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
+                           TCGv_vec pm, TCGv_vec pg)
+{
+    tcg_gen_or_vec(vece, pd, pn, pm);
+    tcg_gen_andc_vec(vece, pd, pg, pd);
+}
+
+static void trans_NOR_pppp(DisasContext *s, arg_rprr_s *a, uint32_t insn)
+{
+    static const GVecGen4 op = {
+        .fni8 = gen_nor_pg_i64,
+        .fniv = gen_nor_pg_vec,
+        .fno = gen_helper_sve_nor_pppp,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    };
+    if (a->s) {
+        do_pppp_flags(s, a, &op);
+    } else {
+        do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
+    }
+}
+
+static void gen_nand_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
+{
+    tcg_gen_and_i64(pd, pn, pm);
+    tcg_gen_andc_i64(pd, pg, pd);
+}
+
+static void gen_nand_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
+                           TCGv_vec pm, TCGv_vec pg)
+{
+    tcg_gen_and_vec(vece, pd, pn, pm);
+    tcg_gen_andc_vec(vece, pd, pg, pd);
+}
+
+static void trans_NAND_pppp(DisasContext *s, arg_rprr_s *a, uint32_t insn)
+{
+    static const GVecGen4 op = {
+        .fni8 = gen_nand_pg_i64,
+        .fniv = gen_nand_pg_vec,
+        .fno = gen_helper_sve_nand_pppp,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    };
+    if (a->s) {
+        do_pppp_flags(s, a, &op);
+    } else {
+        do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
+    }
+}
+
 /*
  *** SVE Predicate Misc Group
  */
 
-void trans_PTEST(DisasContext *s, arg_PTEST *a, uint32_t insn)
+static void trans_PTEST(DisasContext *s, arg_PTEST *a, uint32_t insn)
 {
     int nofs = pred_full_reg_offset(s, a->rn);
     int gofs = pred_full_reg_offset(s, a->pg);
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 7efaa8fe8e..d92886127a 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -31,6 +31,7 @@
 
 &rri		rd rn imm
 &rrr_esz	rd rn rm esz
+&rprr_s		rd pg rn rm s
 
 ###########################################################################
 # Named instruction formats.  These are generally used to
@@ -39,6 +40,9 @@
 # Three operand with unused vector element size
 @rd_rn_rm_e0	........ ... rm:5 ... ... rn:5 rd:5		&rrr_esz esz=0
 
+# Three prediate operand, with governing predicate, flag setting
+@pd_pg_pn_pm_s	........ . s:1 .. rm:4 .. pg:4 . rn:4 . rd:4	&rprr_s
+
 # Basic Load/Store with 9-bit immediate offset
 @pd_rn_i9	........ ........ ...... rn:5 . rd:4	\
 		&rri imm=%imm9_16_10
@@ -56,6 +60,18 @@ ORR_zzz		00000100 01 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
 EOR_zzz		00000100 10 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
 BIC_zzz		00000100 11 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
 
+### SVE Predicate Logical Operations Group
+
+# SVE predicate logical operations
+AND_pppp	00100101 0. 00 .... 01 .... 0 .... 0 ....	@pd_pg_pn_pm_s
+BIC_pppp	00100101 0. 00 .... 01 .... 0 .... 1 ....	@pd_pg_pn_pm_s
+EOR_pppp	00100101 0. 00 .... 01 .... 1 .... 0 ....	@pd_pg_pn_pm_s
+SEL_pppp	00100101 0. 00 .... 01 .... 1 .... 1 ....	@pd_pg_pn_pm_s
+ORR_pppp	00100101 1. 00 .... 01 .... 0 .... 0 ....	@pd_pg_pn_pm_s
+ORN_pppp	00100101 1. 00 .... 01 .... 0 .... 1 ....	@pd_pg_pn_pm_s
+NOR_pppp	00100101 1. 00 .... 01 .... 1 .... 0 ....	@pd_pg_pn_pm_s
+NAND_pppp	00100101 1. 00 .... 01 .... 1 .... 1 ....	@pd_pg_pn_pm_s
+
 ### SVE Predicate Misc Group
 
 # SVE predicate test
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 08/67] target/arm: Implement SVE Predicate Misc Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (6 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 07/67] target/arm: Implement SVE Predicate Logical Operations Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 11:22   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 09/67] target/arm: Implement SVE Integer Binary Arithmetic - Predicated Group Richard Henderson
                   ` (60 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h           |   3 +
 target/arm/helper-sve.h    |   3 +
 target/arm/sve_helper.c    |  86 +++++++++++++++++++++++-
 target/arm/translate-sve.c | 163 ++++++++++++++++++++++++++++++++++++++++++++-
 target/arm/sve.decode      |  41 ++++++++++++
 5 files changed, 293 insertions(+), 3 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 8befe43a01..27f395183b 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2915,4 +2915,7 @@ static inline uint64_t *aa64_vfp_qreg(CPUARMState *env, unsigned regno)
     return &env->vfp.zregs[regno].d[0];
 }
 
+/* Shared between translate-sve.c and sve_helper.c.  */
+extern const uint64_t pred_esz_masks[4];
+
 #endif
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 57adc4d912..0c04afff8c 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -20,6 +20,9 @@
 DEF_HELPER_FLAGS_2(sve_predtest1, TCG_CALL_NO_WG, i32, i64, i64)
 DEF_HELPER_FLAGS_3(sve_predtest, TCG_CALL_NO_WG, i32, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_3(sve_pfirst, TCG_CALL_NO_WG, i32, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_pnext, TCG_CALL_NO_WG, i32, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index b63e7cc90e..cee7d9bcf6 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -39,7 +39,7 @@
 
 static uint32_t iter_predtest_fwd(uint64_t d, uint64_t g, uint32_t flags)
 {
-    if (g) {
+    if (likely(g)) {
         /* Compute N from first D & G.
            Use bit 2 to signal first G bit seen.  */
         if (!(flags & 4)) {
@@ -114,3 +114,87 @@ LOGICAL_PPPP(sve_nand_pppp, DO_NAND)
 #undef DO_NAND
 #undef DO_SEL
 #undef LOGICAL_PPPP
+
+/* Similar to the ARM LastActiveElement pseudocode function, except the
+   result is multiplied by the element size.  This includes the not found
+   indication; e.g. not found for esz=3 is -8.  */
+static intptr_t last_active_element(uint64_t *g, intptr_t words, intptr_t esz)
+{
+    uint64_t mask = pred_esz_masks[esz];
+    intptr_t i = words;
+
+    do {
+        uint64_t this_g = g[--i] & mask;
+        if (this_g) {
+            return i * 64 + (63 - clz64(this_g));
+        }
+    } while (i > 0);
+    return (intptr_t)-1 << esz;
+}
+
+uint32_t HELPER(sve_pfirst)(void *vd, void *vg, uint32_t words)
+{
+    uint32_t flags = PREDTEST_INIT;
+    uint64_t *d = vd, *g = vg;
+    intptr_t i = 0;
+
+    do {
+        uint64_t this_d = d[i];
+        uint64_t this_g = g[i];
+
+        if (this_g) {
+            if (!(flags & 4)) {
+                /* Set in D the first bit of G.  */
+                this_d |= this_g & -this_g;
+                d[i] = this_d;
+            }
+            flags = iter_predtest_fwd(this_d, this_g, flags);
+        }
+    } while (++i < words);
+
+    return flags;
+}
+
+uint32_t HELPER(sve_pnext)(void *vd, void *vg, uint32_t pred_desc)
+{
+    intptr_t words = extract32(pred_desc, 0, SIMD_OPRSZ_BITS);
+    intptr_t esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2);
+    uint32_t flags = PREDTEST_INIT;
+    uint64_t *d = vd, *g = vg, esz_mask;
+    intptr_t i, next;
+
+    next = last_active_element(vd, words, esz) + (1 << esz);
+    esz_mask = pred_esz_masks[esz];
+
+    /* Similar to the pseudocode for pnext, but scaled by ESZ
+       so that we find the correct bit.  */
+    if (next < words * 64) {
+        uint64_t mask = -1;
+
+        if (next & 63) {
+            mask = ~((1ull << (next & 63)) - 1);
+            next &= -64;
+        }
+        do {
+            uint64_t this_g = g[next / 64] & esz_mask & mask;
+            if (this_g != 0) {
+                next = (next & -64) + ctz64(this_g);
+                break;
+            }
+            next += 64;
+            mask = -1;
+        } while (next < words * 64);
+    }
+
+    i = 0;
+    do {
+        uint64_t this_d = 0;
+        if (i == next / 64) {
+            this_d = 1ull << (next & 63);
+        }
+        d[i] = this_d;
+        flags = iter_predtest_fwd(this_d, g[i] & esz_mask, flags);
+    } while (++i < words);
+
+    return flags;
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 405f9397a1..a9b6ae046d 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -22,6 +22,7 @@
 #include "exec/exec-all.h"
 #include "tcg-op.h"
 #include "tcg-op-gvec.h"
+#include "tcg-gvec-desc.h"
 #include "qemu/log.h"
 #include "arm_ldst.h"
 #include "translate.h"
@@ -67,9 +68,8 @@ static inline int pred_full_reg_size(DisasContext *s)
  * Note that this is not needed for the vector registers as they
  * are always properly sized for tcg vectors.
  */
-static int pred_gvec_reg_size(DisasContext *s)
+static int size_for_gvec(int size)
 {
-    int size = pred_full_reg_size(s);
     if (size <= 8) {
         return 8;
     } else {
@@ -77,6 +77,11 @@ static int pred_gvec_reg_size(DisasContext *s)
     }
 }
 
+static int pred_gvec_reg_size(DisasContext *s)
+{
+    return size_for_gvec(pred_full_reg_size(s));
+}
+
 /* Invoke a vector expander on two Zregs.  */
 static void do_vector2_z(DisasContext *s, GVecGen2Fn *gvec_fn,
                          int esz, int rd, int rn)
@@ -172,6 +177,12 @@ static void do_predtest(DisasContext *s, int dofs, int gofs, int words)
     tcg_temp_free_i32(t);
 }
 
+/* For each element size, the bits within a predicate word that are active.  */
+const uint64_t pred_esz_masks[4] = {
+    0xffffffffffffffffull, 0x5555555555555555ull,
+    0x1111111111111111ull, 0x0101010101010101ull
+};
+
 /*
  *** SVE Logical - Unpredicated Group
  */
@@ -509,6 +520,154 @@ static void trans_PTEST(DisasContext *s, arg_PTEST *a, uint32_t insn)
     }
 }
 
+/* See the ARM pseudocode DecodePredCount.  */
+static unsigned decode_pred_count(unsigned fullsz, int pattern, int esz)
+{
+    unsigned elements = fullsz >> esz;
+    unsigned bound;
+
+    switch (pattern) {
+    case 0x0: /* POW2 */
+        return pow2floor(elements);
+    case 0x1: /* VL1 */
+    case 0x2: /* VL2 */
+    case 0x3: /* VL3 */
+    case 0x4: /* VL4 */
+    case 0x5: /* VL5 */
+    case 0x6: /* VL6 */
+    case 0x7: /* VL7 */
+    case 0x8: /* VL8 */
+        bound = pattern;
+        break;
+    case 0x9: /* VL16 */
+    case 0xa: /* VL32 */
+    case 0xb: /* VL64 */
+    case 0xc: /* VL128 */
+    case 0xd: /* VL256 */
+        bound = 16 << (pattern - 9);
+        break;
+    case 0x1d: /* MUL4 */
+        return elements - elements % 4;
+    case 0x1e: /* MUL3 */
+        return elements - elements % 3;
+    case 0x1f: /* ALL */
+        return elements;
+    default:   /* #uimm5 */
+        return 0;
+    }
+    return elements >= bound ? bound : 0;
+}
+
+static void trans_PTRUE(DisasContext *s, arg_PTRUE *a, uint32_t insn)
+{
+    unsigned fullsz = vec_full_reg_size(s);
+    unsigned ofs = pred_full_reg_offset(s, a->rd);
+    unsigned numelem, setsz, i;
+    uint64_t word, lastword;
+    TCGv_i64 t;
+
+    numelem = decode_pred_count(fullsz, a->pat, a->esz);
+
+    /* Determine what we must store into each bit, and how many.  */
+    if (numelem == 0) {
+        lastword = word = 0;
+        setsz = fullsz;
+    } else {
+        setsz = numelem << a->esz;
+        lastword = word = pred_esz_masks[a->esz];
+        if (setsz % 64) {
+            lastword &= ~(-1ull << (setsz % 64));
+        }
+    }
+
+    t = tcg_temp_new_i64();
+    if (fullsz <= 64) {
+        tcg_gen_movi_i64(t, lastword);
+        tcg_gen_st_i64(t, cpu_env, ofs);
+        goto done;
+    }
+
+    if (word == lastword) {
+        unsigned maxsz = size_for_gvec(fullsz / 8);
+        unsigned oprsz = size_for_gvec(setsz / 8);
+
+        if (oprsz * 8 == setsz) {
+            tcg_gen_gvec_dup64i(ofs, oprsz, maxsz, word);
+            goto done;
+        }
+        if (oprsz * 8 == setsz + 8) {
+            tcg_gen_gvec_dup64i(ofs, oprsz, maxsz, word);
+            tcg_gen_movi_i64(t, 0);
+            tcg_gen_st_i64(t, cpu_env, ofs + oprsz - 8);
+            goto done;
+        }
+    }
+
+    setsz /= 8;
+    fullsz /= 8;
+
+    tcg_gen_movi_i64(t, word);
+    for (i = 0; i < setsz; i += 8) {
+        tcg_gen_st_i64(t, cpu_env, ofs + i);
+    }
+    if (lastword != word) {
+        tcg_gen_movi_i64(t, lastword);
+        tcg_gen_st_i64(t, cpu_env, ofs + i);
+        i += 8;
+    }
+    if (i < fullsz) {
+        tcg_gen_movi_i64(t, 0);
+        for (; i < fullsz; i += 8) {
+            tcg_gen_st_i64(t, cpu_env, ofs + i);
+        }
+    }
+
+ done:
+    tcg_temp_free_i64(t);
+
+    /* PTRUES */
+    if (a->s) {
+        tcg_gen_movi_i32(cpu_NF, -(word != 0));
+        tcg_gen_movi_i32(cpu_CF, word == 0);
+        tcg_gen_movi_i32(cpu_VF, 0);
+        tcg_gen_mov_i32(cpu_ZF, cpu_NF);
+    }
+}
+
+static void do_pfirst_pnext(DisasContext *s, arg_rr_esz *a,
+                            void (*gen_fn)(TCGv_i32, TCGv_ptr,
+                                           TCGv_ptr, TCGv_i32))
+{
+    TCGv_ptr t_pd = tcg_temp_new_ptr();
+    TCGv_ptr t_pg = tcg_temp_new_ptr();
+    TCGv_i32 t;
+    unsigned desc;
+
+    desc = DIV_ROUND_UP(pred_full_reg_size(s), 8);
+    desc = deposit32(desc, SIMD_DATA_SHIFT, 2, a->esz);
+
+    tcg_gen_addi_ptr(t_pd, cpu_env, pred_full_reg_offset(s, a->rd));
+    tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, a->rn));
+    t = tcg_const_i32(desc);
+
+    gen_fn(t, t_pd, t_pg, t);
+    tcg_temp_free_ptr(t_pd);
+    tcg_temp_free_ptr(t_pg);
+
+    do_pred_flags(t);
+    tcg_temp_free_i32(t);
+}
+
+static void trans_PFIRST(DisasContext *s, arg_rr_esz *a, uint32_t insn)
+{
+    do_pfirst_pnext(s, a, gen_helper_sve_pfirst);
+}
+
+static void trans_PNEXT(DisasContext *s, arg_rr_esz *a, uint32_t insn)
+{
+    do_pfirst_pnext(s, a, gen_helper_sve_pnext);
+}
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index d92886127a..2e27ef41cd 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -23,20 +23,30 @@
 # Named fields.  These are primarily for disjoint fields.
 
 %imm9_16_10	16:s6 10:3
+%preg4_5	5:4
 
 ###########################################################################
 # Named attribute sets.  These are used to make nice(er) names
 # when creating helpers common to those for the individual
 # instruction patterns.
 
+&rr_esz		rd rn esz
 &rri		rd rn imm
 &rrr_esz	rd rn rm esz
 &rprr_s		rd pg rn rm s
 
+&ptrue		rd esz pat s
+
 ###########################################################################
 # Named instruction formats.  These are generally used to
 # reduce the amount of duplication between instruction patterns.
 
+# Two operand with unused vector element size
+@pd_pn_e0	........ ........ ....... rn:4 . rd:4		&rr_esz esz=0
+
+# Two operand
+@pd_pn		........ esz:2 .. .... ....... rn:4 . rd:4	&rr_esz
+
 # Three operand with unused vector element size
 @rd_rn_rm_e0	........ ... rm:5 ... ... rn:5 rd:5		&rrr_esz esz=0
 
@@ -77,6 +87,37 @@ NAND_pppp	00100101 1. 00 .... 01 .... 1 .... 1 ....	@pd_pg_pn_pm_s
 # SVE predicate test
 PTEST		00100101 01010000 11 pg:4 0 rn:4 00000
 
+# SVE predicate initialize
+PTRUE		00100101 esz:2 01100 s:1 111000 pat:5 0 rd:4	&ptrue
+
+# SVE initialize FFR (SETFFR)
+PTRUE		00100101 0010 1100 1001 0000 0000 0000 \
+		&ptrue rd=16 esz=0 pat=31 s=0
+
+# SVE zero predicate register (PFALSE)
+# Note that pat=32 is outside of the natural 0..31, and will
+# always hit the default #uimm5 case of decode_pred_count.
+PTRUE		00100101 0001 1000 1110 0100 0000 rd:4 \
+		&ptrue esz=0 pat=32 s=0
+
+# SVE predicate read from FFR (predicated) (RDFFR)
+ORR_pppp	00100101 0 s:1 0110001111000 pg:4 0 rd:4 \
+		&rprr_s rn=16 rm=16
+
+# SVE predicate read from FFR (unpredicated) (RDFFR)
+ORR_pppp	00100101 0001 1001 1111 0000 0000 rd:4 \
+		&rprr_s rn=16 rm=16 pg=16 s=0
+
+# SVE FFR write from predicate (WRFFR)
+ORR_pppp	00100101 0010 1000 1001 000 rn:4 00000 \
+		&rprr_s rd=16 rm=%preg4_5 pg=%preg4_5 s=0
+
+# SVE predicate first active
+PFIRST		00100101 01 011 000 11000 00 .... 0 ....	@pd_pn_e0
+
+# SVE predicate next active
+PNEXT		00100101 .. 011 001 11000 10 .... 0 ....	@pd_pn
+
 ### SVE Memory - 32-bit Gather and Unsized Contiguous Group
 
 # SVE load predicate register
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 09/67] target/arm: Implement SVE Integer Binary Arithmetic - Predicated Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (7 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 08/67] target/arm: Implement SVE Predicate Misc Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 11:35   ` Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 10/67] target/arm: Implement SVE Integer Reduction Group Richard Henderson
                   ` (59 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 145 +++++++++++++++++++++++++++++++++
 target/arm/sve_helper.c    | 196 ++++++++++++++++++++++++++++++++++++++++++++-
 target/arm/translate-sve.c |  65 +++++++++++++++
 target/arm/sve.decode      |  42 ++++++++++
 4 files changed, 447 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 0c04afff8c..5b82ba1501 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -23,6 +23,151 @@ DEF_HELPER_FLAGS_3(sve_predtest, TCG_CALL_NO_WG, i32, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_pfirst, TCG_CALL_NO_WG, i32, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_pnext, TCG_CALL_NO_WG, i32, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(sve_and_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_and_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_and_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_and_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_eor_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_eor_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_eor_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_eor_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_orr_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_orr_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_orr_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_orr_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_bic_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_bic_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_bic_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_bic_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_add_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_add_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_add_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_add_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_sub_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_sub_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_sub_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_sub_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_smax_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_smax_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_smax_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_smax_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_umax_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_umax_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_umax_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_umax_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_smin_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_smin_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_smin_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_smin_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_umin_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_umin_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_umin_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_umin_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_sabd_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_sabd_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_sabd_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_sabd_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_uabd_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_uabd_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_uabd_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_uabd_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_mul_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_mul_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_mul_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_mul_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_smulh_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_smulh_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_smulh_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_smulh_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_umulh_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_umulh_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_umulh_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_umulh_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_sdiv_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_sdiv_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_udiv_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_udiv_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index cee7d9bcf6..26c177c2fd 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -25,6 +25,22 @@
 #include "tcg/tcg-gvec-desc.h"
 
 
+/* Note that vector data is stored in host-endian 64-bit chunks,
+   so addressing units smaller than that needs a host-endian fixup.  */
+#ifdef HOST_WORDS_BIGENDIAN
+#define H1(x)   ((x) ^ 7)
+#define H1_2(x) ((x) ^ 6)
+#define H1_4(x) ((x) ^ 4)
+#define H2(x)   ((x) ^ 3)
+#define H4(x)   ((x) ^ 1)
+#else
+#define H1(x)   (x)
+#define H1_2(x) (x)
+#define H1_4(x) (x)
+#define H2(x)   (x)
+#define H4(x)   (x)
+#endif
+
 /* Return a value for NZCV as per the ARM PredTest pseudofunction.
  *
  * The return value has bit 31 set if N is set, bit 1 set if Z is clear,
@@ -105,7 +121,7 @@ LOGICAL_PPPP(sve_orn_pppp, DO_ORN)
 LOGICAL_PPPP(sve_nor_pppp, DO_NOR)
 LOGICAL_PPPP(sve_nand_pppp, DO_NAND)
 
-#undef DO_ADD
+#undef DO_AND
 #undef DO_BIC
 #undef DO_EOR
 #undef DO_ORR
@@ -115,6 +131,184 @@ LOGICAL_PPPP(sve_nand_pppp, DO_NAND)
 #undef DO_SEL
 #undef LOGICAL_PPPP
 
+/* Fully general three-operand expander, controlled by a predicate.
+ * This is complicated by the host-endian storage of the register file.
+ */
+/* ??? I don't expect the compiler could ever vectorize this itself.
+ * With some tables we can convert bit masks to byte masks, and with
+ * extra care wrt byte/word ordering we could use gcc generic vectors
+ * and do 16 bytes at a time.
+ */
+#define DO_ZPZZ(NAME, TYPE, H, OP)                                       \
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \
+{                                                                       \
+    intptr_t i, opr_sz = simd_oprsz(desc);                              \
+    for (i = 0; i < opr_sz; ) {                                         \
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));                 \
+        do {                                                            \
+            if (pg & 1) {                                               \
+                TYPE nn = *(TYPE *)(vn + H(i));                         \
+                TYPE mm = *(TYPE *)(vm + H(i));                         \
+                *(TYPE *)(vd + H(i)) = OP(nn, mm);                      \
+            }                                                           \
+            i += sizeof(TYPE), pg >>= sizeof(TYPE);                     \
+        } while (i & 15);                                               \
+    }                                                                   \
+}
+
+/* Similarly, specialized for 64-bit operands.  */
+#define DO_ZPZZ_D(NAME, TYPE, OP)                                \
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \
+{                                                               \
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;                  \
+    TYPE *d = vd, *n = vn, *m = vm;                             \
+    uint8_t *pg = vg;                                           \
+    for (i = 0; i < opr_sz; i += 1) {                           \
+        if (pg[H1(i)] & 1) {                                    \
+            TYPE nn = n[i], mm = m[i];                          \
+            d[i] = OP(nn, mm);                                  \
+        }                                                       \
+    }                                                           \
+}
+
+#define DO_AND(N, M)  (N & M)
+#define DO_EOR(N, M)  (N ^ M)
+#define DO_ORR(N, M)  (N | M)
+#define DO_BIC(N, M)  (N & ~M)
+#define DO_ADD(N, M)  (N + M)
+#define DO_SUB(N, M)  (N - M)
+#define DO_MAX(N, M)  ((N) >= (M) ? (N) : (M))
+#define DO_MIN(N, M)  ((N) >= (M) ? (M) : (N))
+#define DO_ABD(N, M)  ((N) >= (M) ? (N) - (M) : (M) - (N))
+#define DO_MUL(N, M)  (N * M)
+#define DO_DIV(N, M)  (M ? N / M : 0)
+
+DO_ZPZZ(sve_and_zpzz_b, uint8_t, H1, DO_AND)
+DO_ZPZZ(sve_and_zpzz_h, uint16_t, H1_2, DO_AND)
+DO_ZPZZ(sve_and_zpzz_s, uint32_t, H1_4, DO_AND)
+DO_ZPZZ_D(sve_and_zpzz_d, uint64_t, DO_AND)
+
+DO_ZPZZ(sve_orr_zpzz_b, uint8_t, H1, DO_ORR)
+DO_ZPZZ(sve_orr_zpzz_h, uint16_t, H1_2, DO_ORR)
+DO_ZPZZ(sve_orr_zpzz_s, uint32_t, H1_4, DO_ORR)
+DO_ZPZZ_D(sve_orr_zpzz_d, uint64_t, DO_ORR)
+
+DO_ZPZZ(sve_eor_zpzz_b, uint8_t, H1, DO_EOR)
+DO_ZPZZ(sve_eor_zpzz_h, uint16_t, H1_2, DO_EOR)
+DO_ZPZZ(sve_eor_zpzz_s, uint32_t, H1_4, DO_EOR)
+DO_ZPZZ_D(sve_eor_zpzz_d, uint64_t, DO_EOR)
+
+DO_ZPZZ(sve_bic_zpzz_b, uint8_t, H1, DO_BIC)
+DO_ZPZZ(sve_bic_zpzz_h, uint16_t, H1_2, DO_BIC)
+DO_ZPZZ(sve_bic_zpzz_s, uint32_t, H1_4, DO_BIC)
+DO_ZPZZ_D(sve_bic_zpzz_d, uint64_t, DO_BIC)
+
+DO_ZPZZ(sve_add_zpzz_b, uint8_t, H1, DO_ADD)
+DO_ZPZZ(sve_add_zpzz_h, uint16_t, H1_2, DO_ADD)
+DO_ZPZZ(sve_add_zpzz_s, uint32_t, H1_4, DO_ADD)
+DO_ZPZZ_D(sve_add_zpzz_d, uint64_t, DO_ADD)
+
+DO_ZPZZ(sve_sub_zpzz_b, uint8_t, H1, DO_SUB)
+DO_ZPZZ(sve_sub_zpzz_h, uint16_t, H1_2, DO_SUB)
+DO_ZPZZ(sve_sub_zpzz_s, uint32_t, H1_4, DO_SUB)
+DO_ZPZZ_D(sve_sub_zpzz_d, uint64_t, DO_SUB)
+
+DO_ZPZZ(sve_smax_zpzz_b, int8_t, H1, DO_MAX)
+DO_ZPZZ(sve_smax_zpzz_h, int16_t, H1_2, DO_MAX)
+DO_ZPZZ(sve_smax_zpzz_s, int32_t, H1_4, DO_MAX)
+DO_ZPZZ_D(sve_smax_zpzz_d, int64_t, DO_MAX)
+
+DO_ZPZZ(sve_umax_zpzz_b, uint8_t, H1, DO_MAX)
+DO_ZPZZ(sve_umax_zpzz_h, uint16_t, H1_2, DO_MAX)
+DO_ZPZZ(sve_umax_zpzz_s, uint32_t, H1_4, DO_MAX)
+DO_ZPZZ_D(sve_umax_zpzz_d, uint64_t, DO_MAX)
+
+DO_ZPZZ(sve_smin_zpzz_b, int8_t,  H1, DO_MIN)
+DO_ZPZZ(sve_smin_zpzz_h, int16_t,  H1_2, DO_MIN)
+DO_ZPZZ(sve_smin_zpzz_s, int32_t,  H1_4, DO_MIN)
+DO_ZPZZ_D(sve_smin_zpzz_d, int64_t,  DO_MIN)
+
+DO_ZPZZ(sve_umin_zpzz_b, uint8_t, H1, DO_MIN)
+DO_ZPZZ(sve_umin_zpzz_h, uint16_t, H1_2, DO_MIN)
+DO_ZPZZ(sve_umin_zpzz_s, uint32_t, H1_4, DO_MIN)
+DO_ZPZZ_D(sve_umin_zpzz_d, uint64_t, DO_MIN)
+
+DO_ZPZZ(sve_sabd_zpzz_b, int8_t,  H1, DO_ABD)
+DO_ZPZZ(sve_sabd_zpzz_h, int16_t,  H1_2, DO_ABD)
+DO_ZPZZ(sve_sabd_zpzz_s, int32_t,  H1_4, DO_ABD)
+DO_ZPZZ_D(sve_sabd_zpzz_d, int64_t,  DO_ABD)
+
+DO_ZPZZ(sve_uabd_zpzz_b, uint8_t, H1, DO_ABD)
+DO_ZPZZ(sve_uabd_zpzz_h, uint16_t, H1_2, DO_ABD)
+DO_ZPZZ(sve_uabd_zpzz_s, uint32_t, H1_4, DO_ABD)
+DO_ZPZZ_D(sve_uabd_zpzz_d, uint64_t, DO_ABD)
+
+/* Because the computation type is at least twice as large as required,
+   these work for both signed and unsigned source types.  */
+static inline uint8_t do_mulh_b(int32_t n, int32_t m)
+{
+    return (n * m) >> 8;
+}
+
+static inline uint16_t do_mulh_h(int32_t n, int32_t m)
+{
+    return (n * m) >> 16;
+}
+
+static inline uint32_t do_mulh_s(int64_t n, int64_t m)
+{
+    return (n * m) >> 32;
+}
+
+static inline uint64_t do_smulh_d(uint64_t n, uint64_t m)
+{
+    uint64_t lo, hi;
+    muls64(&lo, &hi, n, m);
+    return hi;
+}
+
+static inline uint64_t do_umulh_d(uint64_t n, uint64_t m)
+{
+    uint64_t lo, hi;
+    mulu64(&lo, &hi, n, m);
+    return hi;
+}
+
+DO_ZPZZ(sve_mul_zpzz_b, uint8_t, H1, DO_MUL)
+DO_ZPZZ(sve_mul_zpzz_h, uint16_t, H1_2, DO_MUL)
+DO_ZPZZ(sve_mul_zpzz_s, uint32_t, H1_4, DO_MUL)
+DO_ZPZZ_D(sve_mul_zpzz_d, uint64_t, DO_MUL)
+
+DO_ZPZZ(sve_smulh_zpzz_b, int8_t, H1, do_mulh_b)
+DO_ZPZZ(sve_smulh_zpzz_h, int16_t, H1_2, do_mulh_h)
+DO_ZPZZ(sve_smulh_zpzz_s, int32_t, H1_4, do_mulh_s)
+DO_ZPZZ_D(sve_smulh_zpzz_d, uint64_t, do_smulh_d)
+
+DO_ZPZZ(sve_umulh_zpzz_b, uint8_t, H1, do_mulh_b)
+DO_ZPZZ(sve_umulh_zpzz_h, uint16_t, H1_2, do_mulh_h)
+DO_ZPZZ(sve_umulh_zpzz_s, uint32_t, H1_4, do_mulh_s)
+DO_ZPZZ_D(sve_umulh_zpzz_d, uint64_t, do_umulh_d)
+
+DO_ZPZZ(sve_sdiv_zpzz_s, int32_t, H1_4, DO_DIV)
+DO_ZPZZ_D(sve_sdiv_zpzz_d, int64_t, DO_DIV)
+
+DO_ZPZZ(sve_udiv_zpzz_s, uint32_t, H1_4, DO_DIV)
+DO_ZPZZ_D(sve_udiv_zpzz_d, uint64_t, DO_DIV)
+
+#undef DO_AND
+#undef DO_ORR
+#undef DO_EOR
+#undef DO_BIC
+#undef DO_ADD
+#undef DO_SUB
+#undef DO_MAX
+#undef DO_MIN
+#undef DO_ABD
+#undef DO_MUL
+#undef DO_DIV
+#undef DO_ZPZZ
+#undef DO_ZPZZ_D
+
 /* Similar to the ARM LastActiveElement pseudocode function, except the
    result is multiplied by the element size.  This includes the not found
    indication; e.g. not found for esz=3 is -8.  */
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index a9b6ae046d..116002792a 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -211,6 +211,71 @@ static void trans_BIC_zzz(DisasContext *s, arg_BIC_zzz *a, uint32_t insn)
     do_vector3_z(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm);
 }
 
+/*
+ *** SVE Integer Arithmetic - Binary Predicated Group
+ */
+
+static void do_zpzz_ool(DisasContext *s, arg_rprr_esz *a, gen_helper_gvec_4 *fn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    if (fn == NULL) {
+        unallocated_encoding(s);
+        return;
+    }
+    tcg_gen_gvec_4_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       vec_full_reg_offset(s, a->rm),
+                       pred_full_reg_offset(s, a->pg),
+                       vsz, vsz, 0, fn);
+}
+
+#define DO_ZPZZ(NAME, name) \
+void trans_##NAME##_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn) \
+{                                                                         \
+    static gen_helper_gvec_4 * const fns[4] = {                           \
+        gen_helper_sve_##name##_zpzz_b, gen_helper_sve_##name##_zpzz_h,   \
+        gen_helper_sve_##name##_zpzz_s, gen_helper_sve_##name##_zpzz_d,   \
+    };                                                                    \
+    do_zpzz_ool(s, a, fns[a->esz]);                                       \
+}
+
+DO_ZPZZ(AND, and)
+DO_ZPZZ(EOR, eor)
+DO_ZPZZ(ORR, orr)
+DO_ZPZZ(BIC, bic)
+
+DO_ZPZZ(ADD, add)
+DO_ZPZZ(SUB, sub)
+
+DO_ZPZZ(SMAX, smax)
+DO_ZPZZ(UMAX, umax)
+DO_ZPZZ(SMIN, smin)
+DO_ZPZZ(UMIN, umin)
+DO_ZPZZ(SABD, sabd)
+DO_ZPZZ(UABD, uabd)
+
+DO_ZPZZ(MUL, mul)
+DO_ZPZZ(SMULH, smulh)
+DO_ZPZZ(UMULH, umulh)
+
+void trans_SDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_4 * const fns[4] = {
+        NULL, NULL, gen_helper_sve_sdiv_zpzz_s, gen_helper_sve_sdiv_zpzz_d
+    };
+    do_zpzz_ool(s, a, fns[a->esz]);
+}
+
+void trans_UDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_4 * const fns[4] = {
+        NULL, NULL, gen_helper_sve_udiv_zpzz_s, gen_helper_sve_udiv_zpzz_d
+    };
+    do_zpzz_ool(s, a, fns[a->esz]);
+}
+
+#undef DO_ZPZZ
+
 /*
  *** SVE Predicate Logical Operations Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 2e27ef41cd..5fafe02575 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -25,6 +25,10 @@
 %imm9_16_10	16:s6 10:3
 %preg4_5	5:4
 
+# Either a copy of rd (at bit 0), or a different source
+# as propagated via the MOVPRFX instruction.
+%reg_movprfx		0:5
+
 ###########################################################################
 # Named attribute sets.  These are used to make nice(er) names
 # when creating helpers common to those for the individual
@@ -34,6 +38,7 @@
 &rri		rd rn imm
 &rrr_esz	rd rn rm esz
 &rprr_s		rd pg rn rm s
+&rprr_esz	rd pg rn rm esz
 
 &ptrue		rd esz pat s
 
@@ -53,6 +58,12 @@
 # Three prediate operand, with governing predicate, flag setting
 @pd_pg_pn_pm_s	........ . s:1 .. rm:4 .. pg:4 . rn:4 . rd:4	&rprr_s
 
+# Two register operand, with governing predicate, vector element size
+@rdn_pg_rm	........ esz:2 ... ... ... pg:3 rm:5 rd:5 \
+		&rprr_esz rn=%reg_movprfx
+@rdm_pg_rn	........ esz:2 ... ... ... pg:3 rn:5 rd:5 \
+		&rprr_esz rm=%reg_movprfx
+
 # Basic Load/Store with 9-bit immediate offset
 @pd_rn_i9	........ ........ ...... rn:5 . rd:4	\
 		&rri imm=%imm9_16_10
@@ -62,6 +73,37 @@
 ###########################################################################
 # Instruction patterns.  Grouped according to the SVE encodingindex.xhtml.
 
+### SVE Integer Arithmetic - Binary Predicated Group
+
+# SVE bitwise logical vector operations (predicated)
+ORR_zpzz	00000100 .. 011 000 000 ... ..... .....   @rdn_pg_rm
+EOR_zpzz	00000100 .. 011 001 000 ... ..... .....   @rdn_pg_rm
+AND_zpzz	00000100 .. 011 010 000 ... ..... .....   @rdn_pg_rm
+BIC_zpzz	00000100 .. 011 011 000 ... ..... .....   @rdn_pg_rm
+
+# SVE integer add/subtract vectors (predicated)
+ADD_zpzz	00000100 .. 000 000 000 ... ..... .....   @rdn_pg_rm
+SUB_zpzz	00000100 .. 000 001 000 ... ..... .....   @rdn_pg_rm
+SUB_zpzz	00000100 .. 000 011 000 ... ..... .....   @rdm_pg_rn # SUBR
+
+# SVE integer min/max/difference (predicated)
+SMAX_zpzz	00000100 .. 001 000 000 ... ..... .....   @rdn_pg_rm
+UMAX_zpzz	00000100 .. 001 001 000 ... ..... .....   @rdn_pg_rm
+SMIN_zpzz	00000100 .. 001 010 000 ... ..... .....   @rdn_pg_rm
+UMIN_zpzz	00000100 .. 001 011 000 ... ..... .....   @rdn_pg_rm
+SABD_zpzz	00000100 .. 001 100 000 ... ..... .....   @rdn_pg_rm
+UABD_zpzz	00000100 .. 001 101 000 ... ..... .....   @rdn_pg_rm
+
+# SVE integer multiply/divide (predicated)
+MUL_zpzz	00000100 .. 010 000 000 ... ..... .....   @rdn_pg_rm
+SMULH_zpzz	00000100 .. 010 010 000 ... ..... .....   @rdn_pg_rm
+UMULH_zpzz	00000100 .. 010 011 000 ... ..... .....   @rdn_pg_rm
+# Note that divide requires size >= 2; below 2 is unallocated.
+SDIV_zpzz	00000100 .. 010 100 000 ... ..... .....   @rdn_pg_rm
+UDIV_zpzz	00000100 .. 010 101 000 ... ..... .....   @rdn_pg_rm
+SDIV_zpzz	00000100 .. 010 110 000 ... ..... .....   @rdm_pg_rn # SDIVR
+UDIV_zpzz	00000100 .. 010 111 000 ... ..... .....   @rdm_pg_rn # UDIVR
+
 ### SVE Logical - Unpredicated Group
 
 # SVE bitwise logical operations (unpredicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 10/67] target/arm: Implement SVE Integer Reduction Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (8 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 09/67] target/arm: Implement SVE Integer Binary Arithmetic - Predicated Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 11:50   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 11/67] target/arm: Implement SVE bitwise shift by immediate (predicated) Richard Henderson
                   ` (58 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Excepting MOVPRFX, which isn't a reduction.  Presumably it is
placed within the group because of its encoding.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 44 +++++++++++++++++++++
 target/arm/sve_helper.c    | 95 +++++++++++++++++++++++++++++++++++++++++++++-
 target/arm/translate-sve.c | 65 +++++++++++++++++++++++++++++++
 target/arm/sve.decode      | 22 +++++++++++
 4 files changed, 224 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 5b82ba1501..6b6bbeb272 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -168,6 +168,50 @@ DEF_HELPER_FLAGS_5(sve_udiv_zpzz_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(sve_udiv_zpzz_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_3(sve_orv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_orv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_orv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_orv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_eorv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_eorv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_eorv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_eorv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_andv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_andv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_andv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_andv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_saddv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_saddv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_saddv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_uaddv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_uaddv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_uaddv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_uaddv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_smaxv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_smaxv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_smaxv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_smaxv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_umaxv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_umaxv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_umaxv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_umaxv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_sminv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_sminv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_sminv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_sminv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_uminv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_uminv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_uminv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_uminv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 26c177c2fd..18fb27805e 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -295,6 +295,99 @@ DO_ZPZZ_D(sve_sdiv_zpzz_d, int64_t, DO_DIV)
 DO_ZPZZ(sve_udiv_zpzz_s, uint32_t, H1_4, DO_DIV)
 DO_ZPZZ_D(sve_udiv_zpzz_d, uint64_t, DO_DIV)
 
+#undef DO_ZPZZ
+#undef DO_ZPZZ_D
+
+/* Two-operand reduction expander, controlled by a predicate.
+ * The difference between TYPERED and TYPERET has to do with
+ * sign-extension.  E.g. for SMAX, TYPERED must be signed,
+ * but TYPERET must be unsigned so that e.g. a 32-bit value
+ * is not sign-extended to the ABI uint64_t return type.
+ */
+/* ??? If we were to vectorize this by hand the reduction ordering
+ * would change.  For integer operands, this is perfectly fine.
+ */
+#define DO_VPZ(NAME, TYPEELT, TYPERED, TYPERET, H, INIT, OP) \
+uint64_t HELPER(NAME)(void *vn, void *vg, uint32_t desc)   \
+{                                                          \
+    intptr_t i, opr_sz = simd_oprsz(desc);                 \
+    TYPERED ret = INIT;                                    \
+    for (i = 0; i < opr_sz; ) {                            \
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));    \
+        do {                                               \
+            if (pg & 1) {                                  \
+                TYPEELT nn = *(TYPEELT *)(vn + H(i));      \
+                ret = OP(ret, nn);                         \
+            }                                              \
+            i += sizeof(TYPEELT), pg >>= sizeof(TYPEELT);  \
+        } while (i & 15);                                  \
+    }                                                      \
+    return (TYPERET)ret;                                   \
+}
+
+#define DO_VPZ_D(NAME, TYPEE, TYPER, INIT, OP)             \
+uint64_t HELPER(NAME)(void *vn, void *vg, uint32_t desc)   \
+{                                                          \
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;             \
+    TYPEE *n = vn;                                         \
+    uint8_t *pg = vg;                                      \
+    TYPER ret = INIT;                                      \
+    for (i = 0; i < opr_sz; i += 1) {                      \
+        if (pg[H1(i)] & 1) {                               \
+            TYPEE nn = n[i];                               \
+            ret = OP(ret, nn);                             \
+        }                                                  \
+    }                                                      \
+    return ret;                                            \
+}
+
+DO_VPZ(sve_orv_b, uint8_t, uint8_t, uint8_t, H1, 0, DO_ORR)
+DO_VPZ(sve_orv_h, uint16_t, uint16_t, uint16_t, H1_2, 0, DO_ORR)
+DO_VPZ(sve_orv_s, uint32_t, uint32_t, uint32_t, H1_4, 0, DO_ORR)
+DO_VPZ_D(sve_orv_d, uint64_t, uint64_t, 0, DO_ORR)
+
+DO_VPZ(sve_eorv_b, uint8_t, uint8_t, uint8_t, H1, 0, DO_EOR)
+DO_VPZ(sve_eorv_h, uint16_t, uint16_t, uint16_t, H1_2, 0, DO_EOR)
+DO_VPZ(sve_eorv_s, uint32_t, uint32_t, uint32_t, H1_4, 0, DO_EOR)
+DO_VPZ_D(sve_eorv_d, uint64_t, uint64_t, 0, DO_EOR)
+
+DO_VPZ(sve_andv_b, uint8_t, uint8_t, uint8_t, H1, -1, DO_AND)
+DO_VPZ(sve_andv_h, uint16_t, uint16_t, uint16_t, H1_2, -1, DO_AND)
+DO_VPZ(sve_andv_s, uint32_t, uint32_t, uint32_t, H1_4, -1, DO_AND)
+DO_VPZ_D(sve_andv_d, uint64_t, uint64_t, -1, DO_AND)
+
+DO_VPZ(sve_saddv_b, int8_t, uint64_t, uint64_t, H1, 0, DO_ADD)
+DO_VPZ(sve_saddv_h, int16_t, uint64_t, uint64_t, H1_2, 0, DO_ADD)
+DO_VPZ(sve_saddv_s, int32_t, uint64_t, uint64_t, H1_4, 0, DO_ADD)
+
+DO_VPZ(sve_uaddv_b, uint8_t, uint64_t, uint64_t, H1, 0, DO_ADD)
+DO_VPZ(sve_uaddv_h, uint16_t, uint64_t, uint64_t, H1_2, 0, DO_ADD)
+DO_VPZ(sve_uaddv_s, uint32_t, uint64_t, uint64_t, H1_4, 0, DO_ADD)
+DO_VPZ_D(sve_uaddv_d, uint64_t, uint64_t, 0, DO_ADD)
+
+DO_VPZ(sve_smaxv_b, int8_t, int8_t, uint8_t, H1, INT8_MIN, DO_MAX)
+DO_VPZ(sve_smaxv_h, int16_t, int16_t, uint16_t, H1_2, INT16_MIN, DO_MAX)
+DO_VPZ(sve_smaxv_s, int32_t, int32_t, uint32_t, H1_4, INT32_MIN, DO_MAX)
+DO_VPZ_D(sve_smaxv_d, int64_t, int64_t, INT64_MIN, DO_MAX)
+
+DO_VPZ(sve_umaxv_b, uint8_t, uint8_t, uint8_t, H1, 0, DO_MAX)
+DO_VPZ(sve_umaxv_h, uint16_t, uint16_t, uint16_t, H1_2, 0, DO_MAX)
+DO_VPZ(sve_umaxv_s, uint32_t, uint32_t, uint32_t, H1_4, 0, DO_MAX)
+DO_VPZ_D(sve_umaxv_d, uint64_t, uint64_t, 0, DO_MAX)
+
+DO_VPZ(sve_sminv_b, int8_t, int8_t, uint8_t, H1, INT8_MAX, DO_MIN)
+DO_VPZ(sve_sminv_h, int16_t, int16_t, uint16_t, H1_2, INT16_MAX, DO_MIN)
+DO_VPZ(sve_sminv_s, int32_t, int32_t, uint32_t, H1_4, INT32_MAX, DO_MIN)
+DO_VPZ_D(sve_sminv_d, int64_t, int64_t, INT64_MAX, DO_MIN)
+
+DO_VPZ(sve_uminv_b, uint8_t, uint8_t, uint8_t, H1, -1, DO_MIN)
+DO_VPZ(sve_uminv_h, uint16_t, uint16_t, uint16_t, H1_2, -1, DO_MIN)
+DO_VPZ(sve_uminv_s, uint32_t, uint32_t, uint32_t, H1_4, -1, DO_MIN)
+DO_VPZ_D(sve_uminv_d, uint64_t, uint64_t, -1, DO_MIN)
+
+#undef DO_VPZ
+#undef DO_VPZ_D
+
 #undef DO_AND
 #undef DO_ORR
 #undef DO_EOR
@@ -306,8 +399,6 @@ DO_ZPZZ_D(sve_udiv_zpzz_d, uint64_t, DO_DIV)
 #undef DO_ABD
 #undef DO_MUL
 #undef DO_DIV
-#undef DO_ZPZZ
-#undef DO_ZPZZ_D
 
 /* Similar to the ARM LastActiveElement pseudocode function, except the
    result is multiplied by the element size.  This includes the not found
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 116002792a..49251a53c1 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -276,6 +276,71 @@ void trans_UDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
 
 #undef DO_ZPZZ
 
+/*
+ *** SVE Integer Reduction Group
+ */
+
+typedef void gen_helper_gvec_reduc(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_i32);
+static void do_vpz_ool(DisasContext *s, arg_rpr_esz *a,
+                       gen_helper_gvec_reduc *fn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_ptr t_zn, t_pg;
+    TCGv_i32 desc;
+    TCGv_i64 temp;
+
+    if (fn == 0) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    desc = tcg_const_i32(simd_desc(vsz, vsz, 0));
+    temp = tcg_temp_new_i64();
+    t_zn = tcg_temp_new_ptr();
+    t_pg = tcg_temp_new_ptr();
+
+    tcg_gen_addi_ptr(t_zn, cpu_env, vec_full_reg_offset(s, a->rn));
+    tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, a->pg));
+    fn(temp, t_zn, t_pg, desc);
+    tcg_temp_free_ptr(t_zn);
+    tcg_temp_free_ptr(t_pg);
+    tcg_temp_free_i32(desc);
+
+    write_fp_dreg(s, a->rd, temp);
+    tcg_temp_free_i64(temp);
+}
+
+#define DO_VPZ(NAME, name) \
+static void trans_##NAME(DisasContext *s, arg_rpr_esz *a, uint32_t insn) \
+{                                                                        \
+    static gen_helper_gvec_reduc * const fns[4] = {                      \
+        gen_helper_sve_##name##_b, gen_helper_sve_##name##_h,            \
+        gen_helper_sve_##name##_s, gen_helper_sve_##name##_d,            \
+    };                                                                   \
+    do_vpz_ool(s, a, fns[a->esz]);                                       \
+}
+
+DO_VPZ(ORV, orv)
+DO_VPZ(ANDV, andv)
+DO_VPZ(EORV, eorv)
+
+DO_VPZ(UADDV, uaddv)
+DO_VPZ(SMAXV, smaxv)
+DO_VPZ(UMAXV, umaxv)
+DO_VPZ(SMINV, sminv)
+DO_VPZ(UMINV, uminv)
+
+static void trans_SADDV(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_reduc * const fns[4] = {
+        gen_helper_sve_saddv_b, gen_helper_sve_saddv_h,
+        gen_helper_sve_saddv_s, NULL
+    };
+    do_vpz_ool(s, a, fns[a->esz]);
+}
+
+#undef DO_VPZ
+
 /*
  *** SVE Predicate Logical Operations Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 5fafe02575..b390d8f398 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -37,6 +37,7 @@
 &rr_esz		rd rn esz
 &rri		rd rn imm
 &rrr_esz	rd rn rm esz
+&rpr_esz	rd pg rn esz
 &rprr_s		rd pg rn rm s
 &rprr_esz	rd pg rn rm esz
 
@@ -64,6 +65,9 @@
 @rdm_pg_rn	........ esz:2 ... ... ... pg:3 rn:5 rd:5 \
 		&rprr_esz rm=%reg_movprfx
 
+# One register operand, with governing predicate, vector element size
+@rd_pg_rn	........ esz:2 ... ... ... pg:3 rn:5 rd:5	&rpr_esz
+
 # Basic Load/Store with 9-bit immediate offset
 @pd_rn_i9	........ ........ ...... rn:5 . rd:4	\
 		&rri imm=%imm9_16_10
@@ -104,6 +108,24 @@ UDIV_zpzz	00000100 .. 010 101 000 ... ..... .....   @rdn_pg_rm
 SDIV_zpzz	00000100 .. 010 110 000 ... ..... .....   @rdm_pg_rn # SDIVR
 UDIV_zpzz	00000100 .. 010 111 000 ... ..... .....   @rdm_pg_rn # UDIVR
 
+### SVE Integer Reduction Group
+
+# SVE bitwise logical reduction (predicated)
+ORV		00000100 .. 011 000 001 ... ..... .....		@rd_pg_rn
+EORV		00000100 .. 011 001 001 ... ..... .....		@rd_pg_rn
+ANDV		00000100 .. 011 010 001 ... ..... .....		@rd_pg_rn
+
+# SVE integer add reduction (predicated)
+# Note that saddv requires size != 3.
+UADDV		00000100 .. 000 001 001 ... ..... .....		@rd_pg_rn
+SADDV		00000100 .. 000 000 001 ... ..... .....		@rd_pg_rn
+
+# SVE integer min/max reduction (predicated)
+SMAXV		00000100 .. 001 000 001 ... ..... .....		@rd_pg_rn
+UMAXV		00000100 .. 001 001 001 ... ..... .....		@rd_pg_rn
+SMINV		00000100 .. 001 010 001 ... ..... .....		@rd_pg_rn
+UMINV		00000100 .. 001 011 001 ... ..... .....		@rd_pg_rn
+
 ### SVE Logical - Unpredicated Group
 
 # SVE bitwise logical operations (unpredicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 11/67] target/arm: Implement SVE bitwise shift by immediate (predicated)
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (9 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 10/67] target/arm: Implement SVE Integer Reduction Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 12:03   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 12/67] target/arm: Implement SVE bitwise shift by vector (predicated) Richard Henderson
                   ` (57 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  25 +++++
 target/arm/sve_helper.c    | 265 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 128 ++++++++++++++++++++++
 target/arm/sve.decode      |  29 ++++-
 4 files changed, 445 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 6b6bbeb272..b3c89579af 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -212,6 +212,31 @@ DEF_HELPER_FLAGS_3(sve_uminv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_uminv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_uminv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_3(sve_clr_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_clr_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_clr_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_clr_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_asr_zpzi_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_asr_zpzi_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_asr_zpzi_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_asr_zpzi_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_lsr_zpzi_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_lsr_zpzi_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_lsr_zpzi_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_lsr_zpzi_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_lsl_zpzi_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_lsl_zpzi_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_lsl_zpzi_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_lsl_zpzi_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_asrd_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_asrd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_asrd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_asrd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 18fb27805e..b1a170fd70 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -92,6 +92,150 @@ uint32_t HELPER(sve_predtest)(void *vd, void *vg, uint32_t words)
     return flags;
 }
 
+/* Expand active predicate bits to bytes, for byte elements.
+ *  for (i = 0; i < 256; ++i) {
+ *      unsigned long m = 0;
+ *      for (j = 0; j < 8; j++) {
+ *          if ((i >> j) & 1) {
+ *              m |= 0xfful << (j << 3);
+ *          }
+ *      }
+ *      printf("0x%016lx,\n", m);
+ *  }
+ */
+static inline uint64_t expand_pred_b(uint8_t byte)
+{
+    static const uint64_t word[256] = {
+        0x0000000000000000, 0x00000000000000ff, 0x000000000000ff00,
+        0x000000000000ffff, 0x0000000000ff0000, 0x0000000000ff00ff,
+        0x0000000000ffff00, 0x0000000000ffffff, 0x00000000ff000000,
+        0x00000000ff0000ff, 0x00000000ff00ff00, 0x00000000ff00ffff,
+        0x00000000ffff0000, 0x00000000ffff00ff, 0x00000000ffffff00,
+        0x00000000ffffffff, 0x000000ff00000000, 0x000000ff000000ff,
+        0x000000ff0000ff00, 0x000000ff0000ffff, 0x000000ff00ff0000,
+        0x000000ff00ff00ff, 0x000000ff00ffff00, 0x000000ff00ffffff,
+        0x000000ffff000000, 0x000000ffff0000ff, 0x000000ffff00ff00,
+        0x000000ffff00ffff, 0x000000ffffff0000, 0x000000ffffff00ff,
+        0x000000ffffffff00, 0x000000ffffffffff, 0x0000ff0000000000,
+        0x0000ff00000000ff, 0x0000ff000000ff00, 0x0000ff000000ffff,
+        0x0000ff0000ff0000, 0x0000ff0000ff00ff, 0x0000ff0000ffff00,
+        0x0000ff0000ffffff, 0x0000ff00ff000000, 0x0000ff00ff0000ff,
+        0x0000ff00ff00ff00, 0x0000ff00ff00ffff, 0x0000ff00ffff0000,
+        0x0000ff00ffff00ff, 0x0000ff00ffffff00, 0x0000ff00ffffffff,
+        0x0000ffff00000000, 0x0000ffff000000ff, 0x0000ffff0000ff00,
+        0x0000ffff0000ffff, 0x0000ffff00ff0000, 0x0000ffff00ff00ff,
+        0x0000ffff00ffff00, 0x0000ffff00ffffff, 0x0000ffffff000000,
+        0x0000ffffff0000ff, 0x0000ffffff00ff00, 0x0000ffffff00ffff,
+        0x0000ffffffff0000, 0x0000ffffffff00ff, 0x0000ffffffffff00,
+        0x0000ffffffffffff, 0x00ff000000000000, 0x00ff0000000000ff,
+        0x00ff00000000ff00, 0x00ff00000000ffff, 0x00ff000000ff0000,
+        0x00ff000000ff00ff, 0x00ff000000ffff00, 0x00ff000000ffffff,
+        0x00ff0000ff000000, 0x00ff0000ff0000ff, 0x00ff0000ff00ff00,
+        0x00ff0000ff00ffff, 0x00ff0000ffff0000, 0x00ff0000ffff00ff,
+        0x00ff0000ffffff00, 0x00ff0000ffffffff, 0x00ff00ff00000000,
+        0x00ff00ff000000ff, 0x00ff00ff0000ff00, 0x00ff00ff0000ffff,
+        0x00ff00ff00ff0000, 0x00ff00ff00ff00ff, 0x00ff00ff00ffff00,
+        0x00ff00ff00ffffff, 0x00ff00ffff000000, 0x00ff00ffff0000ff,
+        0x00ff00ffff00ff00, 0x00ff00ffff00ffff, 0x00ff00ffffff0000,
+        0x00ff00ffffff00ff, 0x00ff00ffffffff00, 0x00ff00ffffffffff,
+        0x00ffff0000000000, 0x00ffff00000000ff, 0x00ffff000000ff00,
+        0x00ffff000000ffff, 0x00ffff0000ff0000, 0x00ffff0000ff00ff,
+        0x00ffff0000ffff00, 0x00ffff0000ffffff, 0x00ffff00ff000000,
+        0x00ffff00ff0000ff, 0x00ffff00ff00ff00, 0x00ffff00ff00ffff,
+        0x00ffff00ffff0000, 0x00ffff00ffff00ff, 0x00ffff00ffffff00,
+        0x00ffff00ffffffff, 0x00ffffff00000000, 0x00ffffff000000ff,
+        0x00ffffff0000ff00, 0x00ffffff0000ffff, 0x00ffffff00ff0000,
+        0x00ffffff00ff00ff, 0x00ffffff00ffff00, 0x00ffffff00ffffff,
+        0x00ffffffff000000, 0x00ffffffff0000ff, 0x00ffffffff00ff00,
+        0x00ffffffff00ffff, 0x00ffffffffff0000, 0x00ffffffffff00ff,
+        0x00ffffffffffff00, 0x00ffffffffffffff, 0xff00000000000000,
+        0xff000000000000ff, 0xff0000000000ff00, 0xff0000000000ffff,
+        0xff00000000ff0000, 0xff00000000ff00ff, 0xff00000000ffff00,
+        0xff00000000ffffff, 0xff000000ff000000, 0xff000000ff0000ff,
+        0xff000000ff00ff00, 0xff000000ff00ffff, 0xff000000ffff0000,
+        0xff000000ffff00ff, 0xff000000ffffff00, 0xff000000ffffffff,
+        0xff0000ff00000000, 0xff0000ff000000ff, 0xff0000ff0000ff00,
+        0xff0000ff0000ffff, 0xff0000ff00ff0000, 0xff0000ff00ff00ff,
+        0xff0000ff00ffff00, 0xff0000ff00ffffff, 0xff0000ffff000000,
+        0xff0000ffff0000ff, 0xff0000ffff00ff00, 0xff0000ffff00ffff,
+        0xff0000ffffff0000, 0xff0000ffffff00ff, 0xff0000ffffffff00,
+        0xff0000ffffffffff, 0xff00ff0000000000, 0xff00ff00000000ff,
+        0xff00ff000000ff00, 0xff00ff000000ffff, 0xff00ff0000ff0000,
+        0xff00ff0000ff00ff, 0xff00ff0000ffff00, 0xff00ff0000ffffff,
+        0xff00ff00ff000000, 0xff00ff00ff0000ff, 0xff00ff00ff00ff00,
+        0xff00ff00ff00ffff, 0xff00ff00ffff0000, 0xff00ff00ffff00ff,
+        0xff00ff00ffffff00, 0xff00ff00ffffffff, 0xff00ffff00000000,
+        0xff00ffff000000ff, 0xff00ffff0000ff00, 0xff00ffff0000ffff,
+        0xff00ffff00ff0000, 0xff00ffff00ff00ff, 0xff00ffff00ffff00,
+        0xff00ffff00ffffff, 0xff00ffffff000000, 0xff00ffffff0000ff,
+        0xff00ffffff00ff00, 0xff00ffffff00ffff, 0xff00ffffffff0000,
+        0xff00ffffffff00ff, 0xff00ffffffffff00, 0xff00ffffffffffff,
+        0xffff000000000000, 0xffff0000000000ff, 0xffff00000000ff00,
+        0xffff00000000ffff, 0xffff000000ff0000, 0xffff000000ff00ff,
+        0xffff000000ffff00, 0xffff000000ffffff, 0xffff0000ff000000,
+        0xffff0000ff0000ff, 0xffff0000ff00ff00, 0xffff0000ff00ffff,
+        0xffff0000ffff0000, 0xffff0000ffff00ff, 0xffff0000ffffff00,
+        0xffff0000ffffffff, 0xffff00ff00000000, 0xffff00ff000000ff,
+        0xffff00ff0000ff00, 0xffff00ff0000ffff, 0xffff00ff00ff0000,
+        0xffff00ff00ff00ff, 0xffff00ff00ffff00, 0xffff00ff00ffffff,
+        0xffff00ffff000000, 0xffff00ffff0000ff, 0xffff00ffff00ff00,
+        0xffff00ffff00ffff, 0xffff00ffffff0000, 0xffff00ffffff00ff,
+        0xffff00ffffffff00, 0xffff00ffffffffff, 0xffffff0000000000,
+        0xffffff00000000ff, 0xffffff000000ff00, 0xffffff000000ffff,
+        0xffffff0000ff0000, 0xffffff0000ff00ff, 0xffffff0000ffff00,
+        0xffffff0000ffffff, 0xffffff00ff000000, 0xffffff00ff0000ff,
+        0xffffff00ff00ff00, 0xffffff00ff00ffff, 0xffffff00ffff0000,
+        0xffffff00ffff00ff, 0xffffff00ffffff00, 0xffffff00ffffffff,
+        0xffffffff00000000, 0xffffffff000000ff, 0xffffffff0000ff00,
+        0xffffffff0000ffff, 0xffffffff00ff0000, 0xffffffff00ff00ff,
+        0xffffffff00ffff00, 0xffffffff00ffffff, 0xffffffffff000000,
+        0xffffffffff0000ff, 0xffffffffff00ff00, 0xffffffffff00ffff,
+        0xffffffffffff0000, 0xffffffffffff00ff, 0xffffffffffffff00,
+        0xffffffffffffffff,
+    };
+    return word[byte];
+}
+
+/* Similarly for half-word elements.
+ *  for (i = 0; i < 256; ++i) {
+ *      unsigned long m = 0;
+ *      if (i & 0xaa) {
+ *          continue;
+ *      }
+ *      for (j = 0; j < 8; j += 2) {
+ *          if ((i >> j) & 1) {
+ *              m |= 0xfffful << (j << 3);
+ *          }
+ *      }
+ *      printf("[0x%x] = 0x%016lx,\n", i, m);
+ *  }
+ */
+static inline uint64_t expand_pred_h(uint8_t byte)
+{
+    static const uint64_t word[] = {
+        [0x01] = 0x000000000000ffff, [0x04] = 0x00000000ffff0000,
+        [0x05] = 0x00000000ffffffff, [0x10] = 0x0000ffff00000000,
+        [0x11] = 0x0000ffff0000ffff, [0x14] = 0x0000ffffffff0000,
+        [0x15] = 0x0000ffffffffffff, [0x40] = 0xffff000000000000,
+        [0x41] = 0xffff00000000ffff, [0x44] = 0xffff0000ffff0000,
+        [0x45] = 0xffff0000ffffffff, [0x50] = 0xffffffff00000000,
+        [0x51] = 0xffffffff0000ffff, [0x54] = 0xffffffffffff0000,
+        [0x55] = 0xffffffffffffffff,
+    };
+    return word[byte & 0x55];
+}
+
+/* Similarly for single word elements.  */
+static inline uint64_t expand_pred_s(uint8_t byte)
+{
+    static const uint64_t word[] = {
+        [0x01] = 0x00000000ffffffffull,
+        [0x10] = 0xffffffff00000000ull,
+        [0x11] = 0xffffffffffffffffull,
+    };
+    return word[byte & 0x11];
+}
+
 #define LOGICAL_PPPP(NAME, FUNC) \
 void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc)  \
 {                                                                         \
@@ -483,3 +627,124 @@ uint32_t HELPER(sve_pnext)(void *vd, void *vg, uint32_t pred_desc)
 
     return flags;
 }
+
+/* Store zero into every active element of Zd.  We will use this for two
+ * and three-operand predicated instructions for which logic dictates a
+ * zero result.  In particular, logical shift by element size, which is
+ * otherwise undefined on the host.
+ *
+ * For element sizes smaller than uint64_t, we use tables to expand
+ * the N bits of the controlling predicate to a byte mask, and clear
+ * those bytes.
+ */
+void HELPER(sve_clr_b)(void *vd, void *vg, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd;
+    uint8_t *pg = vg;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] &= ~expand_pred_b(pg[H1(i)]);
+    }
+}
+
+void HELPER(sve_clr_h)(void *vd, void *vg, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd;
+    uint8_t *pg = vg;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] &= ~expand_pred_h(pg[H1(i)]);
+    }
+}
+
+void HELPER(sve_clr_s)(void *vd, void *vg, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd;
+    uint8_t *pg = vg;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] &= ~expand_pred_s(pg[H1(i)]);
+    }
+}
+
+void HELPER(sve_clr_d)(void *vd, void *vg, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd;
+    uint8_t *pg = vg;
+    for (i = 0; i < opr_sz; i += 1) {
+        if (pg[H1(i)] & 1) {
+            d[i] = 0;
+        }
+    }
+}
+
+/* Three-operand expander, immediate operand, controlled by a predicate.
+ */
+#define DO_ZPZI(NAME, TYPE, H, OP)                              \
+void HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc)  \
+{                                                               \
+    intptr_t i, opr_sz = simd_oprsz(desc);                      \
+    TYPE imm = simd_data(desc);                                 \
+    for (i = 0; i < opr_sz; ) {                                 \
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));         \
+        do {                                                    \
+            if (pg & 1) {                                       \
+                TYPE nn = *(TYPE *)(vn + H(i));                 \
+                *(TYPE *)(vd + H(i)) = OP(nn, imm);             \
+            }                                                   \
+            i += sizeof(TYPE), pg >>= sizeof(TYPE);             \
+        } while (i & 15);                                       \
+    }                                                           \
+}
+
+/* Similarly, specialized for 64-bit operands.  */
+#define DO_ZPZI_D(NAME, TYPE, OP)                               \
+void HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc)  \
+{                                                               \
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;                  \
+    TYPE *d = vd, *n = vn;                                      \
+    TYPE imm = simd_data(desc);                                 \
+    uint8_t *pg = vg;                                           \
+    for (i = 0; i < opr_sz; i += 1) {                           \
+        if (pg[H1(i)] & 1) {                                    \
+            TYPE nn = n[i];                                     \
+            d[i] = OP(nn, imm);                                 \
+        }                                                       \
+    }                                                           \
+}
+
+#define DO_SHR(N, M)  (N >> M)
+#define DO_SHL(N, M)  (N << M)
+
+/* Arithmetic shift right for division.  This rounds negative numbers
+   toward zero as per signed division.  Therefore before shifting,
+   when N is negative, add 2**M-1.  */
+#define DO_ASRD(N, M) ((N + (N < 0 ? ((__typeof(N))1 << M) - 1 : 0)) >> M)
+
+DO_ZPZI(sve_asr_zpzi_b, int8_t, H1, DO_SHR)
+DO_ZPZI(sve_asr_zpzi_h, int16_t, H1_2, DO_SHR)
+DO_ZPZI(sve_asr_zpzi_s, int32_t, H1_4, DO_SHR)
+DO_ZPZI_D(sve_asr_zpzi_d, int64_t, DO_SHR)
+
+DO_ZPZI(sve_lsr_zpzi_b, uint8_t, H1, DO_SHR)
+DO_ZPZI(sve_lsr_zpzi_h, uint16_t, H1_2, DO_SHR)
+DO_ZPZI(sve_lsr_zpzi_s, uint32_t, H1_4, DO_SHR)
+DO_ZPZI_D(sve_lsr_zpzi_d, uint64_t, DO_SHR)
+
+DO_ZPZI(sve_lsl_zpzi_b, uint8_t, H1, DO_SHL)
+DO_ZPZI(sve_lsl_zpzi_h, uint16_t, H1_2, DO_SHL)
+DO_ZPZI(sve_lsl_zpzi_s, uint32_t, H1_4, DO_SHL)
+DO_ZPZI_D(sve_lsl_zpzi_d, uint64_t, DO_SHL)
+
+DO_ZPZI(sve_asrd_b, int8_t, H1, DO_ASRD)
+DO_ZPZI(sve_asrd_h, int16_t, H1_2, DO_ASRD)
+DO_ZPZI(sve_asrd_s, int32_t, H1_4, DO_ASRD)
+DO_ZPZI_D(sve_asrd_d, int64_t, DO_ASRD)
+
+#undef DO_SHR
+#undef DO_SHL
+#undef DO_ASRD
+
+#undef DO_ZPZI
+#undef DO_ZPZI_D
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 49251a53c1..4218300960 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -37,6 +37,30 @@ typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
 typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
                         uint32_t, uint32_t, uint32_t);
 
+/*
+ * Helpers for extracting complex instruction fields.
+ */
+
+/* See e.g. ASL (immediate, predicated).
+ * Returns -1 for unallocated encoding; diagnose later.
+ */
+static int tszimm_esz(int x)
+{
+    x >>= 3;  /* discard imm3 */
+    return 31 - clz32(x);
+}
+
+static int tszimm_shr(int x)
+{
+    return (16 << tszimm_esz(x)) - x;
+}
+
+/* See e.g. LSL (immediate, predicated).  */
+static int tszimm_shl(int x)
+{
+    return x - (8 << tszimm_esz(x));
+}
+
 /*
  * Include the generated decoder.
  */
@@ -341,6 +365,110 @@ static void trans_SADDV(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
 
 #undef DO_VPZ
 
+/*
+ *** SVE Shift by Immediate - Predicated Group
+ */
+
+/* Store zero into every active element of Zd.  We will use this for two
+ * and three-operand predicated instructions for which logic dictates a
+ * zero result.
+ */
+static void do_clr_zp(DisasContext *s, int rd, int pg, int esz)
+{
+    static gen_helper_gvec_2 * const fns[4] = {
+        gen_helper_sve_clr_b, gen_helper_sve_clr_h,
+        gen_helper_sve_clr_s, gen_helper_sve_clr_d,
+    };
+    unsigned vsz = vec_full_reg_size(s);
+    tcg_gen_gvec_2_ool(vec_full_reg_offset(s, rd),
+                       pred_full_reg_offset(s, pg),
+                       vsz, vsz, 0, fns[esz]);
+}
+
+static void do_zpzi_ool(DisasContext *s, arg_rpri_esz *a,
+                        gen_helper_gvec_3 *fn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       pred_full_reg_offset(s, a->pg),
+                       vsz, vsz, a->imm, fn);
+}
+
+static void trans_ASR_zpzi(DisasContext *s, arg_rpri_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        gen_helper_sve_asr_zpzi_b, gen_helper_sve_asr_zpzi_h,
+        gen_helper_sve_asr_zpzi_s, gen_helper_sve_asr_zpzi_d,
+    };
+    if (a->esz < 0) {
+        /* Invalid tsz encoding -- see tszimm_esz. */
+        unallocated_encoding(s);
+        return;
+    }
+    /* Shift by element size is architecturally valid.  For
+       arithmetic right-shift, it's the same as by one less. */
+    a->imm = MIN(a->imm, (8 << a->esz) - 1);
+    do_zpzi_ool(s, a, fns[a->esz]);
+}
+
+static void trans_LSR_zpzi(DisasContext *s, arg_rpri_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        gen_helper_sve_lsr_zpzi_b, gen_helper_sve_lsr_zpzi_h,
+        gen_helper_sve_lsr_zpzi_s, gen_helper_sve_lsr_zpzi_d,
+    };
+    if (a->esz < 0) {
+        unallocated_encoding(s);
+        return;
+    }
+    /* Shift by element size is architecturally valid.
+       For logical shifts, it is a zeroing operation.  */
+    if (a->imm >= (8 << a->esz)) {
+        do_clr_zp(s, a->rd, a->pg, a->esz);
+    } else {
+        do_zpzi_ool(s, a, fns[a->esz]);
+    }
+}
+
+static void trans_LSL_zpzi(DisasContext *s, arg_rpri_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        gen_helper_sve_lsl_zpzi_b, gen_helper_sve_lsl_zpzi_h,
+        gen_helper_sve_lsl_zpzi_s, gen_helper_sve_lsl_zpzi_d,
+    };
+    if (a->esz < 0) {
+        unallocated_encoding(s);
+        return;
+    }
+    /* Shift by element size is architecturally valid.
+       For logical shifts, it is a zeroing operation.  */
+    if (a->imm >= (8 << a->esz)) {
+        do_clr_zp(s, a->rd, a->pg, a->esz);
+    } else {
+        do_zpzi_ool(s, a, fns[a->esz]);
+    }
+}
+
+static void trans_ASRD(DisasContext *s, arg_rpri_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        gen_helper_sve_asrd_b, gen_helper_sve_asrd_h,
+        gen_helper_sve_asrd_s, gen_helper_sve_asrd_d,
+    };
+    if (a->esz < 0) {
+        unallocated_encoding(s);
+        return;
+    }
+    /* Shift by element size is architecturally valid.  For arithmetic
+       right shift for division, it is a zeroing operation.  */
+    if (a->imm >= (8 << a->esz)) {
+        do_clr_zp(s, a->rd, a->pg, a->esz);
+    } else {
+        do_zpzi_ool(s, a, fns[a->esz]);
+    }
+}
+
 /*
  *** SVE Predicate Logical Operations Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index b390d8f398..c265ff9899 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -22,12 +22,20 @@
 ###########################################################################
 # Named fields.  These are primarily for disjoint fields.
 
+%imm6_22_5	22:1 5:5
 %imm9_16_10	16:s6 10:3
 %preg4_5	5:4
 
+# A combination of tsz:imm3 -- extract esize.
+%tszimm_esz	22:2 5:5 !function=tszimm_esz
+# A combination of tsz:imm3 -- extract (2 * esize) - (tsz:imm3)
+%tszimm_shr	22:2 5:5 !function=tszimm_shr
+# A combination of tsz:imm3 -- extract (tsz:imm3) - esize
+%tszimm_shl	22:2 5:5 !function=tszimm_shl
+
 # Either a copy of rd (at bit 0), or a different source
 # as propagated via the MOVPRFX instruction.
-%reg_movprfx		0:5
+%reg_movprfx	0:5
 
 ###########################################################################
 # Named attribute sets.  These are used to make nice(er) names
@@ -40,7 +48,7 @@
 &rpr_esz	rd pg rn esz
 &rprr_s		rd pg rn rm s
 &rprr_esz	rd pg rn rm esz
-
+&rpri_esz	rd pg rn imm esz
 &ptrue		rd esz pat s
 
 ###########################################################################
@@ -68,6 +76,11 @@
 # One register operand, with governing predicate, vector element size
 @rd_pg_rn	........ esz:2 ... ... ... pg:3 rn:5 rd:5	&rpr_esz
 
+# Two register operand, one immediate operand, with predicate,
+# element size encoded as TSZHL.  User must fill in imm.
+@rdn_pg_tszimm	........ .. ... ... ... pg:3 ..... rd:5 \
+		&rpri_esz rn=%reg_movprfx esz=%tszimm_esz
+
 # Basic Load/Store with 9-bit immediate offset
 @pd_rn_i9	........ ........ ...... rn:5 . rd:4	\
 		&rri imm=%imm9_16_10
@@ -126,6 +139,18 @@ UMAXV		00000100 .. 001 001 001 ... ..... .....		@rd_pg_rn
 SMINV		00000100 .. 001 010 001 ... ..... .....		@rd_pg_rn
 UMINV		00000100 .. 001 011 001 ... ..... .....		@rd_pg_rn
 
+### SVE Shift by Immediate - Predicated Group
+
+# SVE bitwise shift by immediate (predicated)
+ASR_zpzi	00000100 .. 000 000 100 ... .. ... ..... \
+		@rdn_pg_tszimm imm=%tszimm_shr
+LSR_zpzi	00000100 .. 000 001 100 ... .. ... ..... \
+		@rdn_pg_tszimm imm=%tszimm_shr
+LSL_zpzi	00000100 .. 000 011 100 ... .. ... ..... \
+		@rdn_pg_tszimm imm=%tszimm_shl
+ASRD		00000100 .. 000 100 100 ... .. ... ..... \
+		@rdn_pg_tszimm imm=%tszimm_shr
+
 ### SVE Logical - Unpredicated Group
 
 # SVE bitwise logical operations (unpredicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 12/67] target/arm: Implement SVE bitwise shift by vector (predicated)
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (10 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 11/67] target/arm: Implement SVE bitwise shift by immediate (predicated) Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 12:50   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 13/67] target/arm: Implement SVE bitwise shift by wide elements (predicated) Richard Henderson
                   ` (56 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 27 +++++++++++++++++++++++++++
 target/arm/sve_helper.c    | 25 +++++++++++++++++++++++++
 target/arm/translate-sve.c |  4 ++++
 target/arm/sve.decode      |  8 ++++++++
 4 files changed, 64 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index b3c89579af..0cc02ee59e 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -168,6 +168,33 @@ DEF_HELPER_FLAGS_5(sve_udiv_zpzz_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(sve_udiv_zpzz_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(sve_asr_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_asr_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_asr_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_asr_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_lsr_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_lsr_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_lsr_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_lsr_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_lsl_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_lsl_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_lsl_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_lsl_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_3(sve_orv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_orv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_orv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index b1a170fd70..6ea806d12b 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -439,6 +439,28 @@ DO_ZPZZ_D(sve_sdiv_zpzz_d, int64_t, DO_DIV)
 DO_ZPZZ(sve_udiv_zpzz_s, uint32_t, H1_4, DO_DIV)
 DO_ZPZZ_D(sve_udiv_zpzz_d, uint64_t, DO_DIV)
 
+/* Note that all bits of the shift are significant
+   and not modulo the element size.  */
+#define DO_ASR(N, M)  (N >> MIN(M, sizeof(N) * 8 - 1))
+#define DO_LSR(N, M)  (M < sizeof(N) * 8 ? N >> M : 0)
+#define DO_LSL(N, M)  (M < sizeof(N) * 8 ? N << M : 0)
+
+DO_ZPZZ(sve_asr_zpzz_b, int8_t, H1, DO_ASR)
+DO_ZPZZ(sve_lsr_zpzz_b, uint8_t, H1_2, DO_LSR)
+DO_ZPZZ(sve_lsl_zpzz_b, uint8_t, H1_4, DO_LSL)
+
+DO_ZPZZ(sve_asr_zpzz_h, int16_t, H1, DO_ASR)
+DO_ZPZZ(sve_lsr_zpzz_h, uint16_t, H1_2, DO_LSR)
+DO_ZPZZ(sve_lsl_zpzz_h, uint16_t, H1_4, DO_LSL)
+
+DO_ZPZZ(sve_asr_zpzz_s, int32_t, H1, DO_ASR)
+DO_ZPZZ(sve_lsr_zpzz_s, uint32_t, H1_2, DO_LSR)
+DO_ZPZZ(sve_lsl_zpzz_s, uint32_t, H1_4, DO_LSL)
+
+DO_ZPZZ_D(sve_asr_zpzz_d, int64_t, DO_ASR)
+DO_ZPZZ_D(sve_lsr_zpzz_d, uint64_t, DO_LSR)
+DO_ZPZZ_D(sve_lsl_zpzz_d, uint64_t, DO_LSL)
+
 #undef DO_ZPZZ
 #undef DO_ZPZZ_D
 
@@ -543,6 +565,9 @@ DO_VPZ_D(sve_uminv_d, uint64_t, uint64_t, -1, DO_MIN)
 #undef DO_ABD
 #undef DO_MUL
 #undef DO_DIV
+#undef DO_ASR
+#undef DO_LSR
+#undef DO_LSL
 
 /* Similar to the ARM LastActiveElement pseudocode function, except the
    result is multiplied by the element size.  This includes the not found
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 4218300960..08c56e55a0 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -282,6 +282,10 @@ DO_ZPZZ(MUL, mul)
 DO_ZPZZ(SMULH, smulh)
 DO_ZPZZ(UMULH, umulh)
 
+DO_ZPZZ(ASR, asr)
+DO_ZPZZ(LSR, lsr)
+DO_ZPZZ(LSL, lsl)
+
 void trans_SDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
 {
     static gen_helper_gvec_4 * const fns[4] = {
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index c265ff9899..7ddff8e6bb 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -151,6 +151,14 @@ LSL_zpzi	00000100 .. 000 011 100 ... .. ... ..... \
 ASRD		00000100 .. 000 100 100 ... .. ... ..... \
 		@rdn_pg_tszimm imm=%tszimm_shr
 
+# SVE bitwise shift by vector (predicated)
+ASR_zpzz	00000100 .. 010 000 100 ... ..... .....   @rdn_pg_rm
+LSR_zpzz	00000100 .. 010 001 100 ... ..... .....   @rdn_pg_rm
+LSL_zpzz	00000100 .. 010 011 100 ... ..... .....   @rdn_pg_rm
+ASR_zpzz	00000100 .. 010 100 100 ... ..... .....   @rdm_pg_rn # ASRR
+LSR_zpzz	00000100 .. 010 101 100 ... ..... .....   @rdm_pg_rn # LSRR
+LSL_zpzz	00000100 .. 010 111 100 ... ..... .....   @rdm_pg_rn # LSLR
+
 ### SVE Logical - Unpredicated Group
 
 # SVE bitwise logical operations (unpredicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 13/67] target/arm: Implement SVE bitwise shift by wide elements (predicated)
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (11 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 12/67] target/arm: Implement SVE bitwise shift by vector (predicated) Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 12:57   ` Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 14/67] target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group Richard Henderson
                   ` (55 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 21 +++++++++++++++++++++
 target/arm/sve_helper.c    | 35 +++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 25 +++++++++++++++++++++++++
 target/arm/sve.decode      |  6 ++++++
 4 files changed, 87 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 0cc02ee59e..d516580134 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -195,6 +195,27 @@ DEF_HELPER_FLAGS_5(sve_lsl_zpzz_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(sve_lsl_zpzz_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(sve_asr_zpzw_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_asr_zpzw_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_asr_zpzw_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_lsr_zpzw_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_lsr_zpzw_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_lsr_zpzw_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_lsl_zpzw_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_lsl_zpzw_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_lsl_zpzw_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_3(sve_orv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_orv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_orv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 6ea806d12b..3054b3cc99 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -464,6 +464,41 @@ DO_ZPZZ_D(sve_lsl_zpzz_d, uint64_t, DO_LSL)
 #undef DO_ZPZZ
 #undef DO_ZPZZ_D
 
+/* Three-operand expander, controlled by a predicate, in which the
+ * third operand is "wide".  That is, for D = N op M, the same 64-bit
+ * value of M is used with all of the narrower values of N.
+ */
+#define DO_ZPZW(NAME, TYPE, TYPEW, H, OP)                               \
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \
+{                                                                       \
+    intptr_t i, opr_sz = simd_oprsz(desc);                              \
+    for (i = 0; i < opr_sz; ) {                                         \
+        uint8_t pg = *(uint8_t *)(vg + H1(i >> 3));                     \
+        TYPEW mm = *(TYPEW *)(vm + i);                                  \
+        do {                                                            \
+            if (pg & 1) {                                               \
+                TYPE nn = *(TYPE *)(vn + H(i));                         \
+                *(TYPE *)(vd + H(i)) = OP(nn, mm);                      \
+            }                                                           \
+            i += sizeof(TYPE), pg >>= sizeof(TYPE);                     \
+        } while (i & 7);                                                \
+    }                                                                   \
+}
+
+DO_ZPZW(sve_asr_zpzw_b, int8_t, uint64_t, H1, DO_ASR)
+DO_ZPZW(sve_lsr_zpzw_b, uint8_t, uint64_t, H1, DO_LSR)
+DO_ZPZW(sve_lsl_zpzw_b, uint8_t, uint64_t, H1, DO_LSL)
+
+DO_ZPZW(sve_asr_zpzw_h, int16_t, uint64_t, H1_2, DO_ASR)
+DO_ZPZW(sve_lsr_zpzw_h, uint16_t, uint64_t, H1_2, DO_LSR)
+DO_ZPZW(sve_lsl_zpzw_h, uint16_t, uint64_t, H1_2, DO_LSL)
+
+DO_ZPZW(sve_asr_zpzw_s, int32_t, uint64_t, H1_4, DO_ASR)
+DO_ZPZW(sve_lsr_zpzw_s, uint32_t, uint64_t, H1_4, DO_LSR)
+DO_ZPZW(sve_lsl_zpzw_s, uint32_t, uint64_t, H1_4, DO_LSL)
+
+#undef DO_ZPZW
+
 /* Two-operand reduction expander, controlled by a predicate.
  * The difference between TYPERED and TYPERET has to do with
  * sign-extension.  E.g. for SMAX, TYPERED must be signed,
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 08c56e55a0..35bcd9229d 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -473,6 +473,31 @@ static void trans_ASRD(DisasContext *s, arg_rpri_esz *a, uint32_t insn)
     }
 }
 
+/*
+ *** SVE Bitwise Shift - Predicated Group
+ */
+
+#define DO_ZPZW(NAME, name) \
+static void trans_##NAME##_zpzw(DisasContext *s, arg_rprr_esz *a,         \
+                                uint32_t insn)                            \
+{                                                                         \
+    static gen_helper_gvec_4 * const fns[3] = {                           \
+        gen_helper_sve_##name##_zpzw_b, gen_helper_sve_##name##_zpzw_h,   \
+        gen_helper_sve_##name##_zpzw_s,                                   \
+    };                                                                    \
+    if (a->esz >= 0 && a->esz < 3) {                                      \
+        do_zpzz_ool(s, a, fns[a->esz]);                                   \
+    } else {                                                              \
+        unallocated_encoding(s);                                          \
+    }                                                                     \
+}
+
+DO_ZPZW(ASR, asr)
+DO_ZPZW(LSR, lsr)
+DO_ZPZW(LSL, lsl)
+
+#undef DO_ZPZW
+
 /*
  *** SVE Predicate Logical Operations Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 7ddff8e6bb..177f338fed 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -159,6 +159,12 @@ ASR_zpzz	00000100 .. 010 100 100 ... ..... .....   @rdm_pg_rn # ASRR
 LSR_zpzz	00000100 .. 010 101 100 ... ..... .....   @rdm_pg_rn # LSRR
 LSL_zpzz	00000100 .. 010 111 100 ... ..... .....   @rdm_pg_rn # LSLR
 
+# SVE bitwise shift by wide elements (predicated)
+# Note these require size != 3.
+ASR_zpzw	00000100 .. 011 000 100 ... ..... .....		@rdn_pg_rm
+LSR_zpzw	00000100 .. 011 001 100 ... ..... .....		@rdn_pg_rm
+LSL_zpzw	00000100 .. 011 011 100 ... ..... .....		@rdn_pg_rm
+
 ### SVE Logical - Unpredicated Group
 
 # SVE bitwise logical operations (unpredicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 14/67] target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (12 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 13/67] target/arm: Implement SVE bitwise shift by wide elements (predicated) Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 13:08   ` Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 15/67] target/arm: Implement SVE Integer Multiply-Add Group Richard Henderson
                   ` (54 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  60 +++++++++++++++++++++
 target/arm/sve_helper.c    | 127 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 111 +++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      |  23 ++++++++
 4 files changed, 321 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index d516580134..11644125d1 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -285,6 +285,66 @@ DEF_HELPER_FLAGS_4(sve_asrd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_asrd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_asrd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(sve_cls_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cls_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cls_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cls_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_clz_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_clz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_clz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_clz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_cnt_zpz_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cnt_zpz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cnt_zpz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cnt_zpz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_cnot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cnot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cnot_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cnot_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_fabs_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_fabs_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_fabs_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_fneg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_fneg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_fneg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_not_zpz_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_not_zpz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_not_zpz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_not_zpz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_sxtb_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_sxtb_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_sxtb_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_uxtb_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_uxtb_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_uxtb_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_sxth_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_sxth_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_uxth_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_uxth_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_sxtw_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_uxtw_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_abs_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_abs_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_abs_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_abs_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_neg_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_neg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_neg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_neg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 3054b3cc99..e11823a727 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -499,6 +499,133 @@ DO_ZPZW(sve_lsl_zpzw_s, uint32_t, uint64_t, H1_4, DO_LSL)
 
 #undef DO_ZPZW
 
+/* Fully general two-operand expander, controlled by a predicate.
+ */
+#define DO_ZPZ(NAME, TYPE, H, OP)                               \
+void HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc)  \
+{                                                               \
+    intptr_t i, opr_sz = simd_oprsz(desc);                      \
+    for (i = 0; i < opr_sz; ) {                                 \
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));         \
+        do {                                                    \
+            if (pg & 1) {                                       \
+                TYPE nn = *(TYPE *)(vn + H(i));                 \
+                *(TYPE *)(vd + H(i)) = OP(nn);                  \
+            }                                                   \
+            i += sizeof(TYPE), pg >>= sizeof(TYPE);             \
+        } while (i & 15);                                       \
+    }                                                           \
+}
+
+/* Similarly, specialized for 64-bit operands.  */
+#define DO_ZPZ_D(NAME, TYPE, OP)                                \
+void HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc)  \
+{                                                               \
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;                  \
+    TYPE *d = vd, *n = vn;                                      \
+    uint8_t *pg = vg;                                           \
+    for (i = 0; i < opr_sz; i += 1) {                           \
+        if (pg[H1(i)] & 1) {                                    \
+            TYPE nn = n[i];                                     \
+            d[i] = OP(nn);                                      \
+        }                                                       \
+    }                                                           \
+}
+
+#define DO_CLS_B(N)   (clrsb32(N) - 24)
+#define DO_CLS_H(N)   (clrsb32(N) - 16)
+
+DO_ZPZ(sve_cls_b, int8_t, H1, DO_CLS_B)
+DO_ZPZ(sve_cls_h, int16_t, H1_2, DO_CLS_H)
+DO_ZPZ(sve_cls_s, int32_t, H1_4, clrsb32)
+DO_ZPZ_D(sve_cls_d, int64_t, clrsb64)
+
+#define DO_CLZ_B(N)   (clz32(N) - 24)
+#define DO_CLZ_H(N)   (clz32(N) - 16)
+
+DO_ZPZ(sve_clz_b, uint8_t, H1, DO_CLZ_B)
+DO_ZPZ(sve_clz_h, uint16_t, H1_2, DO_CLZ_H)
+DO_ZPZ(sve_clz_s, uint32_t, H1_4, clz32)
+DO_ZPZ_D(sve_clz_d, uint64_t, clz64)
+
+DO_ZPZ(sve_cnt_zpz_b, uint8_t, H1, ctpop8)
+DO_ZPZ(sve_cnt_zpz_h, uint16_t, H1_2, ctpop16)
+DO_ZPZ(sve_cnt_zpz_s, uint32_t, H1_4, ctpop32)
+DO_ZPZ_D(sve_cnt_zpz_d, uint64_t, ctpop64)
+
+#define DO_CNOT(N)    (N == 0)
+
+DO_ZPZ(sve_cnot_b, uint8_t, H1, DO_CNOT)
+DO_ZPZ(sve_cnot_h, uint16_t, H1_2, DO_CNOT)
+DO_ZPZ(sve_cnot_s, uint32_t, H1_4, DO_CNOT)
+DO_ZPZ_D(sve_cnot_d, uint64_t, DO_CNOT)
+
+#define DO_FABS(N)    (N & ((__typeof(N))-1 >> 1))
+
+DO_ZPZ(sve_fabs_h, uint16_t, H1_2, DO_FABS)
+DO_ZPZ(sve_fabs_s, uint32_t, H1_4, DO_FABS)
+DO_ZPZ_D(sve_fabs_d, uint64_t, DO_FABS)
+
+#define DO_FNEG(N)    (N ^ ~((__typeof(N))-1 >> 1))
+
+DO_ZPZ(sve_fneg_h, uint16_t, H1_2, DO_FNEG)
+DO_ZPZ(sve_fneg_s, uint32_t, H1_4, DO_FNEG)
+DO_ZPZ_D(sve_fneg_d, uint64_t, DO_FNEG)
+
+#define DO_NOT(N)    (~N)
+
+DO_ZPZ(sve_not_zpz_b, uint8_t, H1, DO_NOT)
+DO_ZPZ(sve_not_zpz_h, uint16_t, H1_2, DO_NOT)
+DO_ZPZ(sve_not_zpz_s, uint32_t, H1_4, DO_NOT)
+DO_ZPZ_D(sve_not_zpz_d, uint64_t, DO_NOT)
+
+#define DO_SXTB(N)    ((int8_t)N)
+#define DO_SXTH(N)    ((int16_t)N)
+#define DO_SXTS(N)    ((int32_t)N)
+#define DO_UXTB(N)    ((uint8_t)N)
+#define DO_UXTH(N)    ((uint16_t)N)
+#define DO_UXTS(N)    ((uint32_t)N)
+
+DO_ZPZ(sve_sxtb_h, uint16_t, H1_2, DO_SXTB)
+DO_ZPZ(sve_sxtb_s, uint32_t, H1_4, DO_SXTB)
+DO_ZPZ(sve_sxth_s, uint32_t, H1_4, DO_SXTH)
+DO_ZPZ_D(sve_sxtb_d, uint64_t, DO_SXTB)
+DO_ZPZ_D(sve_sxth_d, uint64_t, DO_SXTH)
+DO_ZPZ_D(sve_sxtw_d, uint64_t, DO_SXTS)
+
+DO_ZPZ(sve_uxtb_h, uint16_t, H1_2, DO_UXTB)
+DO_ZPZ(sve_uxtb_s, uint32_t, H1_4, DO_UXTB)
+DO_ZPZ(sve_uxth_s, uint32_t, H1_4, DO_UXTH)
+DO_ZPZ_D(sve_uxtb_d, uint64_t, DO_UXTB)
+DO_ZPZ_D(sve_uxth_d, uint64_t, DO_UXTH)
+DO_ZPZ_D(sve_uxtw_d, uint64_t, DO_UXTS)
+
+#define DO_ABS(N)    (N < 0 ? -N : N)
+
+DO_ZPZ(sve_abs_b, int8_t, H1, DO_ABS)
+DO_ZPZ(sve_abs_h, int16_t, H1_2, DO_ABS)
+DO_ZPZ(sve_abs_s, int32_t, H1_4, DO_ABS)
+DO_ZPZ_D(sve_abs_d, int64_t, DO_ABS)
+
+#define DO_NEG(N)    (-N)
+
+DO_ZPZ(sve_neg_b, uint8_t, H1, DO_NEG)
+DO_ZPZ(sve_neg_h, uint16_t, H1_2, DO_NEG)
+DO_ZPZ(sve_neg_s, uint32_t, H1_4, DO_NEG)
+DO_ZPZ_D(sve_neg_d, uint64_t, DO_NEG)
+
+#undef DO_CLS_B
+#undef DO_CLS_H
+#undef DO_CLZ_B
+#undef DO_CLZ_H
+#undef DO_CNOT
+#undef DO_FABS
+#undef DO_FNEG
+#undef DO_ABS
+#undef DO_NEG
+#undef DO_ZPZ
+#undef DO_ZPZ_D
+
 /* Two-operand reduction expander, controlled by a predicate.
  * The difference between TYPERED and TYPERET has to do with
  * sign-extension.  E.g. for SMAX, TYPERED must be signed,
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 35bcd9229d..dce8ba8dc0 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -304,6 +304,117 @@ void trans_UDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
 
 #undef DO_ZPZZ
 
+/*
+ *** SVE Integer Arithmetic - Unary Predicated Group
+ */
+
+static void do_zpz_ool(DisasContext *s, arg_rpr_esz *a, gen_helper_gvec_3 *fn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    if (fn == NULL) {
+        unallocated_encoding(s);
+        return;
+    }
+    tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       pred_full_reg_offset(s, a->pg),
+                       vsz, vsz, 0, fn);
+}
+
+#define DO_ZPZ(NAME, name) \
+static void trans_##NAME(DisasContext *s, arg_rpr_esz *a, uint32_t insn) \
+{                                                                   \
+    static gen_helper_gvec_3 * const fns[4] = {                     \
+        gen_helper_sve_##name##_b, gen_helper_sve_##name##_h,       \
+        gen_helper_sve_##name##_s, gen_helper_sve_##name##_d,       \
+    };                                                              \
+    do_zpz_ool(s, a, fns[a->esz]);                                  \
+}
+
+DO_ZPZ(CLS, cls)
+DO_ZPZ(CLZ, clz)
+DO_ZPZ(CNT_zpz, cnt_zpz)
+DO_ZPZ(CNOT, cnot)
+DO_ZPZ(NOT_zpz, not_zpz)
+DO_ZPZ(ABS, abs)
+DO_ZPZ(NEG, neg)
+
+static void trans_FABS(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        NULL,
+        gen_helper_sve_fabs_h,
+        gen_helper_sve_fabs_s,
+        gen_helper_sve_fabs_d
+    };
+    do_zpz_ool(s, a, fns[a->esz]);
+}
+
+static void trans_FNEG(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        NULL,
+        gen_helper_sve_fneg_h,
+        gen_helper_sve_fneg_s,
+        gen_helper_sve_fneg_d
+    };
+    do_zpz_ool(s, a, fns[a->esz]);
+}
+
+static void trans_SXTB(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        NULL,
+        gen_helper_sve_sxtb_h,
+        gen_helper_sve_sxtb_s,
+        gen_helper_sve_sxtb_d
+    };
+    do_zpz_ool(s, a, fns[a->esz]);
+}
+
+static void trans_UXTB(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        NULL,
+        gen_helper_sve_uxtb_h,
+        gen_helper_sve_uxtb_s,
+        gen_helper_sve_uxtb_d
+    };
+    do_zpz_ool(s, a, fns[a->esz]);
+}
+
+static void trans_SXTH(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        NULL, NULL,
+        gen_helper_sve_sxth_s,
+        gen_helper_sve_sxth_d
+    };
+    do_zpz_ool(s, a, fns[a->esz]);
+}
+
+static void trans_UXTH(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        NULL, NULL,
+        gen_helper_sve_uxth_s,
+        gen_helper_sve_uxth_d
+    };
+    do_zpz_ool(s, a, fns[a->esz]);
+}
+
+static void trans_SXTW(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ool(s, a, a->esz == 3 ? gen_helper_sve_sxtw_d : NULL);
+}
+
+static void trans_UXTW(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ool(s, a, a->esz == 3 ? gen_helper_sve_uxtw_d : NULL);
+}
+
+#undef DO_ZPZ
+
 /*
  *** SVE Integer Reduction Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 177f338fed..b875501475 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -165,6 +165,29 @@ ASR_zpzw	00000100 .. 011 000 100 ... ..... .....		@rdn_pg_rm
 LSR_zpzw	00000100 .. 011 001 100 ... ..... .....		@rdn_pg_rm
 LSL_zpzw	00000100 .. 011 011 100 ... ..... .....		@rdn_pg_rm
 
+### SVE Integer Arithmetic - Unary Predicated Group
+
+# SVE unary bit operations (predicated)
+# Note esz != 0 for FABS and FNEG.
+CLS		00000100 .. 011 000 101 ... ..... .....		@rd_pg_rn
+CLZ		00000100 .. 011 001 101 ... ..... .....		@rd_pg_rn
+CNT_zpz		00000100 .. 011 010 101 ... ..... .....		@rd_pg_rn
+CNOT		00000100 .. 011 011 101 ... ..... .....		@rd_pg_rn
+NOT_zpz		00000100 .. 011 110 101 ... ..... .....		@rd_pg_rn
+FABS		00000100 .. 011 100 101 ... ..... .....		@rd_pg_rn
+FNEG		00000100 .. 011 101 101 ... ..... .....		@rd_pg_rn
+
+# SVE integer unary operations (predicated)
+# Note esz > original size for extensions.
+ABS		00000100 .. 010 110 101 ... ..... .....		@rd_pg_rn
+NEG		00000100 .. 010 111 101 ... ..... .....		@rd_pg_rn
+SXTB		00000100 .. 010 000 101 ... ..... .....		@rd_pg_rn
+UXTB		00000100 .. 010 001 101 ... ..... .....		@rd_pg_rn
+SXTH		00000100 .. 010 010 101 ... ..... .....		@rd_pg_rn
+UXTH		00000100 .. 010 011 101 ... ..... .....		@rd_pg_rn
+SXTW		00000100 .. 010 100 101 ... ..... .....		@rd_pg_rn
+UXTW		00000100 .. 010 101 101 ... ..... .....		@rd_pg_rn
+
 ### SVE Logical - Unpredicated Group
 
 # SVE bitwise logical operations (unpredicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 15/67] target/arm: Implement SVE Integer Multiply-Add Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (13 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 14/67] target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 13:12   ` Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 16/67] target/arm: Implement SVE Integer Arithmetic - Unpredicated Group Richard Henderson
                   ` (53 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 18 ++++++++++++++
 target/arm/sve_helper.c    | 58 +++++++++++++++++++++++++++++++++++++++++++++-
 target/arm/translate-sve.c | 31 +++++++++++++++++++++++++
 target/arm/sve.decode      | 17 ++++++++++++++
 4 files changed, 123 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 11644125d1..b31d497f31 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -345,6 +345,24 @@ DEF_HELPER_FLAGS_4(sve_neg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_neg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_neg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_6(sve_mla_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_mla_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_mla_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_mla_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_mls_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_mls_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_mls_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_mls_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index e11823a727..4b08a38ce8 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -932,6 +932,62 @@ DO_ZPZI_D(sve_asrd_d, int64_t, DO_ASRD)
 #undef DO_SHR
 #undef DO_SHL
 #undef DO_ASRD
-
 #undef DO_ZPZI
 #undef DO_ZPZI_D
+
+/* Fully general four-operand expander, controlled by a predicate.
+ */
+#define DO_ZPZZZ(NAME, TYPE, H, OP)                           \
+void HELPER(NAME)(void *vd, void *va, void *vn, void *vm,     \
+                  void *vg, uint32_t desc)                    \
+{                                                             \
+    intptr_t i, opr_sz = simd_oprsz(desc);                    \
+    for (i = 0; i < opr_sz; ) {                               \
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));       \
+        do {                                                  \
+            if (pg & 1) {                                     \
+                TYPE nn = *(TYPE *)(vn + H(i));               \
+                TYPE mm = *(TYPE *)(vm + H(i));               \
+                TYPE aa = *(TYPE *)(va + H(i));               \
+                *(TYPE *)(vd + H(i)) = OP(aa, nn, mm);        \
+            }                                                 \
+            i += sizeof(TYPE), pg >>= sizeof(TYPE);           \
+        } while (i & 15);                                     \
+    }                                                         \
+}
+
+/* Similarly, specialized for 64-bit operands.  */
+#define DO_ZPZZZ_D(NAME, TYPE, OP)                            \
+void HELPER(NAME)(void *vd, void *va, void *vn, void *vm,     \
+                  void *vg, uint32_t desc)                    \
+{                                                             \
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;                \
+    TYPE *d = vd, *a = va, *n = vn, *m = vm;                  \
+    uint8_t *pg = vg;                                         \
+    for (i = 0; i < opr_sz; i += 1) {                         \
+        if (pg[H1(i)] & 1) {                                  \
+            TYPE aa = a[i], nn = n[i], mm = m[i];             \
+            d[i] = OP(aa, nn, mm);                            \
+        }                                                     \
+    }                                                         \
+}
+
+#define DO_MLA(A, N, M)  (A + N * M)
+#define DO_MLS(A, N, M)  (A - N * M)
+
+DO_ZPZZZ(sve_mla_b, uint8_t, H1, DO_MLA)
+DO_ZPZZZ(sve_mls_b, uint8_t, H1, DO_MLS)
+
+DO_ZPZZZ(sve_mla_h, uint16_t, H1_2, DO_MLA)
+DO_ZPZZZ(sve_mls_h, uint16_t, H1_2, DO_MLS)
+
+DO_ZPZZZ(sve_mla_s, uint32_t, H1_4, DO_MLA)
+DO_ZPZZZ(sve_mls_s, uint32_t, H1_4, DO_MLS)
+
+DO_ZPZZZ_D(sve_mla_d, uint64_t, DO_MLA)
+DO_ZPZZZ_D(sve_mls_d, uint64_t, DO_MLS)
+
+#undef DO_MLA
+#undef DO_MLS
+#undef DO_ZPZZZ
+#undef DO_ZPZZZ_D
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index dce8ba8dc0..b956d87636 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -609,6 +609,37 @@ DO_ZPZW(LSL, lsl)
 
 #undef DO_ZPZW
 
+/*
+ *** SVE Integer Multiply-Add Group
+ */
+
+static void do_zpzzz_ool(DisasContext *s, arg_rprrr_esz *a,
+                         gen_helper_gvec_5 *fn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    tcg_gen_gvec_5_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->ra),
+                       vec_full_reg_offset(s, a->rn),
+                       vec_full_reg_offset(s, a->rm),
+                       pred_full_reg_offset(s, a->pg),
+                       vsz, vsz, 0, fn);
+}
+
+#define DO_ZPZZZ(NAME, name) \
+static void trans_##NAME(DisasContext *s, arg_rprrr_esz *a, uint32_t insn) \
+{                                                                    \
+    static gen_helper_gvec_5 * const fns[4] = {                      \
+        gen_helper_sve_##name##_b, gen_helper_sve_##name##_h,        \
+        gen_helper_sve_##name##_s, gen_helper_sve_##name##_d,        \
+    };                                                               \
+    do_zpzzz_ool(s, a, fns[a->esz]);                                 \
+}
+
+DO_ZPZZZ(MLA, mla)
+DO_ZPZZZ(MLS, mls)
+
+#undef DO_ZPZZZ
+
 /*
  *** SVE Predicate Logical Operations Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index b875501475..68a1823b72 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -48,6 +48,7 @@
 &rpr_esz	rd pg rn esz
 &rprr_s		rd pg rn rm s
 &rprr_esz	rd pg rn rm esz
+&rprrr_esz	rd pg rn rm ra esz
 &rpri_esz	rd pg rn imm esz
 &ptrue		rd esz pat s
 
@@ -73,6 +74,12 @@
 @rdm_pg_rn	........ esz:2 ... ... ... pg:3 rn:5 rd:5 \
 		&rprr_esz rm=%reg_movprfx
 
+# Three register operand, with governing predicate, vector element size
+@rda_pg_rn_rm	........ esz:2 . rm:5  ... pg:3 rn:5 rd:5 \
+		&rprrr_esz ra=%reg_movprfx
+@rdn_pg_ra_rm	........ esz:2 . rm:5  ... pg:3 ra:5 rd:5 \
+		&rprrr_esz rn=%reg_movprfx
+
 # One register operand, with governing predicate, vector element size
 @rd_pg_rn	........ esz:2 ... ... ... pg:3 rn:5 rd:5	&rpr_esz
 
@@ -188,6 +195,16 @@ UXTH		00000100 .. 010 011 101 ... ..... .....		@rd_pg_rn
 SXTW		00000100 .. 010 100 101 ... ..... .....		@rd_pg_rn
 UXTW		00000100 .. 010 101 101 ... ..... .....		@rd_pg_rn
 
+### SVE Integer Multiply-Add Group
+
+# SVE integer multiply-add writing addend (predicated)
+MLA		00000100 .. 0 ..... 010 ... ..... .....   @rda_pg_rn_rm
+MLS		00000100 .. 0 ..... 011 ... ..... .....   @rda_pg_rn_rm
+
+# SVE integer multiply-add writing multiplicand (predicated)
+MLA		00000100 .. 0 ..... 110 ... ..... .....   @rdn_pg_ra_rm # MAD
+MLS		00000100 .. 0 ..... 111 ... ..... .....   @rdn_pg_ra_rm # MSB
+
 ### SVE Logical - Unpredicated Group
 
 # SVE bitwise logical operations (unpredicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 16/67] target/arm: Implement SVE Integer Arithmetic - Unpredicated Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (14 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 15/67] target/arm: Implement SVE Integer Multiply-Add Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 13:16   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 17/67] target/arm: Implement SVE Index Generation Group Richard Henderson
                   ` (52 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 41 ++++++++++++++++++++++++++++++++++++++---
 target/arm/sve.decode      | 13 +++++++++++++
 2 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index b956d87636..8baec6c674 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -235,6 +235,40 @@ static void trans_BIC_zzz(DisasContext *s, arg_BIC_zzz *a, uint32_t insn)
     do_vector3_z(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm);
 }
 
+/*
+ *** SVE Integer Arithmetic - Unpredicated Group
+ */
+
+static void trans_ADD_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_vector3_z(s, tcg_gen_gvec_add, a->esz, a->rd, a->rn, a->rm);
+}
+
+static void trans_SUB_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_vector3_z(s, tcg_gen_gvec_sub, a->esz, a->rd, a->rn, a->rm);
+}
+
+static void trans_SQADD_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_vector3_z(s, tcg_gen_gvec_ssadd, a->esz, a->rd, a->rn, a->rm);
+}
+
+static void trans_SQSUB_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_vector3_z(s, tcg_gen_gvec_sssub, a->esz, a->rd, a->rn, a->rm);
+}
+
+static void trans_UQADD_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_vector3_z(s, tcg_gen_gvec_usadd, a->esz, a->rd, a->rn, a->rm);
+}
+
+static void trans_UQSUB_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_vector3_z(s, tcg_gen_gvec_ussub, a->esz, a->rd, a->rn, a->rm);
+}
+
 /*
  *** SVE Integer Arithmetic - Binary Predicated Group
  */
@@ -254,7 +288,8 @@ static void do_zpzz_ool(DisasContext *s, arg_rprr_esz *a, gen_helper_gvec_4 *fn)
 }
 
 #define DO_ZPZZ(NAME, name) \
-void trans_##NAME##_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn) \
+static void trans_##NAME##_zpzz(DisasContext *s, arg_rprr_esz *a,         \
+                                uint32_t insn)                            \
 {                                                                         \
     static gen_helper_gvec_4 * const fns[4] = {                           \
         gen_helper_sve_##name##_zpzz_b, gen_helper_sve_##name##_zpzz_h,   \
@@ -286,7 +321,7 @@ DO_ZPZZ(ASR, asr)
 DO_ZPZZ(LSR, lsr)
 DO_ZPZZ(LSL, lsl)
 
-void trans_SDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+static void trans_SDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
 {
     static gen_helper_gvec_4 * const fns[4] = {
         NULL, NULL, gen_helper_sve_sdiv_zpzz_s, gen_helper_sve_sdiv_zpzz_d
@@ -294,7 +329,7 @@ void trans_SDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
     do_zpzz_ool(s, a, fns[a->esz]);
 }
 
-void trans_UDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+static void trans_UDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
 {
     static gen_helper_gvec_4 * const fns[4] = {
         NULL, NULL, gen_helper_sve_udiv_zpzz_s, gen_helper_sve_udiv_zpzz_d
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 68a1823b72..b40d7dc9a2 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -68,6 +68,9 @@
 # Three prediate operand, with governing predicate, flag setting
 @pd_pg_pn_pm_s	........ . s:1 .. rm:4 .. pg:4 . rn:4 . rd:4	&rprr_s
 
+# Three operand, vector element size
+@rd_rn_rm	........ esz:2 . rm:5  ... ...  rn:5 rd:5	&rrr_esz
+
 # Two register operand, with governing predicate, vector element size
 @rdn_pg_rm	........ esz:2 ... ... ... pg:3 rm:5 rd:5 \
 		&rprr_esz rn=%reg_movprfx
@@ -205,6 +208,16 @@ MLS		00000100 .. 0 ..... 011 ... ..... .....   @rda_pg_rn_rm
 MLA		00000100 .. 0 ..... 110 ... ..... .....   @rdn_pg_ra_rm # MAD
 MLS		00000100 .. 0 ..... 111 ... ..... .....   @rdn_pg_ra_rm # MSB
 
+### SVE Integer Arithmetic - Unpredicated Group
+
+# SVE integer add/subtract vectors (unpredicated)
+ADD_zzz		00000100 .. 1 ..... 000 000 ..... .....		@rd_rn_rm
+SUB_zzz		00000100 .. 1 ..... 000 001 ..... .....		@rd_rn_rm
+SQADD_zzz	00000100 .. 1 ..... 000 100 ..... .....		@rd_rn_rm
+UQADD_zzz	00000100 .. 1 ..... 000 101 ..... .....		@rd_rn_rm
+SQSUB_zzz	00000100 .. 1 ..... 000 110 ..... .....		@rd_rn_rm
+UQSUB_zzz	00000100 .. 1 ..... 000 111 ..... .....		@rd_rn_rm
+
 ### SVE Logical - Unpredicated Group
 
 # SVE bitwise logical operations (unpredicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 17/67] target/arm: Implement SVE Index Generation Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (15 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 16/67] target/arm: Implement SVE Integer Arithmetic - Unpredicated Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 13:22   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 18/67] target/arm: Implement SVE Stack Allocation Group Richard Henderson
                   ` (51 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  5 ++++
 target/arm/sve_helper.c    | 40 +++++++++++++++++++++++++++
 target/arm/translate-sve.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      | 14 ++++++++++
 4 files changed, 126 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index b31d497f31..2a2dbe98dd 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -363,6 +363,11 @@ DEF_HELPER_FLAGS_6(sve_mls_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(sve_mls_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(sve_index_b, TCG_CALL_NO_RWG, void, ptr, i32, i32, i32)
+DEF_HELPER_FLAGS_4(sve_index_h, TCG_CALL_NO_RWG, void, ptr, i32, i32, i32)
+DEF_HELPER_FLAGS_4(sve_index_s, TCG_CALL_NO_RWG, void, ptr, i32, i32, i32)
+DEF_HELPER_FLAGS_4(sve_index_d, TCG_CALL_NO_RWG, void, ptr, i64, i64, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 4b08a38ce8..950012e70a 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -991,3 +991,43 @@ DO_ZPZZZ_D(sve_mls_d, uint64_t, DO_MLS)
 #undef DO_MLS
 #undef DO_ZPZZZ
 #undef DO_ZPZZZ_D
+
+void HELPER(sve_index_b)(void *vd, uint32_t start,
+                         uint32_t incr, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    uint8_t *d = vd;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[H1(i)] = start + i * incr;
+    }
+}
+
+void HELPER(sve_index_h)(void *vd, uint32_t start,
+                         uint32_t incr, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 2;
+    uint16_t *d = vd;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[H2(i)] = start + i * incr;
+    }
+}
+
+void HELPER(sve_index_s)(void *vd, uint32_t start,
+                         uint32_t incr, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 4;
+    uint32_t *d = vd;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[H4(i)] = start + i * incr;
+    }
+}
+
+void HELPER(sve_index_d)(void *vd, uint64_t start,
+                         uint64_t incr, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] = start + i * incr;
+    }
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 8baec6c674..773f0bfded 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -675,6 +675,73 @@ DO_ZPZZZ(MLS, mls)
 
 #undef DO_ZPZZZ
 
+/*
+ *** SVE Index Generation Group
+ */
+
+static void do_index(DisasContext *s, int esz, int rd,
+                     TCGv_i64 start, TCGv_i64 incr)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_i32 desc = tcg_const_i32(simd_desc(vsz, vsz, 0));
+    TCGv_ptr t_zd = tcg_temp_new_ptr();
+
+    tcg_gen_addi_ptr(t_zd, cpu_env, vec_full_reg_offset(s, rd));
+    if (esz == 3) {
+        gen_helper_sve_index_d(t_zd, start, incr, desc);
+    } else {
+        typedef void index_fn(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);
+        static index_fn * const fns[3] = {
+            gen_helper_sve_index_b,
+            gen_helper_sve_index_h,
+            gen_helper_sve_index_s,
+        };
+        TCGv_i32 s32 = tcg_temp_new_i32();
+        TCGv_i32 i32 = tcg_temp_new_i32();
+
+        tcg_gen_extrl_i64_i32(s32, start);
+        tcg_gen_extrl_i64_i32(i32, incr);
+        fns[esz](t_zd, s32, i32, desc);
+
+        tcg_temp_free_i32(s32);
+        tcg_temp_free_i32(i32);
+    }
+    tcg_temp_free_ptr(t_zd);
+    tcg_temp_free_i32(desc);
+}
+
+static void trans_INDEX_ii(DisasContext *s, arg_INDEX_ii *a, uint32_t insn)
+{
+    TCGv_i64 start = tcg_const_i64(a->imm1);
+    TCGv_i64 incr = tcg_const_i64(a->imm2);
+    do_index(s, a->esz, a->rd, start, incr);
+    tcg_temp_free_i64(start);
+    tcg_temp_free_i64(incr);
+}
+
+static void trans_INDEX_ir(DisasContext *s, arg_INDEX_ir *a, uint32_t insn)
+{
+    TCGv_i64 start = tcg_const_i64(a->imm);
+    TCGv_i64 incr = cpu_reg(s, a->rm);
+    do_index(s, a->esz, a->rd, start, incr);
+    tcg_temp_free_i64(start);
+}
+
+static void trans_INDEX_ri(DisasContext *s, arg_INDEX_ri *a, uint32_t insn)
+{
+    TCGv_i64 start = cpu_reg(s, a->rn);
+    TCGv_i64 incr = tcg_const_i64(a->imm);
+    do_index(s, a->esz, a->rd, start, incr);
+    tcg_temp_free_i64(incr);
+}
+
+static void trans_INDEX_rr(DisasContext *s, arg_INDEX_rr *a, uint32_t insn)
+{
+    TCGv_i64 start = cpu_reg(s, a->rn);
+    TCGv_i64 incr = cpu_reg(s, a->rm);
+    do_index(s, a->esz, a->rd, start, incr);
+}
+
 /*
  *** SVE Predicate Logical Operations Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index b40d7dc9a2..d7b078e92f 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -226,6 +226,20 @@ ORR_zzz		00000100 01 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
 EOR_zzz		00000100 10 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
 BIC_zzz		00000100 11 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
 
+### SVE Index Generation Group
+
+# SVE index generation (immediate start, immediate increment)
+INDEX_ii	00000100 esz:2 1 imm2:s5 010000 imm1:s5 rd:5
+
+# SVE index generation (immediate start, register increment)
+INDEX_ir	00000100 esz:2 1 rm:5 010010 imm:s5 rd:5
+
+# SVE index generation (register start, immediate increment)
+INDEX_ri	00000100 esz:2 1 imm:s5 010001 rn:5 rd:5
+
+# SVE index generation (register start, register increment)
+INDEX_rr	00000100 .. 1 ..... 010011 ..... .....		@rd_rn_rm
+
 ### SVE Predicate Logical Operations Group
 
 # SVE predicate logical operations
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 18/67] target/arm: Implement SVE Stack Allocation Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (16 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 17/67] target/arm: Implement SVE Index Generation Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 13:25   ` Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 19/67] target/arm: Implement SVE Bitwise Shift - Unpredicated Group Richard Henderson
                   ` (50 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 24 ++++++++++++++++++++++++
 target/arm/sve.decode      | 12 ++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 773f0bfded..4a38020c8a 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -742,6 +742,30 @@ static void trans_INDEX_rr(DisasContext *s, arg_INDEX_rr *a, uint32_t insn)
     do_index(s, a->esz, a->rd, start, incr);
 }
 
+/*
+ *** SVE Stack Allocation Group
+ */
+
+static void trans_ADDVL(DisasContext *s, arg_ADDVL *a, uint32_t insn)
+{
+    TCGv_i64 rd = cpu_reg_sp(s, a->rd);
+    TCGv_i64 rn = cpu_reg_sp(s, a->rn);
+    tcg_gen_addi_i64(rd, rn, a->imm * vec_full_reg_size(s));
+}
+
+static void trans_ADDPL(DisasContext *s, arg_ADDPL *a, uint32_t insn)
+{
+    TCGv_i64 rd = cpu_reg_sp(s, a->rd);
+    TCGv_i64 rn = cpu_reg_sp(s, a->rn);
+    tcg_gen_addi_i64(rd, rn, a->imm * pred_full_reg_size(s));
+}
+
+static void trans_RDVL(DisasContext *s, arg_RDVL *a, uint32_t insn)
+{
+    TCGv_i64 reg = cpu_reg(s, a->rd);
+    tcg_gen_movi_i64(reg, a->imm * vec_full_reg_size(s));
+}
+
 /*
  *** SVE Predicate Logical Operations Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index d7b078e92f..0b47869dcd 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -86,6 +86,9 @@
 # One register operand, with governing predicate, vector element size
 @rd_pg_rn	........ esz:2 ... ... ... pg:3 rn:5 rd:5	&rpr_esz
 
+# Two register operands with a 6-bit signed immediate.
+@rd_rn_i6	........ ... rn:5 ..... imm:s6 rd:5		&rri
+
 # Two register operand, one immediate operand, with predicate,
 # element size encoded as TSZHL.  User must fill in imm.
 @rdn_pg_tszimm	........ .. ... ... ... pg:3 ..... rd:5 \
@@ -240,6 +243,15 @@ INDEX_ri	00000100 esz:2 1 imm:s5 010001 rn:5 rd:5
 # SVE index generation (register start, register increment)
 INDEX_rr	00000100 .. 1 ..... 010011 ..... .....		@rd_rn_rm
 
+### SVE Stack Allocation Group
+
+# SVE stack frame adjustment
+ADDVL		00000100 001 ..... 01010 ...... .....		@rd_rn_i6
+ADDPL		00000100 011 ..... 01010 ...... .....		@rd_rn_i6
+
+# SVE stack frame size
+RDVL		00000100 101 11111 01010 imm:s6 rd:5
+
 ### SVE Predicate Logical Operations Group
 
 # SVE predicate logical operations
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 19/67] target/arm: Implement SVE Bitwise Shift - Unpredicated Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (17 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 18/67] target/arm: Implement SVE Stack Allocation Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 13:28   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 20/67] target/arm: Implement SVE Compute Vector Address Group Richard Henderson
                   ` (49 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 12 +++++++
 target/arm/sve_helper.c    | 30 +++++++++++++++++
 target/arm/translate-sve.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      | 26 +++++++++++++++
 4 files changed, 149 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 2a2dbe98dd..00e3cd48bb 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -368,6 +368,18 @@ DEF_HELPER_FLAGS_4(sve_index_h, TCG_CALL_NO_RWG, void, ptr, i32, i32, i32)
 DEF_HELPER_FLAGS_4(sve_index_s, TCG_CALL_NO_RWG, void, ptr, i32, i32, i32)
 DEF_HELPER_FLAGS_4(sve_index_d, TCG_CALL_NO_RWG, void, ptr, i64, i64, i32)
 
+DEF_HELPER_FLAGS_4(sve_asr_zzw_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_asr_zzw_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_asr_zzw_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_lsr_zzw_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_lsr_zzw_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_lsr_zzw_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_lsl_zzw_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_lsl_zzw_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_lsl_zzw_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 950012e70a..4c6e2713fa 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -614,6 +614,36 @@ DO_ZPZ(sve_neg_h, uint16_t, H1_2, DO_NEG)
 DO_ZPZ(sve_neg_s, uint32_t, H1_4, DO_NEG)
 DO_ZPZ_D(sve_neg_d, uint64_t, DO_NEG)
 
+/* Three-operand expander, unpredicated, in which the third operand is "wide".
+ */
+#define DO_ZZW(NAME, TYPE, TYPEW, H, OP)                       \
+void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \
+{                                                              \
+    intptr_t i, opr_sz = simd_oprsz(desc);                     \
+    for (i = 0; i < opr_sz; ) {                                \
+        TYPEW mm = *(TYPEW *)(vm + i);                         \
+        do {                                                   \
+            TYPE nn = *(TYPE *)(vn + H(i));                    \
+            *(TYPE *)(vd + H(i)) = OP(nn, mm);                 \
+            i += sizeof(TYPE);                                 \
+        } while (i & 7);                                       \
+    }                                                          \
+}
+
+DO_ZZW(sve_asr_zzw_b, int8_t, uint64_t, H1, DO_ASR)
+DO_ZZW(sve_lsr_zzw_b, uint8_t, uint64_t, H1, DO_LSR)
+DO_ZZW(sve_lsl_zzw_b, uint8_t, uint64_t, H1, DO_LSL)
+
+DO_ZZW(sve_asr_zzw_h, int16_t, uint64_t, H1_2, DO_ASR)
+DO_ZZW(sve_lsr_zzw_h, uint16_t, uint64_t, H1_2, DO_LSR)
+DO_ZZW(sve_lsl_zzw_h, uint16_t, uint64_t, H1_2, DO_LSL)
+
+DO_ZZW(sve_asr_zzw_s, int32_t, uint64_t, H1_4, DO_ASR)
+DO_ZZW(sve_lsr_zzw_s, uint32_t, uint64_t, H1_4, DO_LSR)
+DO_ZZW(sve_lsl_zzw_s, uint32_t, uint64_t, H1_4, DO_LSL)
+
+#undef DO_ZZW
+
 #undef DO_CLS_B
 #undef DO_CLS_H
 #undef DO_CLZ_B
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 4a38020c8a..43e9f1ad08 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -130,6 +130,13 @@ static void do_mov_z(DisasContext *s, int rd, int rn)
     do_vector2_z(s, tcg_gen_gvec_mov, 0, rd, rn);
 }
 
+/* Initialize a Zreg with replications of a 64-bit immediate.  */
+static void do_dupi_z(DisasContext *s, int rd, uint64_t word)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    tcg_gen_gvec_dup64i(vec_full_reg_offset(s, rd), vsz, vsz, word);
+}
+
 /* Invoke a vector expander on two Pregs.  */
 static void do_vector2_p(DisasContext *s, GVecGen2Fn *gvec_fn,
                          int esz, int rd, int rn)
@@ -644,6 +651,80 @@ DO_ZPZW(LSL, lsl)
 
 #undef DO_ZPZW
 
+/*
+ *** SVE Bitwise Shift - Unpredicated Group
+ */
+
+static void do_shift_imm(DisasContext *s, arg_rri_esz *a, bool asr,
+                         void (*gvec_fn)(unsigned, uint32_t, uint32_t,
+                                         int64_t, uint32_t, uint32_t))
+{
+    unsigned vsz = vec_full_reg_size(s);
+    if (a->esz < 0) {
+        /* Invalid tsz encoding -- see tszimm_esz. */
+        unallocated_encoding(s);
+        return;
+    }
+    /* Shift by element size is architecturally valid.  For
+       arithmetic right-shift, it's the same as by one less.
+       Otherwise it is a zeroing operation.  */
+    if (a->imm >= 8 << a->esz) {
+        if (asr) {
+            a->imm = (8 << a->esz) - 1;
+        } else {
+            do_dupi_z(s, a->rd, 0);
+            return;
+        }
+    }
+    gvec_fn(a->esz, vec_full_reg_offset(s, a->rd),
+            vec_full_reg_offset(s, a->rn), a->imm, vsz, vsz);
+}
+
+static void trans_ASR_zzi(DisasContext *s, arg_rri_esz *a, uint32_t insn)
+{
+    do_shift_imm(s, a, true, tcg_gen_gvec_sari);
+}
+
+static void trans_LSR_zzi(DisasContext *s, arg_rri_esz *a, uint32_t insn)
+{
+    do_shift_imm(s, a, false, tcg_gen_gvec_shri);
+}
+
+static void trans_LSL_zzi(DisasContext *s, arg_rri_esz *a, uint32_t insn)
+{
+    do_shift_imm(s, a, false, tcg_gen_gvec_shli);
+}
+
+static void do_zzw_ool(DisasContext *s, arg_rrr_esz *a, gen_helper_gvec_3 *fn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    if (fn == NULL) {
+        unallocated_encoding(s);
+        return;
+    }
+    tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       vec_full_reg_offset(s, a->rm),
+                       vsz, vsz, 0, fn);
+}
+
+#define DO_ZZW(NAME, name) \
+static void trans_##NAME##_zzw(DisasContext *s, arg_rrr_esz *a,           \
+                               uint32_t insn)                             \
+{                                                                         \
+    static gen_helper_gvec_3 * const fns[4] = {                           \
+        gen_helper_sve_##name##_zzw_b, gen_helper_sve_##name##_zzw_h,     \
+        gen_helper_sve_##name##_zzw_s, NULL                               \
+    };                                                                    \
+    do_zzw_ool(s, a, fns[a->esz]);                                        \
+}
+
+DO_ZZW(ASR, asr)
+DO_ZZW(LSR, lsr)
+DO_ZZW(LSL, lsl)
+
+#undef DO_ZZW
+
 /*
  *** SVE Integer Multiply-Add Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 0b47869dcd..f71ea1b60d 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -33,6 +33,11 @@
 # A combination of tsz:imm3 -- extract (tsz:imm3) - esize
 %tszimm_shl	22:2 5:5 !function=tszimm_shl
 
+# Similarly for the tszh/tszl pair at 22/16 for zzi
+%tszimm16_esz	22:2 16:5 !function=tszimm_esz
+%tszimm16_shr	22:2 16:5 !function=tszimm_shr
+%tszimm16_shl	22:2 16:5 !function=tszimm_shl
+
 # Either a copy of rd (at bit 0), or a different source
 # as propagated via the MOVPRFX instruction.
 %reg_movprfx	0:5
@@ -44,6 +49,7 @@
 
 &rr_esz		rd rn esz
 &rri		rd rn imm
+&rri_esz	rd rn imm esz
 &rrr_esz	rd rn rm esz
 &rpr_esz	rd pg rn esz
 &rprr_s		rd pg rn rm s
@@ -94,6 +100,10 @@
 @rdn_pg_tszimm	........ .. ... ... ... pg:3 ..... rd:5 \
 		&rpri_esz rn=%reg_movprfx esz=%tszimm_esz
 
+# Similarly without predicate.
+@rd_rn_tszimm	........ .. ... ... ...... rn:5 rd:5 \
+		&rri_esz esz=%tszimm16_esz
+
 # Basic Load/Store with 9-bit immediate offset
 @pd_rn_i9	........ ........ ...... rn:5 . rd:4	\
 		&rri imm=%imm9_16_10
@@ -252,6 +262,22 @@ ADDPL		00000100 011 ..... 01010 ...... .....		@rd_rn_i6
 # SVE stack frame size
 RDVL		00000100 101 11111 01010 imm:s6 rd:5
 
+### SVE Bitwise Shift - Unpredicated Group
+
+# SVE bitwise shift by immediate (unpredicated)
+ASR_zzi		00000100 .. 1 ..... 1001 00 ..... ..... \
+		@rd_rn_tszimm imm=%tszimm16_shr
+LSR_zzi		00000100 .. 1 ..... 1001 01 ..... ..... \
+		@rd_rn_tszimm imm=%tszimm16_shr
+LSL_zzi		00000100 .. 1 ..... 1001 11 ..... ..... \
+		@rd_rn_tszimm imm=%tszimm16_shl
+
+# SVE bitwise shift by wide elements (unpredicated)
+# Note esz != 3
+ASR_zzw		00000100 .. 1 ..... 1000 00 ..... .....		@rd_rn_rm
+LSR_zzw		00000100 .. 1 ..... 1000 01 ..... .....		@rd_rn_rm
+LSL_zzw		00000100 .. 1 ..... 1000 11 ..... .....		@rd_rn_rm
+
 ### SVE Predicate Logical Operations Group
 
 # SVE predicate logical operations
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 20/67] target/arm: Implement SVE Compute Vector Address Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (18 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 19/67] target/arm: Implement SVE Bitwise Shift - Unpredicated Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 13:34   ` Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 21/67] target/arm: Implement SVE floating-point exponential accelerator Richard Henderson
                   ` (48 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  5 +++++
 target/arm/sve_helper.c    | 40 ++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 33 +++++++++++++++++++++++++++++++++
 target/arm/sve.decode      | 12 ++++++++++++
 4 files changed, 90 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 00e3cd48bb..5280d375f9 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -380,6 +380,11 @@ DEF_HELPER_FLAGS_4(sve_lsl_zzw_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_lsl_zzw_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_lsl_zzw_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(sve_adr_p32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_adr_p64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_adr_s32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_adr_u32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 4c6e2713fa..a290a58c02 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -1061,3 +1061,43 @@ void HELPER(sve_index_d)(void *vd, uint64_t start,
         d[i] = start + i * incr;
     }
 }
+
+void HELPER(sve_adr_p32)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 4;
+    uint32_t sh = simd_data(desc);
+    uint32_t *d = vd, *n = vn, *m = vm;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] = n[i] + (m[i] << sh);
+    }
+}
+
+void HELPER(sve_adr_p64)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t sh = simd_data(desc);
+    uint64_t *d = vd, *n = vn, *m = vm;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] = n[i] + (m[i] << sh);
+    }
+}
+
+void HELPER(sve_adr_s32)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t sh = simd_data(desc);
+    uint64_t *d = vd, *n = vn, *m = vm;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] = n[i] + ((uint64_t)(int32_t)m[i] << sh);
+    }
+}
+
+void HELPER(sve_adr_u32)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t sh = simd_data(desc);
+    uint64_t *d = vd, *n = vn, *m = vm;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] = n[i] + ((uint64_t)(uint32_t)m[i] << sh);
+    }
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 43e9f1ad08..34cc8c2773 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -847,6 +847,39 @@ static void trans_RDVL(DisasContext *s, arg_RDVL *a, uint32_t insn)
     tcg_gen_movi_i64(reg, a->imm * vec_full_reg_size(s));
 }
 
+/*
+ *** SVE Compute Vector Address Group
+ */
+
+static void do_adr(DisasContext *s, arg_rrri *a, gen_helper_gvec_3 *fn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       vec_full_reg_offset(s, a->rm),
+                       vsz, vsz, a->imm, fn);
+}
+
+static void trans_ADR_p32(DisasContext *s, arg_rrri *a, uint32_t insn)
+{
+    do_adr(s, a, gen_helper_sve_adr_p32);
+}
+
+static void trans_ADR_p64(DisasContext *s, arg_rrri *a, uint32_t insn)
+{
+    do_adr(s, a, gen_helper_sve_adr_p64);
+}
+
+static void trans_ADR_s32(DisasContext *s, arg_rrri *a, uint32_t insn)
+{
+    do_adr(s, a, gen_helper_sve_adr_s32);
+}
+
+static void trans_ADR_u32(DisasContext *s, arg_rrri *a, uint32_t insn)
+{
+    do_adr(s, a, gen_helper_sve_adr_u32);
+}
+
 /*
  *** SVE Predicate Logical Operations Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index f71ea1b60d..6ec1f94832 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -49,6 +49,7 @@
 
 &rr_esz		rd rn esz
 &rri		rd rn imm
+&rrri		rd rn rm imm
 &rri_esz	rd rn imm esz
 &rrr_esz	rd rn rm esz
 &rpr_esz	rd pg rn esz
@@ -77,6 +78,9 @@
 # Three operand, vector element size
 @rd_rn_rm	........ esz:2 . rm:5  ... ...  rn:5 rd:5	&rrr_esz
 
+# Three operand with "memory" size, aka immediate left shift
+@rd_rn_msz_rm	........ ... rm:5 .... imm:2 rn:5 rd:5		&rrri
+
 # Two register operand, with governing predicate, vector element size
 @rdn_pg_rm	........ esz:2 ... ... ... pg:3 rm:5 rd:5 \
 		&rprr_esz rn=%reg_movprfx
@@ -278,6 +282,14 @@ ASR_zzw		00000100 .. 1 ..... 1000 00 ..... .....		@rd_rn_rm
 LSR_zzw		00000100 .. 1 ..... 1000 01 ..... .....		@rd_rn_rm
 LSL_zzw		00000100 .. 1 ..... 1000 11 ..... .....		@rd_rn_rm
 
+### SVE Compute Vector Address Group
+
+# SVE vector address generation
+ADR_s32		00000100 00 1 ..... 1010 .. ..... .....		@rd_rn_msz_rm
+ADR_u32		00000100 01 1 ..... 1010 .. ..... .....		@rd_rn_msz_rm
+ADR_p32		00000100 10 1 ..... 1010 .. ..... .....		@rd_rn_msz_rm
+ADR_p64		00000100 11 1 ..... 1010 .. ..... .....		@rd_rn_msz_rm
+
 ### SVE Predicate Logical Operations Group
 
 # SVE predicate logical operations
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 21/67] target/arm: Implement SVE floating-point exponential accelerator
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (19 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 20/67] target/arm: Implement SVE Compute Vector Address Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 13:48   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 22/67] target/arm: Implement SVE floating-point trig select coefficient Richard Henderson
                   ` (47 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  4 +++
 target/arm/sve_helper.c    | 81 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 22 +++++++++++++
 target/arm/sve.decode      |  7 ++++
 4 files changed, 114 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 5280d375f9..e2925ff8ec 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -385,6 +385,10 @@ DEF_HELPER_FLAGS_4(sve_adr_p64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_adr_s32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_adr_u32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_3(sve_fexpa_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_fexpa_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_fexpa_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index a290a58c02..4d42653eef 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -1101,3 +1101,84 @@ void HELPER(sve_adr_u32)(void *vd, void *vn, void *vm, uint32_t desc)
         d[i] = n[i] + ((uint64_t)(uint32_t)m[i] << sh);
     }
 }
+
+void HELPER(sve_fexpa_h)(void *vd, void *vn, uint32_t desc)
+{
+    static const uint16_t coeff[] = {
+        0x0000, 0x0016, 0x002d, 0x0045, 0x005d, 0x0075, 0x008e, 0x00a8,
+        0x00c2, 0x00dc, 0x00f8, 0x0114, 0x0130, 0x014d, 0x016b, 0x0189,
+        0x01a8, 0x01c8, 0x01e8, 0x0209, 0x022b, 0x024e, 0x0271, 0x0295,
+        0x02ba, 0x02e0, 0x0306, 0x032e, 0x0356, 0x037f, 0x03a9, 0x03d4,
+    };
+    intptr_t i, opr_sz = simd_oprsz(desc) / 2;
+    uint16_t *d = vd, *n = vn;
+
+    for (i = 0; i < opr_sz; i++) {
+        uint16_t nn = n[i];
+        intptr_t idx = extract32(nn, 0, 5);
+        uint16_t exp = extract32(nn, 5, 5);
+        d[i] = coeff[idx] | (exp << 10);
+    }
+}
+
+void HELPER(sve_fexpa_s)(void *vd, void *vn, uint32_t desc)
+{
+    static const uint32_t coeff[] = {
+        0x000000, 0x0164d2, 0x02cd87, 0x043a29,
+        0x05aac3, 0x071f62, 0x08980f, 0x0a14d5,
+        0x0b95c2, 0x0d1adf, 0x0ea43a, 0x1031dc,
+        0x11c3d3, 0x135a2b, 0x14f4f0, 0x16942d,
+        0x1837f0, 0x19e046, 0x1b8d3a, 0x1d3eda,
+        0x1ef532, 0x20b051, 0x227043, 0x243516,
+        0x25fed7, 0x27cd94, 0x29a15b, 0x2b7a3a,
+        0x2d583f, 0x2f3b79, 0x3123f6, 0x3311c4,
+        0x3504f3, 0x36fd92, 0x38fbaf, 0x3aff5b,
+        0x3d08a4, 0x3f179a, 0x412c4d, 0x4346cd,
+        0x45672a, 0x478d75, 0x49b9be, 0x4bec15,
+        0x4e248c, 0x506334, 0x52a81e, 0x54f35b,
+        0x5744fd, 0x599d16, 0x5bfbb8, 0x5e60f5,
+        0x60ccdf, 0x633f89, 0x65b907, 0x68396a,
+        0x6ac0c7, 0x6d4f30, 0x6fe4ba, 0x728177,
+        0x75257d, 0x77d0df, 0x7a83b3, 0x7d3e0c,
+    };
+    intptr_t i, opr_sz = simd_oprsz(desc) / 4;
+    uint32_t *d = vd, *n = vn;
+
+    for (i = 0; i < opr_sz; i++) {
+        uint32_t nn = n[i];
+        intptr_t idx = extract32(nn, 0, 6);
+        uint32_t exp = extract32(nn, 6, 8);
+        d[i] = coeff[idx] | (exp << 23);
+    }
+}
+
+void HELPER(sve_fexpa_d)(void *vd, void *vn, uint32_t desc)
+{
+    static const uint64_t coeff[] = {
+        0x0000000000000, 0x02C9A3E778061, 0x059B0D3158574, 0x0874518759BC8,
+        0x0B5586CF9890F, 0x0E3EC32D3D1A2, 0x11301D0125B51, 0x1429AAEA92DE0,
+        0x172B83C7D517B, 0x1A35BEB6FCB75, 0x1D4873168B9AA, 0x2063B88628CD6,
+        0x2387A6E756238, 0x26B4565E27CDD, 0x29E9DF51FDEE1, 0x2D285A6E4030B,
+        0x306FE0A31B715, 0x33C08B26416FF, 0x371A7373AA9CB, 0x3A7DB34E59FF7,
+        0x3DEA64C123422, 0x4160A21F72E2A, 0x44E086061892D, 0x486A2B5C13CD0,
+        0x4BFDAD5362A27, 0x4F9B2769D2CA7, 0x5342B569D4F82, 0x56F4736B527DA,
+        0x5AB07DD485429, 0x5E76F15AD2148, 0x6247EB03A5585, 0x6623882552225,
+        0x6A09E667F3BCD, 0x6DFB23C651A2F, 0x71F75E8EC5F74, 0x75FEB564267C9,
+        0x7A11473EB0187, 0x7E2F336CF4E62, 0x82589994CCE13, 0x868D99B4492ED,
+        0x8ACE5422AA0DB, 0x8F1AE99157736, 0x93737B0CDC5E5, 0x97D829FDE4E50,
+        0x9C49182A3F090, 0xA0C667B5DE565, 0xA5503B23E255D, 0xA9E6B5579FDBF,
+        0xAE89F995AD3AD, 0xB33A2B84F15FB, 0xB7F76F2FB5E47, 0xBCC1E904BC1D2,
+        0xC199BDD85529C, 0xC67F12E57D14B, 0xCB720DCEF9069, 0xD072D4A07897C,
+        0xD5818DCFBA487, 0xDA9E603DB3285, 0xDFC97337B9B5F, 0xE502EE78B3FF6,
+        0xEA4AFA2A490DA, 0xEFA1BEE615A27, 0xF50765B6E4540, 0xFA7C1819E90D8,
+    };
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd, *n = vn;
+
+    for (i = 0; i < opr_sz; i++) {
+        uint64_t nn = n[i];
+        intptr_t idx = extract32(nn, 0, 6);
+        uint64_t exp = extract32(nn, 6, 11);
+        d[i] = coeff[idx] | (exp << 52);
+    }
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 34cc8c2773..2f23f1b192 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -880,6 +880,28 @@ static void trans_ADR_u32(DisasContext *s, arg_rrri *a, uint32_t insn)
     do_adr(s, a, gen_helper_sve_adr_u32);
 }
 
+/*
+ *** SVE Integer Misc - Unpredicated Group
+ */
+
+static void trans_FEXPA(DisasContext *s, arg_rr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_2 * const fns[4] = {
+        NULL,
+        gen_helper_sve_fexpa_h,
+        gen_helper_sve_fexpa_s,
+        gen_helper_sve_fexpa_d,
+    };
+    unsigned vsz = vec_full_reg_size(s);
+    if (a->esz == 0) {
+        unallocated_encoding(s);
+        return;
+    }
+    tcg_gen_gvec_2_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       vsz, vsz, 0, fns[a->esz]);
+}
+
 /*
  *** SVE Predicate Logical Operations Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 6ec1f94832..e791fe8031 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -68,6 +68,7 @@
 
 # Two operand
 @pd_pn		........ esz:2 .. .... ....... rn:4 . rd:4	&rr_esz
+@rd_rn		........ esz:2 ...... ...... rn:5 rd:5		&rr_esz
 
 # Three operand with unused vector element size
 @rd_rn_rm_e0	........ ... rm:5 ... ... rn:5 rd:5		&rrr_esz esz=0
@@ -290,6 +291,12 @@ ADR_u32		00000100 01 1 ..... 1010 .. ..... .....		@rd_rn_msz_rm
 ADR_p32		00000100 10 1 ..... 1010 .. ..... .....		@rd_rn_msz_rm
 ADR_p64		00000100 11 1 ..... 1010 .. ..... .....		@rd_rn_msz_rm
 
+### SVE Integer Misc - Unpredicated Group
+
+# SVE floating-point exponential accelerator
+# Note esz != 0
+FEXPA		00000100 .. 1 00000 101110 ..... .....		@rd_rn
+
 ### SVE Predicate Logical Operations Group
 
 # SVE predicate logical operations
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 22/67] target/arm: Implement SVE floating-point trig select coefficient
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (20 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 21/67] target/arm: Implement SVE floating-point exponential accelerator Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 13:54   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 23/67] target/arm: Implement SVE Element Count Group Richard Henderson
                   ` (46 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  4 ++++
 target/arm/sve_helper.c    | 43 +++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 19 +++++++++++++++++++
 target/arm/sve.decode      |  4 ++++
 4 files changed, 70 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index e2925ff8ec..4f1bd5a62f 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -389,6 +389,10 @@ DEF_HELPER_FLAGS_3(sve_fexpa_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_fexpa_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_fexpa_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(sve_ftssel_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_ftssel_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_ftssel_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 4d42653eef..b4f70af23f 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -23,6 +23,7 @@
 #include "exec/cpu_ldst.h"
 #include "exec/helper-proto.h"
 #include "tcg/tcg-gvec-desc.h"
+#include "fpu/softfloat.h"
 
 
 /* Note that vector data is stored in host-endian 64-bit chunks,
@@ -1182,3 +1183,45 @@ void HELPER(sve_fexpa_d)(void *vd, void *vn, uint32_t desc)
         d[i] = coeff[idx] | (exp << 52);
     }
 }
+
+void HELPER(sve_ftssel_h)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 2;
+    uint16_t *d = vd, *n = vn, *m = vm;
+    for (i = 0; i < opr_sz; i += 1) {
+        uint16_t nn = n[i];
+        uint16_t mm = m[i];
+        if (mm & 1) {
+            nn = float16_one;
+        }
+        d[i] = nn ^ (mm & 2) << 14;
+    }
+}
+
+void HELPER(sve_ftssel_s)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 4;
+    uint32_t *d = vd, *n = vn, *m = vm;
+    for (i = 0; i < opr_sz; i += 1) {
+        uint32_t nn = n[i];
+        uint32_t mm = m[i];
+        if (mm & 1) {
+            nn = float32_one;
+        }
+        d[i] = nn ^ (mm & 2) << 30;
+    }
+}
+
+void HELPER(sve_ftssel_d)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd, *n = vn, *m = vm;
+    for (i = 0; i < opr_sz; i += 1) {
+        uint64_t nn = n[i];
+        uint64_t mm = m[i];
+        if (mm & 1) {
+            nn = float64_one;
+        }
+        d[i] = nn ^ (mm & 2) << 62;
+    }
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 2f23f1b192..e32be385fd 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -902,6 +902,25 @@ static void trans_FEXPA(DisasContext *s, arg_rr_esz *a, uint32_t insn)
                        vsz, vsz, 0, fns[a->esz]);
 }
 
+static void trans_FTSSEL(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        NULL,
+        gen_helper_sve_ftssel_h,
+        gen_helper_sve_ftssel_s,
+        gen_helper_sve_ftssel_d,
+    };
+    unsigned vsz = vec_full_reg_size(s);
+    if (a->esz == 0) {
+        unallocated_encoding(s);
+        return;
+    }
+    tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       vec_full_reg_offset(s, a->rm),
+                       vsz, vsz, 0, fns[a->esz]);
+}
+
 /*
  *** SVE Predicate Logical Operations Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index e791fe8031..4ea3f33919 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -297,6 +297,10 @@ ADR_p64		00000100 11 1 ..... 1010 .. ..... .....		@rd_rn_msz_rm
 # Note esz != 0
 FEXPA		00000100 .. 1 00000 101110 ..... .....		@rd_rn
 
+# SVE floating-point trig select coefficient
+# Note esz != 0
+FTSSEL		00000100 .. 1 ..... 101100 ..... .....		@rd_rn_rm
+
 ### SVE Predicate Logical Operations Group
 
 # SVE predicate logical operations
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 23/67] target/arm: Implement SVE Element Count Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (21 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 22/67] target/arm: Implement SVE floating-point trig select coefficient Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 14:06   ` Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 24/67] target/arm: Implement SVE Bitwise Immediate Group Richard Henderson
                   ` (45 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  11 ++
 target/arm/sve_helper.c    | 136 ++++++++++++++++++++++
 target/arm/translate-sve.c | 274 ++++++++++++++++++++++++++++++++++++++++++++-
 target/arm/sve.decode      |  30 ++++-
 4 files changed, 448 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 4f1bd5a62f..2831e1643b 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -393,6 +393,17 @@ DEF_HELPER_FLAGS_4(sve_ftssel_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_ftssel_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_ftssel_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(sve_sqaddi_b, TCG_CALL_NO_RWG, void, ptr, ptr, s32, i32)
+DEF_HELPER_FLAGS_4(sve_sqaddi_h, TCG_CALL_NO_RWG, void, ptr, ptr, s32, i32)
+DEF_HELPER_FLAGS_4(sve_sqaddi_s, TCG_CALL_NO_RWG, void, ptr, ptr, s64, i32)
+DEF_HELPER_FLAGS_4(sve_sqaddi_d, TCG_CALL_NO_RWG, void, ptr, ptr, s64, i32)
+
+DEF_HELPER_FLAGS_4(sve_uqaddi_b, TCG_CALL_NO_RWG, void, ptr, ptr, s32, i32)
+DEF_HELPER_FLAGS_4(sve_uqaddi_h, TCG_CALL_NO_RWG, void, ptr, ptr, s32, i32)
+DEF_HELPER_FLAGS_4(sve_uqaddi_s, TCG_CALL_NO_RWG, void, ptr, ptr, s64, i32)
+DEF_HELPER_FLAGS_4(sve_uqaddi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_uqsubi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index b4f70af23f..cfda16d520 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -1225,3 +1225,139 @@ void HELPER(sve_ftssel_d)(void *vd, void *vn, void *vm, uint32_t desc)
         d[i] = nn ^ (mm & 2) << 62;
     }
 }
+
+/*
+ * Signed saturating addition with scalar operand.
+ */
+
+void HELPER(sve_sqaddi_b)(void *d, void *a, int32_t b, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc);
+
+    for (i = 0; i < oprsz; i += sizeof(int8_t)) {
+        int r = *(int8_t *)(a + i) + b;
+        if (r > INT8_MAX) {
+            r = INT8_MAX;
+        } else if (r < INT8_MIN) {
+            r = INT8_MIN;
+        }
+        *(int8_t *)(d + i) = r;
+    }
+}
+
+void HELPER(sve_sqaddi_h)(void *d, void *a, int32_t b, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc);
+
+    for (i = 0; i < oprsz; i += sizeof(int16_t)) {
+        int r = *(int16_t *)(a + i) + b;
+        if (r > INT16_MAX) {
+            r = INT16_MAX;
+        } else if (r < INT16_MIN) {
+            r = INT16_MIN;
+        }
+        *(int16_t *)(d + i) = r;
+    }
+}
+
+void HELPER(sve_sqaddi_s)(void *d, void *a, int64_t b, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc);
+
+    for (i = 0; i < oprsz; i += sizeof(int32_t)) {
+        int64_t r = *(int32_t *)(a + i) + b;
+        if (r > INT32_MAX) {
+            r = INT32_MAX;
+        } else if (r < INT32_MIN) {
+            r = INT32_MIN;
+        }
+        *(int32_t *)(d + i) = r;
+    }
+}
+
+void HELPER(sve_sqaddi_d)(void *d, void *a, int64_t b, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc);
+
+    for (i = 0; i < oprsz; i += sizeof(int64_t)) {
+        int64_t ai = *(int64_t *)(a + i);
+        int64_t r = ai + b;
+        if (((r ^ ai) & ~(ai ^ b)) < 0) {
+            /* Signed overflow.  */
+            r = (r < 0 ? INT64_MAX : INT64_MIN);
+        }
+        *(int64_t *)(d + i) = r;
+    }
+}
+
+/*
+ * Unsigned saturating addition with scalar operand.
+ */
+
+void HELPER(sve_uqaddi_b)(void *d, void *a, int32_t b, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc);
+
+    for (i = 0; i < oprsz; i += sizeof(uint8_t)) {
+        int r = *(uint8_t *)(a + i) + b;
+        if (r > UINT8_MAX) {
+            r = UINT8_MAX;
+        } else if (r < 0) {
+            r = 0;
+        }
+        *(uint8_t *)(d + i) = r;
+    }
+}
+
+void HELPER(sve_uqaddi_h)(void *d, void *a, int32_t b, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc);
+
+    for (i = 0; i < oprsz; i += sizeof(uint16_t)) {
+        int r = *(uint16_t *)(a + i) + b;
+        if (r > UINT16_MAX) {
+            r = UINT16_MAX;
+        } else if (r < 0) {
+            r = 0;
+        }
+        *(uint16_t *)(d + i) = r;
+    }
+}
+
+void HELPER(sve_uqaddi_s)(void *d, void *a, int64_t b, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc);
+
+    for (i = 0; i < oprsz; i += sizeof(uint32_t)) {
+        int64_t r = *(uint32_t *)(a + i) + b;
+        if (r > UINT32_MAX) {
+            r = UINT32_MAX;
+        } else if (r < 0) {
+            r = 0;
+        }
+        *(uint32_t *)(d + i) = r;
+    }
+}
+
+void HELPER(sve_uqaddi_d)(void *d, void *a, uint64_t b, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc);
+
+    for (i = 0; i < oprsz; i += sizeof(uint64_t)) {
+        uint64_t r = *(uint64_t *)(a + i) + b;
+        if (r < b) {
+            r = UINT64_MAX;
+        }
+        *(uint64_t *)(d + i) = r;
+    }
+}
+
+void HELPER(sve_uqsubi_d)(void *d, void *a, uint64_t b, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc);
+
+    for (i = 0; i < oprsz; i += sizeof(uint64_t)) {
+        uint64_t ai = *(uint64_t *)(a + i);
+        *(uint64_t *)(d + i) = (ai < b ? 0 : ai - b);
+    }
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index e32be385fd..702f20e97b 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -61,6 +61,11 @@ static int tszimm_shl(int x)
     return x - (8 << tszimm_esz(x));
 }
 
+static inline int plus1(int x)
+{
+    return x + 1;
+}
+
 /*
  * Include the generated decoder.
  */
@@ -127,7 +132,9 @@ static void do_vector3_z(DisasContext *s, GVecGen3Fn *gvec_fn,
 /* Invoke a vector move on two Zregs.  */
 static void do_mov_z(DisasContext *s, int rd, int rn)
 {
-    do_vector2_z(s, tcg_gen_gvec_mov, 0, rd, rn);
+    if (rd != rn) {
+        do_vector2_z(s, tcg_gen_gvec_mov, 0, rd, rn);
+    }
 }
 
 /* Initialize a Zreg with replications of a 64-bit immediate.  */
@@ -168,7 +175,9 @@ static void do_vecop4_p(DisasContext *s, const GVecGen4 *gvec_op,
 /* Invoke a vector move on two Pregs.  */
 static void do_mov_p(DisasContext *s, int rd, int rn)
 {
-    do_vector2_p(s, tcg_gen_gvec_mov, 0, rd, rn);
+    if (rd != rn) {
+        do_vector2_p(s, tcg_gen_gvec_mov, 0, rd, rn);
+    }
 }
 
 /* Set the cpu flags as per a return from an SVE helper.  */
@@ -1378,6 +1387,267 @@ static void trans_PNEXT(DisasContext *s, arg_rr_esz *a, uint32_t insn)
     do_pfirst_pnext(s, a, gen_helper_sve_pnext);
 }
 
+/*
+ *** SVE Element Count Group
+ */
+
+/* Perform an inline saturating addition of a 32-bit value within
+ * a 64-bit register.  The second operand is known to be positive,
+ * which halves the comparisions we must perform to bound the result.
+ */
+static void do_sat_addsub_32(TCGv_i64 reg, TCGv_i64 val, bool u, bool d)
+{
+    int64_t ibound;
+    TCGv_i64 bound;
+    TCGCond cond;
+
+    /* Use normal 64-bit arithmetic to detect 32-bit overflow.  */
+    if (u) {
+        tcg_gen_ext32u_i64(reg, reg);
+    } else {
+        tcg_gen_ext32s_i64(reg, reg);
+    }
+    if (d) {
+        tcg_gen_sub_i64(reg, reg, val);
+        ibound = (u ? 0 : INT32_MIN);
+        cond = TCG_COND_LT;
+    } else {
+        tcg_gen_add_i64(reg, reg, val);
+        ibound = (u ? UINT32_MAX : INT32_MAX);
+        cond = TCG_COND_GT;
+    }
+    bound = tcg_const_i64(ibound);
+    tcg_gen_movcond_i64(cond, reg, reg, bound, bound, reg);
+    tcg_temp_free_i64(bound);
+}
+
+/* Similarly with 64-bit values.  */
+static void do_sat_addsub_64(TCGv_i64 reg, TCGv_i64 val, bool u, bool d)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2;
+
+    if (u) {
+        if (d) {
+            tcg_gen_sub_i64(t0, reg, val);
+            tcg_gen_movi_i64(t1, 0);
+            tcg_gen_movcond_i64(TCG_COND_LTU, reg, reg, val, t1, t0);
+        } else {
+            tcg_gen_add_i64(t0, reg, val);
+            tcg_gen_movi_i64(t1, -1);
+            tcg_gen_movcond_i64(TCG_COND_LTU, reg, t0, reg, t1, t0);
+        }
+    } else {
+        if (d) {
+            /* Detect signed overflow for subtraction.  */
+            tcg_gen_xor_i64(t0, reg, val);
+            tcg_gen_sub_i64(t1, reg, val);
+            tcg_gen_xor_i64(reg, reg, t0);
+            tcg_gen_and_i64(t0, t0, reg);
+
+            /* Bound the result.  */
+            tcg_gen_movi_i64(reg, INT64_MIN);
+            t2 = tcg_const_i64(0);
+            tcg_gen_movcond_i64(TCG_COND_LT, reg, t0, t2, reg, t1);
+        } else {
+            /* Detect signed overflow for addition.  */
+            tcg_gen_xor_i64(t0, reg, val);
+            tcg_gen_add_i64(reg, reg, val);
+            tcg_gen_xor_i64(t1, reg, val);
+            tcg_gen_andc_i64(t0, t1, t0);
+
+            /* Bound the result.  */
+            tcg_gen_movi_i64(t1, INT64_MAX);
+            t2 = tcg_const_i64(0);
+            tcg_gen_movcond_i64(TCG_COND_LT, reg, t0, t2, t1, reg);
+        }
+        tcg_temp_free_i64(t2);
+    }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+}
+
+/* Similarly with a vector and a scalar operand.  */
+static void do_sat_addsub_vec(DisasContext *s, int esz, int rd, int rn,
+                              TCGv_i64 val, bool u, bool d)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_ptr dptr, nptr;
+    TCGv_i32 t32, desc;
+    TCGv_i64 t64;
+
+    dptr = tcg_temp_new_ptr();
+    nptr = tcg_temp_new_ptr();
+    tcg_gen_addi_ptr(dptr, cpu_env, vec_full_reg_offset(s, rd));
+    tcg_gen_addi_ptr(nptr, cpu_env, vec_full_reg_offset(s, rn));
+    desc = tcg_const_i32(simd_desc(vsz, vsz, 0));
+
+    switch (esz) {
+    case MO_8:
+        t32 = tcg_temp_new_i32();
+        tcg_gen_extrl_i64_i32(t32, val);
+        if (d) {
+            tcg_gen_neg_i32(t32, t32);
+        }
+        if (u) {
+            gen_helper_sve_uqaddi_b(dptr, nptr, t32, desc);
+        } else {
+            gen_helper_sve_sqaddi_b(dptr, nptr, t32, desc);
+        }
+        tcg_temp_free_i32(t32);
+        break;
+
+    case MO_16:
+        t32 = tcg_temp_new_i32();
+        tcg_gen_extrl_i64_i32(t32, val);
+        if (d) {
+            tcg_gen_neg_i32(t32, t32);
+        }
+        if (u) {
+            gen_helper_sve_uqaddi_h(dptr, nptr, t32, desc);
+        } else {
+            gen_helper_sve_sqaddi_h(dptr, nptr, t32, desc);
+        }
+        tcg_temp_free_i32(t32);
+        break;
+
+    case MO_32:
+        t64 = tcg_temp_new_i64();
+        if (d) {
+            tcg_gen_neg_i64(t64, val);
+        } else {
+            tcg_gen_mov_i64(t64, val);
+        }
+        if (u) {
+            gen_helper_sve_uqaddi_s(dptr, nptr, t64, desc);
+        } else {
+            gen_helper_sve_sqaddi_s(dptr, nptr, t64, desc);
+        }
+        tcg_temp_free_i64(t64);
+        break;
+
+    case MO_64:
+        if (u) {
+            if (d) {
+                gen_helper_sve_uqsubi_d(dptr, nptr, val, desc);
+            } else {
+                gen_helper_sve_uqaddi_d(dptr, nptr, val, desc);
+            }
+        } else if (d) {
+            t64 = tcg_temp_new_i64();
+            tcg_gen_neg_i64(t64, val);
+            gen_helper_sve_sqaddi_d(dptr, nptr, t64, desc);
+            tcg_temp_free_i64(t64);
+        } else {
+            gen_helper_sve_sqaddi_d(dptr, nptr, val, desc);
+        }
+        break;
+
+    default:
+        g_assert_not_reached();
+    }
+
+    tcg_temp_free_ptr(dptr);
+    tcg_temp_free_ptr(nptr);
+    tcg_temp_free_i32(desc);
+}
+
+static void trans_CNT_r(DisasContext *s, arg_CNT_r *a, uint32_t insn)
+{
+    unsigned fullsz = vec_full_reg_size(s);
+    unsigned numelem = decode_pred_count(fullsz, a->pat, a->esz);
+
+    tcg_gen_movi_i64(cpu_reg(s, a->rd), numelem * a->imm);
+}
+
+static void trans_INCDEC_r(DisasContext *s, arg_incdec_cnt *a, uint32_t insn)
+{
+    unsigned fullsz = vec_full_reg_size(s);
+    unsigned numelem = decode_pred_count(fullsz, a->pat, a->esz);
+    int inc = numelem * a->imm * (a->d ? -1 : 1);
+    TCGv_i64 reg = cpu_reg(s, a->rd);
+
+    tcg_gen_addi_i64(reg, reg, inc);
+}
+
+static void trans_SINCDEC_r_32(DisasContext *s, arg_incdec_cnt *a,
+                               uint32_t insn)
+{
+    unsigned fullsz = vec_full_reg_size(s);
+    unsigned numelem = decode_pred_count(fullsz, a->pat, a->esz);
+    int inc = numelem * a->imm;
+    TCGv_i64 reg = cpu_reg(s, a->rd);
+
+    /* Use normal 64-bit arithmetic to detect 32-bit overflow.  */
+    if (inc == 0) {
+        if (a->u) {
+            tcg_gen_ext32u_i64(reg, reg);
+        } else {
+            tcg_gen_ext32s_i64(reg, reg);
+        }
+    } else {
+        TCGv_i64 t = tcg_const_i64(inc);
+        do_sat_addsub_32(reg, t, a->u, a->d);
+        tcg_temp_free_i64(t);
+    }
+}
+
+static void trans_SINCDEC_r_64(DisasContext *s, arg_incdec_cnt *a,
+                               uint32_t insn)
+{
+    unsigned fullsz = vec_full_reg_size(s);
+    unsigned numelem = decode_pred_count(fullsz, a->pat, a->esz);
+    int inc = numelem * a->imm;
+    TCGv_i64 reg = cpu_reg(s, a->rd);
+
+    if (inc != 0) {
+        TCGv_i64 t = tcg_const_i64(inc);
+        do_sat_addsub_64(reg, t, a->u, a->d);
+        tcg_temp_free_i64(t);
+    }
+}
+
+static void trans_INCDEC_v(DisasContext *s, arg_incdec2_cnt *a, uint32_t insn)
+{
+    unsigned fullsz = vec_full_reg_size(s);
+    unsigned numelem = decode_pred_count(fullsz, a->pat, a->esz);
+    int inc = numelem * a->imm;
+
+    if (a->esz == 0) {
+        unallocated_encoding(s);
+        return;
+    }
+    if (inc != 0) {
+        TCGv_i64 t = tcg_const_i64(a->d ? -inc : inc);
+        tcg_gen_gvec_adds(a->esz, vec_full_reg_offset(s, a->rd),
+                          vec_full_reg_offset(s, a->rn), t, fullsz, fullsz);
+        tcg_temp_free_i64(t);
+    } else {
+        do_mov_z(s, a->rd, a->rn);
+    }
+}
+
+static void trans_SINCDEC_v(DisasContext *s, arg_incdec2_cnt *a,
+                            uint32_t insn)
+{
+    unsigned fullsz = vec_full_reg_size(s);
+    unsigned numelem = decode_pred_count(fullsz, a->pat, a->esz);
+    int inc = numelem * a->imm;
+
+    if (a->esz == 0) {
+        unallocated_encoding(s);
+        return;
+    }
+    if (inc != 0) {
+        TCGv_i64 t = tcg_const_i64(inc);
+        do_sat_addsub_vec(s, a->esz, a->rd, a->rn, t, a->u, a->d);
+        tcg_temp_free_i64(t);
+    } else {
+        do_mov_z(s, a->rd, a->rn);
+    }
+}
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 4ea3f33919..5690b5fcb9 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -22,6 +22,7 @@
 ###########################################################################
 # Named fields.  These are primarily for disjoint fields.
 
+%imm4_16_p1             16:4 !function=plus1
 %imm6_22_5	22:1 5:5
 %imm9_16_10	16:s6 10:3
 %preg4_5	5:4
@@ -58,6 +59,8 @@
 &rprrr_esz	rd pg rn rm ra esz
 &rpri_esz	rd pg rn imm esz
 &ptrue		rd esz pat s
+&incdec_cnt	rd pat esz imm d u
+&incdec2_cnt	rd rn pat esz imm d u
 
 ###########################################################################
 # Named instruction formats.  These are generally used to
@@ -115,6 +118,13 @@
 @rd_rn_i9	........ ........ ...... rn:5 rd:5	\
 		&rri imm=%imm9_16_10
 
+# One register, pattern, and uint4+1.
+# User must fill in U and D.
+@incdec_cnt	........ esz:2 .. .... ...... pat:5 rd:5 \
+		&incdec_cnt imm=%imm4_16_p1
+@incdec2_cnt	........ esz:2 .. .... ...... pat:5 rd:5 \
+		&incdec2_cnt imm=%imm4_16_p1 rn=%reg_movprfx
+
 ###########################################################################
 # Instruction patterns.  Grouped according to the SVE encodingindex.xhtml.
 
@@ -301,7 +311,25 @@ FEXPA		00000100 .. 1 00000 101110 ..... .....		@rd_rn
 # Note esz != 0
 FTSSEL		00000100 .. 1 ..... 101100 ..... .....		@rd_rn_rm
 
-### SVE Predicate Logical Operations Group
+### SVE Element Count Group
+
+# SVE element count
+CNT_r		00000100 .. 10 .... 1110 0 0 ..... .....    @incdec_cnt d=0 u=1
+
+# SVE inc/dec register by element count
+INCDEC_r	00000100 .. 11 .... 1110 0 d:1 ..... .....	@incdec_cnt u=1
+
+# SVE saturating inc/dec register by element count
+SINCDEC_r_32	00000100 .. 10 .... 1111 d:1 u:1 ..... .....	@incdec_cnt
+SINCDEC_r_64	00000100 .. 11 .... 1111 d:1 u:1 ..... .....	@incdec_cnt
+
+# SVE inc/dec vector by element count
+# Note this requires esz != 0.
+INCDEC_v	00000100 .. 1 1 .... 1100 0 d:1 ..... .....    @incdec2_cnt u=1
+
+# SVE saturating inc/dec vector by element count
+# Note these require esz != 0.
+SINCDEC_v	00000100 .. 1 0 .... 1100 d:1 u:1 ..... .....	@incdec2_cnt
 
 # SVE predicate logical operations
 AND_pppp	00100101 0. 00 .... 01 .... 0 .... 0 ....	@pd_pg_pn_pm_s
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 24/67] target/arm: Implement SVE Bitwise Immediate Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (22 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 23/67] target/arm: Implement SVE Element Count Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 14:10   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 25/67] target/arm: Implement SVE Integer Wide Immediate - Predicated Group Richard Henderson
                   ` (44 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      | 17 ++++++++++++++++
 2 files changed, 67 insertions(+)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 702f20e97b..21b1e4df85 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -34,6 +34,8 @@
 #include "translate-a64.h"
 
 typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
+typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t,
+                         int64_t, uint32_t, uint32_t);
 typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
                         uint32_t, uint32_t, uint32_t);
 
@@ -1648,6 +1650,54 @@ static void trans_SINCDEC_v(DisasContext *s, arg_incdec2_cnt *a,
     }
 }
 
+/*
+ *** SVE Bitwise Immediate Group
+ */
+
+static void do_zz_dbm(DisasContext *s, arg_rr_dbm *a, GVecGen2iFn *gvec_fn)
+{
+    unsigned vsz;
+    uint64_t imm;
+
+    if (!logic_imm_decode_wmask(&imm, extract32(a->dbm, 12, 1),
+                                extract32(a->dbm, 0, 6),
+                                extract32(a->dbm, 6, 6))) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    vsz = vec_full_reg_size(s);
+    gvec_fn(MO_64, vec_full_reg_offset(s, a->rd),
+            vec_full_reg_offset(s, a->rn), imm, vsz, vsz);
+}
+
+static void trans_AND_zzi(DisasContext *s, arg_rr_dbm *a, uint32_t insn)
+{
+    do_zz_dbm(s, a, tcg_gen_gvec_andi);
+}
+
+static void trans_ORR_zzi(DisasContext *s, arg_rr_dbm *a, uint32_t insn)
+{
+    do_zz_dbm(s, a, tcg_gen_gvec_ori);
+}
+
+static void trans_EOR_zzi(DisasContext *s, arg_rr_dbm *a, uint32_t insn)
+{
+    do_zz_dbm(s, a, tcg_gen_gvec_xori);
+}
+
+static void trans_DUPM(DisasContext *s, arg_DUPM *a, uint32_t insn)
+{
+    uint64_t imm;
+    if (!logic_imm_decode_wmask(&imm, extract32(a->dbm, 12, 1),
+                                extract32(a->dbm, 0, 6),
+                                extract32(a->dbm, 6, 6))) {
+        unallocated_encoding(s);
+        return;
+    }
+    do_dupi_z(s, a->rd, imm);
+}
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 5690b5fcb9..0990d135f4 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -50,6 +50,7 @@
 
 &rr_esz		rd rn esz
 &rri		rd rn imm
+&rr_dbm		rd rn dbm
 &rrri		rd rn rm imm
 &rri_esz	rd rn imm esz
 &rrr_esz	rd rn rm esz
@@ -112,6 +113,10 @@
 @rd_rn_tszimm	........ .. ... ... ...... rn:5 rd:5 \
 		&rri_esz esz=%tszimm16_esz
 
+# Two register operand, one encoded bitmask.
+@rdn_dbm	........ .. .... dbm:13 rd:5 \
+		&rr_dbm rn=%reg_movprfx
+
 # Basic Load/Store with 9-bit immediate offset
 @pd_rn_i9	........ ........ ...... rn:5 . rd:4	\
 		&rri imm=%imm9_16_10
@@ -331,6 +336,18 @@ INCDEC_v	00000100 .. 1 1 .... 1100 0 d:1 ..... .....    @incdec2_cnt u=1
 # Note these require esz != 0.
 SINCDEC_v	00000100 .. 1 0 .... 1100 d:1 u:1 ..... .....	@incdec2_cnt
 
+### SVE Bitwise Immediate Group
+
+# SVE bitwise logical with immediate (unpredicated)
+ORR_zzi		00000101 00 0000 ............. .....		@rdn_dbm
+EOR_zzi		00000101 01 0000 ............. .....		@rdn_dbm
+AND_zzi		00000101 10 0000 ............. .....		@rdn_dbm
+
+# SVE broadcast bitmask immediate
+DUPM		00000101 11 0000 dbm:13 rd:5
+
+### SVE Predicate Logical Operations Group
+
 # SVE predicate logical operations
 AND_pppp	00100101 0. 00 .... 01 .... 0 .... 0 ....	@pd_pg_pn_pm_s
 BIC_pppp	00100101 0. 00 .... 01 .... 0 .... 1 ....	@pd_pg_pn_pm_s
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 25/67] target/arm: Implement SVE Integer Wide Immediate - Predicated Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (23 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 24/67] target/arm: Implement SVE Bitwise Immediate Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 14:18   ` Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 26/67] target/arm: Implement SVE Permute - Extract Group Richard Henderson
                   ` (43 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  10 +++++
 target/arm/sve_helper.c    | 108 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c |  92 ++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      |  17 +++++++
 4 files changed, 227 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 2831e1643b..79493ab647 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -404,6 +404,16 @@ DEF_HELPER_FLAGS_4(sve_uqaddi_s, TCG_CALL_NO_RWG, void, ptr, ptr, s64, i32)
 DEF_HELPER_FLAGS_4(sve_uqaddi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(sve_uqsubi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 
+DEF_HELPER_FLAGS_5(sve_cpy_m_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_5(sve_cpy_m_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_5(sve_cpy_m_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_5(sve_cpy_m_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(sve_cpy_z_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_cpy_z_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_cpy_z_s, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_cpy_z_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index cfda16d520..6a95d1ec48 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -1361,3 +1361,111 @@ void HELPER(sve_uqsubi_d)(void *d, void *a, uint64_t b, uint32_t desc)
         *(uint64_t *)(d + i) = (ai < b ? 0 : ai - b);
     }
 }
+
+/* Two operand predicated copy immediate with merge.  All valid immediates
+ * can fit within 17 signed bits in the simd_data field.
+ */
+void HELPER(sve_cpy_m_b)(void *vd, void *vn, void *vg,
+                         uint64_t mm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd, *n = vn;
+    uint8_t *pg = vg;
+
+    mm = (mm & 0xff) * (-1ull / 0xff);
+    for (i = 0; i < opr_sz; i += 1) {
+        uint64_t nn = n[i];
+        uint64_t pp = expand_pred_b(pg[H1(i)]);
+        d[i] = (mm & pp) | (nn & ~pp);
+    }
+}
+
+void HELPER(sve_cpy_m_h)(void *vd, void *vn, void *vg,
+                         uint64_t mm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd, *n = vn;
+    uint8_t *pg = vg;
+
+    mm = (mm & 0xffff) * (-1ull / 0xffff);
+    for (i = 0; i < opr_sz; i += 1) {
+        uint64_t nn = n[i];
+        uint64_t pp = expand_pred_h(pg[H1(i)]);
+        d[i] = (mm & pp) | (nn & ~pp);
+    }
+}
+
+void HELPER(sve_cpy_m_s)(void *vd, void *vn, void *vg,
+                         uint64_t mm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd, *n = vn;
+    uint8_t *pg = vg;
+
+    mm = deposit64(mm, 32, 32, mm);
+    for (i = 0; i < opr_sz; i += 1) {
+        uint64_t nn = n[i];
+        uint64_t pp = expand_pred_s(pg[H1(i)]);
+        d[i] = (mm & pp) | (nn & ~pp);
+    }
+}
+
+void HELPER(sve_cpy_m_d)(void *vd, void *vn, void *vg,
+                         uint64_t mm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd, *n = vn;
+    uint8_t *pg = vg;
+
+    for (i = 0; i < opr_sz; i += 1) {
+        uint64_t nn = n[i];
+        d[i] = (pg[H1(i)] & 1 ? mm : nn);
+    }
+}
+
+void HELPER(sve_cpy_z_b)(void *vd, void *vg, uint64_t val, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd;
+    uint8_t *pg = vg;
+
+    val = (val & 0xff) * (-1ull / 0xff);
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] = val & expand_pred_b(pg[H1(i)]);
+    }
+}
+
+void HELPER(sve_cpy_z_h)(void *vd, void *vg, uint64_t val, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd;
+    uint8_t *pg = vg;
+
+    val = (val & 0xffff) * (-1ull / 0xffff);
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] = val & expand_pred_h(pg[H1(i)]);
+    }
+}
+
+void HELPER(sve_cpy_z_s)(void *vd, void *vg, uint64_t val, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd;
+    uint8_t *pg = vg;
+
+    val = deposit64(val, 32, 32, val);
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] = val & expand_pred_s(pg[H1(i)]);
+    }
+}
+
+void HELPER(sve_cpy_z_d)(void *vd, void *vg, uint64_t val, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd;
+    uint8_t *pg = vg;
+
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] = (pg[H1(i)] & 1 ? val : 0);
+    }
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 21b1e4df85..dd085b084b 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -68,6 +68,12 @@ static inline int plus1(int x)
     return x + 1;
 }
 
+/* The SH bit is in bit 8.  Extract the low 8 and shift.  */
+static inline int expand_imm_sh8s(int x)
+{
+    return (int8_t)x << (x & 0x100 ? 8 : 0);
+}
+
 /*
  * Include the generated decoder.
  */
@@ -1698,6 +1704,92 @@ static void trans_DUPM(DisasContext *s, arg_DUPM *a, uint32_t insn)
     do_dupi_z(s, a->rd, imm);
 }
 
+/*
+ *** SVE Integer Wide Immediate - Predicated Group
+ */
+
+/* Implement all merging copies.  This is used for CPY (immediate),
+ * FCPY, CPY (scalar), CPY (SIMD&FP scalar).
+ */
+static void do_cpy_m(DisasContext *s, int esz, int rd, int rn, int pg,
+                     TCGv_i64 val)
+{
+    typedef void gen_cpy(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64, TCGv_i32);
+    static gen_cpy * const fns[4] = {
+        gen_helper_sve_cpy_m_b, gen_helper_sve_cpy_m_h,
+        gen_helper_sve_cpy_m_s, gen_helper_sve_cpy_m_d,
+    };
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_i32 desc = tcg_const_i32(simd_desc(vsz, vsz, 0));
+    TCGv_ptr t_zd = tcg_temp_new_ptr();
+    TCGv_ptr t_zn = tcg_temp_new_ptr();
+    TCGv_ptr t_pg = tcg_temp_new_ptr();
+
+    tcg_gen_addi_ptr(t_zd, cpu_env, vec_full_reg_offset(s, rd));
+    tcg_gen_addi_ptr(t_zn, cpu_env, vec_full_reg_offset(s, rn));
+    tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg));
+
+    fns[esz](t_zd, t_zn, t_pg, val, desc);
+
+    tcg_temp_free_ptr(t_zd);
+    tcg_temp_free_ptr(t_zn);
+    tcg_temp_free_ptr(t_pg);
+    tcg_temp_free_i32(desc);
+}
+
+static void trans_FCPY(DisasContext *s, arg_FCPY *a, uint32_t insn)
+{
+    uint64_t imm;
+    TCGv_i64 t_imm;
+
+    if (a->esz == 0) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    /* Decode the VFP immediate.  */
+    imm = vfp_expand_imm(a->esz, a->imm);
+
+    t_imm = tcg_const_i64(imm);
+    do_cpy_m(s, a->esz, a->rd, a->rn, a->pg, t_imm);
+    tcg_temp_free_i64(t_imm);
+}
+
+static void trans_CPY_m_i(DisasContext *s, arg_rpri_esz *a, uint32_t insn)
+{
+    TCGv_i64 t_imm;
+
+    if (a->esz == 0 && extract32(insn, 13, 1)) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    t_imm = tcg_const_i64(a->imm);
+    do_cpy_m(s, a->esz, a->rd, a->rn, a->pg, t_imm);
+    tcg_temp_free_i64(t_imm);
+}
+
+static void trans_CPY_z_i(DisasContext *s, arg_CPY_z_i *a, uint32_t insn)
+{
+    static gen_helper_gvec_2i * const fns[4] = {
+        gen_helper_sve_cpy_z_b, gen_helper_sve_cpy_z_h,
+        gen_helper_sve_cpy_z_s, gen_helper_sve_cpy_z_d,
+    };
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_i64 t_imm;
+
+    if (a->esz == 0 && extract32(insn, 13, 1)) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    t_imm = tcg_const_i64(a->imm);
+    tcg_gen_gvec_2i_ool(vec_full_reg_offset(s, a->rd),
+                        pred_full_reg_offset(s, a->pg),
+                        t_imm, vsz, vsz, 0, fns[a->esz]);
+    tcg_temp_free_i64(t_imm);
+}
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 0990d135f4..e6e10a4f84 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -39,6 +39,9 @@
 %tszimm16_shr	22:2 16:5 !function=tszimm_shr
 %tszimm16_shl	22:2 16:5 !function=tszimm_shl
 
+# Signed 8-bit immediate, optionally shifted left by 8.
+%sh8_i8s		5:9 !function=expand_imm_sh8s
+
 # Either a copy of rd (at bit 0), or a different source
 # as propagated via the MOVPRFX instruction.
 %reg_movprfx	0:5
@@ -113,6 +116,11 @@
 @rd_rn_tszimm	........ .. ... ... ...... rn:5 rd:5 \
 		&rri_esz esz=%tszimm16_esz
 
+# Two register operand, one immediate operand, with 4-bit predicate.
+# User must fill in imm.
+@rdn_pg4	........ esz:2 .. pg:4 ... ........ rd:5 \
+		&rpri_esz rn=%reg_movprfx
+
 # Two register operand, one encoded bitmask.
 @rdn_dbm	........ .. .... dbm:13 rd:5 \
 		&rr_dbm rn=%reg_movprfx
@@ -346,6 +354,15 @@ AND_zzi		00000101 10 0000 ............. .....		@rdn_dbm
 # SVE broadcast bitmask immediate
 DUPM		00000101 11 0000 dbm:13 rd:5
 
+### SVE Integer Wide Immediate - Predicated Group
+
+# SVE copy floating-point immediate (predicated)
+FCPY		00000101 .. 01 .... 110 imm:8 .....		@rdn_pg4
+
+# SVE copy integer immediate (predicated)
+CPY_m_i		00000101 .. 01 .... 01 . ........ .....   @rdn_pg4 imm=%sh8_i8s
+CPY_z_i		00000101 .. 01 .... 00 . ........ .....   @rdn_pg4 imm=%sh8_i8s
+
 ### SVE Predicate Logical Operations Group
 
 # SVE predicate logical operations
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 26/67] target/arm: Implement SVE Permute - Extract Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (24 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 25/67] target/arm: Implement SVE Integer Wide Immediate - Predicated Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 14:24   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 27/67] target/arm: Implement SVE Permute - Unpredicated Group Richard Henderson
                   ` (42 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  2 ++
 target/arm/sve_helper.c    | 81 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 29 +++++++++++++++++
 target/arm/sve.decode      |  9 +++++-
 4 files changed, 120 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 79493ab647..94f4356ce9 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -414,6 +414,8 @@ DEF_HELPER_FLAGS_4(sve_cpy_z_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(sve_cpy_z_s, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(sve_cpy_z_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 
+DEF_HELPER_FLAGS_4(sve_ext, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 6a95d1ec48..fb3f54300b 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -1469,3 +1469,84 @@ void HELPER(sve_cpy_z_d)(void *vd, void *vg, uint64_t val, uint32_t desc)
         d[i] = (pg[H1(i)] & 1 ? val : 0);
     }
 }
+
+/* Big-endian hosts need to frob the byte indicies.  If the copy
+ * happens to be 8-byte aligned, then no frobbing necessary.
+ */
+static void swap_memmove(void *vd, void *vs, size_t n)
+{
+    uintptr_t d = (uintptr_t)vd;
+    uintptr_t s = (uintptr_t)vs;
+    uintptr_t o = (d | s | n) & 7;
+    size_t i;
+
+#ifndef HOST_WORDS_BIGENDIAN
+    o = 0;
+#endif
+    switch (o) {
+    case 0:
+        memmove(vd, vs, n);
+        break;
+
+    case 4:
+        if (d < s || d >= s + n) {
+            for (i = 0; i < n; i += 4) {
+                *(uint32_t *)H1_4(d + i) = *(uint32_t *)H1_4(s + i);
+            }
+        } else {
+            for (i = n; i > 0; ) {
+                i -= 4;
+                *(uint32_t *)H1_4(d + i) = *(uint32_t *)H1_4(s + i);
+            }
+        }
+        break;
+
+    case 2:
+    case 6:
+        if (d < s || d >= s + n) {
+            for (i = 0; i < n; i += 2) {
+                *(uint16_t *)H1_2(d + i) = *(uint16_t *)H1_2(s + i);
+            }
+        } else {
+            for (i = n; i > 0; ) {
+                i -= 2;
+                *(uint16_t *)H1_2(d + i) = *(uint16_t *)H1_2(s + i);
+            }
+        }
+        break;
+
+    default:
+        if (d < s || d >= s + n) {
+            for (i = 0; i < n; i++) {
+                *(uint8_t *)H1(d + i) = *(uint8_t *)H1(s + i);
+            }
+        } else {
+            for (i = n; i > 0; ) {
+                i -= 1;
+                *(uint8_t *)H1(d + i) = *(uint8_t *)H1(s + i);
+            }
+        }
+        break;
+    }
+}
+
+void HELPER(sve_ext)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t opr_sz = simd_oprsz(desc);
+    size_t n_ofs = simd_data(desc);
+    size_t n_siz = opr_sz - n_ofs;
+
+    if (vd != vm) {
+        swap_memmove(vd, vn + n_ofs, n_siz);
+        swap_memmove(vd + n_siz, vm, n_ofs);
+    } else if (vd != vn) {
+        swap_memmove(vd + n_siz, vd, n_ofs);
+        swap_memmove(vd, vn + n_ofs, n_siz);
+    } else {
+        /* vd == vn == vm.  Need temp space.  */
+        ARMVectorReg tmp;
+        swap_memmove(&tmp, vm, n_ofs);
+        swap_memmove(vd, vd + n_ofs, n_siz);
+        memcpy(vd + n_siz, &tmp, n_ofs);
+    }
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index dd085b084b..07a5eac092 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -1790,6 +1790,35 @@ static void trans_CPY_z_i(DisasContext *s, arg_CPY_z_i *a, uint32_t insn)
     tcg_temp_free_i64(t_imm);
 }
 
+/*
+ *** SVE Permute Extract Group
+ */
+
+static void trans_EXT(DisasContext *s, arg_EXT *a, uint32_t insn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    unsigned n_ofs = a->imm >= vsz ? 0 : a->imm;
+    unsigned n_siz = vsz - n_ofs;
+    unsigned d = vec_full_reg_offset(s, a->rd);
+    unsigned n = vec_full_reg_offset(s, a->rn);
+    unsigned m = vec_full_reg_offset(s, a->rm);
+
+    /* Use host vector move insns if we have appropriate sizes
+       and no unfortunate overlap.  */
+    if (m != d
+        && n_ofs == size_for_gvec(n_ofs)
+        && n_siz == size_for_gvec(n_siz)
+        && (d != n || n_siz <= n_ofs)) {
+        tcg_gen_gvec_mov(0, d, n + n_ofs, n_siz, n_siz);
+        if (n_ofs != 0) {
+            tcg_gen_gvec_mov(0, d + n_siz, m, n_ofs, n_ofs);
+        }
+        return;
+    }
+
+    tcg_gen_gvec_3_ool(d, n, m, vsz, vsz, n_ofs, gen_helper_sve_ext);
+}
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index e6e10a4f84..5e3a9839d4 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -22,8 +22,9 @@
 ###########################################################################
 # Named fields.  These are primarily for disjoint fields.
 
-%imm4_16_p1             16:4 !function=plus1
+%imm4_16_p1	16:4 !function=plus1
 %imm6_22_5	22:1 5:5
+%imm8_16_10	16:5 10:3
 %imm9_16_10	16:s6 10:3
 %preg4_5	5:4
 
@@ -363,6 +364,12 @@ FCPY		00000101 .. 01 .... 110 imm:8 .....		@rdn_pg4
 CPY_m_i		00000101 .. 01 .... 01 . ........ .....   @rdn_pg4 imm=%sh8_i8s
 CPY_z_i		00000101 .. 01 .... 00 . ........ .....   @rdn_pg4 imm=%sh8_i8s
 
+### SVE Permute - Extract Group
+
+# SVE extract vector (immediate offset)
+EXT		00000101 001 ..... 000 ... rm:5 rd:5 \
+		&rrri rn=%reg_movprfx imm=%imm8_16_10
+
 ### SVE Predicate Logical Operations Group
 
 # SVE predicate logical operations
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 27/67] target/arm: Implement SVE Permute - Unpredicated Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (25 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 26/67] target/arm: Implement SVE Permute - Extract Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 14:34   ` Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 28/67] target/arm: Implement SVE Permute - Predicates Group Richard Henderson
                   ` (41 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  23 +++++++++
 target/arm/translate-a64.h |  14 +++---
 target/arm/sve_helper.c    | 114 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 113 ++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      |  29 +++++++++++-
 5 files changed, 285 insertions(+), 8 deletions(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 94f4356ce9..0c9aad575e 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -416,6 +416,29 @@ DEF_HELPER_FLAGS_4(sve_cpy_z_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 
 DEF_HELPER_FLAGS_4(sve_ext, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(sve_insr_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_insr_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_insr_s, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_insr_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_3(sve_rev_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_rev_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_rev_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_rev_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_tbl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_tbl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_tbl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_tbl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_sunpk_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_sunpk_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_sunpk_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_uunpk_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_uunpk_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_uunpk_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index e519aee314..328aa7fce1 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -66,18 +66,18 @@ static inline void assert_fp_access_checked(DisasContext *s)
 static inline int vec_reg_offset(DisasContext *s, int regno,
                                  int element, TCGMemOp size)
 {
-    int offs = 0;
+    int element_size = 1 << size;
+    int offs = element * element_size;
 #ifdef HOST_WORDS_BIGENDIAN
     /* This is complicated slightly because vfp.zregs[n].d[0] is
      * still the low half and vfp.zregs[n].d[1] the high half
      * of the 128 bit vector, even on big endian systems.
-     * Calculate the offset assuming a fully bigendian 128 bits,
-     * then XOR to account for the order of the two 64 bit halves.
+     * Calculate the offset assuming a fully little-endian 128 bits,
+     * then XOR to account for the order of the 64 bit units.
      */
-    offs += (16 - ((element + 1) * (1 << size)));
-    offs ^= 8;
-#else
-    offs += element * (1 << size);
+    if (element_size < 8) {
+        offs ^= 8 - element_size;
+    }
 #endif
     offs += offsetof(CPUARMState, vfp.zregs[regno]);
     assert_fp_access_checked(s);
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index fb3f54300b..466a209c1e 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -1550,3 +1550,117 @@ void HELPER(sve_ext)(void *vd, void *vn, void *vm, uint32_t desc)
         memcpy(vd + n_siz, &tmp, n_ofs);
     }
 }
+
+#define DO_INSR(NAME, TYPE, H) \
+void HELPER(NAME)(void *vd, void *vn, uint64_t val, uint32_t desc) \
+{                                                                  \
+    intptr_t opr_sz = simd_oprsz(desc);                            \
+    swap_memmove(vd + sizeof(TYPE), vn, opr_sz - sizeof(TYPE));    \
+    *(TYPE *)(vd + H(0)) = val;                                    \
+}
+
+DO_INSR(sve_insr_b, uint8_t, H1)
+DO_INSR(sve_insr_h, uint16_t, H1_2)
+DO_INSR(sve_insr_s, uint32_t, H1_4)
+DO_INSR(sve_insr_d, uint64_t, )
+
+#undef DO_INSR
+
+void HELPER(sve_rev_b)(void *vd, void *vn, uint32_t desc)
+{
+    intptr_t i, j, opr_sz = simd_oprsz(desc);
+    for (i = 0, j = opr_sz - 8; i < opr_sz / 2; i += 8, j -= 8) {
+        uint64_t f = *(uint64_t *)(vn + i);
+        uint64_t b = *(uint64_t *)(vn + j);
+        *(uint64_t *)(vd + i) = bswap64(b);
+        *(uint64_t *)(vd + j) = bswap64(f);
+    }
+}
+
+static inline uint64_t hswap64(uint64_t h)
+{
+    uint64_t m = 0x0000ffff0000ffffull;
+    h = rol64(h, 32);
+    return ((h & m) << 16) | ((h >> 16) & m);
+}
+
+void HELPER(sve_rev_h)(void *vd, void *vn, uint32_t desc)
+{
+    intptr_t i, j, opr_sz = simd_oprsz(desc);
+    for (i = 0, j = opr_sz - 8; i < opr_sz / 2; i += 8, j -= 8) {
+        uint64_t f = *(uint64_t *)(vn + i);
+        uint64_t b = *(uint64_t *)(vn + j);
+        *(uint64_t *)(vd + i) = hswap64(b);
+        *(uint64_t *)(vd + j) = hswap64(f);
+    }
+}
+
+void HELPER(sve_rev_s)(void *vd, void *vn, uint32_t desc)
+{
+    intptr_t i, j, opr_sz = simd_oprsz(desc);
+    for (i = 0, j = opr_sz - 8; i < opr_sz / 2; i += 8, j -= 8) {
+        uint64_t f = *(uint64_t *)(vn + i);
+        uint64_t b = *(uint64_t *)(vn + j);
+        *(uint64_t *)(vd + i) = rol64(b, 32);
+        *(uint64_t *)(vd + j) = rol64(f, 32);
+    }
+}
+
+void HELPER(sve_rev_d)(void *vd, void *vn, uint32_t desc)
+{
+    intptr_t i, j, opr_sz = simd_oprsz(desc);
+    for (i = 0, j = opr_sz - 8; i < opr_sz / 2; i += 8, j -= 8) {
+        uint64_t f = *(uint64_t *)(vn + i);
+        uint64_t b = *(uint64_t *)(vn + j);
+        *(uint64_t *)(vd + i) = b;
+        *(uint64_t *)(vd + j) = f;
+    }
+}
+
+#define DO_TBL(NAME, TYPE, H) \
+void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \
+{                                                              \
+    intptr_t i, opr_sz = simd_oprsz(desc);                     \
+    uintptr_t elem = opr_sz / sizeof(TYPE);                    \
+    TYPE *d = vd, *n = vn, *m = vm;                            \
+    ARMVectorReg tmp;                                          \
+    if (unlikely(vd == vn)) {                                  \
+        n = memcpy(&tmp, vn, opr_sz);                          \
+    }                                                          \
+    for (i = 0; i < elem; i++) {                               \
+        TYPE j = m[H(i)];                                      \
+        d[H(i)] = j < elem ? n[H(j)] : 0;                      \
+    }                                                          \
+}
+
+DO_TBL(sve_tbl_b, uint8_t, H1)
+DO_TBL(sve_tbl_h, uint16_t, H2)
+DO_TBL(sve_tbl_s, uint32_t, H4)
+DO_TBL(sve_tbl_d, uint64_t, )
+
+#undef TBL
+
+#define DO_UNPK(NAME, TYPED, TYPES, HD, HS) \
+void HELPER(NAME)(void *vd, void *vn, uint32_t desc)           \
+{                                                              \
+    intptr_t i, opr_sz = simd_oprsz(desc);                     \
+    TYPED *d = vd;                                             \
+    TYPES *n = vn;                                             \
+    ARMVectorReg tmp;                                          \
+    if (unlikely(vn - vd < opr_sz)) {                          \
+        n = memcpy(&tmp, n, opr_sz / 2);                       \
+    }                                                          \
+    for (i = 0; i < opr_sz / sizeof(TYPED); i++) {             \
+        d[HD(i)] = n[HS(i)];                                   \
+    }                                                          \
+}
+
+DO_UNPK(sve_sunpk_h, int16_t, int8_t, H2, H1)
+DO_UNPK(sve_sunpk_s, int32_t, int16_t, H4, H2)
+DO_UNPK(sve_sunpk_d, int64_t, int32_t, , H4)
+
+DO_UNPK(sve_uunpk_h, uint16_t, uint8_t, H2, H1)
+DO_UNPK(sve_uunpk_s, uint32_t, uint16_t, H4, H2)
+DO_UNPK(sve_uunpk_d, uint64_t, uint32_t, , H4)
+
+#undef DO_UNPK
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 07a5eac092..3724f6290c 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -1819,6 +1819,119 @@ static void trans_EXT(DisasContext *s, arg_EXT *a, uint32_t insn)
     tcg_gen_gvec_3_ool(d, n, m, vsz, vsz, n_ofs, gen_helper_sve_ext);
 }
 
+/*
+ *** SVE Permute - Unpredicated Group
+ */
+
+static void trans_DUP_s(DisasContext *s, arg_DUP_s *a, uint32_t insn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    tcg_gen_gvec_dup_i64(a->esz, vec_full_reg_offset(s, a->rd),
+                         vsz, vsz, cpu_reg_sp(s, a->rn));
+}
+
+static void trans_DUP_x(DisasContext *s, arg_DUP_x *a, uint32_t insn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    unsigned dofs = vec_full_reg_offset(s, a->rd);
+    unsigned esz, index;
+
+    if ((a->imm & 0x1f) == 0) {
+        unallocated_encoding(s);
+        return;
+    }
+    esz = ctz32(a->imm);
+    index = a->imm >> (esz + 1);
+
+    if ((index << esz) < vsz) {
+        unsigned nofs = vec_reg_offset(s, a->rn, index, esz);
+        tcg_gen_gvec_dup_mem(esz, dofs, nofs, vsz, vsz);
+    } else {
+        tcg_gen_gvec_dup64i(dofs, vsz, vsz, 0);
+    }
+}
+
+static void do_insr_i64(DisasContext *s, arg_rrr_esz *a, TCGv_i64 val)
+{
+    typedef void gen_insr(TCGv_ptr, TCGv_ptr, TCGv_i64, TCGv_i32);
+    static gen_insr * const fns[4] = {
+        gen_helper_sve_insr_b, gen_helper_sve_insr_h,
+        gen_helper_sve_insr_s, gen_helper_sve_insr_d,
+    };
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_i32 desc = tcg_const_i32(simd_desc(vsz, vsz, 0));
+    TCGv_ptr t_zd = tcg_temp_new_ptr();
+    TCGv_ptr t_zn = tcg_temp_new_ptr();
+
+    tcg_gen_addi_ptr(t_zd, cpu_env, vec_full_reg_offset(s, a->rd));
+    tcg_gen_addi_ptr(t_zn, cpu_env, vec_full_reg_offset(s, a->rn));
+
+    fns[a->esz](t_zd, t_zn, val, desc);
+
+    tcg_temp_free_ptr(t_zd);
+    tcg_temp_free_ptr(t_zn);
+    tcg_temp_free_i32(desc);
+}
+
+static void trans_INSR_f(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    TCGv_i64 t = tcg_temp_new_i64();
+    tcg_gen_ld_i64(t, cpu_env, vec_reg_offset(s, a->rm, 0, MO_64));
+    do_insr_i64(s, a, t);
+    tcg_temp_free_i64(t);
+}
+
+static void trans_INSR_r(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_insr_i64(s, a, cpu_reg(s, a->rm));
+}
+
+static void trans_REV_v(DisasContext *s, arg_rr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_2 * const fns[4] = {
+        gen_helper_sve_rev_b, gen_helper_sve_rev_h,
+        gen_helper_sve_rev_s, gen_helper_sve_rev_d
+    };
+    unsigned vsz = vec_full_reg_size(s);
+
+    tcg_gen_gvec_2_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       vsz, vsz, 0, fns[a->esz]);
+}
+
+static void trans_TBL(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        gen_helper_sve_tbl_b, gen_helper_sve_tbl_h,
+        gen_helper_sve_tbl_s, gen_helper_sve_tbl_d
+    };
+    unsigned vsz = vec_full_reg_size(s);
+
+    tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       vec_full_reg_offset(s, a->rm),
+                       vsz, vsz, 0, fns[a->esz]);
+}
+
+static void trans_UNPK(DisasContext *s, arg_UNPK *a, uint32_t insn)
+{
+    static gen_helper_gvec_2 * const fns[4][2] = {
+        { NULL, NULL },
+        { gen_helper_sve_sunpk_h, gen_helper_sve_uunpk_h },
+        { gen_helper_sve_sunpk_s, gen_helper_sve_uunpk_s },
+        { gen_helper_sve_sunpk_d, gen_helper_sve_uunpk_d },
+    };
+    unsigned vsz = vec_full_reg_size(s);
+
+    if (a->esz == 0) {
+        unallocated_encoding(s);
+        return;
+    }
+    tcg_gen_gvec_2_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn) + (a->h ? vsz / 2 : 0),
+                       vsz, vsz, 0, fns[a->esz][a->u]);
+}
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 5e3a9839d4..8af47ad27b 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -24,6 +24,7 @@
 
 %imm4_16_p1	16:4 !function=plus1
 %imm6_22_5	22:1 5:5
+%imm7_22_16	22:2 16:5
 %imm8_16_10	16:5 10:3
 %imm9_16_10	16:s6 10:3
 %preg4_5	5:4
@@ -85,7 +86,9 @@
 @pd_pg_pn_pm_s	........ . s:1 .. rm:4 .. pg:4 . rn:4 . rd:4	&rprr_s
 
 # Three operand, vector element size
-@rd_rn_rm	........ esz:2 . rm:5  ... ...  rn:5 rd:5	&rrr_esz
+@rd_rn_rm	........ esz:2 . rm:5 ... ... rn:5 rd:5		&rrr_esz
+@rdn_rm		........ esz:2 ...... ...... rm:5 rd:5 \
+		&rrr_esz rn=%reg_movprfx
 
 # Three operand with "memory" size, aka immediate left shift
 @rd_rn_msz_rm	........ ... rm:5 .... imm:2 rn:5 rd:5		&rrri
@@ -370,6 +373,30 @@ CPY_z_i		00000101 .. 01 .... 00 . ........ .....   @rdn_pg4 imm=%sh8_i8s
 EXT		00000101 001 ..... 000 ... rm:5 rd:5 \
 		&rrri rn=%reg_movprfx imm=%imm8_16_10
 
+### SVE Permute - Unpredicated Group
+
+# SVE broadcast general register
+DUP_s		00000101 .. 1 00000 001110 ..... .....		@rd_rn
+
+# SVE broadcast indexed element
+DUP_x		00000101 .. 1 ..... 001000 rn:5 rd:5 \
+		&rri imm=%imm7_22_16
+
+# SVE insert SIMD&FP scalar register
+INSR_f		00000101 .. 1 10100 001110 ..... .....		@rdn_rm
+
+# SVE insert general register
+INSR_r		00000101 .. 1 00100 001110 ..... .....		@rdn_rm
+
+# SVE reverse vector elements
+REV_v		00000101 .. 1 11000 001110 ..... .....		@rd_rn
+
+# SVE vector table lookup
+TBL		00000101 .. 1 ..... 001100 ..... .....		@rd_rn_rm
+
+# SVE unpack vector elements
+UNPK		00000101 esz:2 1100 u:1 h:1 001110 rn:5 rd:5
+
 ### SVE Predicate Logical Operations Group
 
 # SVE predicate logical operations
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 28/67] target/arm: Implement SVE Permute - Predicates Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (26 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 27/67] target/arm: Implement SVE Permute - Unpredicated Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 15:15   ` Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 29/67] target/arm: Implement SVE Permute - Interleaving Group Richard Henderson
                   ` (40 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |   6 +
 target/arm/sve_helper.c    | 280 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 110 ++++++++++++++++++
 target/arm/sve.decode      |  18 +++
 4 files changed, 414 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 0c9aad575e..ff958fcebd 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -439,6 +439,12 @@ DEF_HELPER_FLAGS_3(sve_uunpk_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_uunpk_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_uunpk_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(sve_zip_p, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_uzp_p, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_trn_p, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_rev_p, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_punpk_p, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 466a209c1e..c3a2706a16 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -1664,3 +1664,283 @@ DO_UNPK(sve_uunpk_s, uint32_t, uint16_t, H4, H2)
 DO_UNPK(sve_uunpk_d, uint64_t, uint32_t, , H4)
 
 #undef DO_UNPK
+
+static const uint64_t expand_bit_data[5][2] = {
+    { 0x1111111111111111ull, 0x2222222222222222ull },
+    { 0x0303030303030303ull, 0x0c0c0c0c0c0c0c0cull },
+    { 0x000f000f000f000full, 0x00f000f000f000f0ull },
+    { 0x000000ff000000ffull, 0x0000ff000000ff00ull },
+    { 0x000000000000ffffull, 0x00000000ffff0000ull }
+};
+
+/* Expand units of 2**N bits to units of 2**(N+1) bits,
+   with the higher bits zero.  */
+static uint64_t expand_bits(uint64_t x, int n)
+{
+    int i, sh;
+    for (i = 4, sh = 16; i >= n; i--, sh >>= 1) {
+        x = ((x & expand_bit_data[i][1]) << sh) | (x & expand_bit_data[i][0]);
+    }
+    return x;
+}
+
+/* Compress units of 2**(N+1) bits to units of 2**N bits.  */
+static uint64_t compress_bits(uint64_t x, int n)
+{
+    int i, sh;
+    for (i = n, sh = 1 << n; i <= 4; i++, sh <<= 1) {
+        x = ((x >> sh) & expand_bit_data[i][1]) | (x & expand_bit_data[i][0]);
+    }
+    return x;
+}
+
+void HELPER(sve_zip_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+    int esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2);
+    intptr_t high = extract32(pred_desc, SIMD_DATA_SHIFT + 2, 1);
+    uint64_t *d = vd;
+    intptr_t i;
+
+    if (oprsz <= 8) {
+        uint64_t nn = *(uint64_t *)vn;
+        uint64_t mm = *(uint64_t *)vm;
+        int half = 4 * oprsz;
+
+        nn = extract64(nn, high * half, half);
+        mm = extract64(mm, high * half, half);
+        nn = expand_bits(nn, esz);
+        mm = expand_bits(mm, esz);
+        d[0] = nn + (mm << (1 << esz));
+    } else {
+        ARMPredicateReg tmp_n, tmp_m;
+
+        /* We produce output faster than we consume input.
+           Therefore we must be mindful of possible overlap.  */
+        if ((vn - vd) < (uintptr_t)oprsz) {
+            vn = memcpy(&tmp_n, vn, oprsz);
+        }
+        if ((vm - vd) < (uintptr_t)oprsz) {
+            vm = memcpy(&tmp_m, vm, oprsz);
+        }
+        if (high) {
+            high = oprsz >> 1;
+        }
+
+        if ((high & 3) == 0) {
+            uint32_t *n = vn, *m = vm;
+            high >>= 2;
+
+            for (i = 0; i < DIV_ROUND_UP(oprsz, 8); i++) {
+                uint64_t nn = n[H4(high + i)];
+                uint64_t mm = m[H4(high + i)];
+
+                nn = expand_bits(nn, esz);
+                mm = expand_bits(mm, esz);
+                d[i] = nn + (mm << (1 << esz));
+            }
+        } else {
+            uint8_t *n = vn, *m = vm;
+            uint16_t *d16 = vd;
+
+            for (i = 0; i < oprsz / 2; i++) {
+                uint16_t nn = n[H1(high + i)];
+                uint16_t mm = m[H1(high + i)];
+
+                nn = expand_bits(nn, esz);
+                mm = expand_bits(mm, esz);
+                d16[H2(i)] = nn + (mm << (1 << esz));
+            }
+        }
+    }
+}
+
+void HELPER(sve_uzp_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+    int esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2);
+    int odd = extract32(pred_desc, SIMD_DATA_SHIFT + 2, 1) << esz;
+    uint64_t *d = vd, *n = vn, *m = vm;
+    uint64_t l, h;
+    intptr_t i;
+
+    if (oprsz <= 8) {
+        l = compress_bits(n[0] >> odd, esz);
+        h = compress_bits(m[0] >> odd, esz);
+        d[0] = extract64(l + (h << (4 * oprsz)), 0, 8 * oprsz);
+    } else {
+        ARMPredicateReg tmp_m;
+        intptr_t oprsz_16 = oprsz / 16;
+
+        if ((vm - vd) < (uintptr_t)oprsz) {
+            m = memcpy(&tmp_m, vm, oprsz);
+        }
+
+        for (i = 0; i < oprsz_16; i++) {
+            l = n[2 * i + 0];
+            h = n[2 * i + 1];
+            l = compress_bits(l >> odd, esz);
+            h = compress_bits(h >> odd, esz);
+            d[i] = l + (h << 32);
+        }
+
+        /* For VL which is not a power of 2, the results from M do not
+           align nicely with the uint64_t for D.  Put the aligned results
+           from M into TMP_M and then copy it into place afterward.  */
+        if (oprsz & 15) {
+            d[i] = compress_bits(n[2 * i] >> odd, esz);
+
+            for (i = 0; i < oprsz_16; i++) {
+                l = m[2 * i + 0];
+                h = m[2 * i + 1];
+                l = compress_bits(l >> odd, esz);
+                h = compress_bits(h >> odd, esz);
+                tmp_m.p[i] = l + (h << 32);
+            }
+            tmp_m.p[i] = compress_bits(m[2 * i] >> odd, esz);
+
+            swap_memmove(vd + oprsz / 2, &tmp_m, oprsz / 2);
+        } else {
+            for (i = 0; i < oprsz_16; i++) {
+                l = m[2 * i + 0];
+                h = m[2 * i + 1];
+                l = compress_bits(l >> odd, esz);
+                h = compress_bits(h >> odd, esz);
+                d[oprsz_16 + i] = l + (h << 32);
+            }
+        }
+    }
+}
+
+static const uint64_t even_bit_esz_masks[4] = {
+    0x5555555555555555ull,
+    0x3333333333333333ull,
+    0x0f0f0f0f0f0f0f0full,
+    0x00ff00ff00ff00ffull
+};
+
+void HELPER(sve_trn_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+    uintptr_t esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2);
+    bool odd = extract32(pred_desc, SIMD_DATA_SHIFT + 2, 1);
+    uint64_t *d = vd, *n = vn, *m = vm;
+    uint64_t mask;
+    int shr, shl;
+    intptr_t i;
+
+    shl = 1 << esz;
+    shr = 0;
+    mask = even_bit_esz_masks[esz];
+    if (odd) {
+        mask <<= shl;
+        shr = shl;
+        shl = 0;
+    }
+
+    for (i = 0; i < DIV_ROUND_UP(oprsz, 8); i++) {
+        uint64_t nn = (n[i] & mask) >> shr;
+        uint64_t mm = (m[i] & mask) << shl;
+        d[i] = nn + mm;
+    }
+}
+
+/* Reverse units of 2**N bits.  */
+static uint64_t reverse_bits_64(uint64_t x, int n)
+{
+    int i, sh;
+
+    x = bswap64(x);
+    for (i = 2, sh = 4; i >= n; i--, sh >>= 1) {
+        uint64_t mask = even_bit_esz_masks[i];
+        x = ((x & mask) << sh) | ((x >> sh) & mask);
+    }
+    return x;
+}
+
+static uint8_t reverse_bits_8(uint8_t x, int n)
+{
+    static const uint8_t mask[3] = { 0x55, 0x33, 0x0f };
+    int i, sh;
+
+    for (i = 2, sh = 4; i >= n; i--, sh >>= 1) {
+        x = ((x & mask[i]) << sh) | ((x >> sh) & mask[i]);
+    }
+    return x;
+}
+
+void HELPER(sve_rev_p)(void *vd, void *vn, uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+    int esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2);
+    intptr_t i, oprsz_2 = oprsz / 2;
+
+    if (oprsz <= 8) {
+        uint64_t l = *(uint64_t *)vn;
+        l = reverse_bits_64(l << (64 - 8 * oprsz), esz);
+        *(uint64_t *)vd = l;
+    } else if ((oprsz & 15) == 0) {
+        for (i = 0; i < oprsz_2; i += 8) {
+            intptr_t ih = oprsz - 8 - i;
+            uint64_t l = reverse_bits_64(*(uint64_t *)(vn + i), esz);
+            uint64_t h = reverse_bits_64(*(uint64_t *)(vn + ih), esz);
+            *(uint64_t *)(vd + i) = h;
+            *(uint64_t *)(vd + ih) = l;
+        }
+    } else {
+        for (i = 0; i < oprsz_2; i += 1) {
+            intptr_t il = H1(i);
+            intptr_t ih = H1(oprsz - 1 - i);
+            uint8_t l = reverse_bits_8(*(uint8_t *)(vn + il), esz);
+            uint8_t h = reverse_bits_8(*(uint8_t *)(vn + ih), esz);
+            *(uint8_t *)(vd + il) = h;
+            *(uint8_t *)(vd + ih) = l;
+        }
+    }
+}
+
+void HELPER(sve_punpk_p)(void *vd, void *vn, uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+    intptr_t high = extract32(pred_desc, SIMD_DATA_SHIFT + 2, 1);
+    uint64_t *d = vd;
+    intptr_t i;
+
+    if (oprsz <= 8) {
+        uint64_t nn = *(uint64_t *)vn;
+        int half = 4 * oprsz;
+
+        nn = extract64(nn, high * half, half);
+        nn = expand_bits(nn, 0);
+        d[0] = nn;
+    } else {
+        ARMPredicateReg tmp_n;
+
+        /* We produce output faster than we consume input.
+           Therefore we must be mindful of possible overlap.  */
+        if ((vn - vd) < (uintptr_t)oprsz) {
+            vn = memcpy(&tmp_n, vn, oprsz);
+        }
+        if (high) {
+            high = oprsz >> 1;
+        }
+
+        if ((high & 3) == 0) {
+            uint32_t *n = vn;
+            high >>= 2;
+
+            for (i = 0; i < DIV_ROUND_UP(oprsz, 8); i++) {
+                uint64_t nn = n[H4(high + i)];
+                d[i] = expand_bits(nn, 0);
+            }
+        } else {
+            uint16_t *d16 = vd;
+            uint8_t *n = vn;
+
+            for (i = 0; i < oprsz / 2; i++) {
+                uint16_t nn = n[H1(high + i)];
+                d16[H2(i)] = expand_bits(nn, 0);
+            }
+        }
+    }
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 3724f6290c..45e1ea87bf 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -1932,6 +1932,116 @@ static void trans_UNPK(DisasContext *s, arg_UNPK *a, uint32_t insn)
                        vsz, vsz, 0, fns[a->esz][a->u]);
 }
 
+/*
+ *** SVE Permute - Predicates Group
+ */
+
+static void do_perm_pred3(DisasContext *s, arg_rrr_esz *a, bool high_odd,
+                          gen_helper_gvec_3 *fn)
+{
+    unsigned vsz = pred_full_reg_size(s);
+
+    /* Predicate sizes may be smaller and cannot use simd_desc.
+       We cannot round up, as we do elsewhere, because we need
+       the exact size for ZIP2 and REV.  We retain the style for
+       the other helpers for consistency.  */
+    TCGv_ptr t_d = tcg_temp_new_ptr();
+    TCGv_ptr t_n = tcg_temp_new_ptr();
+    TCGv_ptr t_m = tcg_temp_new_ptr();
+    TCGv_i32 t_desc;
+    int desc;
+
+    desc = vsz - 2;
+    desc = deposit32(desc, SIMD_DATA_SHIFT, 2, a->esz);
+    desc = deposit32(desc, SIMD_DATA_SHIFT + 2, 2, high_odd);
+
+    tcg_gen_addi_ptr(t_d, cpu_env, pred_full_reg_offset(s, a->rd));
+    tcg_gen_addi_ptr(t_n, cpu_env, pred_full_reg_offset(s, a->rn));
+    tcg_gen_addi_ptr(t_m, cpu_env, pred_full_reg_offset(s, a->rm));
+    t_desc = tcg_const_i32(desc);
+
+    fn(t_d, t_n, t_m, t_desc);
+
+    tcg_temp_free_ptr(t_d);
+    tcg_temp_free_ptr(t_n);
+    tcg_temp_free_ptr(t_m);
+    tcg_temp_free_i32(t_desc);
+}
+
+static void do_perm_pred2(DisasContext *s, arg_rr_esz *a, bool high_odd,
+                          gen_helper_gvec_2 *fn)
+{
+    unsigned vsz = pred_full_reg_size(s);
+    TCGv_ptr t_d = tcg_temp_new_ptr();
+    TCGv_ptr t_n = tcg_temp_new_ptr();
+    TCGv_i32 t_desc;
+    int desc;
+
+    tcg_gen_addi_ptr(t_d, cpu_env, pred_full_reg_offset(s, a->rd));
+    tcg_gen_addi_ptr(t_n, cpu_env, pred_full_reg_offset(s, a->rn));
+
+    /* Predicate sizes may be smaller and cannot use simd_desc.
+       We cannot round up, as we do elsewhere, because we need
+       the exact size for ZIP2 and REV.  We retain the style for
+       the other helpers for consistency.  */
+
+    desc = vsz - 2;
+    desc = deposit32(desc, SIMD_DATA_SHIFT, 2, a->esz);
+    desc = deposit32(desc, SIMD_DATA_SHIFT + 2, 2, high_odd);
+    t_desc = tcg_const_i32(desc);
+
+    fn(t_d, t_n, t_desc);
+
+    tcg_temp_free_i32(t_desc);
+    tcg_temp_free_ptr(t_d);
+    tcg_temp_free_ptr(t_n);
+}
+
+static void trans_ZIP1_p(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_perm_pred3(s, a, 0, gen_helper_sve_zip_p);
+}
+
+static void trans_ZIP2_p(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_perm_pred3(s, a, 1, gen_helper_sve_zip_p);
+}
+
+static void trans_UZP1_p(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_perm_pred3(s, a, 0, gen_helper_sve_uzp_p);
+}
+
+static void trans_UZP2_p(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_perm_pred3(s, a, 1, gen_helper_sve_uzp_p);
+}
+
+static void trans_TRN1_p(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_perm_pred3(s, a, 0, gen_helper_sve_trn_p);
+}
+
+static void trans_TRN2_p(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_perm_pred3(s, a, 1, gen_helper_sve_trn_p);
+}
+
+static void trans_REV_p(DisasContext *s, arg_rr_esz *a, uint32_t insn)
+{
+    do_perm_pred2(s, a, 0, gen_helper_sve_rev_p);
+}
+
+static void trans_PUNPKLO(DisasContext *s, arg_PUNPKLO *a, uint32_t insn)
+{
+    do_perm_pred2(s, a, 0, gen_helper_sve_punpk_p);
+}
+
+static void trans_PUNPKHI(DisasContext *s, arg_PUNPKHI *a, uint32_t insn)
+{
+    do_perm_pred2(s, a, 1, gen_helper_sve_punpk_p);
+}
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 8af47ad27b..bcbe84c3a6 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -87,6 +87,7 @@
 
 # Three operand, vector element size
 @rd_rn_rm	........ esz:2 . rm:5 ... ... rn:5 rd:5		&rrr_esz
+@pd_pn_pm	........ esz:2 .. rm:4 ....... rn:4 . rd:4	&rrr_esz
 @rdn_rm		........ esz:2 ...... ...... rm:5 rd:5 \
 		&rrr_esz rn=%reg_movprfx
 
@@ -397,6 +398,23 @@ TBL		00000101 .. 1 ..... 001100 ..... .....		@rd_rn_rm
 # SVE unpack vector elements
 UNPK		00000101 esz:2 1100 u:1 h:1 001110 rn:5 rd:5
 
+### SVE Permute - Predicates Group
+
+# SVE permute predicate elements
+ZIP1_p		00000101 .. 10 .... 010 000 0 .... 0 ....	@pd_pn_pm
+ZIP2_p		00000101 .. 10 .... 010 001 0 .... 0 ....	@pd_pn_pm
+UZP1_p		00000101 .. 10 .... 010 010 0 .... 0 ....	@pd_pn_pm
+UZP2_p		00000101 .. 10 .... 010 011 0 .... 0 ....	@pd_pn_pm
+TRN1_p		00000101 .. 10 .... 010 100 0 .... 0 ....	@pd_pn_pm
+TRN2_p		00000101 .. 10 .... 010 101 0 .... 0 ....	@pd_pn_pm
+
+# SVE reverse predicate elements
+REV_p		00000101 .. 11 0100 010 000 0 .... 0 ....	@pd_pn
+
+# SVE unpack predicate elements
+PUNPKLO		00000101 00 11 0000 010 000 0 .... 0 ....	@pd_pn_e0
+PUNPKHI		00000101 00 11 0001 010 000 0 .... 0 ....	@pd_pn_e0
+
 ### SVE Predicate Logical Operations Group
 
 # SVE predicate logical operations
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 29/67] target/arm: Implement SVE Permute - Interleaving Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (27 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 28/67] target/arm: Implement SVE Permute - Predicates Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 15:22   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 30/67] target/arm: Implement SVE compress active elements Richard Henderson
                   ` (39 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 15 ++++++++++
 target/arm/sve_helper.c    | 72 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 69 ++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      | 10 +++++++
 4 files changed, 166 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index ff958fcebd..bab20345c6 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -445,6 +445,21 @@ DEF_HELPER_FLAGS_4(sve_trn_p, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_rev_p, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_punpk_p, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(sve_zip_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_zip_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_zip_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_zip_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_uzp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_uzp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_uzp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_uzp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_trn_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_trn_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_trn_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_trn_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index c3a2706a16..62982bd099 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -1944,3 +1944,75 @@ void HELPER(sve_punpk_p)(void *vd, void *vn, uint32_t pred_desc)
         }
     }
 }
+
+#define DO_ZIP(NAME, TYPE, H) \
+void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc)       \
+{                                                                    \
+    intptr_t oprsz = simd_oprsz(desc);                               \
+    intptr_t i, oprsz_2 = oprsz / 2;                                 \
+    ARMVectorReg tmp_n, tmp_m;                                       \
+    /* We produce output faster than we consume input.               \
+       Therefore we must be mindful of possible overlap.  */         \
+    if (unlikely((vn - vd) < (uintptr_t)oprsz)) {                    \
+        vn = memcpy(&tmp_n, vn, oprsz_2);                            \
+    }                                                                \
+    if (unlikely((vm - vd) < (uintptr_t)oprsz)) {                    \
+        vm = memcpy(&tmp_m, vm, oprsz_2);                            \
+    }                                                                \
+    for (i = 0; i < oprsz_2; i += sizeof(TYPE)) {                    \
+        *(TYPE *)(vd + H(2 * i + 0)) = *(TYPE *)(vn + H(i));         \
+        *(TYPE *)(vd + H(2 * i + sizeof(TYPE))) = *(TYPE *)(vm + H(i)); \
+    }                                                                \
+}
+
+DO_ZIP(sve_zip_b, uint8_t, H1)
+DO_ZIP(sve_zip_h, uint16_t, H1_2)
+DO_ZIP(sve_zip_s, uint32_t, H1_4)
+DO_ZIP(sve_zip_d, uint64_t, )
+
+#define DO_UZP(NAME, TYPE, H) \
+void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc)         \
+{                                                                      \
+    intptr_t oprsz = simd_oprsz(desc);                                 \
+    intptr_t oprsz_2 = oprsz / 2;                                      \
+    intptr_t odd_ofs = simd_data(desc);                                \
+    intptr_t i;                                                        \
+    ARMVectorReg tmp_m;                                                \
+    if (unlikely((vm - vd) < (uintptr_t)oprsz)) {                      \
+        vm = memcpy(&tmp_m, vm, oprsz);                                \
+    }                                                                  \
+    for (i = 0; i < oprsz_2; i += sizeof(TYPE)) {                      \
+        *(TYPE *)(vd + H(i)) = *(TYPE *)(vn + H(2 * i + odd_ofs));     \
+    }                                                                  \
+    for (i = 0; i < oprsz_2; i += sizeof(TYPE)) {                      \
+        *(TYPE *)(vd + H(oprsz_2 + i)) = *(TYPE *)(vm + H(2 * i + odd_ofs)); \
+    }                                                                  \
+}
+
+DO_UZP(sve_uzp_b, uint8_t, H1)
+DO_UZP(sve_uzp_h, uint16_t, H1_2)
+DO_UZP(sve_uzp_s, uint32_t, H1_4)
+DO_UZP(sve_uzp_d, uint64_t, )
+
+#define DO_TRN(NAME, TYPE, H) \
+void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc)         \
+{                                                                      \
+    intptr_t oprsz = simd_oprsz(desc);                                 \
+    intptr_t odd_ofs = simd_data(desc);                                \
+    intptr_t i;                                                        \
+    for (i = 0; i < oprsz; i += 2 * sizeof(TYPE)) {                    \
+        TYPE ae = *(TYPE *)(vn + H(i + odd_ofs));                      \
+        TYPE be = *(TYPE *)(vm + H(i + odd_ofs));                      \
+        *(TYPE *)(vd + H(i + 0)) = ae;                                 \
+        *(TYPE *)(vd + H(i + sizeof(TYPE))) = be;                      \
+    }                                                                  \
+}
+
+DO_TRN(sve_trn_b, uint8_t, H1)
+DO_TRN(sve_trn_h, uint16_t, H1_2)
+DO_TRN(sve_trn_s, uint32_t, H1_4)
+DO_TRN(sve_trn_d, uint64_t, )
+
+#undef DO_ZIP
+#undef DO_UZP
+#undef DO_TRN
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 45e1ea87bf..09ac955a36 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -2042,6 +2042,75 @@ static void trans_PUNPKHI(DisasContext *s, arg_PUNPKHI *a, uint32_t insn)
     do_perm_pred2(s, a, 1, gen_helper_sve_punpk_p);
 }
 
+/*
+ *** SVE Permute - Interleaving Group
+ */
+
+static void do_zip(DisasContext *s, arg_rrr_esz *a, bool high)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        gen_helper_sve_zip_b, gen_helper_sve_zip_h,
+        gen_helper_sve_zip_s, gen_helper_sve_zip_d,
+    };
+    unsigned vsz = vec_full_reg_size(s);
+    unsigned high_ofs = high ? vsz / 2 : 0;
+
+    tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn) + high_ofs,
+                       vec_full_reg_offset(s, a->rm) + high_ofs,
+                       vsz, vsz, 0, fns[a->esz]);
+}
+
+static void do_zzz_data_ool(DisasContext *s, arg_rrr_esz *a, int data,
+                            gen_helper_gvec_3 *fn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       vec_full_reg_offset(s, a->rm),
+                       vsz, vsz, data, fn);
+}
+
+static void trans_ZIP1_z(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_zip(s, a, false);
+}
+
+static void trans_ZIP2_z(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_zip(s, a, true);
+}
+
+static gen_helper_gvec_3 * const uzp_fns[4] = {
+    gen_helper_sve_uzp_b, gen_helper_sve_uzp_h,
+    gen_helper_sve_uzp_s, gen_helper_sve_uzp_d,
+};
+
+static void trans_UZP1_z(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_zzz_data_ool(s, a, 0, uzp_fns[a->esz]);
+}
+
+static void trans_UZP2_z(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_zzz_data_ool(s, a, 1 << a->esz, uzp_fns[a->esz]);
+}
+
+static gen_helper_gvec_3 * const trn_fns[4] = {
+    gen_helper_sve_trn_b, gen_helper_sve_trn_h,
+    gen_helper_sve_trn_s, gen_helper_sve_trn_d,
+};
+
+static void trans_TRN1_z(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_zzz_data_ool(s, a, 0, trn_fns[a->esz]);
+}
+
+static void trans_TRN2_z(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
+{
+    do_zzz_data_ool(s, a, 1 << a->esz, trn_fns[a->esz]);
+}
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index bcbe84c3a6..2efa3773fc 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -415,6 +415,16 @@ REV_p		00000101 .. 11 0100 010 000 0 .... 0 ....	@pd_pn
 PUNPKLO		00000101 00 11 0000 010 000 0 .... 0 ....	@pd_pn_e0
 PUNPKHI		00000101 00 11 0001 010 000 0 .... 0 ....	@pd_pn_e0
 
+### SVE Permute - Interleaving Group
+
+# SVE permute vector elements
+ZIP1_z		00000101 .. 1 ..... 011 000 ..... .....		@rd_rn_rm
+ZIP2_z		00000101 .. 1 ..... 011 001 ..... .....		@rd_rn_rm
+UZP1_z		00000101 .. 1 ..... 011 010 ..... .....		@rd_rn_rm
+UZP2_z		00000101 .. 1 ..... 011 011 ..... .....		@rd_rn_rm
+TRN1_z		00000101 .. 1 ..... 011 100 ..... .....		@rd_rn_rm
+TRN2_z		00000101 .. 1 ..... 011 101 ..... .....		@rd_rn_rm
+
 ### SVE Predicate Logical Operations Group
 
 # SVE predicate logical operations
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 30/67] target/arm: Implement SVE compress active elements
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (28 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 29/67] target/arm: Implement SVE Permute - Interleaving Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 15:25   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 31/67] target/arm: Implement SVE conditionally broadcast/extract element Richard Henderson
                   ` (38 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  3 +++
 target/arm/sve_helper.c    | 34 ++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 12 ++++++++++++
 target/arm/sve.decode      |  6 ++++++
 4 files changed, 55 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index bab20345c6..d977aea00d 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -460,6 +460,9 @@ DEF_HELPER_FLAGS_4(sve_trn_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_trn_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_trn_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(sve_compact_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_compact_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 62982bd099..87a1a32232 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -2016,3 +2016,37 @@ DO_TRN(sve_trn_d, uint64_t, )
 #undef DO_ZIP
 #undef DO_UZP
 #undef DO_TRN
+
+void HELPER(sve_compact_s)(void *vd, void *vn, void *vg, uint32_t desc)
+{
+    intptr_t i, j, opr_sz = simd_oprsz(desc) / 4;
+    uint32_t *d = vd, *n = vn;
+    uint8_t *pg = vg;
+
+    for (i = j = 0; i < opr_sz; i++) {
+        if (pg[H1(i / 2)] & (i & 1 ? 0x10 : 0x01)) {
+            d[H4(j)] = n[H4(i)];
+            j++;
+        }
+    }
+    for (; j < opr_sz; j++) {
+        d[H4(j)] = 0;
+    }
+}
+
+void HELPER(sve_compact_d)(void *vd, void *vn, void *vg, uint32_t desc)
+{
+    intptr_t i, j, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd, *n = vn;
+    uint8_t *pg = vg;
+
+    for (i = j = 0; i < opr_sz; i++) {
+        if (pg[H1(i)] & 1) {
+            d[j] = n[i];
+            j++;
+        }
+    }
+    for (; j < opr_sz; j++) {
+        d[j] = 0;
+    }
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 09ac955a36..21531b259c 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -2111,6 +2111,18 @@ static void trans_TRN2_z(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
     do_zzz_data_ool(s, a, 1 << a->esz, trn_fns[a->esz]);
 }
 
+/*
+ *** SVE Permute Vector - Predicated Group
+ */
+
+static void trans_COMPACT(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        NULL, NULL, gen_helper_sve_compact_s, gen_helper_sve_compact_d
+    };
+    do_zpz_ool(s, a, fns[a->esz]);
+}
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 2efa3773fc..a89bd37eeb 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -425,6 +425,12 @@ UZP2_z		00000101 .. 1 ..... 011 011 ..... .....		@rd_rn_rm
 TRN1_z		00000101 .. 1 ..... 011 100 ..... .....		@rd_rn_rm
 TRN2_z		00000101 .. 1 ..... 011 101 ..... .....		@rd_rn_rm
 
+### SVE Permute - Predicated Group
+
+# SVE compress active elements
+# Note esz >= 2
+COMPACT		00000101 .. 100001 100 ... ..... .....		@rd_pg_rn
+
 ### SVE Predicate Logical Operations Group
 
 # SVE predicate logical operations
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 31/67] target/arm: Implement SVE conditionally broadcast/extract element
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (29 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 30/67] target/arm: Implement SVE compress active elements Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 15:44   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 32/67] target/arm: Implement SVE copy to vector (predicated) Richard Henderson
                   ` (37 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |   2 +
 target/arm/sve_helper.c    |  11 ++
 target/arm/translate-sve.c | 299 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      |  20 +++
 4 files changed, 332 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index d977aea00d..a58fb4ba01 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -463,6 +463,8 @@ DEF_HELPER_FLAGS_4(sve_trn_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_compact_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_compact_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_2(sve_last_active_element, TCG_CALL_NO_RWG, s32, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 87a1a32232..ee289be642 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -2050,3 +2050,14 @@ void HELPER(sve_compact_d)(void *vd, void *vn, void *vg, uint32_t desc)
         d[j] = 0;
     }
 }
+
+/* Similar to the ARM LastActiveElement pseudocode function, except the
+   result is multiplied by the element size.  This includes the not found
+   indication; e.g. not found for esz=3 is -8.  */
+int32_t HELPER(sve_last_active_element)(void *vg, uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+    intptr_t esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2);
+
+    return last_active_element(vg, DIV_ROUND_UP(oprsz, 8), esz);
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 21531b259c..207a22a0bc 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -2123,6 +2123,305 @@ static void trans_COMPACT(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
     do_zpz_ool(s, a, fns[a->esz]);
 }
 
+/* Call the helper that computes the ARM LastActiveElement pseudocode
+   function, scaled by the element size.  This includes the not found
+   indication; e.g. not found for esz=3 is -8.  */
+static void find_last_active(DisasContext *s, TCGv_i32 ret, int esz, int pg)
+{
+    /* Predicate sizes may be smaller and cannot use simd_desc.  We cannot
+       round up, as we do elsewhere, because we need the exact size.  */
+    TCGv_ptr t_p = tcg_temp_new_ptr();
+    TCGv_i32 t_desc;
+    unsigned vsz = pred_full_reg_size(s);
+    unsigned desc;
+
+    desc = vsz - 2;
+    desc = deposit32(desc, SIMD_DATA_SHIFT, 2, esz);
+
+    tcg_gen_addi_ptr(t_p, cpu_env, pred_full_reg_offset(s, pg));
+    t_desc = tcg_const_i32(desc);
+
+    gen_helper_sve_last_active_element(ret, t_p, t_desc);
+
+    tcg_temp_free_i32(t_desc);
+    tcg_temp_free_ptr(t_p);
+}
+
+/* Increment LAST to the offset of the next element in the vector,
+   wrapping around to 0.  */
+static void incr_last_active(DisasContext *s, TCGv_i32 last, int esz)
+{
+    unsigned vsz = vec_full_reg_size(s);
+
+    tcg_gen_addi_i32(last, last, 1 << esz);
+    if (is_power_of_2(vsz)) {
+        tcg_gen_andi_i32(last, last, vsz - 1);
+    } else {
+        TCGv_i32 max = tcg_const_i32(vsz);
+        TCGv_i32 zero = tcg_const_i32(0);
+        tcg_gen_movcond_i32(TCG_COND_GEU, last, last, max, zero, last);
+        tcg_temp_free_i32(max);
+        tcg_temp_free_i32(zero);
+    }
+}
+
+/* If LAST < 0, set LAST to the offset of the last element in the vector.  */
+static void wrap_last_active(DisasContext *s, TCGv_i32 last, int esz)
+{
+    unsigned vsz = vec_full_reg_size(s);
+
+    if (is_power_of_2(vsz)) {
+        tcg_gen_andi_i32(last, last, vsz - 1);
+    } else {
+        TCGv_i32 max = tcg_const_i32(vsz - (1 << esz));
+        TCGv_i32 zero = tcg_const_i32(0);
+        tcg_gen_movcond_i32(TCG_COND_LT, last, last, zero, max, last);
+        tcg_temp_free_i32(max);
+        tcg_temp_free_i32(zero);
+    }
+}
+
+/* Load an unsigned element of ESZ from BASE+OFS.  */
+static TCGv_i64 load_esz(TCGv_ptr base, int ofs, int esz)
+{
+    TCGv_i64 r = tcg_temp_new_i64();
+
+    switch (esz) {
+    case 0:
+        tcg_gen_ld8u_i64(r, base, ofs);
+        break;
+    case 1:
+        tcg_gen_ld16u_i64(r, base, ofs);
+        break;
+    case 2:
+        tcg_gen_ld32u_i64(r, base, ofs);
+        break;
+    case 3:
+        tcg_gen_ld_i64(r, base, ofs);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    return r;
+}
+
+/* Load an unsigned element of ESZ from RM[LAST].  */
+static TCGv_i64 load_last_active(DisasContext *s, TCGv_i32 last,
+                                 int rm, int esz)
+{
+    TCGv_ptr p = tcg_temp_new_ptr();
+    TCGv_i64 r;
+
+    /* Convert offset into vector into offset into ENV.
+       The final adjustment for the vector register base
+       is added via constant offset to the load.  */
+#ifdef HOST_WORDS_BIGENDIAN
+    /* Adjust for element ordering.  See vec_reg_offset.  */
+    if (esz < 3) {
+        tcg_gen_xori_i32(last, last, 8 - (1 << esz));
+    }
+#endif
+    tcg_gen_ext_i32_ptr(p, last);
+    tcg_gen_add_ptr(p, p, cpu_env);
+
+    r = load_esz(p, vec_full_reg_offset(s, rm), esz);
+    tcg_temp_free_ptr(p);
+
+    return r;
+}
+
+/* Compute CLAST for a Zreg.  */
+static void do_clast_vector(DisasContext *s, arg_rprr_esz *a, bool before)
+{
+    TCGv_i32 last = tcg_temp_local_new_i32();
+    TCGLabel *over = gen_new_label();
+    TCGv_i64 ele;
+    unsigned vsz, esz = a->esz;
+
+    find_last_active(s, last, esz, a->pg);
+
+    /* There is of course no movcond for a 2048-bit vector,
+       so we must branch over the actual store.  */
+    tcg_gen_brcondi_i32(TCG_COND_LT, last, 0, over);
+
+    if (!before) {
+        incr_last_active(s, last, esz);
+    }
+
+    ele = load_last_active(s, last, a->rm, esz);
+    tcg_temp_free_i32(last);
+
+    vsz = vec_full_reg_size(s);
+    tcg_gen_gvec_dup_i64(esz, vec_full_reg_offset(s, a->rd), vsz, vsz, ele);
+    tcg_temp_free_i64(ele);
+
+    /* If this insn used MOVPRFX, we may need a second move.  */
+    if (a->rd != a->rn) {
+        TCGLabel *done = gen_new_label();
+        tcg_gen_br(done);
+
+        gen_set_label(over);
+        do_mov_z(s, a->rd, a->rn);
+
+        gen_set_label(done);
+    } else {
+        gen_set_label(over);
+    }
+}
+
+static void trans_CLASTA_z(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    do_clast_vector(s, a, false);
+}
+
+static void trans_CLASTB_z(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    do_clast_vector(s, a, true);
+}
+
+/* Compute CLAST for a scalar.  */
+static void do_clast_scalar(DisasContext *s, int esz, int pg, int rm,
+                            bool before, TCGv_i64 reg_val)
+{
+    TCGv_i32 last = tcg_temp_new_i32();
+    TCGv_i64 ele, cmp, zero;
+
+    find_last_active(s, last, esz, pg);
+
+    /* Extend the original value of last prior to incrementing.  */
+    cmp = tcg_temp_new_i64();
+    tcg_gen_ext_i32_i64(cmp, last);
+
+    if (!before) {
+        incr_last_active(s, last, esz);
+    }
+
+    /* The conceit here is that while last < 0 indicates not found, after
+       adjusting for cpu_env->vfp.zregs[rm], it is still a valid address
+       from which we can load garbage.  We then discard the garbage with
+       a conditional move.  */
+    ele = load_last_active(s, last, rm, esz);
+    tcg_temp_free_i32(last);
+
+    zero = tcg_const_i64(0);
+    tcg_gen_movcond_i64(TCG_COND_GE, reg_val, cmp, zero, ele, reg_val);
+
+    tcg_temp_free_i64(zero);
+    tcg_temp_free_i64(cmp);
+    tcg_temp_free_i64(ele);
+}
+
+/* Compute CLAST for a Vreg.  */
+static void do_clast_fp(DisasContext *s, arg_rpr_esz *a, bool before)
+{
+    int esz = a->esz;
+    int ofs = vec_reg_offset(s, a->rd, 0, esz);
+    TCGv_i64 reg = load_esz(cpu_env, ofs, esz);
+
+    do_clast_scalar(s, esz, a->pg, a->rn, before, reg);
+    write_fp_dreg(s, a->rd, reg);
+    tcg_temp_free_i64(reg);
+}
+
+static void trans_CLASTA_v(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_clast_fp(s, a, false);
+}
+
+static void trans_CLASTB_v(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_clast_fp(s, a, true);
+}
+
+/* Compute CLAST for a Xreg.  */
+static void do_clast_general(DisasContext *s, arg_rpr_esz *a, bool before)
+{
+    TCGv_i64 reg = cpu_reg(s, a->rd);
+
+    switch (a->esz) {
+    case 0:
+        tcg_gen_ext8u_i64(reg, reg);
+        break;
+    case 1:
+        tcg_gen_ext16u_i64(reg, reg);
+        break;
+    case 2:
+        tcg_gen_ext32u_i64(reg, reg);
+        break;
+    case 3:
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    do_clast_scalar(s, a->esz, a->pg, a->rn, before, cpu_reg(s, a->rd));
+}
+
+static void trans_CLASTA_r(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_clast_general(s, a, false);
+}
+
+static void trans_CLASTB_r(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_clast_general(s, a, true);
+}
+
+/* Compute LAST for a scalar.  */
+static TCGv_i64 do_last_scalar(DisasContext *s, int esz,
+                               int pg, int rm, bool before)
+{
+    TCGv_i32 last = tcg_temp_new_i32();
+    TCGv_i64 ret;
+
+    find_last_active(s, last, esz, pg);
+    if (before) {
+        wrap_last_active(s, last, esz);
+    } else {
+        incr_last_active(s, last, esz);
+    }
+
+    ret = load_last_active(s, last, rm, esz);
+    tcg_temp_free_i32(last);
+    return ret;
+}
+
+/* Compute LAST for a Vreg.  */
+static void do_last_fp(DisasContext *s, arg_rpr_esz *a, bool before)
+{
+    TCGv_i64 val = do_last_scalar(s, a->esz, a->pg, a->rn, before);
+    write_fp_dreg(s, a->rd, val);
+    tcg_temp_free_i64(val);
+}
+
+static void trans_LASTA_v(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_last_fp(s, a, false);
+}
+
+static void trans_LASTB_v(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_last_fp(s, a, true);
+}
+
+/* Compute LAST for a Xreg.  */
+static void do_last_general(DisasContext *s, arg_rpr_esz *a, bool before)
+{
+    TCGv_i64 val = do_last_scalar(s, a->esz, a->pg, a->rn, before);
+    tcg_gen_mov_i64(cpu_reg(s, a->rd), val);
+    tcg_temp_free_i64(val);
+}
+
+static void trans_LASTA_r(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_last_general(s, a, false);
+}
+
+static void trans_LASTB_r(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_last_general(s, a, true);
+}
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index a89bd37eeb..1370802c12 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -431,6 +431,26 @@ TRN2_z		00000101 .. 1 ..... 011 101 ..... .....		@rd_rn_rm
 # Note esz >= 2
 COMPACT		00000101 .. 100001 100 ... ..... .....		@rd_pg_rn
 
+# SVE conditionally broadcast element to vector
+CLASTA_z	00000101 .. 10100 0 100 ... ..... .....		@rdn_pg_rm
+CLASTB_z	00000101 .. 10100 1 100 ... ..... .....		@rdn_pg_rm
+
+# SVE conditionally copy element to SIMD&FP scalar
+CLASTA_v	00000101 .. 10101 0 100 ... ..... .....		@rd_pg_rn
+CLASTB_v	00000101 .. 10101 1 100 ... ..... .....		@rd_pg_rn
+
+# SVE conditionally copy element to general register
+CLASTA_r	00000101 .. 11000 0 101 ... ..... .....		@rd_pg_rn
+CLASTB_r	00000101 .. 11000 1 101 ... ..... .....		@rd_pg_rn
+
+# SVE copy element to SIMD&FP scalar register
+LASTA_v		00000101 .. 10001 0 100 ... ..... .....		@rd_pg_rn
+LASTB_v		00000101 .. 10001 1 100 ... ..... .....		@rd_pg_rn
+
+# SVE copy element to general register
+LASTA_r		00000101 .. 10000 0 101 ... ..... .....		@rd_pg_rn
+LASTB_r		00000101 .. 10000 1 101 ... ..... .....		@rd_pg_rn
+
 ### SVE Predicate Logical Operations Group
 
 # SVE predicate logical operations
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 32/67] target/arm: Implement SVE copy to vector (predicated)
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (30 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 31/67] target/arm: Implement SVE conditionally broadcast/extract element Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 15:45   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 33/67] target/arm: Implement SVE reverse within elements Richard Henderson
                   ` (36 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 13 +++++++++++++
 target/arm/sve.decode      |  6 ++++++
 2 files changed, 19 insertions(+)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 207a22a0bc..fc2a295ab7 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -2422,6 +2422,19 @@ static void trans_LASTB_r(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
     do_last_general(s, a, true);
 }
 
+static void trans_CPY_m_r(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_cpy_m(s, a->esz, a->rd, a->rd, a->pg, cpu_reg_sp(s, a->rn));
+}
+
+static void trans_CPY_m_v(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    int ofs = vec_reg_offset(s, a->rn, 0, a->esz);
+    TCGv_i64 t = load_esz(cpu_env, ofs, a->esz);
+    do_cpy_m(s, a->esz, a->rd, a->rd, a->pg, t);
+    tcg_temp_free_i64(t);
+}
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 1370802c12..5e127de88c 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -451,6 +451,12 @@ LASTB_v		00000101 .. 10001 1 100 ... ..... .....		@rd_pg_rn
 LASTA_r		00000101 .. 10000 0 101 ... ..... .....		@rd_pg_rn
 LASTB_r		00000101 .. 10000 1 101 ... ..... .....		@rd_pg_rn
 
+# SVE copy element from SIMD&FP scalar register
+CPY_m_v		00000101 .. 100000 100 ... ..... .....		@rd_pg_rn
+
+# SVE copy element from general register to vector (predicated)
+CPY_m_r		00000101 .. 101000 101 ... ..... .....		@rd_pg_rn
+
 ### SVE Predicate Logical Operations Group
 
 # SVE predicate logical operations
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 33/67] target/arm: Implement SVE reverse within elements
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (31 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 32/67] target/arm: Implement SVE copy to vector (predicated) Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 15:50   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 34/67] target/arm: Implement SVE vector splice (predicated) Richard Henderson
                   ` (35 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 14 ++++++++++++++
 target/arm/sve_helper.c    | 41 ++++++++++++++++++++++++++++++++++-------
 target/arm/translate-sve.c | 38 ++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      |  7 +++++++
 4 files changed, 93 insertions(+), 7 deletions(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index a58fb4ba01..3b7c54905d 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -465,6 +465,20 @@ DEF_HELPER_FLAGS_4(sve_compact_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_2(sve_last_active_element, TCG_CALL_NO_RWG, s32, ptr, i32)
 
+DEF_HELPER_FLAGS_4(sve_revb_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_revb_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_revb_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_revh_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_revh_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_revw_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_rbit_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_rbit_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_rbit_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_rbit_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index ee289be642..a67bb579b8 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -237,6 +237,26 @@ static inline uint64_t expand_pred_s(uint8_t byte)
     return word[byte & 0x11];
 }
 
+/* Swap 16-bit words within a 32-bit word.  */
+static inline uint32_t hswap32(uint32_t h)
+{
+    return rol32(h, 16);
+}
+
+/* Swap 16-bit words within a 64-bit word.  */
+static inline uint64_t hswap64(uint64_t h)
+{
+    uint64_t m = 0x0000ffff0000ffffull;
+    h = rol64(h, 32);
+    return ((h & m) << 16) | ((h >> 16) & m);
+}
+
+/* Swap 32-bit words within a 64-bit word.  */
+static inline uint64_t wswap64(uint64_t h)
+{
+    return rol64(h, 32);
+}
+
 #define LOGICAL_PPPP(NAME, FUNC) \
 void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc)  \
 {                                                                         \
@@ -615,6 +635,20 @@ DO_ZPZ(sve_neg_h, uint16_t, H1_2, DO_NEG)
 DO_ZPZ(sve_neg_s, uint32_t, H1_4, DO_NEG)
 DO_ZPZ_D(sve_neg_d, uint64_t, DO_NEG)
 
+DO_ZPZ(sve_revb_h, uint16_t, H1_2, bswap16)
+DO_ZPZ(sve_revb_s, uint32_t, H1_4, bswap32)
+DO_ZPZ_D(sve_revb_d, uint64_t, bswap64)
+
+DO_ZPZ(sve_revh_s, uint32_t, H1_4, hswap32)
+DO_ZPZ_D(sve_revh_d, uint64_t, hswap64)
+
+DO_ZPZ_D(sve_revw_d, uint64_t, wswap64)
+
+DO_ZPZ(sve_rbit_b, uint8_t, H1, revbit8)
+DO_ZPZ(sve_rbit_h, uint16_t, H1_2, revbit16)
+DO_ZPZ(sve_rbit_s, uint32_t, H1_4, revbit32)
+DO_ZPZ_D(sve_rbit_d, uint64_t, revbit64)
+
 /* Three-operand expander, unpredicated, in which the third operand is "wide".
  */
 #define DO_ZZW(NAME, TYPE, TYPEW, H, OP)                       \
@@ -1577,13 +1611,6 @@ void HELPER(sve_rev_b)(void *vd, void *vn, uint32_t desc)
     }
 }
 
-static inline uint64_t hswap64(uint64_t h)
-{
-    uint64_t m = 0x0000ffff0000ffffull;
-    h = rol64(h, 32);
-    return ((h & m) << 16) | ((h >> 16) & m);
-}
-
 void HELPER(sve_rev_h)(void *vd, void *vn, uint32_t desc)
 {
     intptr_t i, j, opr_sz = simd_oprsz(desc);
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index fc2a295ab7..5a1ed379ad 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -2435,6 +2435,44 @@ static void trans_CPY_m_v(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
     tcg_temp_free_i64(t);
 }
 
+static void trans_REVB(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        NULL,
+        gen_helper_sve_revb_h,
+        gen_helper_sve_revb_s,
+        gen_helper_sve_revb_d,
+    };
+    do_zpz_ool(s, a, fns[a->esz]);
+}
+
+static void trans_REVH(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        NULL,
+        NULL,
+        gen_helper_sve_revh_s,
+        gen_helper_sve_revh_d,
+    };
+    do_zpz_ool(s, a, fns[a->esz]);
+}
+
+static void trans_REVW(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ool(s, a, a->esz == 3 ? gen_helper_sve_revw_d : NULL);
+}
+
+static void trans_RBIT(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3 * const fns[4] = {
+        gen_helper_sve_rbit_b,
+        gen_helper_sve_rbit_h,
+        gen_helper_sve_rbit_s,
+        gen_helper_sve_rbit_d,
+    };
+    do_zpz_ool(s, a, fns[a->esz]);
+}
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 5e127de88c..8903fb6592 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -457,6 +457,13 @@ CPY_m_v		00000101 .. 100000 100 ... ..... .....		@rd_pg_rn
 # SVE copy element from general register to vector (predicated)
 CPY_m_r		00000101 .. 101000 101 ... ..... .....		@rd_pg_rn
 
+# SVE reverse within elements
+# Note esz >= operation size
+REVB		00000101 .. 1001 00 100 ... ..... .....		@rd_pg_rn
+REVH		00000101 .. 1001 01 100 ... ..... .....		@rd_pg_rn
+REVW		00000101 .. 1001 10 100 ... ..... .....		@rd_pg_rn
+RBIT		00000101 .. 1001 11 100 ... ..... .....		@rd_pg_rn
+
 ### SVE Predicate Logical Operations Group
 
 # SVE predicate logical operations
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 34/67] target/arm: Implement SVE vector splice (predicated)
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (32 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 33/67] target/arm: Implement SVE reverse within elements Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 15:52   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 35/67] target/arm: Implement SVE Select Vectors Group Richard Henderson
                   ` (34 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  2 ++
 target/arm/sve_helper.c    | 37 +++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 10 ++++++++++
 target/arm/sve.decode      |  3 +++
 4 files changed, 52 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 3b7c54905d..c3f8a2b502 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -479,6 +479,8 @@ DEF_HELPER_FLAGS_4(sve_rbit_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_rbit_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_rbit_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(sve_splice, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index a67bb579b8..f524a1ddce 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -2088,3 +2088,40 @@ int32_t HELPER(sve_last_active_element)(void *vg, uint32_t pred_desc)
 
     return last_active_element(vg, DIV_ROUND_UP(oprsz, 8), esz);
 }
+
+void HELPER(sve_splice)(void *vd, void *vn, void *vm, void *vg, uint32_t desc)
+{
+    intptr_t opr_sz = simd_oprsz(desc) / 8;
+    int esz = simd_data(desc);
+    uint64_t pg, first_g, last_g, len, mask = pred_esz_masks[esz];
+    intptr_t i, first_i, last_i;
+    ARMVectorReg tmp;
+
+    first_i = last_i = 0;
+    first_g = last_g = 0;
+
+    /* Find the extent of the active elements within VG.  */
+    for (i = QEMU_ALIGN_UP(opr_sz, 8) - 8; i >= 0; i -= 8) {
+        pg = *(uint64_t *)(vg + i) & mask;
+        if (pg) {
+            if (last_g == 0) {
+                last_g = pg;
+                last_i = i;
+            }
+            first_g = pg;
+            first_i = i;
+        }
+    }
+
+    len = 0;
+    if (first_g != 0) {
+        first_i = first_i * 8 + ctz64(first_g);
+        last_i = last_i * 8 + 63 - clz64(last_g);
+        len = last_i - first_i + (1 << esz);
+        if (vd == vm) {
+            vm = memcpy(&tmp, vm, opr_sz * 8);
+        }
+        swap_memmove(vd, vn + first_i, len);
+    }
+    swap_memmove(vd + len, vm, opr_sz * 8 - len);
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 5a1ed379ad..559fb41fd6 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -2473,6 +2473,16 @@ static void trans_RBIT(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
     do_zpz_ool(s, a, fns[a->esz]);
 }
 
+static void trans_SPLICE(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    tcg_gen_gvec_4_ool(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       vec_full_reg_offset(s, a->rm),
+                       pred_full_reg_offset(s, a->pg),
+                       vsz, vsz, a->esz, gen_helper_sve_splice);
+}
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 8903fb6592..70feb448e6 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -464,6 +464,9 @@ REVH		00000101 .. 1001 01 100 ... ..... .....		@rd_pg_rn
 REVW		00000101 .. 1001 10 100 ... ..... .....		@rd_pg_rn
 RBIT		00000101 .. 1001 11 100 ... ..... .....		@rd_pg_rn
 
+# SVE vector splice (predicated)
+SPLICE		00000101 .. 101 100 100 ... ..... .....		@rdn_pg_rm
+
 ### SVE Predicate Logical Operations Group
 
 # SVE predicate logical operations
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 35/67] target/arm: Implement SVE Select Vectors Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (33 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 34/67] target/arm: Implement SVE vector splice (predicated) Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 16:21   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 36/67] target/arm: Implement SVE Integer Compare - " Richard Henderson
                   ` (33 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  9 ++++++++
 target/arm/sve_helper.c    | 55 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c |  2 ++
 target/arm/sve.decode      |  6 +++++
 4 files changed, 72 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index c3f8a2b502..0f57f64895 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -195,6 +195,15 @@ DEF_HELPER_FLAGS_5(sve_lsl_zpzz_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(sve_lsl_zpzz_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(sve_sel_zpzz_b, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_sel_zpzz_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_sel_zpzz_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_sel_zpzz_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_asr_zpzw_b, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_asr_zpzw_h, TCG_CALL_NO_RWG,
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index f524a1ddce..86cd792cdf 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -2125,3 +2125,58 @@ void HELPER(sve_splice)(void *vd, void *vn, void *vm, void *vg, uint32_t desc)
     }
     swap_memmove(vd + len, vm, opr_sz * 8 - len);
 }
+
+void HELPER(sve_sel_zpzz_b)(void *vd, void *vn, void *vm,
+                            void *vg, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd, *n = vn, *m = vm;
+    uint8_t *pg = vg;
+
+    for (i = 0; i < opr_sz; i += 1) {
+        uint64_t nn = n[i], mm = m[i];
+        uint64_t pp = expand_pred_b(pg[H1(i)]);
+        d[i] = (nn & pp) | (mm & ~pp);
+    }
+}
+
+void HELPER(sve_sel_zpzz_h)(void *vd, void *vn, void *vm,
+                            void *vg, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd, *n = vn, *m = vm;
+    uint8_t *pg = vg;
+
+    for (i = 0; i < opr_sz; i += 1) {
+        uint64_t nn = n[i], mm = m[i];
+        uint64_t pp = expand_pred_h(pg[H1(i)]);
+        d[i] = (nn & pp) | (mm & ~pp);
+    }
+}
+
+void HELPER(sve_sel_zpzz_s)(void *vd, void *vn, void *vm,
+                            void *vg, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd, *n = vn, *m = vm;
+    uint8_t *pg = vg;
+
+    for (i = 0; i < opr_sz; i += 1) {
+        uint64_t nn = n[i], mm = m[i];
+        uint64_t pp = expand_pred_s(pg[H1(i)]);
+        d[i] = (nn & pp) | (mm & ~pp);
+    }
+}
+
+void HELPER(sve_sel_zpzz_d)(void *vd, void *vn, void *vm,
+                            void *vg, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd, *n = vn, *m = vm;
+    uint8_t *pg = vg;
+
+    for (i = 0; i < opr_sz; i += 1) {
+        uint64_t nn = n[i], mm = m[i];
+        d[i] = (pg[H1(i)] & 1 ? nn : mm);
+    }
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 559fb41fd6..021b33ced9 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -361,6 +361,8 @@ static void trans_UDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
     do_zpzz_ool(s, a, fns[a->esz]);
 }
 
+DO_ZPZZ(SEL, sel)
+
 #undef DO_ZPZZ
 
 /*
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 70feb448e6..7ec84fdd80 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -99,6 +99,7 @@
 		&rprr_esz rn=%reg_movprfx
 @rdm_pg_rn	........ esz:2 ... ... ... pg:3 rn:5 rd:5 \
 		&rprr_esz rm=%reg_movprfx
+@rd_pg4_rn_rm	........ esz:2 . rm:5  .. pg:4  rn:5 rd:5	&rprr_esz
 
 # Three register operand, with governing predicate, vector element size
 @rda_pg_rn_rm	........ esz:2 . rm:5  ... pg:3 rn:5 rd:5 \
@@ -467,6 +468,11 @@ RBIT		00000101 .. 1001 11 100 ... ..... .....		@rd_pg_rn
 # SVE vector splice (predicated)
 SPLICE		00000101 .. 101 100 100 ... ..... .....		@rdn_pg_rm
 
+### SVE Select Vectors Group
+
+# SVE select vector elements (predicated)
+SEL_zpzz	00000101 .. 1 ..... 11 .... ..... .....		@rd_pg4_rn_rm
+
 ### SVE Predicate Logical Operations Group
 
 # SVE predicate logical operations
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 36/67] target/arm: Implement SVE Integer Compare - Vectors Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (34 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 35/67] target/arm: Implement SVE Select Vectors Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 16:29   ` Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 37/67] target/arm: Implement SVE Integer Compare - Immediate Group Richard Henderson
                   ` (32 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 115 +++++++++++++++++++++++++++
 target/arm/sve_helper.c    | 193 ++++++++++++++++++++++++++++++++++++++++++++-
 target/arm/translate-sve.c |  87 ++++++++++++++++++++
 target/arm/sve.decode      |  24 ++++++
 4 files changed, 416 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 0f57f64895..6ffd1fbe8e 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -490,6 +490,121 @@ DEF_HELPER_FLAGS_4(sve_rbit_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(sve_splice, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(sve_cmpeq_ppzz_b, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpne_ppzz_b, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpge_ppzz_b, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpgt_ppzz_b, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmphi_ppzz_b, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmphs_ppzz_b, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_cmpeq_ppzz_h, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpne_ppzz_h, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpge_ppzz_h, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpgt_ppzz_h, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmphi_ppzz_h, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmphs_ppzz_h, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_cmpeq_ppzz_s, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpne_ppzz_s, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpge_ppzz_s, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpgt_ppzz_s, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmphi_ppzz_s, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmphs_ppzz_s, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_cmpeq_ppzz_d, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpne_ppzz_d, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpge_ppzz_d, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpgt_ppzz_d, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmphi_ppzz_d, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmphs_ppzz_d, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_cmpeq_ppzw_b, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpne_ppzw_b, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpge_ppzw_b, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpgt_ppzw_b, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmphi_ppzw_b, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmphs_ppzw_b, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmple_ppzw_b, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmplt_ppzw_b, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmplo_ppzw_b, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpls_ppzw_b, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_cmpeq_ppzw_h, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpne_ppzw_h, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpge_ppzw_h, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpgt_ppzw_h, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmphi_ppzw_h, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmphs_ppzw_h, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmple_ppzw_h, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmplt_ppzw_h, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmplo_ppzw_h, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpls_ppzw_h, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_cmpeq_ppzw_s, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpne_ppzw_s, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpge_ppzw_s, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpgt_ppzw_s, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmphi_ppzw_s, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmphs_ppzw_s, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmple_ppzw_s, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmplt_ppzw_s, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmplo_ppzw_s, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_cmpls_ppzw_s, TCG_CALL_NO_RWG,
+                   i32, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 86cd792cdf..ae433861f8 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -46,14 +46,14 @@
  *
  * The return value has bit 31 set if N is set, bit 1 set if Z is clear,
  * and bit 0 set if C is set.
- *
- * This is an iterative function, called for each Pd and Pg word
- * moving forward.
  */
 
 /* For no G bits set, NZCV = C.  */
 #define PREDTEST_INIT  1
 
+/* This is an iterative function, called for each Pd and Pg word
+ * moving forward.
+ */
 static uint32_t iter_predtest_fwd(uint64_t d, uint64_t g, uint32_t flags)
 {
     if (likely(g)) {
@@ -73,6 +73,28 @@ static uint32_t iter_predtest_fwd(uint64_t d, uint64_t g, uint32_t flags)
     return flags;
 }
 
+/* This is an iterative function, called for each Pd and Pg word
+ * moving backward.
+ */
+static uint32_t iter_predtest_bwd(uint64_t d, uint64_t g, uint32_t flags)
+{
+    if (likely(g)) {
+        /* Compute C from first (i.e last) !(D & G).
+           Use bit 2 to signal first G bit seen.  */
+        if (!(flags & 4)) {
+            flags += 4 - 1; /* add bit 2, subtract C from PREDTEST_INIT */
+            flags |= (d & pow2floor(g)) == 0;
+        }
+
+        /* Accumulate Z from each D & G.  */
+        flags |= ((d & g) != 0) << 1;
+
+        /* Compute N from last (i.e first) D & G.  Replace previous.  */
+        flags = deposit32(flags, 31, 1, (d & (g & -g)) != 0);
+    }
+    return flags;
+}
+
 /* The same for a single word predicate.  */
 uint32_t HELPER(sve_predtest1)(uint64_t d, uint64_t g)
 {
@@ -2180,3 +2202,168 @@ void HELPER(sve_sel_zpzz_d)(void *vd, void *vn, void *vm,
         d[i] = (pg[H1(i)] & 1 ? nn : mm);
     }
 }
+
+/* Two operand comparison controlled by a predicate.
+ * ??? It is very tempting to want to be able to expand this inline
+ * with x86 instructions, e.g.
+ *
+ *    vcmpeqw    zm, zn, %ymm0
+ *    vpmovmskb  %ymm0, %eax
+ *    and        $0x5555, %eax
+ *    and        pg, %eax
+ *
+ * or even aarch64, e.g.
+ *
+ *    // mask = 4000 1000 0400 0100 0040 0010 0004 0001
+ *    cmeq       v0.8h, zn, zm
+ *    and        v0.8h, v0.8h, mask
+ *    addv       h0, v0.8h
+ *    and        v0.8b, pg
+ *
+ * However, coming up with an abstraction that allows vector inputs and
+ * a scalar output, and also handles the byte-ordering of sub-uint64_t
+ * scalar outputs, is tricky.
+ */
+#define DO_CMP_PPZZ(NAME, TYPE, OP, H, MASK)                                 \
+uint32_t HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \
+{                                                                            \
+    intptr_t opr_sz = simd_oprsz(desc);                                      \
+    uint32_t flags = PREDTEST_INIT;                                          \
+    intptr_t i = opr_sz;                                                     \
+    do {                                                                     \
+        uint64_t out = 0, pg;                                                \
+        do {                                                                 \
+            i -= sizeof(TYPE), out <<= sizeof(TYPE);                         \
+            TYPE nn = *(TYPE *)(vn + H(i));                                  \
+            TYPE mm = *(TYPE *)(vm + H(i));                                  \
+            out |= nn OP mm;                                                 \
+        } while (i & 63);                                                    \
+        pg = *(uint64_t *)(vg + (i >> 3)) & MASK;                            \
+        out &= pg;                                                           \
+        *(uint64_t *)(vd + (i >> 3)) = out;                                  \
+        flags = iter_predtest_bwd(out, pg, flags);                           \
+    } while (i > 0);                                                         \
+    return flags;                                                            \
+}
+
+#define DO_CMP_PPZZ_B(NAME, TYPE, OP) \
+    DO_CMP_PPZZ(NAME, TYPE, OP, H1,   0xffffffffffffffffull)
+#define DO_CMP_PPZZ_H(NAME, TYPE, OP) \
+    DO_CMP_PPZZ(NAME, TYPE, OP, H1_2, 0x5555555555555555ull)
+#define DO_CMP_PPZZ_S(NAME, TYPE, OP) \
+    DO_CMP_PPZZ(NAME, TYPE, OP, H1_4, 0x1111111111111111ull)
+#define DO_CMP_PPZZ_D(NAME, TYPE, OP) \
+    DO_CMP_PPZZ(NAME, TYPE, OP,     , 0x0101010101010101ull)
+
+DO_CMP_PPZZ_B(sve_cmpeq_ppzz_b, uint8_t,  ==)
+DO_CMP_PPZZ_H(sve_cmpeq_ppzz_h, uint16_t, ==)
+DO_CMP_PPZZ_S(sve_cmpeq_ppzz_s, uint32_t, ==)
+DO_CMP_PPZZ_D(sve_cmpeq_ppzz_d, uint64_t, ==)
+
+DO_CMP_PPZZ_B(sve_cmpne_ppzz_b, uint8_t,  !=)
+DO_CMP_PPZZ_H(sve_cmpne_ppzz_h, uint16_t, !=)
+DO_CMP_PPZZ_S(sve_cmpne_ppzz_s, uint32_t, !=)
+DO_CMP_PPZZ_D(sve_cmpne_ppzz_d, uint64_t, !=)
+
+DO_CMP_PPZZ_B(sve_cmpgt_ppzz_b, int8_t,  >)
+DO_CMP_PPZZ_H(sve_cmpgt_ppzz_h, int16_t, >)
+DO_CMP_PPZZ_S(sve_cmpgt_ppzz_s, int32_t, >)
+DO_CMP_PPZZ_D(sve_cmpgt_ppzz_d, int64_t, >)
+
+DO_CMP_PPZZ_B(sve_cmpge_ppzz_b, int8_t,  >=)
+DO_CMP_PPZZ_H(sve_cmpge_ppzz_h, int16_t, >=)
+DO_CMP_PPZZ_S(sve_cmpge_ppzz_s, int32_t, >=)
+DO_CMP_PPZZ_D(sve_cmpge_ppzz_d, int64_t, >=)
+
+DO_CMP_PPZZ_B(sve_cmphi_ppzz_b, uint8_t,  >)
+DO_CMP_PPZZ_H(sve_cmphi_ppzz_h, uint16_t, >)
+DO_CMP_PPZZ_S(sve_cmphi_ppzz_s, uint32_t, >)
+DO_CMP_PPZZ_D(sve_cmphi_ppzz_d, uint64_t, >)
+
+DO_CMP_PPZZ_B(sve_cmphs_ppzz_b, uint8_t,  >=)
+DO_CMP_PPZZ_H(sve_cmphs_ppzz_h, uint16_t, >=)
+DO_CMP_PPZZ_S(sve_cmphs_ppzz_s, uint32_t, >=)
+DO_CMP_PPZZ_D(sve_cmphs_ppzz_d, uint64_t, >=)
+
+#undef DO_CMP_PPZZ_B
+#undef DO_CMP_PPZZ_H
+#undef DO_CMP_PPZZ_S
+#undef DO_CMP_PPZZ_D
+#undef DO_CMP_PPZZ
+
+/* Similar, but the second source is "wide".  */
+#define DO_CMP_PPZW(NAME, TYPE, TYPEW, OP, H, MASK)                     \
+uint32_t HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \
+{                                                                            \
+    intptr_t opr_sz = simd_oprsz(desc);                                      \
+    uint32_t flags = PREDTEST_INIT;                                          \
+    intptr_t i = opr_sz;                                                     \
+    do {                                                                     \
+        uint64_t out = 0, pg;                                                \
+        do {                                                                 \
+            TYPEW mm = *(TYPEW *)(vm + i - 8);                               \
+            do {                                                             \
+                i -= sizeof(TYPE), out <<= sizeof(TYPE);                     \
+                TYPE nn = *(TYPE *)(vn + H(i));                              \
+                out |= nn OP mm;                                             \
+            } while (i & 7);                                                 \
+        } while (i & 63);                                                    \
+        pg = *(uint64_t *)(vg + (i >> 3)) & MASK;                            \
+        out &= pg;                                                           \
+        *(uint64_t *)(vd + (i >> 3)) = out;                                  \
+        flags = iter_predtest_bwd(out, pg, flags);                           \
+    } while (i > 0);                                                         \
+    return flags;                                                            \
+}
+
+#define DO_CMP_PPZW_B(NAME, TYPE, TYPEW, OP) \
+    DO_CMP_PPZW(NAME, TYPE, TYPEW, OP, H1,   0xffffffffffffffffull)
+#define DO_CMP_PPZW_H(NAME, TYPE, TYPEW, OP) \
+    DO_CMP_PPZW(NAME, TYPE, TYPEW, OP, H1_2, 0x5555555555555555ull)
+#define DO_CMP_PPZW_S(NAME, TYPE, TYPEW, OP) \
+    DO_CMP_PPZW(NAME, TYPE, TYPEW, OP, H1_4, 0x1111111111111111ull)
+
+DO_CMP_PPZW_B(sve_cmpeq_ppzw_b, uint8_t,  uint64_t, ==)
+DO_CMP_PPZW_H(sve_cmpeq_ppzw_h, uint16_t, uint64_t, ==)
+DO_CMP_PPZW_S(sve_cmpeq_ppzw_s, uint32_t, uint64_t, ==)
+
+DO_CMP_PPZW_B(sve_cmpne_ppzw_b, uint8_t,  uint64_t, !=)
+DO_CMP_PPZW_H(sve_cmpne_ppzw_h, uint16_t, uint64_t, !=)
+DO_CMP_PPZW_S(sve_cmpne_ppzw_s, uint32_t, uint64_t, !=)
+
+DO_CMP_PPZW_B(sve_cmpgt_ppzw_b, int8_t,   int64_t, >)
+DO_CMP_PPZW_H(sve_cmpgt_ppzw_h, int16_t,  int64_t, >)
+DO_CMP_PPZW_S(sve_cmpgt_ppzw_s, int32_t,  int64_t, >)
+
+DO_CMP_PPZW_B(sve_cmpge_ppzw_b, int8_t,   int64_t, >=)
+DO_CMP_PPZW_H(sve_cmpge_ppzw_h, int16_t,  int64_t, >=)
+DO_CMP_PPZW_S(sve_cmpge_ppzw_s, int32_t,  int64_t, >=)
+
+DO_CMP_PPZW_B(sve_cmphi_ppzw_b, uint8_t,  uint64_t, >)
+DO_CMP_PPZW_H(sve_cmphi_ppzw_h, uint16_t, uint64_t, >)
+DO_CMP_PPZW_S(sve_cmphi_ppzw_s, uint32_t, uint64_t, >)
+
+DO_CMP_PPZW_B(sve_cmphs_ppzw_b, uint8_t,  uint64_t, >=)
+DO_CMP_PPZW_H(sve_cmphs_ppzw_h, uint16_t, uint64_t, >=)
+DO_CMP_PPZW_S(sve_cmphs_ppzw_s, uint32_t, uint64_t, >=)
+
+DO_CMP_PPZW_B(sve_cmplt_ppzw_b, int8_t,   int64_t, <)
+DO_CMP_PPZW_H(sve_cmplt_ppzw_h, int16_t,  int64_t, <)
+DO_CMP_PPZW_S(sve_cmplt_ppzw_s, int32_t,  int64_t, <)
+
+DO_CMP_PPZW_B(sve_cmple_ppzw_b, int8_t,   int64_t, <=)
+DO_CMP_PPZW_H(sve_cmple_ppzw_h, int16_t,  int64_t, <=)
+DO_CMP_PPZW_S(sve_cmple_ppzw_s, int32_t,  int64_t, <=)
+
+DO_CMP_PPZW_B(sve_cmplo_ppzw_b, uint8_t,  uint64_t, <)
+DO_CMP_PPZW_H(sve_cmplo_ppzw_h, uint16_t, uint64_t, <)
+DO_CMP_PPZW_S(sve_cmplo_ppzw_s, uint32_t, uint64_t, <)
+
+DO_CMP_PPZW_B(sve_cmpls_ppzw_b, uint8_t,  uint64_t, <=)
+DO_CMP_PPZW_H(sve_cmpls_ppzw_h, uint16_t, uint64_t, <=)
+DO_CMP_PPZW_S(sve_cmpls_ppzw_s, uint32_t, uint64_t, <=)
+
+#undef DO_CMP_PPZW_B
+#undef DO_CMP_PPZW_H
+#undef DO_CMP_PPZW_S
+#undef DO_CMP_PPZW
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 021b33ced9..cb54777108 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -39,6 +39,9 @@ typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t,
 typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
                         uint32_t, uint32_t, uint32_t);
 
+typedef void gen_helper_gvec_flags_4(TCGv_i32, TCGv_ptr, TCGv_ptr,
+                                     TCGv_ptr, TCGv_ptr, TCGv_i32);
+
 /*
  * Helpers for extracting complex instruction fields.
  */
@@ -2485,6 +2488,90 @@ static void trans_SPLICE(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
                        vsz, vsz, a->esz, gen_helper_sve_splice);
 }
 
+/*
+ *** SVE Integer Compare - Vectors Group
+ */
+
+static void do_ppzz_flags(DisasContext *s, arg_rprr_esz *a,
+                          gen_helper_gvec_flags_4 *gen_fn)
+{
+    TCGv_ptr pd, zn, zm, pg;
+    unsigned vsz;
+    TCGv_i32 t;
+
+    if (gen_fn == NULL) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    vsz = vec_full_reg_size(s);
+    t = tcg_const_i32(simd_desc(vsz, vsz, 0));
+    pd = tcg_temp_new_ptr();
+    zn = tcg_temp_new_ptr();
+    zm = tcg_temp_new_ptr();
+    pg = tcg_temp_new_ptr();
+
+    tcg_gen_addi_ptr(pd, cpu_env, pred_full_reg_offset(s, a->rd));
+    tcg_gen_addi_ptr(zn, cpu_env, vec_full_reg_offset(s, a->rn));
+    tcg_gen_addi_ptr(zm, cpu_env, vec_full_reg_offset(s, a->rm));
+    tcg_gen_addi_ptr(pg, cpu_env, pred_full_reg_offset(s, a->pg));
+
+    gen_fn(t, pd, zn, zm, pg, t);
+
+    tcg_temp_free_ptr(pd);
+    tcg_temp_free_ptr(zn);
+    tcg_temp_free_ptr(zm);
+    tcg_temp_free_ptr(pg);
+
+    do_pred_flags(t);
+
+    tcg_temp_free_i32(t);
+}
+
+#define DO_PPZZ(NAME, name) \
+static void trans_##NAME##_ppzz(DisasContext *s, arg_rprr_esz *a,         \
+                                uint32_t insn)                            \
+{                                                                         \
+    static gen_helper_gvec_flags_4 * const fns[4] = {                     \
+        gen_helper_sve_##name##_ppzz_b, gen_helper_sve_##name##_ppzz_h,   \
+        gen_helper_sve_##name##_ppzz_s, gen_helper_sve_##name##_ppzz_d,   \
+    };                                                                    \
+    do_ppzz_flags(s, a, fns[a->esz]);                                     \
+}
+
+DO_PPZZ(CMPEQ, cmpeq)
+DO_PPZZ(CMPNE, cmpne)
+DO_PPZZ(CMPGT, cmpgt)
+DO_PPZZ(CMPGE, cmpge)
+DO_PPZZ(CMPHI, cmphi)
+DO_PPZZ(CMPHS, cmphs)
+
+#undef DO_PPZZ
+
+#define DO_PPZW(NAME, name) \
+static void trans_##NAME##_ppzw(DisasContext *s, arg_rprr_esz *a,         \
+                                uint32_t insn)                            \
+{                                                                         \
+    static gen_helper_gvec_flags_4 * const fns[4] = {                     \
+        gen_helper_sve_##name##_ppzw_b, gen_helper_sve_##name##_ppzw_h,   \
+        gen_helper_sve_##name##_ppzw_s, NULL                              \
+    };                                                                    \
+    do_ppzz_flags(s, a, fns[a->esz]);                                     \
+}
+
+DO_PPZW(CMPEQ, cmpeq)
+DO_PPZW(CMPNE, cmpne)
+DO_PPZW(CMPGT, cmpgt)
+DO_PPZW(CMPGE, cmpge)
+DO_PPZW(CMPHI, cmphi)
+DO_PPZW(CMPHS, cmphs)
+DO_PPZW(CMPLT, cmplt)
+DO_PPZW(CMPLE, cmple)
+DO_PPZW(CMPLO, cmplo)
+DO_PPZW(CMPLS, cmpls)
+
+#undef DO_PPZW
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 7ec84fdd80..deedc9163b 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -100,6 +100,7 @@
 @rdm_pg_rn	........ esz:2 ... ... ... pg:3 rn:5 rd:5 \
 		&rprr_esz rm=%reg_movprfx
 @rd_pg4_rn_rm	........ esz:2 . rm:5  .. pg:4  rn:5 rd:5	&rprr_esz
+@pd_pg_rn_rm	........ esz:2 . rm:5 ... pg:3 rn:5 . rd:4	&rprr_esz
 
 # Three register operand, with governing predicate, vector element size
 @rda_pg_rn_rm	........ esz:2 . rm:5  ... pg:3 rn:5 rd:5 \
@@ -473,6 +474,29 @@ SPLICE		00000101 .. 101 100 100 ... ..... .....		@rdn_pg_rm
 # SVE select vector elements (predicated)
 SEL_zpzz	00000101 .. 1 ..... 11 .... ..... .....		@rd_pg4_rn_rm
 
+### SVE Integer Compare - Vectors Group
+
+# SVE integer compare_vectors
+CMPHS_ppzz	00100100 .. 0 ..... 000 ... ..... 0 ....	@pd_pg_rn_rm
+CMPHI_ppzz	00100100 .. 0 ..... 000 ... ..... 1 ....	@pd_pg_rn_rm
+CMPGE_ppzz	00100100 .. 0 ..... 100 ... ..... 0 ....	@pd_pg_rn_rm
+CMPGT_ppzz	00100100 .. 0 ..... 100 ... ..... 1 ....	@pd_pg_rn_rm
+CMPEQ_ppzz	00100100 .. 0 ..... 101 ... ..... 0 ....	@pd_pg_rn_rm
+CMPNE_ppzz	00100100 .. 0 ..... 101 ... ..... 1 ....	@pd_pg_rn_rm
+
+# SVE integer compare with wide elements
+# Note these require esz != 3.
+CMPEQ_ppzw	00100100 .. 0 ..... 001 ... ..... 0 ....	@pd_pg_rn_rm
+CMPNE_ppzw	00100100 .. 0 ..... 001 ... ..... 1 ....	@pd_pg_rn_rm
+CMPGE_ppzw	00100100 .. 0 ..... 010 ... ..... 0 ....	@pd_pg_rn_rm
+CMPGT_ppzw	00100100 .. 0 ..... 010 ... ..... 1 ....	@pd_pg_rn_rm
+CMPLT_ppzw	00100100 .. 0 ..... 011 ... ..... 0 ....	@pd_pg_rn_rm
+CMPLE_ppzw	00100100 .. 0 ..... 011 ... ..... 1 ....	@pd_pg_rn_rm
+CMPHS_ppzw	00100100 .. 0 ..... 110 ... ..... 0 ....	@pd_pg_rn_rm
+CMPHI_ppzw	00100100 .. 0 ..... 110 ... ..... 1 ....	@pd_pg_rn_rm
+CMPLO_ppzw	00100100 .. 0 ..... 111 ... ..... 0 ....	@pd_pg_rn_rm
+CMPLS_ppzw	00100100 .. 0 ..... 111 ... ..... 1 ....	@pd_pg_rn_rm
+
 ### SVE Predicate Logical Operations Group
 
 # SVE predicate logical operations
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 37/67] target/arm: Implement SVE Integer Compare - Immediate Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (35 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 36/67] target/arm: Implement SVE Integer Compare - " Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 16:32   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 38/67] target/arm: Implement SVE Partition Break Group Richard Henderson
                   ` (31 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 44 +++++++++++++++++++++++
 target/arm/sve_helper.c    | 88 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 63 +++++++++++++++++++++++++++++++++
 target/arm/sve.decode      | 23 ++++++++++++
 4 files changed, 218 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 6ffd1fbe8e..ae38c0a4be 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -605,6 +605,50 @@ DEF_HELPER_FLAGS_5(sve_cmplo_ppzw_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(sve_cmpls_ppzw_s, TCG_CALL_NO_RWG,
                    i32, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(sve_cmpeq_ppzi_b, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmpne_ppzi_b, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmpgt_ppzi_b, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmpge_ppzi_b, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmplt_ppzi_b, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmple_ppzi_b, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmphs_ppzi_b, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmphi_ppzi_b, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmplo_ppzi_b, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmpls_ppzi_b, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_cmpeq_ppzi_h, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmpne_ppzi_h, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmpgt_ppzi_h, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmpge_ppzi_h, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmplt_ppzi_h, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmple_ppzi_h, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmphs_ppzi_h, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmphi_ppzi_h, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmplo_ppzi_h, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmpls_ppzi_h, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_cmpeq_ppzi_s, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmpne_ppzi_s, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmpgt_ppzi_s, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmpge_ppzi_s, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmplt_ppzi_s, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmple_ppzi_s, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmphs_ppzi_s, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmphi_ppzi_s, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmplo_ppzi_s, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmpls_ppzi_s, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_cmpeq_ppzi_d, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmpne_ppzi_d, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmpgt_ppzi_d, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmpge_ppzi_d, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmplt_ppzi_d, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmple_ppzi_d, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmphs_ppzi_d, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmphi_ppzi_d, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmplo_ppzi_d, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_cmpls_ppzi_d, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index ae433861f8..b74db681f2 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -2367,3 +2367,91 @@ DO_CMP_PPZW_S(sve_cmpls_ppzw_s, uint32_t, uint64_t, <=)
 #undef DO_CMP_PPZW_H
 #undef DO_CMP_PPZW_S
 #undef DO_CMP_PPZW
+
+/* Similar, but the second source is immediate.  */
+#define DO_CMP_PPZI(NAME, TYPE, OP, H, MASK)                         \
+uint32_t HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc)   \
+{                                                                    \
+    intptr_t opr_sz = simd_oprsz(desc);                              \
+    uint32_t flags = PREDTEST_INIT;                                  \
+    TYPE mm = simd_data(desc);                                       \
+    intptr_t i = opr_sz;                                             \
+    do {                                                             \
+        uint64_t out = 0, pg;                                        \
+        do {                                                         \
+            i -= sizeof(TYPE), out <<= sizeof(TYPE);                 \
+            TYPE nn = *(TYPE *)(vn + H(i));                          \
+            out |= nn OP mm;                                         \
+        } while (i & 63);                                            \
+        pg = *(uint64_t *)(vg + (i >> 3)) & MASK;                    \
+        out &= pg;                                                   \
+        *(uint64_t *)(vd + (i >> 3)) = out;                          \
+        flags = iter_predtest_bwd(out, pg, flags);                   \
+    } while (i > 0);                                                 \
+    return flags;                                                    \
+}
+
+#define DO_CMP_PPZI_B(NAME, TYPE, OP) \
+    DO_CMP_PPZI(NAME, TYPE, OP, H1,   0xffffffffffffffffull)
+#define DO_CMP_PPZI_H(NAME, TYPE, OP) \
+    DO_CMP_PPZI(NAME, TYPE, OP, H1_2, 0x5555555555555555ull)
+#define DO_CMP_PPZI_S(NAME, TYPE, OP) \
+    DO_CMP_PPZI(NAME, TYPE, OP, H1_4, 0x1111111111111111ull)
+#define DO_CMP_PPZI_D(NAME, TYPE, OP) \
+    DO_CMP_PPZI(NAME, TYPE, OP,     , 0x0101010101010101ull)
+
+DO_CMP_PPZI_B(sve_cmpeq_ppzi_b, uint8_t,  ==)
+DO_CMP_PPZI_H(sve_cmpeq_ppzi_h, uint16_t, ==)
+DO_CMP_PPZI_S(sve_cmpeq_ppzi_s, uint32_t, ==)
+DO_CMP_PPZI_D(sve_cmpeq_ppzi_d, uint64_t, ==)
+
+DO_CMP_PPZI_B(sve_cmpne_ppzi_b, uint8_t,  !=)
+DO_CMP_PPZI_H(sve_cmpne_ppzi_h, uint16_t, !=)
+DO_CMP_PPZI_S(sve_cmpne_ppzi_s, uint32_t, !=)
+DO_CMP_PPZI_D(sve_cmpne_ppzi_d, uint64_t, !=)
+
+DO_CMP_PPZI_B(sve_cmpgt_ppzi_b, int8_t,  >)
+DO_CMP_PPZI_H(sve_cmpgt_ppzi_h, int16_t, >)
+DO_CMP_PPZI_S(sve_cmpgt_ppzi_s, int32_t, >)
+DO_CMP_PPZI_D(sve_cmpgt_ppzi_d, int64_t, >)
+
+DO_CMP_PPZI_B(sve_cmpge_ppzi_b, int8_t,  >=)
+DO_CMP_PPZI_H(sve_cmpge_ppzi_h, int16_t, >=)
+DO_CMP_PPZI_S(sve_cmpge_ppzi_s, int32_t, >=)
+DO_CMP_PPZI_D(sve_cmpge_ppzi_d, int64_t, >=)
+
+DO_CMP_PPZI_B(sve_cmphi_ppzi_b, uint8_t,  >)
+DO_CMP_PPZI_H(sve_cmphi_ppzi_h, uint16_t, >)
+DO_CMP_PPZI_S(sve_cmphi_ppzi_s, uint32_t, >)
+DO_CMP_PPZI_D(sve_cmphi_ppzi_d, uint64_t, >)
+
+DO_CMP_PPZI_B(sve_cmphs_ppzi_b, uint8_t,  >=)
+DO_CMP_PPZI_H(sve_cmphs_ppzi_h, uint16_t, >=)
+DO_CMP_PPZI_S(sve_cmphs_ppzi_s, uint32_t, >=)
+DO_CMP_PPZI_D(sve_cmphs_ppzi_d, uint64_t, >=)
+
+DO_CMP_PPZI_B(sve_cmplt_ppzi_b, int8_t,  <)
+DO_CMP_PPZI_H(sve_cmplt_ppzi_h, int16_t, <)
+DO_CMP_PPZI_S(sve_cmplt_ppzi_s, int32_t, <)
+DO_CMP_PPZI_D(sve_cmplt_ppzi_d, int64_t, <)
+
+DO_CMP_PPZI_B(sve_cmple_ppzi_b, int8_t,  <=)
+DO_CMP_PPZI_H(sve_cmple_ppzi_h, int16_t, <=)
+DO_CMP_PPZI_S(sve_cmple_ppzi_s, int32_t, <=)
+DO_CMP_PPZI_D(sve_cmple_ppzi_d, int64_t, <=)
+
+DO_CMP_PPZI_B(sve_cmplo_ppzi_b, uint8_t,  <)
+DO_CMP_PPZI_H(sve_cmplo_ppzi_h, uint16_t, <)
+DO_CMP_PPZI_S(sve_cmplo_ppzi_s, uint32_t, <)
+DO_CMP_PPZI_D(sve_cmplo_ppzi_d, uint64_t, <)
+
+DO_CMP_PPZI_B(sve_cmpls_ppzi_b, uint8_t,  <=)
+DO_CMP_PPZI_H(sve_cmpls_ppzi_h, uint16_t, <=)
+DO_CMP_PPZI_S(sve_cmpls_ppzi_s, uint32_t, <=)
+DO_CMP_PPZI_D(sve_cmpls_ppzi_d, uint64_t, <=)
+
+#undef DO_CMP_PPZI_B
+#undef DO_CMP_PPZI_H
+#undef DO_CMP_PPZI_S
+#undef DO_CMP_PPZI_D
+#undef DO_CMP_PPZI
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index cb54777108..a7eeb122e3 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -39,6 +39,8 @@ typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t,
 typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
                         uint32_t, uint32_t, uint32_t);
 
+typedef void gen_helper_gvec_flags_3(TCGv_i32, TCGv_ptr, TCGv_ptr,
+                                     TCGv_ptr, TCGv_i32);
 typedef void gen_helper_gvec_flags_4(TCGv_i32, TCGv_ptr, TCGv_ptr,
                                      TCGv_ptr, TCGv_ptr, TCGv_i32);
 
@@ -2572,6 +2574,67 @@ DO_PPZW(CMPLS, cmpls)
 
 #undef DO_PPZW
 
+/*
+ *** SVE Integer Compare - Immediate Groups
+ */
+
+static void do_ppzi_flags(DisasContext *s, arg_rpri_esz *a,
+                          gen_helper_gvec_flags_3 *gen_fn)
+{
+    TCGv_ptr pd, zn, pg;
+    unsigned vsz;
+    TCGv_i32 t;
+
+    if (gen_fn == NULL) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    vsz = vec_full_reg_size(s);
+    t = tcg_const_i32(simd_desc(vsz, vsz, a->imm));
+    pd = tcg_temp_new_ptr();
+    zn = tcg_temp_new_ptr();
+    pg = tcg_temp_new_ptr();
+
+    tcg_gen_addi_ptr(pd, cpu_env, pred_full_reg_offset(s, a->rd));
+    tcg_gen_addi_ptr(zn, cpu_env, vec_full_reg_offset(s, a->rn));
+    tcg_gen_addi_ptr(pg, cpu_env, pred_full_reg_offset(s, a->pg));
+
+    gen_fn(t, pd, zn, pg, t);
+
+    tcg_temp_free_ptr(pd);
+    tcg_temp_free_ptr(zn);
+    tcg_temp_free_ptr(pg);
+
+    do_pred_flags(t);
+
+    tcg_temp_free_i32(t);
+}
+
+#define DO_PPZI(NAME, name) \
+static void trans_##NAME##_ppzi(DisasContext *s, arg_rpri_esz *a,         \
+                                uint32_t insn)                            \
+{                                                                         \
+    static gen_helper_gvec_flags_3 * const fns[4] = {                     \
+        gen_helper_sve_##name##_ppzi_b, gen_helper_sve_##name##_ppzi_h,   \
+        gen_helper_sve_##name##_ppzi_s, gen_helper_sve_##name##_ppzi_d,   \
+    };                                                                    \
+    do_ppzi_flags(s, a, fns[a->esz]);                                     \
+}
+
+DO_PPZI(CMPEQ, cmpeq)
+DO_PPZI(CMPNE, cmpne)
+DO_PPZI(CMPGT, cmpgt)
+DO_PPZI(CMPGE, cmpge)
+DO_PPZI(CMPHI, cmphi)
+DO_PPZI(CMPHS, cmphs)
+DO_PPZI(CMPLT, cmplt)
+DO_PPZI(CMPLE, cmple)
+DO_PPZI(CMPLO, cmplo)
+DO_PPZI(CMPLS, cmpls)
+
+#undef DO_PPZI
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index deedc9163b..0e317d7d48 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -132,6 +132,11 @@
 @rdn_dbm	........ .. .... dbm:13 rd:5 \
 		&rr_dbm rn=%reg_movprfx
 
+# Predicate output, vector and immediate input,
+# controlling predicate, element size.
+@pd_pg_rn_i7	........ esz:2 . imm:7 . pg:3 rn:5 . rd:4	&rpri_esz
+@pd_pg_rn_i5	........ esz:2 . imm:s5 ... pg:3 rn:5 . rd:4	&rpri_esz
+
 # Basic Load/Store with 9-bit immediate offset
 @pd_rn_i9	........ ........ ...... rn:5 . rd:4	\
 		&rri imm=%imm9_16_10
@@ -497,6 +502,24 @@ CMPHI_ppzw	00100100 .. 0 ..... 110 ... ..... 1 ....	@pd_pg_rn_rm
 CMPLO_ppzw	00100100 .. 0 ..... 111 ... ..... 0 ....	@pd_pg_rn_rm
 CMPLS_ppzw	00100100 .. 0 ..... 111 ... ..... 1 ....	@pd_pg_rn_rm
 
+### SVE Integer Compare - Unsigned Immediate Group
+
+# SVE integer compare with unsigned immediate
+CMPHS_ppzi	00100100 .. 1 ....... 0 ... ..... 0 ....      @pd_pg_rn_i7
+CMPHI_ppzi	00100100 .. 1 ....... 0 ... ..... 1 ....      @pd_pg_rn_i7
+CMPLO_ppzi	00100100 .. 1 ....... 1 ... ..... 0 ....      @pd_pg_rn_i7
+CMPLS_ppzi	00100100 .. 1 ....... 1 ... ..... 1 ....      @pd_pg_rn_i7
+
+### SVE Integer Compare - Signed Immediate Group
+
+# SVE integer compare with signed immediate
+CMPGE_ppzi	00100101 .. 0 ..... 000 ... ..... 0 ....      @pd_pg_rn_i5
+CMPGT_ppzi	00100101 .. 0 ..... 000 ... ..... 1 ....      @pd_pg_rn_i5
+CMPLT_ppzi	00100101 .. 0 ..... 001 ... ..... 0 ....      @pd_pg_rn_i5
+CMPLE_ppzi	00100101 .. 0 ..... 001 ... ..... 1 ....      @pd_pg_rn_i5
+CMPEQ_ppzi	00100101 .. 0 ..... 100 ... ..... 0 ....      @pd_pg_rn_i5
+CMPNE_ppzi	00100101 .. 0 ..... 100 ... ..... 1 ....      @pd_pg_rn_i5
+
 ### SVE Predicate Logical Operations Group
 
 # SVE predicate logical operations
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 38/67] target/arm: Implement SVE Partition Break Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (36 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 37/67] target/arm: Implement SVE Integer Compare - Immediate Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 16:41   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 39/67] target/arm: Implement SVE Predicate Count Group Richard Henderson
                   ` (30 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  18 ++++
 target/arm/sve_helper.c    | 247 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c |  96 ++++++++++++++++++
 target/arm/sve.decode      |  19 ++++
 4 files changed, 380 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index ae38c0a4be..f0a3ed3414 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -658,3 +658,21 @@ DEF_HELPER_FLAGS_5(sve_orn_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_nor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_nand_pppp, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_brkpa, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_brkpb, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_brkpas, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_brkpbs, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_brka_z, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_brkb_z, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_brka_m, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_brkb_m, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_brkas_z, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_brkbs_z, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_brkas_m, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_brkbs_m, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_brkn, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_brkns, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index b74db681f2..d6d2220f8b 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -2455,3 +2455,250 @@ DO_CMP_PPZI_D(sve_cmpls_ppzi_d, uint64_t, <=)
 #undef DO_CMP_PPZI_S
 #undef DO_CMP_PPZI_D
 #undef DO_CMP_PPZI
+
+/* Similar to the ARM LastActive pseudocode function.  */
+static bool last_active_pred(void *vd, void *vg, intptr_t oprsz)
+{
+    intptr_t i;
+
+    for (i = QEMU_ALIGN_UP(oprsz, 8) - 8; i >= 0; i -= 8) {
+        uint64_t pg = *(uint64_t *)(vg + i);
+        if (pg) {
+            return (pow2floor(pg) & *(uint64_t *)(vd + i)) != 0;
+        }
+    }
+    return 0;
+}
+
+/* Compute a mask into RETB that is true for all G, up to and including
+ * (if after) or excluding (if !after) the first G & N.
+ * Return true if BRK found.
+ */
+static bool compute_brk(uint64_t *retb, uint64_t n, uint64_t g,
+                        bool brk, bool after)
+{
+    uint64_t b;
+
+    if (brk) {
+        b = 0;
+    } else if ((g & n) == 0) {
+        /* For all G, no N are set; break not found.  */
+        b = g;
+    } else {
+        /* Break somewhere in N.  Locate it.  */
+        b = g & n;            /* guard true, pred true*/
+        b = b & -b;           /* first such */
+        if (after) {
+            b = b | (b - 1);  /* break after same */
+        } else {
+            b = b - 1;        /* break before same */
+        }
+        brk = true;
+    }
+
+    *retb = b;
+    return brk;
+}
+
+/* Compute a zeroing BRK.  */
+static void compute_brk_z(uint64_t *d, uint64_t *n, uint64_t *g,
+                          intptr_t oprsz, bool after)
+{
+    bool brk = false;
+    intptr_t i;
+
+    for (i = 0; i < DIV_ROUND_UP(oprsz, 8); ++i) {
+        uint64_t this_b, this_g = g[i];
+
+        brk = compute_brk(&this_b, n[i], this_g, brk, after);
+        d[i] = this_b & this_g;
+    }
+}
+
+/* Likewise, but also compute flags.  */
+static uint32_t compute_brks_z(uint64_t *d, uint64_t *n, uint64_t *g,
+                               intptr_t oprsz, bool after)
+{
+    uint32_t flags = PREDTEST_INIT;
+    bool brk = false;
+    intptr_t i;
+
+    for (i = 0; i < DIV_ROUND_UP(oprsz, 8); ++i) {
+        uint64_t this_b, this_d, this_g = g[i];
+
+        brk = compute_brk(&this_b, n[i], this_g, brk, after);
+        d[i] = this_d = this_b & this_g;
+        flags = iter_predtest_fwd(this_d, this_g, flags);
+    }
+    return flags;
+}
+
+/* Given a computation function, compute a merging BRK.  */
+static void compute_brk_m(uint64_t *d, uint64_t *n, uint64_t *g,
+                          intptr_t oprsz, bool after)
+{
+    bool brk = false;
+    intptr_t i;
+
+    for (i = 0; i < DIV_ROUND_UP(oprsz, 8); ++i) {
+        uint64_t this_b, this_g = g[i];
+
+        brk = compute_brk(&this_b, n[i], this_g, brk, after);
+        d[i] = (this_b & this_g) | (d[i] & ~this_g);
+    }
+}
+
+/* Likewise, but also compute flags.  */
+static uint32_t compute_brks_m(uint64_t *d, uint64_t *n, uint64_t *g,
+                               intptr_t oprsz, bool after)
+{
+    uint32_t flags = PREDTEST_INIT;
+    bool brk = false;
+    intptr_t i;
+
+    for (i = 0; i < oprsz / 8; ++i) {
+        uint64_t this_b, this_d = d[i], this_g = g[i];
+
+        brk = compute_brk(&this_b, n[i], this_g, brk, after);
+        d[i] = this_d = (this_b & this_g) | (this_d & ~this_g);
+        flags = iter_predtest_fwd(this_d, this_g, flags);
+    }
+    return flags;
+}
+
+static uint32_t do_zero(ARMPredicateReg *d, intptr_t oprsz)
+{
+    /* It is quicker to zero the whole predicate than loop on OPRSZ.
+       The compiler should turn this into 4 64-bit integer stores.  */
+    memset(d, 0, sizeof(ARMPredicateReg));
+    return PREDTEST_INIT;
+}
+
+void HELPER(sve_brkpa)(void *vd, void *vn, void *vm, void *vg,
+                       uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+    if (last_active_pred(vn, vg, oprsz)) {
+        compute_brk_z(vd, vm, vg, oprsz, true);
+    } else {
+        do_zero(vd, oprsz);
+    }
+}
+
+uint32_t HELPER(sve_brkpas)(void *vd, void *vn, void *vm, void *vg,
+                            uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+    if (last_active_pred(vn, vg, oprsz)) {
+        return compute_brks_z(vd, vm, vg, oprsz, true);
+    } else {
+        return do_zero(vd, oprsz);
+    }
+}
+
+void HELPER(sve_brkpb)(void *vd, void *vn, void *vm, void *vg,
+                       uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+    if (last_active_pred(vn, vg, oprsz)) {
+        compute_brk_z(vd, vm, vg, oprsz, false);
+    } else {
+        do_zero(vd, oprsz);
+    }
+}
+
+uint32_t HELPER(sve_brkpbs)(void *vd, void *vn, void *vm, void *vg,
+                            uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+    if (last_active_pred(vn, vg, oprsz)) {
+        return compute_brks_z(vd, vm, vg, oprsz, false);
+    } else {
+        return do_zero(vd, oprsz);
+    }
+}
+
+void HELPER(sve_brka_z)(void *vd, void *vn, void *vg, uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+    compute_brk_z(vd, vn, vg, oprsz, true);
+}
+
+uint32_t HELPER(sve_brkas_z)(void *vd, void *vn, void *vg, uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+    return compute_brks_z(vd, vn, vg, oprsz, true);
+}
+
+void HELPER(sve_brkb_z)(void *vd, void *vn, void *vg, uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+    compute_brk_z(vd, vn, vg, oprsz, false);
+}
+
+uint32_t HELPER(sve_brkbs_z)(void *vd, void *vn, void *vg, uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+    return compute_brks_z(vd, vn, vg, oprsz, false);
+}
+
+void HELPER(sve_brka_m)(void *vd, void *vn, void *vg, uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+    compute_brk_m(vd, vn, vg, oprsz, true);
+}
+
+uint32_t HELPER(sve_brkas_m)(void *vd, void *vn, void *vg, uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+    return compute_brks_m(vd, vn, vg, oprsz, true);
+}
+
+void HELPER(sve_brkb_m)(void *vd, void *vn, void *vg, uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+    compute_brk_m(vd, vn, vg, oprsz, false);
+}
+
+uint32_t HELPER(sve_brkbs_m)(void *vd, void *vn, void *vg, uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+    return compute_brks_m(vd, vn, vg, oprsz, false);
+}
+
+void HELPER(sve_brkn)(void *vd, void *vn, void *vg, uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+
+    if (!last_active_pred(vn, vg, oprsz)) {
+        do_zero(vd, oprsz);
+    }
+}
+
+/* As if PredTest(Ones(PL), D, esz).  */
+static uint32_t predtest_ones(ARMPredicateReg *d, intptr_t oprsz,
+                              uint64_t esz_mask)
+{
+    uint32_t flags = PREDTEST_INIT;
+    intptr_t i;
+
+    for (i = 0; i < oprsz / 8; i++) {
+        flags = iter_predtest_fwd(d->p[i], esz_mask, flags);
+    }
+    if (oprsz & 7) {
+        uint64_t mask = ~(-1ULL << (8 * (oprsz & 7)));
+        flags = iter_predtest_fwd(d->p[i], esz_mask & mask, flags);
+    }
+    return flags;
+}
+
+uint32_t HELPER(sve_brkns)(void *vd, void *vn, void *vg, uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+
+    if (last_active_pred(vn, vg, oprsz)) {
+        return predtest_ones(vd, oprsz, -1);
+    } else {
+        return do_zero(vd, oprsz);
+    }
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index a7eeb122e3..dc95d68867 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -2635,6 +2635,102 @@ DO_PPZI(CMPLS, cmpls)
 
 #undef DO_PPZI
 
+/*
+ *** SVE Partition Break Group
+ */
+
+static void do_brk3(DisasContext *s, arg_rprr_s *a,
+                    gen_helper_gvec_4 *fn, gen_helper_gvec_flags_4 *fn_s)
+{
+    unsigned vsz = pred_full_reg_size(s);
+
+    /* Predicate sizes may be smaller and cannot use simd_desc.  */
+    TCGv_ptr d = tcg_temp_new_ptr();
+    TCGv_ptr n = tcg_temp_new_ptr();
+    TCGv_ptr m = tcg_temp_new_ptr();
+    TCGv_ptr g = tcg_temp_new_ptr();
+    TCGv_i32 t = tcg_const_i32(vsz - 2);
+
+    tcg_gen_addi_ptr(d, cpu_env, pred_full_reg_offset(s, a->rd));
+    tcg_gen_addi_ptr(n, cpu_env, pred_full_reg_offset(s, a->rn));
+    tcg_gen_addi_ptr(m, cpu_env, pred_full_reg_offset(s, a->rm));
+    tcg_gen_addi_ptr(g, cpu_env, pred_full_reg_offset(s, a->pg));
+
+    if (a->s) {
+        fn_s(t, d, n, m, g, t);
+        do_pred_flags(t);
+    } else {
+        fn(d, n, m, g, t);
+    }
+    tcg_temp_free_ptr(d);
+    tcg_temp_free_ptr(n);
+    tcg_temp_free_ptr(m);
+    tcg_temp_free_ptr(g);
+    tcg_temp_free_i32(t);
+}
+
+static void do_brk2(DisasContext *s, arg_rpr_s *a,
+                    gen_helper_gvec_3 *fn, gen_helper_gvec_flags_3 *fn_s)
+{
+    unsigned vsz = pred_full_reg_size(s);
+
+    /* Predicate sizes may be smaller and cannot use simd_desc.  */
+    TCGv_ptr d = tcg_temp_new_ptr();
+    TCGv_ptr n = tcg_temp_new_ptr();
+    TCGv_ptr g = tcg_temp_new_ptr();
+    TCGv_i32 t = tcg_const_i32(vsz - 2);
+
+    tcg_gen_addi_ptr(d, cpu_env, pred_full_reg_offset(s, a->rd));
+    tcg_gen_addi_ptr(n, cpu_env, pred_full_reg_offset(s, a->rn));
+    tcg_gen_addi_ptr(g, cpu_env, pred_full_reg_offset(s, a->pg));
+
+    if (a->s) {
+        fn_s(t, d, n, g, t);
+        do_pred_flags(t);
+    } else {
+        fn(d, n, g, t);
+    }
+    tcg_temp_free_ptr(d);
+    tcg_temp_free_ptr(n);
+    tcg_temp_free_ptr(g);
+    tcg_temp_free_i32(t);
+}
+
+void trans_BRKPA(DisasContext *s, arg_rprr_s *a, uint32_t insn)
+{
+    do_brk3(s, a, gen_helper_sve_brkpa, gen_helper_sve_brkpas);
+}
+
+void trans_BRKPB(DisasContext *s, arg_rprr_s *a, uint32_t insn)
+{
+    do_brk3(s, a, gen_helper_sve_brkpb, gen_helper_sve_brkpbs);
+}
+
+void trans_BRKA_m(DisasContext *s, arg_rpr_s *a, uint32_t insn)
+{
+    do_brk2(s, a, gen_helper_sve_brka_m, gen_helper_sve_brkas_m);
+}
+
+void trans_BRKB_m(DisasContext *s, arg_rpr_s *a, uint32_t insn)
+{
+    do_brk2(s, a, gen_helper_sve_brkb_m, gen_helper_sve_brkbs_m);
+}
+
+void trans_BRKA_z(DisasContext *s, arg_rpr_s *a, uint32_t insn)
+{
+    do_brk2(s, a, gen_helper_sve_brka_z, gen_helper_sve_brkas_z);
+}
+
+void trans_BRKB_z(DisasContext *s, arg_rpr_s *a, uint32_t insn)
+{
+    do_brk2(s, a, gen_helper_sve_brkb_z, gen_helper_sve_brkbs_z);
+}
+
+void trans_BRKN(DisasContext *s, arg_rpr_s *a, uint32_t insn)
+{
+    do_brk2(s, a, gen_helper_sve_brkn, gen_helper_sve_brkns);
+}
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 0e317d7d48..1c19129e55 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -60,6 +60,7 @@
 &rri_esz	rd rn imm esz
 &rrr_esz	rd rn rm esz
 &rpr_esz	rd pg rn esz
+&rpr_s		rd pg rn s
 &rprr_s		rd pg rn rm s
 &rprr_esz	rd pg rn rm esz
 &rprrr_esz	rd pg rn rm ra esz
@@ -79,6 +80,9 @@
 @pd_pn		........ esz:2 .. .... ....... rn:4 . rd:4	&rr_esz
 @rd_rn		........ esz:2 ...... ...... rn:5 rd:5		&rr_esz
 
+# Two operand with governing predicate, flags setting
+@pd_pg_pn_s	........ . s:1 ...... .. pg:4 . rn:4 . rd:4	&rpr_s
+
 # Three operand with unused vector element size
 @rd_rn_rm_e0	........ ... rm:5 ... ... rn:5 rd:5		&rrr_esz esz=0
 
@@ -568,6 +572,21 @@ PFIRST		00100101 01 011 000 11000 00 .... 0 ....	@pd_pn_e0
 # SVE predicate next active
 PNEXT		00100101 .. 011 001 11000 10 .... 0 ....	@pd_pn
 
+### SVE Partition Break Group
+
+# SVE propagate break from previous partition
+BRKPA		00100101 0. 00 .... 11 .... 0 .... 0 ....	@pd_pg_pn_pm_s
+BRKPB		00100101 0. 00 .... 11 .... 0 .... 1 ....	@pd_pg_pn_pm_s
+
+# SVE partition break condition
+BRKA_z		00100101 0. 01000001 .... 0 .... 0 ....		@pd_pg_pn_s
+BRKB_z		00100101 1. 01000001 .... 0 .... 0 ....		@pd_pg_pn_s
+BRKA_m		00100101 0. 01000001 .... 0 .... 1 ....		@pd_pg_pn_s
+BRKB_m		00100101 1. 01000001 .... 0 .... 1 ....		@pd_pg_pn_s
+
+# SVE propagate break to next partition
+BRKN		00100101 0. 01100001 .... 0 .... 0 ....		@pd_pg_pn_s
+
 ### SVE Memory - 32-bit Gather and Unsized Contiguous Group
 
 # SVE load predicate register
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 39/67] target/arm: Implement SVE Predicate Count Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (37 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 38/67] target/arm: Implement SVE Partition Break Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 16:48   ` Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 40/67] target/arm: Implement SVE Integer Compare - Scalars Group Richard Henderson
                   ` (29 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |   2 +
 target/arm/sve_helper.c    |  14 ++++++
 target/arm/translate-sve.c | 116 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      |  27 +++++++++++
 4 files changed, 159 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index f0a3ed3414..dd4f8f754d 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -676,3 +676,5 @@ DEF_HELPER_FLAGS_4(sve_brkbs_m, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(sve_brkn, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_brkns, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_cntp, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index d6d2220f8b..dd884bdd1c 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -2702,3 +2702,17 @@ uint32_t HELPER(sve_brkns)(void *vd, void *vn, void *vg, uint32_t pred_desc)
         return do_zero(vd, oprsz);
     }
 }
+
+uint64_t HELPER(sve_cntp)(void *vn, void *vg, uint32_t pred_desc)
+{
+    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+    intptr_t esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2);
+    uint64_t *n = vn, *g = vg, sum = 0, mask = pred_esz_masks[esz];
+    intptr_t i;
+
+    for (i = 0; i < DIV_ROUND_UP(oprsz, 8); ++i) {
+        uint64_t t = n[i] & g[i] & mask;
+        sum += ctpop64(t);
+    }
+    return sum;
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index dc95d68867..038800cc86 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -36,6 +36,8 @@
 typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
 typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t,
                          int64_t, uint32_t, uint32_t);
+typedef void GVecGen2sFn(unsigned, uint32_t, uint32_t,
+                         TCGv_i64, uint32_t, uint32_t);
 typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
                         uint32_t, uint32_t, uint32_t);
 
@@ -2731,6 +2733,120 @@ void trans_BRKN(DisasContext *s, arg_rpr_s *a, uint32_t insn)
     do_brk2(s, a, gen_helper_sve_brkn, gen_helper_sve_brkns);
 }
 
+/*
+ *** SVE Predicate Count Group
+ */
+
+static void do_cntp(DisasContext *s, TCGv_i64 val, int esz, int pn, int pg)
+{
+    unsigned psz = pred_full_reg_size(s);
+
+    if (psz <= 8) {
+        uint64_t psz_mask;
+
+        tcg_gen_ld_i64(val, cpu_env, pred_full_reg_offset(s, pn));
+        if (pn != pg) {
+            TCGv_i64 g = tcg_temp_new_i64();
+            tcg_gen_ld_i64(g, cpu_env, pred_full_reg_offset(s, pg));
+            tcg_gen_and_i64(val, val, g);
+            tcg_temp_free_i64(g);
+        }
+
+        /* Reduce the pred_esz_masks value simply to reduce the
+           size of the code generated here.  */
+        psz_mask = deposit64(0, 0, psz * 8, -1);
+        tcg_gen_andi_i64(val, val, pred_esz_masks[esz] & psz_mask);
+
+        tcg_gen_ctpop_i64(val, val);
+    } else {
+        TCGv_ptr t_pn = tcg_temp_new_ptr();
+        TCGv_ptr t_pg = tcg_temp_new_ptr();
+        unsigned desc;
+        TCGv_i32 t_desc;
+
+        desc = psz - 2;
+        desc = deposit32(desc, SIMD_DATA_SHIFT, 2, esz);
+
+        tcg_gen_addi_ptr(t_pn, cpu_env, pred_full_reg_offset(s, pn));
+        tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg));
+        t_desc = tcg_const_i32(desc);
+
+        gen_helper_sve_cntp(val, t_pn, t_pg, t_desc);
+        tcg_temp_free_ptr(t_pn);
+        tcg_temp_free_ptr(t_pg);
+        tcg_temp_free_i32(t_desc);
+    }
+}
+
+static void trans_CNTP(DisasContext *s, arg_CNTP *a, uint32_t insn)
+{
+    do_cntp(s, cpu_reg(s, a->rd), a->esz, a->rn, a->pg);
+}
+
+static void trans_INCDECP_r(DisasContext *s, arg_incdec_pred *a,
+                            uint32_t insn)
+{
+    TCGv_i64 reg = cpu_reg(s, a->rd);
+    TCGv_i64 val = tcg_temp_new_i64();
+
+    do_cntp(s, val, a->esz, a->pg, a->pg);
+    if (a->d) {
+        tcg_gen_sub_i64(reg, reg, val);
+    } else {
+        tcg_gen_add_i64(reg, reg, val);
+    }
+    tcg_temp_free_i64(val);
+}
+
+static void trans_INCDECP_z(DisasContext *s, arg_incdec2_pred *a,
+                            uint32_t insn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_i64 val = tcg_temp_new_i64();
+    GVecGen2sFn *gvec_fn = a->d ? tcg_gen_gvec_subs : tcg_gen_gvec_adds;
+
+    if (a->esz == 0) {
+        unallocated_encoding(s);
+        return;
+    }
+    do_cntp(s, val, a->esz, a->pg, a->pg);
+    gvec_fn(a->esz, vec_full_reg_offset(s, a->rd),
+            vec_full_reg_offset(s, a->rn), val, vsz, vsz);
+}
+
+static void trans_SINCDECP_r_32(DisasContext *s, arg_incdec_pred *a,
+                                uint32_t insn)
+{
+    TCGv_i64 reg = cpu_reg(s, a->rd);
+    TCGv_i64 val = tcg_temp_new_i64();
+
+    do_cntp(s, val, a->esz, a->pg, a->pg);
+    do_sat_addsub_32(reg, val, a->u, a->d);
+}
+
+static void trans_SINCDECP_r_64(DisasContext *s, arg_incdec_pred *a,
+                                uint32_t insn)
+{
+    TCGv_i64 reg = cpu_reg(s, a->rd);
+    TCGv_i64 val = tcg_temp_new_i64();
+
+    do_cntp(s, val, a->esz, a->pg, a->pg);
+    do_sat_addsub_64(reg, val, a->u, a->d);
+}
+
+static void trans_SINCDECP_z(DisasContext *s, arg_incdec2_pred *a,
+                             uint32_t insn)
+{
+    TCGv_i64 val = tcg_temp_new_i64();
+
+    if (a->esz == 0) {
+        unallocated_encoding(s);
+        return;
+    }
+    do_cntp(s, val, a->esz, a->pg, a->pg);
+    do_sat_addsub_vec(s, a->esz, a->rd, a->rn, val, a->u, a->d);
+}
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 1c19129e55..76c084d43e 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -68,6 +68,8 @@
 &ptrue		rd esz pat s
 &incdec_cnt	rd pat esz imm d u
 &incdec2_cnt	rd rn pat esz imm d u
+&incdec_pred	rd pg esz d u
+&incdec2_pred	rd rn pg esz d u
 
 ###########################################################################
 # Named instruction formats.  These are generally used to
@@ -114,6 +116,7 @@
 
 # One register operand, with governing predicate, vector element size
 @rd_pg_rn	........ esz:2 ... ... ... pg:3 rn:5 rd:5	&rpr_esz
+@rd_pg4_pn	........ esz:2 ... ... .. pg:4 . rn:4 rd:5	&rpr_esz
 
 # Two register operands with a 6-bit signed immediate.
 @rd_rn_i6	........ ... rn:5 ..... imm:s6 rd:5		&rri
@@ -154,6 +157,12 @@
 @incdec2_cnt	........ esz:2 .. .... ...... pat:5 rd:5 \
 		&incdec2_cnt imm=%imm4_16_p1 rn=%reg_movprfx
 
+# One register, predicate.
+# User must fill in U and D.
+@incdec_pred	........ esz:2 .... .. ..... .. pg:4 rd:5	&incdec_pred
+@incdec2_pred	........ esz:2 .... .. ..... .. pg:4 rd:5 \
+		&incdec2_pred rn=%reg_movprfx
+
 ###########################################################################
 # Instruction patterns.  Grouped according to the SVE encodingindex.xhtml.
 
@@ -587,6 +596,24 @@ BRKB_m		00100101 1. 01000001 .... 0 .... 1 ....		@pd_pg_pn_s
 # SVE propagate break to next partition
 BRKN		00100101 0. 01100001 .... 0 .... 0 ....		@pd_pg_pn_s
 
+### SVE Predicate Count Group
+
+# SVE predicate count
+CNTP		00100101 .. 100 000 10 .... 0 .... .....	@rd_pg4_pn
+
+# SVE inc/dec register by predicate count
+INCDECP_r	00100101 .. 10110 d:1 10001 00 .... .....     @incdec_pred u=1
+
+# SVE inc/dec vector by predicate count
+INCDECP_z	00100101 .. 10110 d:1 10000 00 .... .....    @incdec2_pred u=1
+
+# SVE saturating inc/dec register by predicate count
+SINCDECP_r_32	00100101 .. 1010 d:1 u:1 10001 00 .... .....	@incdec_pred
+SINCDECP_r_64	00100101 .. 1010 d:1 u:1 10001 10 .... .....	@incdec_pred
+
+# SVE saturating inc/dec vector by predicate count
+SINCDECP_z	00100101 .. 1010 d:1 u:1 10000 00 .... .....	@incdec2_pred
+
 ### SVE Memory - 32-bit Gather and Unsized Contiguous Group
 
 # SVE load predicate register
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 40/67] target/arm: Implement SVE Integer Compare - Scalars Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (38 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 39/67] target/arm: Implement SVE Predicate Count Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 17:00   ` Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 41/67] target/arm: Implement FDUP/DUP Richard Henderson
                   ` (28 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  2 +
 target/arm/sve_helper.c    | 31 ++++++++++++++++
 target/arm/translate-sve.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      |  8 ++++
 4 files changed, 133 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index dd4f8f754d..1863106d0f 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -678,3 +678,5 @@ DEF_HELPER_FLAGS_4(sve_brkn, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_brkns, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_3(sve_cntp, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_while, TCG_CALL_NO_RWG, i32, ptr, i32, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index dd884bdd1c..80b78da834 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -2716,3 +2716,34 @@ uint64_t HELPER(sve_cntp)(void *vn, void *vg, uint32_t pred_desc)
     }
     return sum;
 }
+
+uint32_t HELPER(sve_while)(void *vd, uint32_t count, uint32_t pred_desc)
+{
+    uintptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
+    intptr_t esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2);
+    uint64_t esz_mask = pred_esz_masks[esz];
+    ARMPredicateReg *d = vd;
+    uint32_t flags;
+    intptr_t i;
+
+    /* Begin with a zero predicate register.  */
+    flags = do_zero(d, oprsz);
+    if (count == 0) {
+        return flags;
+    }
+
+    /* Scale from predicate element count to bits.  */
+    count <<= esz;
+    /* Bound to the bits in the predicate.  */
+    count = MIN(count, oprsz * 8);
+
+    /* Set all of the requested bits.  */
+    for (i = 0; i < count / 64; ++i) {
+        d->p[i] = esz_mask;
+    }
+    if (count & 63) {
+        d->p[i] = ~(-1ull << (count & 63)) & esz_mask;
+    }
+
+    return predtest_ones(d, oprsz, esz_mask);
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 038800cc86..4b92a55c21 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -2847,6 +2847,98 @@ static void trans_SINCDECP_z(DisasContext *s, arg_incdec2_pred *a,
     do_sat_addsub_vec(s, a->esz, a->rd, a->rn, val, a->u, a->d);
 }
 
+/*
+ *** SVE Integer Compare Scalars Group
+ */
+
+static void trans_CTERM(DisasContext *s, arg_CTERM *a, uint32_t insn)
+{
+    TCGCond cond = (a->ne ? TCG_COND_NE : TCG_COND_EQ);
+    TCGv_i64 rn = read_cpu_reg(s, a->rn, a->sf);
+    TCGv_i64 rm = read_cpu_reg(s, a->rm, a->sf);
+    TCGv_i64 cmp = tcg_temp_new_i64();
+
+    tcg_gen_setcond_i64(cond, cmp, rn, rm);
+    tcg_gen_extrl_i64_i32(cpu_NF, cmp);
+    tcg_temp_free_i64(cmp);
+
+    /* VF = !NF & !CF.  */
+    tcg_gen_xori_i32(cpu_VF, cpu_NF, 1);
+    tcg_gen_andc_i32(cpu_VF, cpu_VF, cpu_CF);
+
+    /* Both NF and VF actually look at bit 31.  */
+    tcg_gen_neg_i32(cpu_NF, cpu_NF);
+    tcg_gen_neg_i32(cpu_VF, cpu_VF);
+}
+
+static void trans_WHILE(DisasContext *s, arg_WHILE *a, uint32_t insn)
+{
+    TCGv_i64 op0 = read_cpu_reg(s, a->rn, 1);
+    TCGv_i64 op1 = read_cpu_reg(s, a->rm, 1);
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i32 t2, t3;
+    TCGv_ptr ptr;
+    unsigned desc, vsz = vec_full_reg_size(s);
+    TCGCond cond;
+
+    if (!a->sf) {
+        if (a->u) {
+            tcg_gen_ext32u_i64(op0, op0);
+            tcg_gen_ext32u_i64(op1, op1);
+        } else {
+            tcg_gen_ext32s_i64(op0, op0);
+            tcg_gen_ext32s_i64(op1, op1);
+        }
+    }
+
+    /* For the helper, compress the different conditions into a computation
+     * of how many iterations for which the condition is true.
+     *
+     * This is slightly complicated by 0 <= UINT64_MAX, which is nominally
+     * 2**64 iterations, overflowing to 0.  Of course, predicate registers
+     * aren't that large, so any value >= predicate size is sufficient.
+     */
+    tcg_gen_sub_i64(t0, op1, op0);
+
+    /* t0 = MIN(op1 - op0, vsz).  */
+    if (a->eq) {
+        /* Equality means one more iteration.  */
+        tcg_gen_movi_i64(t1, vsz - 1);
+        tcg_gen_movcond_i64(TCG_COND_LTU, t0, t0, t1, t0, t1);
+        tcg_gen_addi_i64(t0, t0, 1);
+    } else {
+        tcg_gen_movi_i64(t1, vsz);
+        tcg_gen_movcond_i64(TCG_COND_LTU, t0, t0, t1, t0, t1);
+    }
+
+    /* t0 = (condition true ? t0 : 0).  */
+    cond = (a->u
+            ? (a->eq ? TCG_COND_LEU : TCG_COND_LTU)
+            : (a->eq ? TCG_COND_LE : TCG_COND_LT));
+    tcg_gen_movi_i64(t1, 0);
+    tcg_gen_movcond_i64(cond, t0, op0, op1, t0, t1);
+
+    t2 = tcg_temp_new_i32();
+    tcg_gen_extrl_i64_i32(t2, t0);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+
+    desc = (vsz / 8) - 2;
+    desc = deposit32(desc, SIMD_DATA_SHIFT, 2, a->esz);
+    t3 = tcg_const_i32(desc);
+
+    ptr = tcg_temp_new_ptr();
+    tcg_gen_addi_ptr(ptr, cpu_env, pred_full_reg_offset(s, a->rd));
+
+    gen_helper_sve_while(t2, ptr, t2, t3);
+    do_pred_flags(t2);
+
+    tcg_temp_free_ptr(ptr);
+    tcg_temp_free_i32(t2);
+    tcg_temp_free_i32(t3);
+}
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 76c084d43e..b5bc7e9546 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -614,6 +614,14 @@ SINCDECP_r_64	00100101 .. 1010 d:1 u:1 10001 10 .... .....	@incdec_pred
 # SVE saturating inc/dec vector by predicate count
 SINCDECP_z	00100101 .. 1010 d:1 u:1 10000 00 .... .....	@incdec2_pred
 
+### SVE Integer Compare - Scalars Group
+
+# SVE conditionally terminate scalars
+CTERM		00100101 1 sf:1 1 rm:5 001000 rn:5 ne:1 0000
+
+# SVE integer compare scalar count and limit
+WHILE		00100101 esz:2 1 rm:5 000 sf:1 u:1 1 rn:5 eq:1 rd:4
+
 ### SVE Memory - 32-bit Gather and Unsized Contiguous Group
 
 # SVE load predicate register
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 41/67] target/arm: Implement FDUP/DUP
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (39 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 40/67] target/arm: Implement SVE Integer Compare - Scalars Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 17:12   ` Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 42/67] target/arm: Implement SVE Integer Wide Immediate - Unpredicated Group Richard Henderson
                   ` (27 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 35 +++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      |  8 ++++++++
 2 files changed, 43 insertions(+)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 4b92a55c21..7571d02237 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -2939,6 +2939,41 @@ static void trans_WHILE(DisasContext *s, arg_WHILE *a, uint32_t insn)
     tcg_temp_free_i32(t3);
 }
 
+/*
+ *** SVE Integer Wide Immediate - Unpredicated Group
+ */
+
+static void trans_FDUP(DisasContext *s, arg_FDUP *a, uint32_t insn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    int dofs = vec_full_reg_offset(s, a->rd);
+    uint64_t imm;
+
+    if (a->esz == 0) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    /* Decode the VFP immediate.  */
+    imm = vfp_expand_imm(a->esz, a->imm);
+    imm = dup_const(a->esz, imm);
+
+    tcg_gen_gvec_dup64i(dofs, vsz, vsz, imm);
+}
+
+static void trans_DUP_i(DisasContext *s, arg_DUP_i *a, uint32_t insn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    int dofs = vec_full_reg_offset(s, a->rd);
+
+    if (a->esz == 0 && extract32(insn, 13, 1)) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    tcg_gen_gvec_dup64i(dofs, vsz, vsz, dup_const(a->esz, a->imm));
+}
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index b5bc7e9546..ea1bfe7579 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -622,6 +622,14 @@ CTERM		00100101 1 sf:1 1 rm:5 001000 rn:5 ne:1 0000
 # SVE integer compare scalar count and limit
 WHILE		00100101 esz:2 1 rm:5 000 sf:1 u:1 1 rn:5 eq:1 rd:4
 
+### SVE Integer Wide Immediate - Unpredicated Group
+
+# SVE broadcast floating-point immediate (unpredicated)
+FDUP		00100101 esz:2 111 00 1110 imm:8 rd:5
+
+# SVE broadcast integer immediate (unpredicated)
+DUP_i		00100101 esz:2 111 00 011 . ........ rd:5	imm=%sh8_i8s
+
 ### SVE Memory - 32-bit Gather and Unsized Contiguous Group
 
 # SVE load predicate register
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 42/67] target/arm: Implement SVE Integer Wide Immediate - Unpredicated Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (40 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 41/67] target/arm: Implement FDUP/DUP Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 17:18   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 43/67] target/arm: Implement SVE Floating Point Arithmetic " Richard Henderson
                   ` (26 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  25 +++++++++
 target/arm/sve_helper.c    |  41 ++++++++++++++
 target/arm/translate-sve.c | 135 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      |  26 +++++++++
 4 files changed, 227 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 1863106d0f..97bfe0f47b 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -680,3 +680,28 @@ DEF_HELPER_FLAGS_4(sve_brkns, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_cntp, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_3(sve_while, TCG_CALL_NO_RWG, i32, ptr, i32, i32)
+
+DEF_HELPER_FLAGS_4(sve_subri_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_subri_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_subri_s, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_subri_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(sve_smaxi_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_smaxi_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_smaxi_s, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_smaxi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(sve_smini_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_smini_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_smini_s, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_smini_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(sve_umaxi_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_umaxi_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_umaxi_s, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_umaxi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(sve_umini_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_umini_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_umini_s, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(sve_umini_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 80b78da834..4f45f11bff 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -803,6 +803,46 @@ DO_VPZ_D(sve_uminv_d, uint64_t, uint64_t, -1, DO_MIN)
 #undef DO_VPZ
 #undef DO_VPZ_D
 
+/* Two vector operand, one scalar operand, unpredicated.  */
+#define DO_ZZI(NAME, TYPE, OP)                                       \
+void HELPER(NAME)(void *vd, void *vn, uint64_t s64, uint32_t desc)   \
+{                                                                    \
+    intptr_t i, opr_sz = simd_oprsz(desc) / sizeof(TYPE);            \
+    TYPE s = s64, *d = vd, *n = vn;                                  \
+    for (i = 0; i < opr_sz; ++i) {                                   \
+        d[i] = OP(n[i], s);                                          \
+    }                                                                \
+}
+
+#define DO_SUBR(X, Y)   (Y - X)
+
+DO_ZZI(sve_subri_b, uint8_t, DO_SUBR)
+DO_ZZI(sve_subri_h, uint16_t, DO_SUBR)
+DO_ZZI(sve_subri_s, uint32_t, DO_SUBR)
+DO_ZZI(sve_subri_d, uint64_t, DO_SUBR)
+
+DO_ZZI(sve_smaxi_b, int8_t, DO_MAX)
+DO_ZZI(sve_smaxi_h, int16_t, DO_MAX)
+DO_ZZI(sve_smaxi_s, int32_t, DO_MAX)
+DO_ZZI(sve_smaxi_d, int64_t, DO_MAX)
+
+DO_ZZI(sve_smini_b, int8_t, DO_MIN)
+DO_ZZI(sve_smini_h, int16_t, DO_MIN)
+DO_ZZI(sve_smini_s, int32_t, DO_MIN)
+DO_ZZI(sve_smini_d, int64_t, DO_MIN)
+
+DO_ZZI(sve_umaxi_b, uint8_t, DO_MAX)
+DO_ZZI(sve_umaxi_h, uint16_t, DO_MAX)
+DO_ZZI(sve_umaxi_s, uint32_t, DO_MAX)
+DO_ZZI(sve_umaxi_d, uint64_t, DO_MAX)
+
+DO_ZZI(sve_umini_b, uint8_t, DO_MIN)
+DO_ZZI(sve_umini_h, uint16_t, DO_MIN)
+DO_ZZI(sve_umini_s, uint32_t, DO_MIN)
+DO_ZZI(sve_umini_d, uint64_t, DO_MIN)
+
+#undef DO_ZZI
+
 #undef DO_AND
 #undef DO_ORR
 #undef DO_EOR
@@ -817,6 +857,7 @@ DO_VPZ_D(sve_uminv_d, uint64_t, uint64_t, -1, DO_MIN)
 #undef DO_ASR
 #undef DO_LSR
 #undef DO_LSL
+#undef DO_SUBR
 
 /* Similar to the ARM LastActiveElement pseudocode function, except the
    result is multiplied by the element size.  This includes the not found
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 7571d02237..72abcb543a 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -81,6 +81,11 @@ static inline int expand_imm_sh8s(int x)
     return (int8_t)x << (x & 0x100 ? 8 : 0);
 }
 
+static inline int expand_imm_sh8u(int x)
+{
+    return (uint8_t)x << (x & 0x100 ? 8 : 0);
+}
+
 /*
  * Include the generated decoder.
  */
@@ -2974,6 +2979,136 @@ static void trans_DUP_i(DisasContext *s, arg_DUP_i *a, uint32_t insn)
     tcg_gen_gvec_dup64i(dofs, vsz, vsz, dup_const(a->esz, a->imm));
 }
 
+static void trans_ADD_zzi(DisasContext *s, arg_rri_esz *a, uint32_t insn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+
+    if (a->esz == 0 && extract32(insn, 13, 1)) {
+        unallocated_encoding(s);
+        return;
+    }
+    tcg_gen_gvec_addi(a->esz, vec_full_reg_offset(s, a->rd),
+                      vec_full_reg_offset(s, a->rn), a->imm, vsz, vsz);
+}
+
+static void trans_SUB_zzi(DisasContext *s, arg_rri_esz *a, uint32_t insn)
+{
+    a->imm = -a->imm;
+    trans_ADD_zzi(s, a, insn);
+}
+
+static void trans_SUBR_zzi(DisasContext *s, arg_rri_esz *a, uint32_t insn)
+{
+    static const GVecGen2s op[4] = {
+        { .fni8 = tcg_gen_vec_sub8_i64,
+          .fniv = tcg_gen_sub_vec,
+          .fno = gen_helper_sve_subri_b,
+          .opc = INDEX_op_sub_vec,
+          .vece = MO_8,
+          .scalar_first = true },
+        { .fni8 = tcg_gen_vec_sub16_i64,
+          .fniv = tcg_gen_sub_vec,
+          .fno = gen_helper_sve_subri_h,
+          .opc = INDEX_op_sub_vec,
+          .vece = MO_16,
+          .scalar_first = true },
+        { .fni4 = tcg_gen_sub_i32,
+          .fniv = tcg_gen_sub_vec,
+          .fno = gen_helper_sve_subri_s,
+          .opc = INDEX_op_sub_vec,
+          .vece = MO_32,
+          .scalar_first = true },
+        { .fni8 = tcg_gen_sub_i64,
+          .fniv = tcg_gen_sub_vec,
+          .fno = gen_helper_sve_subri_d,
+          .opc = INDEX_op_sub_vec,
+          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+          .vece = MO_64,
+          .scalar_first = true }
+    };
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_i64 c;
+
+    if (a->esz == 0 && extract32(insn, 13, 1)) {
+        unallocated_encoding(s);
+        return;
+    }
+    c = tcg_const_i64(a->imm);
+    tcg_gen_gvec_2s(vec_full_reg_offset(s, a->rd),
+                    vec_full_reg_offset(s, a->rn), vsz, vsz, c, &op[a->esz]);
+    tcg_temp_free_i64(c);
+}
+
+static void trans_MUL_zzi(DisasContext *s, arg_rri_esz *a, uint32_t insn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    tcg_gen_gvec_muli(a->esz, vec_full_reg_offset(s, a->rd),
+                      vec_full_reg_offset(s, a->rn), a->imm, vsz, vsz);
+}
+
+static void do_zzi_sat(DisasContext *s, arg_rri_esz *a, uint32_t insn,
+                       bool u, bool d)
+{
+    TCGv_i64 val;
+
+    if (a->esz == 0 && extract32(insn, 13, 1)) {
+        unallocated_encoding(s);
+        return;
+    }
+    val = tcg_const_i64(a->imm);
+    do_sat_addsub_vec(s, a->esz, a->rd, a->rn, val, u, d);
+    tcg_temp_free_i64(val);
+}
+
+static void trans_SQADD_zzi(DisasContext *s, arg_rri_esz *a, uint32_t insn)
+{
+    do_zzi_sat(s, a, insn, false, false);
+}
+
+static void trans_UQADD_zzi(DisasContext *s, arg_rri_esz *a, uint32_t insn)
+{
+    do_zzi_sat(s, a, insn, true, false);
+}
+
+static void trans_SQSUB_zzi(DisasContext *s, arg_rri_esz *a, uint32_t insn)
+{
+    do_zzi_sat(s, a, insn, false, true);
+}
+
+static void trans_UQSUB_zzi(DisasContext *s, arg_rri_esz *a, uint32_t insn)
+{
+    do_zzi_sat(s, a, insn, true, true);
+}
+
+static void do_zzi_ool(DisasContext *s, arg_rri_esz *a, gen_helper_gvec_2i *fn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_i64 c = tcg_const_i64(a->imm);
+
+    tcg_gen_gvec_2i_ool(vec_full_reg_offset(s, a->rd),
+                        vec_full_reg_offset(s, a->rn),
+                        c, vsz, vsz, 0, fn);
+    tcg_temp_free_i64(c);
+}
+
+#define DO_ZZI(NAME, name) \
+static void trans_##NAME##_zzi(DisasContext *s, arg_rri_esz *a,         \
+                               uint32_t insn)                           \
+{                                                                       \
+    static gen_helper_gvec_2i * const fns[4] = {                        \
+        gen_helper_sve_##name##i_b, gen_helper_sve_##name##i_h,         \
+        gen_helper_sve_##name##i_s, gen_helper_sve_##name##i_d,         \
+    };                                                                  \
+    do_zzi_ool(s, a, fns[a->esz]);                                      \
+}
+
+DO_ZZI(SMAX, smax)
+DO_ZZI(UMAX, umax)
+DO_ZZI(SMIN, smin)
+DO_ZZI(UMIN, umin)
+
+#undef DO_ZZI
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index ea1bfe7579..1ede152360 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -43,6 +43,8 @@
 
 # Signed 8-bit immediate, optionally shifted left by 8.
 %sh8_i8s		5:9 !function=expand_imm_sh8s
+# Unsigned 8-bit immediate, optionally shifted left by 8.
+%sh8_i8u		5:9 !function=expand_imm_sh8u
 
 # Either a copy of rd (at bit 0), or a different source
 # as propagated via the MOVPRFX instruction.
@@ -96,6 +98,12 @@
 @pd_pn_pm	........ esz:2 .. rm:4 ....... rn:4 . rd:4	&rrr_esz
 @rdn_rm		........ esz:2 ...... ...... rm:5 rd:5 \
 		&rrr_esz rn=%reg_movprfx
+@rdn_sh_i8u	........ esz:2 ...... ...... ..... rd:5 \
+		&rri_esz rn=%reg_movprfx imm=%sh8_i8u
+@rdn_i8u	........ esz:2 ...... ... imm:8 rd:5 \
+		&rri_esz rn=%reg_movprfx
+@rdn_i8s	........ esz:2 ...... ... imm:s8 rd:5 \
+		&rri_esz rn=%reg_movprfx
 
 # Three operand with "memory" size, aka immediate left shift
 @rd_rn_msz_rm	........ ... rm:5 .... imm:2 rn:5 rd:5		&rrri
@@ -630,6 +638,24 @@ FDUP		00100101 esz:2 111 00 1110 imm:8 rd:5
 # SVE broadcast integer immediate (unpredicated)
 DUP_i		00100101 esz:2 111 00 011 . ........ rd:5	imm=%sh8_i8s
 
+# SVE integer add/subtract immediate (unpredicated)
+ADD_zzi		00100101 .. 100 000 11 . ........ .....		@rdn_sh_i8u
+SUB_zzi		00100101 .. 100 001 11 . ........ .....		@rdn_sh_i8u
+SUBR_zzi	00100101 .. 100 011 11 . ........ .....		@rdn_sh_i8u
+SQADD_zzi	00100101 .. 100 100 11 . ........ .....		@rdn_sh_i8u
+UQADD_zzi	00100101 .. 100 101 11 . ........ .....		@rdn_sh_i8u
+SQSUB_zzi	00100101 .. 100 110 11 . ........ .....		@rdn_sh_i8u
+UQSUB_zzi	00100101 .. 100 111 11 . ........ .....		@rdn_sh_i8u
+
+# SVE integer min/max immediate (unpredicated)
+SMAX_zzi	00100101 .. 101 000 110 ........ .....		@rdn_i8s
+UMAX_zzi	00100101 .. 101 001 110 ........ .....		@rdn_i8u
+SMIN_zzi	00100101 .. 101 010 110 ........ .....		@rdn_i8s
+UMIN_zzi	00100101 .. 101 011 110 ........ .....		@rdn_i8u
+
+# SVE integer multiply immediate (unpredicated)
+MUL_zzi		00100101 .. 110 000 110 ........ .....		@rdn_i8s
+
 ### SVE Memory - 32-bit Gather and Unsized Contiguous Group
 
 # SVE load predicate register
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 43/67] target/arm: Implement SVE Floating Point Arithmetic - Unpredicated Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (41 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 42/67] target/arm: Implement SVE Integer Wide Immediate - Unpredicated Group Richard Henderson
@ 2018-02-17 18:22 ` Richard Henderson
  2018-02-23 17:25   ` Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 44/67] target/arm: Implement SVE Memory Contiguous Load Group Richard Henderson
                   ` (25 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 14 +++++++
 target/arm/helper.h        | 19 ++++++++++
 target/arm/translate-sve.c | 41 ++++++++++++++++++++
 target/arm/vec_helper.c    | 94 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/Makefile.objs   |  2 +-
 target/arm/sve.decode      | 10 +++++
 6 files changed, 179 insertions(+), 1 deletion(-)
 create mode 100644 target/arm/vec_helper.c

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 97bfe0f47b..2e76084992 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -705,3 +705,17 @@ DEF_HELPER_FLAGS_4(sve_umini_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(sve_umini_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(sve_umini_s, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(sve_umini_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_5(gvec_recps_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_recps_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_recps_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(gvec_rsqrts_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_rsqrts_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_rsqrts_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index be3c2fcdc0..f3ce58e276 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -565,6 +565,25 @@ DEF_HELPER_2(dc_zva, void, env, i64)
 DEF_HELPER_FLAGS_2(neon_pmull_64_lo, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(neon_pmull_64_hi, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 
+DEF_HELPER_FLAGS_5(gvec_fadd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fadd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fadd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(gvec_fsub_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fsub_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fsub_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(gvec_fmul_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fmul_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fmul_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_ftsmul_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 72abcb543a..f9a3ad1434 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3109,6 +3109,47 @@ DO_ZZI(UMIN, umin)
 
 #undef DO_ZZI
 
+/*
+ *** SVE Floating Point Arithmetic - Unpredicated Group
+ */
+
+static void do_zzz_fp(DisasContext *s, arg_rrr_esz *a,
+                      gen_helper_gvec_3_ptr *fn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_ptr status;
+
+    if (fn == NULL) {
+        unallocated_encoding(s);
+        return;
+    }
+    status = get_fpstatus_ptr(a->esz == MO_16);
+    tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       vec_full_reg_offset(s, a->rm),
+                       status, vsz, vsz, 0, fn);
+}
+
+
+#define DO_FP3(NAME, name) \
+static void trans_##NAME(DisasContext *s, arg_rrr_esz *a, uint32_t insn) \
+{                                                                   \
+    static gen_helper_gvec_3_ptr * const fns[4] = {                 \
+        NULL, gen_helper_gvec_##name##_h,                           \
+        gen_helper_gvec_##name##_s, gen_helper_gvec_##name##_d      \
+    };                                                              \
+    do_zzz_fp(s, a, fns[a->esz]);                                   \
+}
+
+DO_FP3(FADD_zzz, fadd)
+DO_FP3(FSUB_zzz, fsub)
+DO_FP3(FMUL_zzz, fmul)
+DO_FP3(FTSMUL, ftsmul)
+DO_FP3(FRECPS, recps)
+DO_FP3(FRSQRTS, rsqrts)
+
+#undef DO_FP3
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
new file mode 100644
index 0000000000..ad5c29cdd5
--- /dev/null
+++ b/target/arm/vec_helper.c
@@ -0,0 +1,94 @@
+/*
+ * ARM Shared AdvSIMD / SVE Operations
+ *
+ * Copyright (c) 2018 Linaro
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/helper-proto.h"
+#include "tcg/tcg-gvec-desc.h"
+#include "fpu/softfloat.h"
+
+
+/* Floating-point trigonometric starting value.
+ * See the ARM ARM pseudocode function FPTrigSMul.
+ */
+static float16 float16_ftsmul(float16 op1, uint16_t op2, float_status *stat)
+{
+    float16 result = float16_mul(op1, op1, stat);
+    if (!float16_is_any_nan(result)) {
+        result = float16_set_sign(result, op2 & 1);
+    }
+    return result;
+}
+
+static float32 float32_ftsmul(float32 op1, uint32_t op2, float_status *stat)
+{
+    float32 result = float32_mul(op1, op1, stat);
+    if (!float32_is_any_nan(result)) {
+        result = float32_set_sign(result, op2 & 1);
+    }
+    return result;
+}
+
+static float64 float64_ftsmul(float64 op1, uint64_t op2, float_status *stat)
+{
+    float64 result = float64_mul(op1, op1, stat);
+    if (!float64_is_any_nan(result)) {
+        result = float64_set_sign(result, op2 & 1);
+    }
+    return result;
+}
+
+#define DO_3OP(NAME, FUNC, TYPE) \
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
+{                                                                          \
+    intptr_t i, oprsz = simd_oprsz(desc);                                  \
+    TYPE *d = vd, *n = vn, *m = vm;                                        \
+    for (i = 0; i < oprsz / sizeof(TYPE); i++) {                           \
+        d[i] = FUNC(n[i], m[i], stat);                                     \
+    }                                                                      \
+}
+
+DO_3OP(gvec_fadd_h, float16_add, float16)
+DO_3OP(gvec_fadd_s, float32_add, float32)
+DO_3OP(gvec_fadd_d, float64_add, float64)
+
+DO_3OP(gvec_fsub_h, float16_sub, float16)
+DO_3OP(gvec_fsub_s, float32_sub, float32)
+DO_3OP(gvec_fsub_d, float64_sub, float64)
+
+DO_3OP(gvec_fmul_h, float16_mul, float16)
+DO_3OP(gvec_fmul_s, float32_mul, float32)
+DO_3OP(gvec_fmul_d, float64_mul, float64)
+
+DO_3OP(gvec_ftsmul_h, float16_ftsmul, float16)
+DO_3OP(gvec_ftsmul_s, float32_ftsmul, float32)
+DO_3OP(gvec_ftsmul_d, float64_ftsmul, float64)
+
+#ifdef TARGET_AARCH64
+
+DO_3OP(gvec_recps_h, helper_recpsf_f16, float16)
+DO_3OP(gvec_recps_s, helper_recpsf_f32, float32)
+DO_3OP(gvec_recps_d, helper_recpsf_f64, float64)
+
+DO_3OP(gvec_rsqrts_h, helper_rsqrtsf_f16, float16)
+DO_3OP(gvec_rsqrts_s, helper_rsqrtsf_f32, float32)
+DO_3OP(gvec_rsqrts_d, helper_rsqrtsf_f64, float64)
+
+#endif
+#undef DO_3OP
diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
index 452ac6f453..50a521876d 100644
--- a/target/arm/Makefile.objs
+++ b/target/arm/Makefile.objs
@@ -8,7 +8,7 @@ obj-y += translate.o op_helper.o helper.o cpu.o
 obj-y += neon_helper.o iwmmxt_helper.o
 obj-y += gdbstub.o
 obj-$(TARGET_AARCH64) += cpu64.o translate-a64.o helper-a64.o gdbstub64.o
-obj-y += crypto_helper.o
+obj-y += crypto_helper.o vec_helper.o
 obj-$(CONFIG_SOFTMMU) += arm-powerctl.o
 
 DECODETREE = $(SRC_PATH)/scripts/decodetree.py
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 1ede152360..42d14994a1 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -656,6 +656,16 @@ UMIN_zzi	00100101 .. 101 011 110 ........ .....		@rdn_i8u
 # SVE integer multiply immediate (unpredicated)
 MUL_zzi		00100101 .. 110 000 110 ........ .....		@rdn_i8s
 
+### SVE Floating Point Arithmetic - Unpredicated Group
+
+# SVE floating-point arithmetic (unpredicated)
+FADD_zzz	01100101 .. 0 ..... 000 000 ..... .....		@rd_rn_rm
+FSUB_zzz	01100101 .. 0 ..... 000 001 ..... .....		@rd_rn_rm
+FMUL_zzz	01100101 .. 0 ..... 000 010 ..... .....		@rd_rn_rm
+FTSMUL		01100101 .. 0 ..... 000 011 ..... .....		@rd_rn_rm
+FRECPS		01100101 .. 0 ..... 000 110 ..... .....		@rd_rn_rm
+FRSQRTS		01100101 .. 0 ..... 000 111 ..... .....		@rd_rn_rm
+
 ### SVE Memory - 32-bit Gather and Unsized Contiguous Group
 
 # SVE load predicate register
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 44/67] target/arm: Implement SVE Memory Contiguous Load Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (42 preceding siblings ...)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 43/67] target/arm: Implement SVE Floating Point Arithmetic " Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 12:16   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 45/67] target/arm: Implement SVE Memory Contiguous Store Group Richard Henderson
                   ` (24 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  35 +++++++
 target/arm/sve_helper.c    | 235 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 130 +++++++++++++++++++++++++
 target/arm/sve.decode      |  44 ++++++++-
 4 files changed, 442 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 2e76084992..fcc9ba5f50 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -719,3 +719,38 @@ DEF_HELPER_FLAGS_5(gvec_rsqrts_s, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_rsqrts_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_ld1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld3bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld4bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ld1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld2hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld3hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld4hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ld1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld2ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld3ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld4ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ld1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld2dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld3dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld4dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ld1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1bsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1bdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1bhs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1bss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1bds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ld1hsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ld1sdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1sds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 4f45f11bff..e542725113 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -2788,3 +2788,238 @@ uint32_t HELPER(sve_while)(void *vd, uint32_t count, uint32_t pred_desc)
 
     return predtest_ones(d, oprsz, esz_mask);
 }
+
+/*
+ * Load contiguous data, protected by a governing predicate.
+ */
+#define DO_LD1(NAME, FN, TYPEE, TYPEM, H)                  \
+void HELPER(NAME)(CPUARMState *env, void *vg,              \
+                  target_ulong addr, uint32_t desc)        \
+{                                                          \
+    intptr_t i, oprsz = simd_oprsz(desc);                  \
+    intptr_t ra = GETPC();                                 \
+    unsigned rd = simd_data(desc);                         \
+    void *vd = &env->vfp.zregs[rd];                        \
+    for (i = 0; i < oprsz; ) {                             \
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));    \
+        do {                                               \
+            TYPEM m = 0;                                   \
+            if (pg & 1) {                                  \
+                m = FN(env, addr, ra);                     \
+            }                                              \
+            *(TYPEE *)(vd + H(i)) = m;                     \
+            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);      \
+            addr += sizeof(TYPEM);                         \
+        } while (i & 15);                                  \
+    }                                                      \
+}
+
+#define DO_LD1_D(NAME, FN, TYPEM)                          \
+void HELPER(NAME)(CPUARMState *env, void *vg,              \
+                  target_ulong addr, uint32_t desc)        \
+{                                                          \
+    intptr_t i, oprsz = simd_oprsz(desc) / 8;              \
+    intptr_t ra = GETPC();                                 \
+    unsigned rd = simd_data(desc);                         \
+    uint64_t *d = &env->vfp.zregs[rd].d[0];                \
+    uint8_t *pg = vg;                                      \
+    for (i = 0; i < oprsz; i += 1) {                       \
+        TYPEM m = 0;                                       \
+        if (pg[H1(i)] & 1) {                               \
+            m = FN(env, addr, ra);                         \
+        }                                                  \
+        d[i] = m;                                          \
+        addr += sizeof(TYPEM);                             \
+    }                                                      \
+}
+
+#define DO_LD2(NAME, FN, TYPEE, TYPEM, H)                  \
+void HELPER(NAME)(CPUARMState *env, void *vg,              \
+                  target_ulong addr, uint32_t desc)        \
+{                                                          \
+    intptr_t i, oprsz = simd_oprsz(desc);                  \
+    intptr_t ra = GETPC();                                 \
+    unsigned rd = simd_data(desc);                         \
+    void *d1 = &env->vfp.zregs[rd];                        \
+    void *d2 = &env->vfp.zregs[(rd + 1) & 31];             \
+    for (i = 0; i < oprsz; ) {                             \
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));    \
+        do {                                               \
+            TYPEM m1 = 0, m2 = 0;                          \
+            if (pg & 1) {                                  \
+                m1 = FN(env, addr, ra);                    \
+                m2 = FN(env, addr + sizeof(TYPEM), ra);    \
+            }                                              \
+            *(TYPEE *)(d1 + H(i)) = m1;                    \
+            *(TYPEE *)(d2 + H(i)) = m2;                    \
+            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);      \
+            addr += 2 * sizeof(TYPEM);                     \
+        } while (i & 15);                                  \
+    }                                                      \
+}
+
+#define DO_LD3(NAME, FN, TYPEE, TYPEM, H)                  \
+void HELPER(NAME)(CPUARMState *env, void *vg,              \
+                  target_ulong addr, uint32_t desc)        \
+{                                                          \
+    intptr_t i, oprsz = simd_oprsz(desc);                  \
+    intptr_t ra = GETPC();                                 \
+    unsigned rd = simd_data(desc);                         \
+    void *d1 = &env->vfp.zregs[rd];                        \
+    void *d2 = &env->vfp.zregs[(rd + 1) & 31];             \
+    void *d3 = &env->vfp.zregs[(rd + 2) & 31];             \
+    for (i = 0; i < oprsz; ) {                             \
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));    \
+        do {                                               \
+            TYPEM m1 = 0, m2 = 0, m3 = 0;                  \
+            if (pg & 1) {                                  \
+                m1 = FN(env, addr, ra);                    \
+                m2 = FN(env, addr + sizeof(TYPEM), ra);    \
+                m3 = FN(env, addr + 2 * sizeof(TYPEM), ra); \
+            }                                              \
+            *(TYPEE *)(d1 + H(i)) = m1;                    \
+            *(TYPEE *)(d2 + H(i)) = m2;                    \
+            *(TYPEE *)(d3 + H(i)) = m3;                    \
+            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);      \
+            addr += 3 * sizeof(TYPEM);                     \
+        } while (i & 15);                                  \
+    }                                                      \
+}
+
+#define DO_LD4(NAME, FN, TYPEE, TYPEM, H)                  \
+void HELPER(NAME)(CPUARMState *env, void *vg,              \
+                  target_ulong addr, uint32_t desc)        \
+{                                                          \
+    intptr_t i, oprsz = simd_oprsz(desc);                  \
+    intptr_t ra = GETPC();                                 \
+    unsigned rd = simd_data(desc);                         \
+    void *d1 = &env->vfp.zregs[rd];                        \
+    void *d2 = &env->vfp.zregs[(rd + 1) & 31];             \
+    void *d3 = &env->vfp.zregs[(rd + 2) & 31];             \
+    void *d4 = &env->vfp.zregs[(rd + 3) & 31];             \
+    for (i = 0; i < oprsz; ) {                             \
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));    \
+        do {                                               \
+            TYPEM m1 = 0, m2 = 0, m3 = 0, m4 = 0;          \
+            if (pg & 1) {                                  \
+                m1 = FN(env, addr, ra);                    \
+                m2 = FN(env, addr + sizeof(TYPEM), ra);    \
+                m3 = FN(env, addr + 2 * sizeof(TYPEM), ra); \
+                m4 = FN(env, addr + 3 * sizeof(TYPEM), ra); \
+            }                                              \
+            *(TYPEE *)(d1 + H(i)) = m1;                    \
+            *(TYPEE *)(d2 + H(i)) = m2;                    \
+            *(TYPEE *)(d3 + H(i)) = m3;                    \
+            *(TYPEE *)(d4 + H(i)) = m4;                    \
+            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);      \
+            addr += 4 * sizeof(TYPEM);                     \
+        } while (i & 15);                                  \
+    }                                                      \
+}
+
+DO_LD1(sve_ld1bhu_r, cpu_ldub_data_ra, uint16_t, uint8_t, H1_2)
+DO_LD1(sve_ld1bhs_r, cpu_ldsb_data_ra, uint16_t, int8_t, H1_2)
+DO_LD1(sve_ld1bsu_r, cpu_ldub_data_ra, uint32_t, uint8_t, H1_4)
+DO_LD1(sve_ld1bss_r, cpu_ldsb_data_ra, uint32_t, int8_t, H1_4)
+DO_LD1_D(sve_ld1bdu_r, cpu_ldub_data_ra, uint8_t)
+DO_LD1_D(sve_ld1bds_r, cpu_ldsb_data_ra, int8_t)
+
+DO_LD1(sve_ld1hsu_r, cpu_lduw_data_ra, uint32_t, uint16_t, H1_4)
+DO_LD1(sve_ld1hss_r, cpu_ldsw_data_ra, uint32_t, int8_t, H1_4)
+DO_LD1_D(sve_ld1hdu_r, cpu_lduw_data_ra, uint16_t)
+DO_LD1_D(sve_ld1hds_r, cpu_ldsw_data_ra, int16_t)
+
+DO_LD1_D(sve_ld1sdu_r, cpu_ldl_data_ra, uint32_t)
+DO_LD1_D(sve_ld1sds_r, cpu_ldl_data_ra, int32_t)
+
+DO_LD1(sve_ld1bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
+DO_LD2(sve_ld2bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
+DO_LD3(sve_ld3bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
+DO_LD4(sve_ld4bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
+
+DO_LD1(sve_ld1hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
+DO_LD2(sve_ld2hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
+DO_LD3(sve_ld3hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
+DO_LD4(sve_ld4hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
+
+DO_LD1(sve_ld1ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
+DO_LD2(sve_ld2ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
+DO_LD3(sve_ld3ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
+DO_LD4(sve_ld4ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
+
+DO_LD1_D(sve_ld1dd_r, cpu_ldq_data_ra, uint64_t)
+
+void HELPER(sve_ld2dd_r)(CPUARMState *env, void *vg,
+                         target_ulong addr, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc) / 8;
+    intptr_t ra = GETPC();
+    unsigned rd = simd_data(desc);
+    uint64_t *d1 = &env->vfp.zregs[rd].d[0];
+    uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0];
+    uint8_t *pg = vg;
+
+    for (i = 0; i < oprsz; i += 1) {
+        uint64_t m1 = 0, m2 = 0;
+        if (pg[H1(i)] & 1) {
+            m1 = cpu_ldq_data_ra(env, addr, ra);
+            m2 = cpu_ldq_data_ra(env, addr + 8, ra);
+        }
+        d1[i] = m1;
+        d2[i] = m2;
+        addr += 2 * 8;
+    }
+}
+
+void HELPER(sve_ld3dd_r)(CPUARMState *env, void *vg,
+                         target_ulong addr, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc) / 8;
+    intptr_t ra = GETPC();
+    unsigned rd = simd_data(desc);
+    uint64_t *d1 = &env->vfp.zregs[rd].d[0];
+    uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0];
+    uint64_t *d3 = &env->vfp.zregs[(rd + 2) & 31].d[0];
+    uint8_t *pg = vg;
+
+    for (i = 0; i < oprsz; i += 1) {
+        uint64_t m1 = 0, m2 = 0, m3 = 0;
+        if (pg[H1(i)] & 1) {
+            m1 = cpu_ldq_data_ra(env, addr, ra);
+            m2 = cpu_ldq_data_ra(env, addr + 8, ra);
+            m3 = cpu_ldq_data_ra(env, addr + 16, ra);
+        }
+        d1[i] = m1;
+        d2[i] = m2;
+        d3[i] = m3;
+        addr += 3 * 8;
+    }
+}
+
+void HELPER(sve_ld4dd_r)(CPUARMState *env, void *vg,
+                         target_ulong addr, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc) / 8;
+    intptr_t ra = GETPC();
+    unsigned rd = simd_data(desc);
+    uint64_t *d1 = &env->vfp.zregs[rd].d[0];
+    uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0];
+    uint64_t *d3 = &env->vfp.zregs[(rd + 2) & 31].d[0];
+    uint64_t *d4 = &env->vfp.zregs[(rd + 3) & 31].d[0];
+    uint8_t *pg = vg;
+
+    for (i = 0; i < oprsz; i += 1) {
+        uint64_t m1 = 0, m2 = 0, m3 = 0, m4 = 0;
+        if (pg[H1(i)] & 1) {
+            m1 = cpu_ldq_data_ra(env, addr, ra);
+            m2 = cpu_ldq_data_ra(env, addr + 8, ra);
+            m3 = cpu_ldq_data_ra(env, addr + 16, ra);
+            m4 = cpu_ldq_data_ra(env, addr + 24, ra);
+        }
+        d1[i] = m1;
+        d2[i] = m2;
+        d3[i] = m3;
+        d4[i] = m4;
+        addr += 4 * 8;
+    }
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index f9a3ad1434..aa8bfd2ae7 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -46,6 +46,8 @@ typedef void gen_helper_gvec_flags_3(TCGv_i32, TCGv_ptr, TCGv_ptr,
 typedef void gen_helper_gvec_flags_4(TCGv_i32, TCGv_ptr, TCGv_ptr,
                                      TCGv_ptr, TCGv_ptr, TCGv_i32);
 
+typedef void gen_helper_gvec_mem(TCGv_env, TCGv_ptr, TCGv_i64, TCGv_i32);
+
 /*
  * Helpers for extracting complex instruction fields.
  */
@@ -86,6 +88,15 @@ static inline int expand_imm_sh8u(int x)
     return (uint8_t)x << (x & 0x100 ? 8 : 0);
 }
 
+/* Convert a 2-bit memory size (msz) to a 4-bit data type (dtype)
+ * with unsigned data.  C.f. SVE Memory Contiguous Load Group.
+ */
+static inline int msz_dtype(int msz)
+{
+    static const uint8_t dtype[4] = { 0, 5, 10, 15 };
+    return dtype[msz];
+}
+
 /*
  * Include the generated decoder.
  */
@@ -3268,3 +3279,122 @@ static void trans_LDR_pri(DisasContext *s, arg_rri *a, uint32_t insn)
     int size = pred_full_reg_size(s);
     do_ldr(s, pred_full_reg_offset(s, a->rd), size, a->rn, a->imm * size);
 }
+
+/*
+ *** SVE Memory - Contiguous Load Group
+ */
+
+/* The memory element size of dtype.  */
+static const TCGMemOp dtype_mop[16] = {
+    MO_UB, MO_UB, MO_UB, MO_UB,
+    MO_SL, MO_UW, MO_UW, MO_UW,
+    MO_SW, MO_SW, MO_UL, MO_UL,
+    MO_SB, MO_SB, MO_SB, MO_Q
+};
+
+#define dtype_msz(x)  (dtype_mop[x] & MO_SIZE)
+
+/* The vector element size of dtype.  */
+static const uint8_t dtype_esz[16] = {
+    0, 1, 2, 3,
+    3, 1, 2, 3,
+    3, 2, 2, 3,
+    3, 2, 1, 3
+};
+
+static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
+                       gen_helper_gvec_mem *fn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_ptr t_pg;
+    TCGv_i32 desc;
+
+    /* For e.g. LD4, there are not enough arguments to pass all 4
+       registers as pointers, so encode the regno into the data field.
+       For consistency, do this even for LD1.  */
+    desc = tcg_const_i32(simd_desc(vsz, vsz, zt));
+    t_pg = tcg_temp_new_ptr();
+
+    tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg));
+    fn(cpu_env, t_pg, addr, desc);
+
+    tcg_temp_free_ptr(t_pg);
+    tcg_temp_free_i32(desc);
+    tcg_temp_free_i64(addr);
+}
+
+static void do_ld_zpa(DisasContext *s, int zt, int pg,
+                      TCGv_i64 addr, int dtype, int nreg)
+{
+    static gen_helper_gvec_mem * const fns[16][4] = {
+        { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r,
+          gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r },
+        { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL },
+        { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL },
+        { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL },
+
+        { gen_helper_sve_ld1sds_r, NULL, NULL, NULL },
+        { gen_helper_sve_ld1hh_r, gen_helper_sve_ld2hh_r,
+          gen_helper_sve_ld3hh_r, gen_helper_sve_ld4hh_r },
+        { gen_helper_sve_ld1hsu_r, NULL, NULL, NULL },
+        { gen_helper_sve_ld1hdu_r, NULL, NULL, NULL },
+
+        { gen_helper_sve_ld1hds_r, NULL, NULL, NULL },
+        { gen_helper_sve_ld1hss_r, NULL, NULL, NULL },
+        { gen_helper_sve_ld1ss_r, gen_helper_sve_ld2ss_r,
+          gen_helper_sve_ld3ss_r, gen_helper_sve_ld4ss_r },
+        { gen_helper_sve_ld1sdu_r, NULL, NULL, NULL },
+
+        { gen_helper_sve_ld1bds_r, NULL, NULL, NULL },
+        { gen_helper_sve_ld1bss_r, NULL, NULL, NULL },
+        { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL },
+        { gen_helper_sve_ld1dd_r, gen_helper_sve_ld2dd_r,
+          gen_helper_sve_ld3dd_r, gen_helper_sve_ld4dd_r },
+    };
+    gen_helper_gvec_mem *fn = fns[dtype][nreg];
+
+    /* While there are holes in the table, they are not
+       accessible via the instruction encoding.  */
+    assert(fn != NULL);
+    do_mem_zpa(s, zt, pg, addr, fn);
+}
+
+static void trans_LD_zprr(DisasContext *s, arg_rprr_load *a, uint32_t insn)
+{
+    TCGv_i64 addr;
+
+    if (a->rm == 31) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    addr = tcg_temp_new_i64();
+    tcg_gen_muli_i64(addr, cpu_reg(s, a->rm),
+                     (a->nreg + 1) << dtype_msz(a->dtype));
+    tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
+    do_ld_zpa(s, a->rd, a->pg, addr, a->dtype, a->nreg);
+}
+
+static void trans_LD_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    unsigned elements = vsz >> dtype_esz[a->dtype];
+    TCGv_i64 addr = tcg_temp_new_i64();
+
+    tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn),
+                     (a->imm * elements * (a->nreg + 1))
+                     << dtype_msz(a->dtype));
+    do_ld_zpa(s, a->rd, a->pg, addr, a->dtype, a->nreg);
+}
+
+static void trans_LDFF1_zprr(DisasContext *s, arg_rprr_load *a, uint32_t insn)
+{
+    /* FIXME */
+    trans_LD_zprr(s, a, insn);
+}
+
+static void trans_LDNF1_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
+{
+    /* FIXME */
+    trans_LD_zpri(s, a, insn);
+}
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 42d14994a1..d2b3869c58 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -42,9 +42,12 @@
 %tszimm16_shl	22:2 16:5 !function=tszimm_shl
 
 # Signed 8-bit immediate, optionally shifted left by 8.
-%sh8_i8s		5:9 !function=expand_imm_sh8s
+%sh8_i8s	5:9 !function=expand_imm_sh8s
 # Unsigned 8-bit immediate, optionally shifted left by 8.
-%sh8_i8u		5:9 !function=expand_imm_sh8u
+%sh8_i8u	5:9 !function=expand_imm_sh8u
+
+# Unsigned load of msz into esz=2, represented as a dtype.
+%msz_dtype	23:2 !function=msz_dtype
 
 # Either a copy of rd (at bit 0), or a different source
 # as propagated via the MOVPRFX instruction.
@@ -72,6 +75,8 @@
 &incdec2_cnt	rd rn pat esz imm d u
 &incdec_pred	rd pg esz d u
 &incdec2_pred	rd rn pg esz d u
+&rprr_load	rd pg rn rm dtype nreg
+&rpri_load	rd pg rn imm dtype nreg
 
 ###########################################################################
 # Named instruction formats.  These are generally used to
@@ -171,6 +176,15 @@
 @incdec2_pred	........ esz:2 .... .. ..... .. pg:4 rd:5 \
 		&incdec2_pred rn=%reg_movprfx
 
+# Loads; user must fill in NREG.
+@rprr_load_dt	....... dtype:4 rm:5 ... pg:3 rn:5 rd:5		&rprr_load
+@rpri_load_dt	....... dtype:4 . imm:s4 ... pg:3 rn:5 rd:5	&rpri_load
+
+@rprr_load_msz	....... .... rm:5 ... pg:3 rn:5 rd:5 \
+		&rprr_load dtype=%msz_dtype
+@rpri_load_msz	....... .... . imm:s4 ... pg:3 rn:5 rd:5 \
+		&rpri_load dtype=%msz_dtype
+
 ###########################################################################
 # Instruction patterns.  Grouped according to the SVE encodingindex.xhtml.
 
@@ -673,3 +687,29 @@ LDR_pri		10000101 10 ...... 000 ... ..... 0 ....		@pd_rn_i9
 
 # SVE load vector register
 LDR_zri		10000101 10 ...... 010 ... ..... .....		@rd_rn_i9
+
+### SVE Memory Contiguous Load Group
+
+# SVE contiguous load (scalar plus scalar)
+LD_zprr		1010010 .... ..... 010 ... ..... .....    @rprr_load_dt nreg=0
+
+# SVE contiguous first-fault load (scalar plus scalar)
+LDFF1_zprr	1010010 .... ..... 011 ... ..... .....	  @rprr_load_dt nreg=0
+
+# SVE contiguous load (scalar plus immediate)
+LD_zpri		1010010 .... 0.... 101 ... ..... .....    @rpri_load_dt nreg=0
+
+# SVE contiguous non-fault load (scalar plus immediate)
+LDNF1_zpri	1010010 .... 1.... 101 ... ..... .....    @rpri_load_dt nreg=0
+
+# SVE contiguous non-temporal load (scalar plus scalar)
+# LDNT1B, LDNT1H, LDNT1W, LDNT1D
+# SVE load multiple structures (scalar plus scalar)
+# LD2B, LD2H, LD2W, LD2D; etc.
+LD_zprr		1010010 .. nreg:2 ..... 110 ... ..... .....	@rprr_load_msz
+
+# SVE contiguous non-temporal load (scalar plus immediate)
+# LDNT1B, LDNT1H, LDNT1W, LDNT1D
+# SVE load multiple structures (scalar plus immediate)
+# LD2B, LD2H, LD2W, LD2D; etc.
+LD_zpri		1010010 .. nreg:2 0.... 111 ... ..... .....	@rpri_load_msz
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 45/67] target/arm: Implement SVE Memory Contiguous Store Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (43 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 44/67] target/arm: Implement SVE Memory Contiguous Load Group Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 13:22   ` Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 46/67] target/arm: Implement SVE load and broadcast quadword Richard Henderson
                   ` (23 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  29 +++++++
 target/arm/sve_helper.c    | 211 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c |  68 ++++++++++++++-
 target/arm/sve.decode      |  38 ++++++++
 4 files changed, 343 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index fcc9ba5f50..74c2d642a3 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -754,3 +754,32 @@ DEF_HELPER_FLAGS_4(sve_ld1hds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_4(sve_ld1sdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_ld1sds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_st1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st3bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st4bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_st1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st2hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st3hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st4hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_st1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st2ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st3ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st4ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_st1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st2dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st3dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st4dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_st1bh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1bs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1bd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_st1hs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1hd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_st1sd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index e542725113..e259e910de 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -3023,3 +3023,214 @@ void HELPER(sve_ld4dd_r)(CPUARMState *env, void *vg,
         addr += 4 * 8;
     }
 }
+
+/*
+ * Store contiguous data, protected by a governing predicate.
+ */
+#define DO_ST1(NAME, FN, TYPEE, TYPEM, H)                  \
+void HELPER(NAME)(CPUARMState *env, void *vg,              \
+                  target_ulong addr, uint32_t desc)        \
+{                                                          \
+    intptr_t i, oprsz = simd_oprsz(desc);                  \
+    intptr_t ra = GETPC();                                 \
+    unsigned rd = simd_data(desc);                         \
+    void *vd = &env->vfp.zregs[rd];                        \
+    for (i = 0; i < oprsz; ) {                             \
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));    \
+        do {                                               \
+            if (pg & 1) {                                  \
+                TYPEM m = *(TYPEE *)(vd + H(i));           \
+                FN(env, addr, m, ra);                      \
+            }                                              \
+            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);      \
+            addr += sizeof(TYPEM);                         \
+        } while (i & 15);                                  \
+    }                                                      \
+}
+
+#define DO_ST1_D(NAME, FN, TYPEM)                          \
+void HELPER(NAME)(CPUARMState *env, void *vg,              \
+                  target_ulong addr, uint32_t desc)        \
+{                                                          \
+    intptr_t i, oprsz = simd_oprsz(desc) / 8;              \
+    intptr_t ra = GETPC();                                 \
+    unsigned rd = simd_data(desc);                         \
+    uint64_t *d = &env->vfp.zregs[rd].d[0];                \
+    uint8_t *pg = vg;                                      \
+    for (i = 0; i < oprsz; i += 1) {                       \
+        if (pg[H1(i)] & 1) {                               \
+            FN(env, addr, d[i], ra);                       \
+        }                                                  \
+        addr += sizeof(TYPEM);                             \
+    }                                                      \
+}
+
+#define DO_ST2(NAME, FN, TYPEE, TYPEM, H)                  \
+void HELPER(NAME)(CPUARMState *env, void *vg,              \
+                  target_ulong addr, uint32_t desc)        \
+{                                                          \
+    intptr_t i, oprsz = simd_oprsz(desc);                  \
+    intptr_t ra = GETPC();                                 \
+    unsigned rd = simd_data(desc);                         \
+    void *d1 = &env->vfp.zregs[rd];                        \
+    void *d2 = &env->vfp.zregs[(rd + 1) & 31];             \
+    for (i = 0; i < oprsz; ) {                             \
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));    \
+        do {                                               \
+            if (pg & 1) {                                  \
+                TYPEM m1 = *(TYPEE *)(d1 + H(i));          \
+                TYPEM m2 = *(TYPEE *)(d2 + H(i));          \
+                FN(env, addr, m1, ra);                     \
+                FN(env, addr + sizeof(TYPEM), m2, ra);     \
+            }                                              \
+            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);      \
+            addr += 2 * sizeof(TYPEM);                     \
+        } while (i & 15);                                  \
+    }                                                      \
+}
+
+#define DO_ST3(NAME, FN, TYPEE, TYPEM, H)                  \
+void HELPER(NAME)(CPUARMState *env, void *vg,              \
+                  target_ulong addr, uint32_t desc)        \
+{                                                          \
+    intptr_t i, oprsz = simd_oprsz(desc);                  \
+    intptr_t ra = GETPC();                                 \
+    unsigned rd = simd_data(desc);                         \
+    void *d1 = &env->vfp.zregs[rd];                        \
+    void *d2 = &env->vfp.zregs[(rd + 1) & 31];             \
+    void *d3 = &env->vfp.zregs[(rd + 2) & 31];             \
+    for (i = 0; i < oprsz; ) {                             \
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));    \
+        do {                                               \
+            if (pg & 1) {                                  \
+                TYPEM m1 = *(TYPEE *)(d1 + H(i));          \
+                TYPEM m2 = *(TYPEE *)(d2 + H(i));          \
+                TYPEM m3 = *(TYPEE *)(d3 + H(i));          \
+                FN(env, addr, m1, ra);                     \
+                FN(env, addr + sizeof(TYPEM), m2, ra);     \
+                FN(env, addr + 2 * sizeof(TYPEM), m3, ra); \
+            }                                              \
+            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);      \
+            addr += 3 * sizeof(TYPEM);                     \
+        } while (i & 15);                                  \
+    }                                                      \
+}
+
+#define DO_ST4(NAME, FN, TYPEE, TYPEM, H)                  \
+void HELPER(NAME)(CPUARMState *env, void *vg,              \
+                  target_ulong addr, uint32_t desc)        \
+{                                                          \
+    intptr_t i, oprsz = simd_oprsz(desc);                  \
+    intptr_t ra = GETPC();                                 \
+    unsigned rd = simd_data(desc);                         \
+    void *d1 = &env->vfp.zregs[rd];                        \
+    void *d2 = &env->vfp.zregs[(rd + 1) & 31];             \
+    void *d3 = &env->vfp.zregs[(rd + 2) & 31];             \
+    void *d4 = &env->vfp.zregs[(rd + 3) & 31];             \
+    for (i = 0; i < oprsz; ) {                             \
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));    \
+        do {                                               \
+            if (pg & 1) {                                  \
+                TYPEM m1 = *(TYPEE *)(d1 + H(i));          \
+                TYPEM m2 = *(TYPEE *)(d2 + H(i));          \
+                TYPEM m3 = *(TYPEE *)(d3 + H(i));          \
+                TYPEM m4 = *(TYPEE *)(d4 + H(i));          \
+                FN(env, addr, m1, ra);                     \
+                FN(env, addr + sizeof(TYPEM), m2, ra);     \
+                FN(env, addr + 2 * sizeof(TYPEM), m3, ra); \
+                FN(env, addr + 3 * sizeof(TYPEM), m4, ra); \
+            }                                              \
+            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);      \
+            addr += 4 * sizeof(TYPEM);                     \
+        } while (i & 15);                                  \
+    }                                                      \
+}
+
+DO_ST1(sve_st1bh_r, cpu_stb_data_ra, uint16_t, uint8_t, H1_2)
+DO_ST1(sve_st1bs_r, cpu_stb_data_ra, uint32_t, uint8_t, H1_4)
+DO_ST1_D(sve_st1bd_r, cpu_stb_data_ra, uint8_t)
+
+DO_ST1(sve_st1hs_r, cpu_stw_data_ra, uint32_t, uint16_t, H1_4)
+DO_ST1_D(sve_st1hd_r, cpu_stw_data_ra, uint16_t)
+
+DO_ST1_D(sve_st1sd_r, cpu_stl_data_ra, uint32_t)
+
+DO_ST1(sve_st1bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1)
+DO_ST2(sve_st2bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1)
+DO_ST3(sve_st3bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1)
+DO_ST4(sve_st4bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1)
+
+DO_ST1(sve_st1hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2)
+DO_ST2(sve_st2hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2)
+DO_ST3(sve_st3hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2)
+DO_ST4(sve_st4hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2)
+
+DO_ST1(sve_st1ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4)
+DO_ST2(sve_st2ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4)
+DO_ST3(sve_st3ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4)
+DO_ST4(sve_st4ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4)
+
+DO_ST1_D(sve_st1dd_r, cpu_stq_data_ra, uint64_t)
+
+void HELPER(sve_st2dd_r)(CPUARMState *env, void *vg,
+                         target_ulong addr, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc) / 8;
+    intptr_t ra = GETPC();
+    unsigned rd = simd_data(desc);
+    uint64_t *d1 = &env->vfp.zregs[rd].d[0];
+    uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0];
+    uint8_t *pg = vg;
+
+    for (i = 0; i < oprsz; i += 1) {
+        if (pg[H1(i)] & 1) {
+            cpu_stq_data_ra(env, addr, d1[i], ra);
+            cpu_stq_data_ra(env, addr + 8, d2[i], ra);
+        }
+        addr += 2 * 8;
+    }
+}
+
+void HELPER(sve_st3dd_r)(CPUARMState *env, void *vg,
+                         target_ulong addr, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc) / 8;
+    intptr_t ra = GETPC();
+    unsigned rd = simd_data(desc);
+    uint64_t *d1 = &env->vfp.zregs[rd].d[0];
+    uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0];
+    uint64_t *d3 = &env->vfp.zregs[(rd + 2) & 31].d[0];
+    uint8_t *pg = vg;
+
+    for (i = 0; i < oprsz; i += 1) {
+        if (pg[H1(i)] & 1) {
+            cpu_stq_data_ra(env, addr, d1[i], ra);
+            cpu_stq_data_ra(env, addr + 8, d2[i], ra);
+            cpu_stq_data_ra(env, addr + 16, d3[i], ra);
+        }
+        addr += 3 * 8;
+    }
+}
+
+void HELPER(sve_st4dd_r)(CPUARMState *env, void *vg,
+                         target_ulong addr, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc) / 8;
+    intptr_t ra = GETPC();
+    unsigned rd = simd_data(desc);
+    uint64_t *d1 = &env->vfp.zregs[rd].d[0];
+    uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0];
+    uint64_t *d3 = &env->vfp.zregs[(rd + 2) & 31].d[0];
+    uint64_t *d4 = &env->vfp.zregs[(rd + 3) & 31].d[0];
+    uint8_t *pg = vg;
+
+    for (i = 0; i < oprsz; i += 1) {
+        if (pg[H1(i)] & 1) {
+            cpu_stq_data_ra(env, addr, d1[i], ra);
+            cpu_stq_data_ra(env, addr + 8, d2[i], ra);
+            cpu_stq_data_ra(env, addr + 16, d3[i], ra);
+            cpu_stq_data_ra(env, addr + 24, d4[i], ra);
+        }
+        addr += 4 * 8;
+    }
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index aa8bfd2ae7..fda9a56fd5 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3320,7 +3320,6 @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
 
     tcg_temp_free_ptr(t_pg);
     tcg_temp_free_i32(desc);
-    tcg_temp_free_i64(addr);
 }
 
 static void do_ld_zpa(DisasContext *s, int zt, int pg,
@@ -3368,7 +3367,7 @@ static void trans_LD_zprr(DisasContext *s, arg_rprr_load *a, uint32_t insn)
         return;
     }
 
-    addr = tcg_temp_new_i64();
+    addr = new_tmp_a64(s);
     tcg_gen_muli_i64(addr, cpu_reg(s, a->rm),
                      (a->nreg + 1) << dtype_msz(a->dtype));
     tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
@@ -3379,7 +3378,7 @@ static void trans_LD_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
 {
     unsigned vsz = vec_full_reg_size(s);
     unsigned elements = vsz >> dtype_esz[a->dtype];
-    TCGv_i64 addr = tcg_temp_new_i64();
+    TCGv_i64 addr = new_tmp_a64(s);
 
     tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn),
                      (a->imm * elements * (a->nreg + 1))
@@ -3398,3 +3397,66 @@ static void trans_LDNF1_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
     /* FIXME */
     trans_LD_zpri(s, a, insn);
 }
+
+static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
+                      int msz, int esz, int nreg)
+{
+    static gen_helper_gvec_mem * const fn_single[4][4] = {
+        { gen_helper_sve_st1bb_r, gen_helper_sve_st1bh_r,
+          gen_helper_sve_st1bs_r, gen_helper_sve_st1bd_r },
+        { NULL,                   gen_helper_sve_st1hh_r,
+          gen_helper_sve_st1hs_r, gen_helper_sve_st1hd_r },
+        { NULL, NULL,
+          gen_helper_sve_st1ss_r, gen_helper_sve_st1sd_r },
+        { NULL, NULL, NULL, gen_helper_sve_st1dd_r },
+    };
+    static gen_helper_gvec_mem * const fn_multiple[3][4] = {
+        { gen_helper_sve_st1hh_r, gen_helper_sve_st2hh_r,
+          gen_helper_sve_st3hh_r, gen_helper_sve_st4hh_r },
+        { gen_helper_sve_st1ss_r, gen_helper_sve_st2ss_r,
+          gen_helper_sve_st3ss_r, gen_helper_sve_st4ss_r },
+        { gen_helper_sve_st1dd_r, gen_helper_sve_st2dd_r,
+          gen_helper_sve_st3dd_r, gen_helper_sve_st4dd_r },
+    };
+    gen_helper_gvec_mem *fn;
+
+    if (nreg == 0) {
+        /* ST1 */
+        fn = fn_single[msz][esz];
+        if (fn == NULL) {
+            unallocated_encoding(s);
+            return;
+        }
+    } else {
+        /* ST2, ST3, ST4 -- msz == esz, enforced by encoding */
+        assert(msz == esz);
+        fn = fn_multiple[msz][nreg - 1];
+    }
+    do_mem_zpa(s, zt, pg, addr, fn);
+}
+
+static void trans_ST_zprr(DisasContext *s, arg_rprr_store *a, uint32_t insn)
+{
+    TCGv_i64 addr;
+
+    if (a->rm == 31) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    addr = new_tmp_a64(s);
+    tcg_gen_muli_i64(addr, cpu_reg(s, a->rm), (a->nreg + 1) << a->msz);
+    tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
+    do_st_zpa(s, a->rd, a->pg, addr, a->msz, a->esz, a->nreg);
+}
+
+static void trans_ST_zpri(DisasContext *s, arg_rpri_store *a, uint32_t insn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    unsigned elements = vsz >> a->esz;
+    TCGv_i64 addr = new_tmp_a64(s);
+
+    tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn),
+                     (a->imm * elements * (a->nreg + 1)) << a->msz);
+    do_st_zpa(s, a->rd, a->pg, addr, a->msz, a->esz, a->nreg);
+}
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index d2b3869c58..41b8cd8746 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -28,6 +28,7 @@
 %imm8_16_10	16:5 10:3
 %imm9_16_10	16:s6 10:3
 %preg4_5	5:4
+%size_23	23:2
 
 # A combination of tsz:imm3 -- extract esize.
 %tszimm_esz	22:2 5:5 !function=tszimm_esz
@@ -77,6 +78,8 @@
 &incdec2_pred	rd rn pg esz d u
 &rprr_load	rd pg rn rm dtype nreg
 &rpri_load	rd pg rn imm dtype nreg
+&rprr_store	rd pg rn rm msz esz nreg
+&rpri_store	rd pg rn imm msz esz nreg
 
 ###########################################################################
 # Named instruction formats.  These are generally used to
@@ -185,6 +188,12 @@
 @rpri_load_msz	....... .... . imm:s4 ... pg:3 rn:5 rd:5 \
 		&rpri_load dtype=%msz_dtype
 
+# Stores; user must fill in ESZ, MSZ, NREG as needed.
+@rprr_store	    ....... ..    ..     rm:5 ... pg:3 rn:5 rd:5    &rprr_store
+@rpri_store_msz     ....... msz:2 .. . imm:s4 ... pg:3 rn:5 rd:5    &rpri_store
+@rprr_store_esz_n0  ....... ..    esz:2  rm:5 ... pg:3 rn:5 rd:5 \
+		    &rprr_store nreg=0
+
 ###########################################################################
 # Instruction patterns.  Grouped according to the SVE encodingindex.xhtml.
 
@@ -713,3 +722,32 @@ LD_zprr		1010010 .. nreg:2 ..... 110 ... ..... .....	@rprr_load_msz
 # SVE load multiple structures (scalar plus immediate)
 # LD2B, LD2H, LD2W, LD2D; etc.
 LD_zpri		1010010 .. nreg:2 0.... 111 ... ..... .....	@rpri_load_msz
+
+### SVE Memory Store Group
+
+# SVE contiguous store (scalar plus immediate)
+# ST1B, ST1H, ST1W, ST1D; require msz <= esz
+ST_zpri		1110010 .. esz:2  0.... 111 ... ..... ..... \
+		@rpri_store_msz nreg=0
+
+# SVE contiguous store (scalar plus scalar)
+# ST1B, ST1H, ST1W, ST1D; require msz <= esz
+# Enumerate msz lest we conflict with STR_zri.
+ST_zprr		1110010 00 ..     ..... 010 ... ..... ..... \
+		@rprr_store_esz_n0 msz=0
+ST_zprr		1110010 01 ..     ..... 010 ... ..... ..... \
+		@rprr_store_esz_n0 msz=1
+ST_zprr		1110010 10 ..     ..... 010 ... ..... ..... \
+		@rprr_store_esz_n0 msz=2
+ST_zprr		1110010 11 11     ..... 010 ... ..... ..... \
+		@rprr_store msz=3 esz=3 nreg=0
+
+# SVE contiguous non-temporal store (scalar plus immediate)  (nreg == 0)
+# SVE store multiple structures (scalar plus immediate)      (nreg != 0)
+ST_zpri		1110010 .. nreg:2 1.... 111 ... ..... ..... \
+		@rpri_store_msz esz=%size_23
+
+# SVE contiguous non-temporal store (scalar plus scalar)     (nreg == 0)
+# SVE store multiple structures (scalar plus scalar)         (nreg != 0)
+ST_zprr		1110010 msz:2 nreg:2 ..... 011 ... ..... ..... \
+		@rprr_store esz=%size_23
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 46/67] target/arm: Implement SVE load and broadcast quadword
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (44 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 45/67] target/arm: Implement SVE Memory Contiguous Store Group Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 13:36   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 47/67] target/arm: Implement SVE integer convert to floating-point Richard Henderson
                   ` (22 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      |  9 ++++++++
 2 files changed, 60 insertions(+)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index fda9a56fd5..7b21102b7e 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3398,6 +3398,57 @@ static void trans_LDNF1_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
     trans_LD_zpri(s, a, insn);
 }
 
+static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
+{
+    static gen_helper_gvec_mem * const fns[4] = {
+        gen_helper_sve_ld1bb_r, gen_helper_sve_ld1hh_r,
+        gen_helper_sve_ld1ss_r, gen_helper_sve_ld1dd_r,
+    };
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_ptr t_pg;
+    TCGv_i32 desc;
+
+    /* Load the first quadword using the normal predicated load helpers.  */
+    desc = tcg_const_i32(simd_desc(16, 16, zt));
+    t_pg = tcg_temp_new_ptr();
+
+    tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg));
+    fns[msz](cpu_env, t_pg, addr, desc);
+
+    tcg_temp_free_ptr(t_pg);
+    tcg_temp_free_i32(desc);
+
+    /* Replicate that first quadword.  */
+    if (vsz > 16) {
+        unsigned dofs = vec_full_reg_offset(s, zt);
+        tcg_gen_gvec_dup_mem(4, dofs + 16, dofs, vsz - 16, vsz - 16);
+    }
+}
+
+static void trans_LD1RQ_zprr(DisasContext *s, arg_rprr_load *a, uint32_t insn)
+{
+    TCGv_i64 addr;
+    int msz = dtype_msz(a->dtype);
+
+    if (a->rm == 31) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    addr = new_tmp_a64(s);
+    tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), msz);
+    tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
+    do_ldrq(s, a->rd, a->pg, addr, msz);
+}
+
+static void trans_LD1RQ_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
+{
+    TCGv_i64 addr = new_tmp_a64(s);
+
+    tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), a->imm * 16);
+    do_ldrq(s, a->rd, a->pg, addr, dtype_msz(a->dtype));
+}
+
 static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
                       int msz, int esz, int nreg)
 {
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 41b8cd8746..6c906e25e9 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -723,6 +723,15 @@ LD_zprr		1010010 .. nreg:2 ..... 110 ... ..... .....	@rprr_load_msz
 # LD2B, LD2H, LD2W, LD2D; etc.
 LD_zpri		1010010 .. nreg:2 0.... 111 ... ..... .....	@rpri_load_msz
 
+# SVE load and broadcast quadword (scalar plus scalar)
+LD1RQ_zprr	1010010 .. 00 ..... 000 ... ..... ..... \
+		@rprr_load_msz nreg=0
+
+# SVE load and broadcast quadword (scalar plus immediate)
+# LD1RQB, LD1RQH, LD1RQS, LD1RQD
+LD1RQ_zpri	1010010 .. 00 0.... 001 ... ..... ..... \
+		@rpri_load_msz nreg=0
+
 ### SVE Memory Store Group
 
 # SVE contiguous store (scalar plus immediate)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 47/67] target/arm: Implement SVE integer convert to floating-point
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (45 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 46/67] target/arm: Implement SVE load and broadcast quadword Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 13:47   ` Peter Maydell
  2018-02-27 13:51   ` Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 48/67] target/arm: Implement SVE floating-point arithmetic (predicated) Richard Henderson
                   ` (21 subsequent siblings)
  68 siblings, 2 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 30 +++++++++++++++
 target/arm/sve_helper.c    | 52 ++++++++++++++++++++++++++
 target/arm/translate-sve.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      | 22 +++++++++++
 4 files changed, 196 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 74c2d642a3..fb7609f9ef 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -720,6 +720,36 @@ DEF_HELPER_FLAGS_5(gvec_rsqrts_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_rsqrts_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(sve_scvt_hh, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_scvt_sh, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_scvt_dh, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_scvt_ss, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_scvt_sd, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_scvt_ds, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_scvt_dd, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_ucvt_hh, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_ucvt_sh, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_ucvt_dh, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_ucvt_ss, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_ucvt_sd, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_ucvt_ds, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_ucvt_dd, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(sve_ld1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_ld2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_ld3bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index e259e910de..a1e0ceb5fb 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -2789,6 +2789,58 @@ uint32_t HELPER(sve_while)(void *vd, uint32_t count, uint32_t pred_desc)
     return predtest_ones(d, oprsz, esz_mask);
 }
 
+/* Fully general two-operand expander, controlled by a predicate,
+ * With the extra float_status parameter.
+ */
+#define DO_ZPZ_FP(NAME, TYPE, H, OP)                            \
+void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc) \
+{                                                               \
+    intptr_t i, opr_sz = simd_oprsz(desc);                      \
+    for (i = 0; i < opr_sz; ) {                                 \
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));         \
+        do {                                                    \
+            if (pg & 1) {                                       \
+                TYPE nn = *(TYPE *)(vn + H(i));                 \
+                *(TYPE *)(vd + H(i)) = OP(nn, status);          \
+            }                                                   \
+            i += sizeof(TYPE), pg >>= sizeof(TYPE);             \
+        } while (i & 15);                                       \
+    }                                                           \
+}
+
+/* Similarly, specialized for 64-bit operands.  */
+#define DO_ZPZ_FP_D(NAME, TYPE, OP)                             \
+void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc) \
+{                                                               \
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;                  \
+    TYPE *d = vd, *n = vn;                                      \
+    uint8_t *pg = vg;                                           \
+    for (i = 0; i < opr_sz; i += 1) {                           \
+        if (pg[H1(i)] & 1) {                                    \
+            d[i] = OP(n[i], status);                            \
+        }                                                       \
+    }                                                           \
+}
+
+DO_ZPZ_FP(sve_scvt_hh, uint16_t, H1_2, int16_to_float16)
+DO_ZPZ_FP(sve_scvt_sh, uint32_t, H1_4, int32_to_float16)
+DO_ZPZ_FP(sve_scvt_ss, uint32_t, H1_4, int32_to_float32)
+DO_ZPZ_FP_D(sve_scvt_sd, uint64_t, int32_to_float64)
+DO_ZPZ_FP_D(sve_scvt_dh, uint64_t, int64_to_float16)
+DO_ZPZ_FP_D(sve_scvt_ds, uint64_t, int64_to_float32)
+DO_ZPZ_FP_D(sve_scvt_dd, uint64_t, int64_to_float64)
+
+DO_ZPZ_FP(sve_ucvt_hh, uint16_t, H1_2, uint16_to_float16)
+DO_ZPZ_FP(sve_ucvt_sh, uint32_t, H1_4, uint32_to_float16)
+DO_ZPZ_FP(sve_ucvt_ss, uint32_t, H1_4, uint32_to_float32)
+DO_ZPZ_FP_D(sve_ucvt_sd, uint64_t, uint32_to_float64)
+DO_ZPZ_FP_D(sve_ucvt_dh, uint64_t, uint64_to_float16)
+DO_ZPZ_FP_D(sve_ucvt_ds, uint64_t, uint64_to_float32)
+DO_ZPZ_FP_D(sve_ucvt_dd, uint64_t, uint64_to_float64)
+
+#undef DO_ZPZ_FP
+#undef DO_ZPZ_FP_D
+
 /*
  * Load contiguous data, protected by a governing predicate.
  */
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 7b21102b7e..05c684222e 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3161,6 +3161,98 @@ DO_FP3(FRSQRTS, rsqrts)
 
 #undef DO_FP3
 
+
+/*
+ *** SVE Floating Point Unary Operations Prediated Group
+ */
+
+static void do_zpz_ptr(DisasContext *s, int rd, int rn, int pg,
+                       bool is_fp16, gen_helper_gvec_3_ptr *fn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_ptr status;
+
+    if (fn == NULL) {
+        unallocated_encoding(s);
+        return;
+    }
+    status = get_fpstatus_ptr(is_fp16);
+    tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
+                       vec_full_reg_offset(s, rn),
+                       pred_full_reg_offset(s, pg),
+                       status, vsz, vsz, 0, fn);
+}
+
+static void trans_SCVTF_hh(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_scvt_hh);
+}
+
+static void trans_SCVTF_sh(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_scvt_sh);
+}
+
+static void trans_SCVTF_dh(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_scvt_dh);
+}
+
+static void trans_SCVTF_ss(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_scvt_ss);
+}
+
+static void trans_SCVTF_ds(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_scvt_ds);
+}
+
+static void trans_SCVTF_sd(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_scvt_sd);
+}
+
+static void trans_SCVTF_dd(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_scvt_dd);
+}
+
+static void trans_UCVTF_hh(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_ucvt_hh);
+}
+
+static void trans_UCVTF_sh(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_ucvt_sh);
+}
+
+static void trans_UCVTF_dh(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_ucvt_dh);
+}
+
+static void trans_UCVTF_ss(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_ucvt_ss);
+}
+
+static void trans_UCVTF_ds(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_ucvt_ds);
+}
+
+static void trans_UCVTF_sd(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_ucvt_sd);
+}
+
+static void trans_UCVTF_dd(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_ucvt_dd);
+}
+
 /*
  *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 6c906e25e9..b571b70050 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -134,6 +134,9 @@
 @rd_pg_rn	........ esz:2 ... ... ... pg:3 rn:5 rd:5	&rpr_esz
 @rd_pg4_pn	........ esz:2 ... ... .. pg:4 . rn:4 rd:5	&rpr_esz
 
+# One register operand, with governing predicate, no vector element size
+@rd_pg_rn_e0	........ .. ... ... ... pg:3 rn:5 rd:5		&rpr_esz esz=0
+
 # Two register operands with a 6-bit signed immediate.
 @rd_rn_i6	........ ... rn:5 ..... imm:s6 rd:5		&rri
 
@@ -689,6 +692,25 @@ FTSMUL		01100101 .. 0 ..... 000 011 ..... .....		@rd_rn_rm
 FRECPS		01100101 .. 0 ..... 000 110 ..... .....		@rd_rn_rm
 FRSQRTS		01100101 .. 0 ..... 000 111 ..... .....		@rd_rn_rm
 
+### SVE FP Unary Operations Predicated Group
+
+# SVE integer convert to floating-point
+SCVTF_hh	01100101 01 010 01 0 101 ... ..... .....	@rd_pg_rn_e0
+SCVTF_sh	01100101 01 010 10 0 101 ... ..... .....	@rd_pg_rn_e0
+SCVTF_dh	01100101 01 010 11 0 101 ... ..... .....	@rd_pg_rn_e0
+SCVTF_ss	01100101 10 010 10 0 101 ... ..... .....	@rd_pg_rn_e0
+SCVTF_sd	01100101 11 010 00 0 101 ... ..... .....	@rd_pg_rn_e0
+SCVTF_ds	01100101 11 010 10 0 101 ... ..... .....	@rd_pg_rn_e0
+SCVTF_dd	01100101 11 010 11 0 101 ... ..... .....	@rd_pg_rn_e0
+
+UCVTF_hh	01100101 01 010 01 1 101 ... ..... .....	@rd_pg_rn_e0
+UCVTF_sh	01100101 01 010 10 1 101 ... ..... .....	@rd_pg_rn_e0
+UCVTF_dh	01100101 01 010 11 1 101 ... ..... .....	@rd_pg_rn_e0
+UCVTF_ss	01100101 10 010 10 1 101 ... ..... .....	@rd_pg_rn_e0
+UCVTF_sd	01100101 11 010 00 1 101 ... ..... .....	@rd_pg_rn_e0
+UCVTF_ds	01100101 11 010 10 1 101 ... ..... .....	@rd_pg_rn_e0
+UCVTF_dd	01100101 11 010 11 1 101 ... ..... .....	@rd_pg_rn_e0
+
 ### SVE Memory - 32-bit Gather and Unsized Contiguous Group
 
 # SVE load predicate register
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 48/67] target/arm: Implement SVE floating-point arithmetic (predicated)
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (46 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 47/67] target/arm: Implement SVE integer convert to floating-point Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 13:50   ` Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 49/67] target/arm: Implement SVE FP Multiply-Add Group Richard Henderson
                   ` (20 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  77 ++++++++++++++++++++++++++++++++
 target/arm/sve_helper.c    | 107 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c |  47 ++++++++++++++++++++
 target/arm/sve.decode      |  17 +++++++
 4 files changed, 248 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index fb7609f9ef..84d0a8978c 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -720,6 +720,83 @@ DEF_HELPER_FLAGS_5(gvec_rsqrts_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_rsqrts_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_6(sve_fadd_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fadd_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fadd_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fsub_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fsub_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fsub_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fmul_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fmul_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fmul_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fdiv_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fdiv_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fdiv_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fmin_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fmin_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fmin_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fmax_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fmax_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fmax_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fminnum_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fminnum_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fminnum_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fmaxnum_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fmaxnum_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fmaxnum_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fabd_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fabd_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fabd_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fscalbn_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fscalbn_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fscalbn_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fmulx_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fmulx_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fmulx_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_scvt_hh, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_scvt_sh, TCG_CALL_NO_RWG,
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index a1e0ceb5fb..d80babfae7 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -2789,6 +2789,113 @@ uint32_t HELPER(sve_while)(void *vd, uint32_t count, uint32_t pred_desc)
     return predtest_ones(d, oprsz, esz_mask);
 }
 
+/* Fully general three-operand expander, controlled by a predicate,
+ * With the extra float_status parameter.
+ */
+#define DO_ZPZZ_FP(NAME, TYPE, H, OP)                           \
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg,       \
+                  void *status, uint32_t desc)                  \
+{                                                               \
+    intptr_t i, opr_sz = simd_oprsz(desc);                      \
+    for (i = 0; i < opr_sz; ) {                                 \
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));         \
+        do {                                                    \
+            if (pg & 1) {                                       \
+                TYPE nn = *(TYPE *)(vn + H(i));                 \
+                TYPE mm = *(TYPE *)(vm + H(i));                 \
+                *(TYPE *)(vd + H(i)) = OP(nn, mm, status);      \
+            }                                                   \
+            i += sizeof(TYPE), pg >>= sizeof(TYPE);             \
+        } while (i & 15);                                       \
+    }                                                           \
+}
+
+/* Similarly, specialized for 64-bit operands.  */
+#define DO_ZPZZ_FP_D(NAME, TYPE, OP)                            \
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg,       \
+                  void *status, uint32_t desc)                  \
+{                                                               \
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;                  \
+    TYPE *d = vd, *n = vn, *m = vm;                             \
+    uint8_t *pg = vg;                                           \
+    for (i = 0; i < opr_sz; i += 1) {                           \
+        if (pg[H1(i)] & 1) {                                    \
+            d[i] = OP(n[i], m[i], status);                      \
+        }                                                       \
+    }                                                           \
+}
+
+DO_ZPZZ_FP(sve_fadd_h, uint16_t, H1_2, float16_add)
+DO_ZPZZ_FP(sve_fadd_s, uint16_t, H1_4, float32_add)
+DO_ZPZZ_FP_D(sve_fadd_d, uint64_t, float64_add)
+
+DO_ZPZZ_FP(sve_fsub_h, uint16_t, H1_2, float16_sub)
+DO_ZPZZ_FP(sve_fsub_s, uint16_t, H1_4, float32_sub)
+DO_ZPZZ_FP_D(sve_fsub_d, uint64_t, float64_sub)
+
+DO_ZPZZ_FP(sve_fmul_h, uint16_t, H1_2, float16_mul)
+DO_ZPZZ_FP(sve_fmul_s, uint16_t, H1_4, float32_mul)
+DO_ZPZZ_FP_D(sve_fmul_d, uint64_t, float64_mul)
+
+DO_ZPZZ_FP(sve_fdiv_h, uint16_t, H1_2, float16_div)
+DO_ZPZZ_FP(sve_fdiv_s, uint16_t, H1_4, float32_div)
+DO_ZPZZ_FP_D(sve_fdiv_d, uint64_t, float64_div)
+
+DO_ZPZZ_FP(sve_fmin_h, uint16_t, H1_2, float16_min)
+DO_ZPZZ_FP(sve_fmin_s, uint16_t, H1_4, float32_min)
+DO_ZPZZ_FP_D(sve_fmin_d, uint64_t, float64_min)
+
+DO_ZPZZ_FP(sve_fmax_h, uint16_t, H1_2, float16_max)
+DO_ZPZZ_FP(sve_fmax_s, uint16_t, H1_4, float32_max)
+DO_ZPZZ_FP_D(sve_fmax_d, uint64_t, float64_max)
+
+DO_ZPZZ_FP(sve_fminnum_h, uint16_t, H1_2, float16_minnum)
+DO_ZPZZ_FP(sve_fminnum_s, uint16_t, H1_4, float32_minnum)
+DO_ZPZZ_FP_D(sve_fminnum_d, uint64_t, float64_minnum)
+
+DO_ZPZZ_FP(sve_fmaxnum_h, uint16_t, H1_2, float16_maxnum)
+DO_ZPZZ_FP(sve_fmaxnum_s, uint16_t, H1_4, float32_maxnum)
+DO_ZPZZ_FP_D(sve_fmaxnum_d, uint64_t, float64_maxnum)
+
+static inline uint16_t abd_h(float16 a, float16 b, float_status *s)
+{
+    return float16_abs(float16_sub(a, b, s));
+
+}
+
+static inline uint32_t abd_s(float32 a, float32 b, float_status *s)
+{
+    return float32_abs(float32_sub(a, b, s));
+
+}
+
+static inline uint64_t abd_d(float64 a, float64 b, float_status *s)
+{
+    return float64_abs(float64_sub(a, b, s));
+
+}
+
+DO_ZPZZ_FP(sve_fabd_h, uint16_t, H1_2, abd_h)
+DO_ZPZZ_FP(sve_fabd_s, uint16_t, H1_4, abd_s)
+DO_ZPZZ_FP_D(sve_fabd_d, uint64_t, abd_d)
+
+static inline uint64_t scalbn_d(float64 a, int64_t b, float_status *s)
+{
+    int b_int = MIN(MAX(b, INT_MIN), INT_MAX);
+    return float64_scalbn(a, b_int, s);
+}
+
+DO_ZPZZ_FP(sve_fscalbn_h, uint16_t, H1_2, float16_scalbn)
+DO_ZPZZ_FP(sve_fscalbn_s, uint16_t, H1_4, float32_scalbn)
+DO_ZPZZ_FP_D(sve_fscalbn_d, uint64_t, scalbn_d)
+
+DO_ZPZZ_FP(sve_fmulx_h, uint16_t, H1_2, helper_advsimd_mulxh)
+DO_ZPZZ_FP(sve_fmulx_s, uint16_t, H1_4, helper_vfp_mulxs)
+DO_ZPZZ_FP_D(sve_fmulx_d, uint64_t, helper_vfp_mulxd)
+
+#undef DO_ZPZZ_FP
+#undef DO_ZPZZ_FP_D
+
 /* Fully general two-operand expander, controlled by a predicate,
  * With the extra float_status parameter.
  */
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 05c684222e..1692980d20 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3161,6 +3161,52 @@ DO_FP3(FRSQRTS, rsqrts)
 
 #undef DO_FP3
 
+/*
+ *** SVE Floating Point Arithmetic - Predicated Group
+ */
+
+static void do_zpzz_fp(DisasContext *s, arg_rprr_esz *a,
+                       gen_helper_gvec_4_ptr *fn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_ptr status;
+
+    if (fn == NULL) {
+        unallocated_encoding(s);
+        return;
+    }
+    status = get_fpstatus_ptr(a->esz == MO_16);
+    tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       vec_full_reg_offset(s, a->rm),
+                       pred_full_reg_offset(s, a->pg),
+                       status, vsz, vsz, 0, fn);
+    tcg_temp_free_ptr(status);
+}
+
+#define DO_FP3(NAME, name) \
+static void trans_##NAME(DisasContext *s, arg_rprr_esz *a, uint32_t insn) \
+{                                                                   \
+    static gen_helper_gvec_4_ptr * const fns[4] = {                 \
+        NULL, gen_helper_sve_##name##_h,                            \
+        gen_helper_sve_##name##_s, gen_helper_sve_##name##_d        \
+    };                                                              \
+    do_zpzz_fp(s, a, fns[a->esz]);                                  \
+}
+
+DO_FP3(FADD_zpzz, fadd)
+DO_FP3(FSUB_zpzz, fsub)
+DO_FP3(FMUL_zpzz, fmul)
+DO_FP3(FMIN_zpzz, fmin)
+DO_FP3(FMAX_zpzz, fmax)
+DO_FP3(FMINNM_zpzz, fminnum)
+DO_FP3(FMAXNM_zpzz, fmaxnum)
+DO_FP3(FABD, fabd)
+DO_FP3(FSCALE, fscalbn)
+DO_FP3(FDIV, fdiv)
+DO_FP3(FMULX, fmulx)
+
+#undef DO_FP3
 
 /*
  *** SVE Floating Point Unary Operations Prediated Group
@@ -3181,6 +3227,7 @@ static void do_zpz_ptr(DisasContext *s, int rd, int rn, int pg,
                        vec_full_reg_offset(s, rn),
                        pred_full_reg_offset(s, pg),
                        status, vsz, vsz, 0, fn);
+    tcg_temp_free_ptr(status);
 }
 
 static void trans_SCVTF_hh(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index b571b70050..1a13c603ff 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -692,6 +692,23 @@ FTSMUL		01100101 .. 0 ..... 000 011 ..... .....		@rd_rn_rm
 FRECPS		01100101 .. 0 ..... 000 110 ..... .....		@rd_rn_rm
 FRSQRTS		01100101 .. 0 ..... 000 111 ..... .....		@rd_rn_rm
 
+### SVE FP Arithmetic Predicated Group
+
+# SVE floating-point arithmetic (predicated)
+FADD_zpzz	01100101 .. 00 0000 100 ... ..... .....    @rdn_pg_rm
+FSUB_zpzz	01100101 .. 00 0001 100 ... ..... .....    @rdn_pg_rm
+FMUL_zpzz	01100101 .. 00 0010 100 ... ..... .....    @rdn_pg_rm
+FSUB_zpzz	01100101 .. 00 0011 100 ... ..... .....    @rdm_pg_rn # FSUBR
+FMAXNM_zpzz	01100101 .. 00 0100 100 ... ..... .....    @rdn_pg_rm
+FMINNM_zpzz	01100101 .. 00 0101 100 ... ..... .....    @rdn_pg_rm
+FMAX_zpzz	01100101 .. 00 0110 100 ... ..... .....    @rdn_pg_rm
+FMIN_zpzz	01100101 .. 00 0111 100 ... ..... .....    @rdn_pg_rm
+FABD		01100101 .. 00 1000 100 ... ..... .....    @rdn_pg_rm
+FSCALE		01100101 .. 00 1001 100 ... ..... .....    @rdn_pg_rm
+FMULX		01100101 .. 00 1010 100 ... ..... .....    @rdn_pg_rm
+FDIV		01100101 .. 00 1100 100 ... ..... .....    @rdm_pg_rn # FDIVR
+FDIV		01100101 .. 00 1101 100 ... ..... .....    @rdn_pg_rm
+
 ### SVE FP Unary Operations Predicated Group
 
 # SVE integer convert to floating-point
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 49/67] target/arm: Implement SVE FP Multiply-Add Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (47 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 48/67] target/arm: Implement SVE floating-point arithmetic (predicated) Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 13:54   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 50/67] target/arm: Implement SVE Floating Point Accumulating Reduction Group Richard Henderson
                   ` (19 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 16 ++++++++++++++
 target/arm/sve_helper.c    | 53 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 41 +++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      | 17 +++++++++++++++
 4 files changed, 127 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 84d0a8978c..a95f077c7f 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -827,6 +827,22 @@ DEF_HELPER_FLAGS_5(sve_ucvt_ds, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(sve_ucvt_dd, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_3(sve_fmla_zpzzz_h, TCG_CALL_NO_RWG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_fmla_zpzzz_s, TCG_CALL_NO_RWG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_fmla_zpzzz_d, TCG_CALL_NO_RWG, void, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_fmls_zpzzz_h, TCG_CALL_NO_RWG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_fmls_zpzzz_s, TCG_CALL_NO_RWG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_fmls_zpzzz_d, TCG_CALL_NO_RWG, void, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_fnmla_zpzzz_h, TCG_CALL_NO_RWG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_fnmla_zpzzz_s, TCG_CALL_NO_RWG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_fnmla_zpzzz_d, TCG_CALL_NO_RWG, void, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(sve_fnmls_zpzzz_h, TCG_CALL_NO_RWG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_fnmls_zpzzz_s, TCG_CALL_NO_RWG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_fnmls_zpzzz_d, TCG_CALL_NO_RWG, void, env, ptr, i32)
+
 DEF_HELPER_FLAGS_4(sve_ld1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_ld2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_ld3bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index d80babfae7..6622275b44 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -2948,6 +2948,59 @@ DO_ZPZ_FP_D(sve_ucvt_dd, uint64_t, uint64_to_float64)
 #undef DO_ZPZ_FP
 #undef DO_ZPZ_FP_D
 
+/* 4-operand predicated multiply-add.  This requires 7 operands to pass
+ * "properly", so we need to encode some of the registers into DESC.
+ */
+QEMU_BUILD_BUG_ON(SIMD_DATA_SHIFT + 20 > 32);
+
+#define DO_FMLA(NAME, N, H, NEG1, NEG3)                                     \
+void HELPER(NAME)(CPUARMState *env, void *vg, uint32_t desc)                \
+{                                                                           \
+    intptr_t i = 0, opr_sz = simd_oprsz(desc);                              \
+    unsigned rd = extract32(desc, SIMD_DATA_SHIFT, 5);                      \
+    unsigned rn = extract32(desc, SIMD_DATA_SHIFT + 5, 5);                  \
+    unsigned rm = extract32(desc, SIMD_DATA_SHIFT + 10, 5);                 \
+    unsigned ra = extract32(desc, SIMD_DATA_SHIFT + 15, 5);                 \
+    void *vd = &env->vfp.zregs[rd];                                         \
+    void *vn = &env->vfp.zregs[rn];                                         \
+    void *vm = &env->vfp.zregs[rm];                                         \
+    void *va = &env->vfp.zregs[ra];                                         \
+    do {                                                                    \
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));                     \
+        do {                                                                \
+            if (likely(pg & 1)) {                                           \
+                float##N e1 = *(uint##N##_t *)(vn + H(i));                  \
+                float##N e2 = *(uint##N##_t *)(vm + H(i));                  \
+                float##N e3 = *(uint##N##_t *)(va + H(i));                  \
+                float##N r;                                                 \
+                if (NEG1) e1 = float##N##_chs(e1);                          \
+                if (NEG3) e3 = float##N##_chs(e3);                          \
+                r = float##N##_muladd(e1, e2, e3, 0, &env->vfp.fp_status);  \
+                *(uint##N##_t *)(vd + H(i)) = r;                            \
+            }                                                               \
+            i += sizeof(float##N), pg >>= sizeof(float##N);                 \
+        } while (i & 15);                                                   \
+    } while (i < opr_sz);                                                   \
+}
+
+DO_FMLA(sve_fmla_zpzzz_h, 16, H1_2, 0, 0)
+DO_FMLA(sve_fmla_zpzzz_s, 32, H1_4, 0, 0)
+DO_FMLA(sve_fmla_zpzzz_d, 64, , 0, 0)
+
+DO_FMLA(sve_fmls_zpzzz_h, 16, H1_2, 0, 1)
+DO_FMLA(sve_fmls_zpzzz_s, 32, H1_4, 0, 1)
+DO_FMLA(sve_fmls_zpzzz_d, 64, , 0, 1)
+
+DO_FMLA(sve_fnmla_zpzzz_h, 16, H1_2, 1, 0)
+DO_FMLA(sve_fnmla_zpzzz_s, 32, H1_4, 1, 0)
+DO_FMLA(sve_fnmla_zpzzz_d, 64, , 1, 0)
+
+DO_FMLA(sve_fnmls_zpzzz_h, 16, H1_2, 1, 1)
+DO_FMLA(sve_fnmls_zpzzz_s, 32, H1_4, 1, 1)
+DO_FMLA(sve_fnmls_zpzzz_d, 64, , 1, 1)
+
+#undef DO_FMLA
+
 /*
  * Load contiguous data, protected by a governing predicate.
  */
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 1692980d20..3124368fb5 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3208,6 +3208,47 @@ DO_FP3(FMULX, fmulx)
 
 #undef DO_FP3
 
+typedef void gen_helper_sve_fmla(TCGv_env, TCGv_ptr, TCGv_i32);
+
+static void do_fmla(DisasContext *s, arg_rprrr_esz *a, gen_helper_sve_fmla *fn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    unsigned desc;
+    TCGv_i32 t_desc;
+    TCGv_ptr pg = tcg_temp_new_ptr();
+
+    /* We would need 7 operands to pass these arguments "properly".
+     * So we encode all the register numbers into the descriptor.
+     */
+    desc = deposit32(a->rd, 5, 5, a->rn);
+    desc = deposit32(desc, 10, 5, a->rm);
+    desc = deposit32(desc, 15, 5, a->ra);
+    desc = simd_desc(vsz, vsz, desc);
+
+    t_desc = tcg_const_i32(desc);
+    tcg_gen_addi_ptr(pg, cpu_env, pred_full_reg_offset(s, a->pg));
+    fn(cpu_env, pg, t_desc);
+    tcg_temp_free_i32(t_desc);
+    tcg_temp_free_ptr(pg);
+}
+
+#define DO_FMLA(NAME, name) \
+static void trans_##NAME(DisasContext *s, arg_rprrr_esz *a, uint32_t insn) \
+{                                                                    \
+    static gen_helper_sve_fmla * const fns[4] = {                    \
+        NULL, gen_helper_sve_##name##_h,                             \
+        gen_helper_sve_##name##_s, gen_helper_sve_##name##_d         \
+    };                                                               \
+    do_fmla(s, a, fns[a->esz]);                                      \
+}
+
+DO_FMLA(FMLA_zpzzz, fmla_zpzzz)
+DO_FMLA(FMLS_zpzzz, fmls_zpzzz)
+DO_FMLA(FNMLA_zpzzz, fnmla_zpzzz)
+DO_FMLA(FNMLS_zpzzz, fnmls_zpzzz)
+
+#undef DO_FMLA
+
 /*
  *** SVE Floating Point Unary Operations Prediated Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 1a13c603ff..817833f96e 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -129,6 +129,8 @@
 		&rprrr_esz ra=%reg_movprfx
 @rdn_pg_ra_rm	........ esz:2 . rm:5  ... pg:3 ra:5 rd:5 \
 		&rprrr_esz rn=%reg_movprfx
+@rdn_pg_rm_ra	........ esz:2 . ra:5  ... pg:3 rm:5 rd:5 \
+		&rprrr_esz rn=%reg_movprfx
 
 # One register operand, with governing predicate, vector element size
 @rd_pg_rn	........ esz:2 ... ... ... pg:3 rn:5 rd:5	&rpr_esz
@@ -709,6 +711,21 @@ FMULX		01100101 .. 00 1010 100 ... ..... .....    @rdn_pg_rm
 FDIV		01100101 .. 00 1100 100 ... ..... .....    @rdm_pg_rn # FDIVR
 FDIV		01100101 .. 00 1101 100 ... ..... .....    @rdn_pg_rm
 
+### SVE FP Multiply-Add Group
+
+# SVE floating-point multiply-accumulate writing addend
+FMLA_zpzzz	01100101 .. 1 ..... 000 ... ..... .....		@rda_pg_rn_rm
+FMLS_zpzzz	01100101 .. 1 ..... 001 ... ..... .....		@rda_pg_rn_rm
+FNMLA_zpzzz	01100101 .. 1 ..... 010 ... ..... .....		@rda_pg_rn_rm
+FNMLS_zpzzz	01100101 .. 1 ..... 011 ... ..... .....		@rda_pg_rn_rm
+
+# SVE floating-point multiply-accumulate writing multiplicand
+# FMAD, FMSB, FNMAD, FNMS
+FMLA_zpzzz	01100101 .. 1 ..... 100 ... ..... .....		@rdn_pg_rm_ra
+FMLS_zpzzz	01100101 .. 1 ..... 101 ... ..... .....		@rdn_pg_rm_ra
+FNMLA_zpzzz	01100101 .. 1 ..... 110 ... ..... .....		@rdn_pg_rm_ra
+FNMLS_zpzzz	01100101 .. 1 ..... 111 ... ..... .....		@rdn_pg_rm_ra
+
 ### SVE FP Unary Operations Predicated Group
 
 # SVE integer convert to floating-point
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 50/67] target/arm: Implement SVE Floating Point Accumulating Reduction Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (48 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 49/67] target/arm: Implement SVE FP Multiply-Add Group Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 13:59   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 51/67] target/arm: Implement SVE load and broadcast element Richard Henderson
                   ` (18 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  7 ++++++
 target/arm/sve_helper.c    | 56 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 42 ++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      |  5 +++++
 4 files changed, 110 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index a95f077c7f..c4502256d5 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -720,6 +720,13 @@ DEF_HELPER_FLAGS_5(gvec_rsqrts_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_rsqrts_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(sve_fadda_h, TCG_CALL_NO_RWG,
+                   i64, i64, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fadda_s, TCG_CALL_NO_RWG,
+                   i64, i64, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fadda_d, TCG_CALL_NO_RWG,
+                   i64, i64, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_6(sve_fadd_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_6(sve_fadd_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 6622275b44..0e2b3091b0 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -2789,6 +2789,62 @@ uint32_t HELPER(sve_while)(void *vd, uint32_t count, uint32_t pred_desc)
     return predtest_ones(d, oprsz, esz_mask);
 }
 
+uint64_t HELPER(sve_fadda_h)(uint64_t nn, void *vm, void *vg,
+                             void *status, uint32_t desc)
+{
+    intptr_t i = 0, opr_sz = simd_oprsz(desc);
+    float16 result = nn;
+
+    do {
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+        do {
+            if (pg & 1) {
+                float16 mm = *(float16 *)(vm + H1_2(i));
+                result = float16_add(result, mm, status);
+            }
+            i += sizeof(float16), pg >>= sizeof(float16);
+        } while (i & 15);
+    } while (i < opr_sz);
+
+    return result;
+}
+
+uint64_t HELPER(sve_fadda_s)(uint64_t nn, void *vm, void *vg,
+                             void *status, uint32_t desc)
+{
+    intptr_t i = 0, opr_sz = simd_oprsz(desc);
+    float32 result = nn;
+
+    do {
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+        do {
+            if (pg & 1) {
+                float32 mm = *(float32 *)(vm + H1_2(i));
+                result = float32_add(result, mm, status);
+            }
+            i += sizeof(float32), pg >>= sizeof(float32);
+        } while (i & 15);
+    } while (i < opr_sz);
+
+    return result;
+}
+
+uint64_t HELPER(sve_fadda_d)(uint64_t nn, void *vm, void *vg,
+                             void *status, uint32_t desc)
+{
+    intptr_t i = 0, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *m = vm;
+    uint8_t *pg = vg;
+
+    for (i = 0; i < opr_sz; i++) {
+        if (pg[H1(i)] & 1) {
+            nn = float64_add(nn, m[i], status);
+        }
+    }
+
+    return nn;
+}
+
 /* Fully general three-operand expander, controlled by a predicate,
  * With the extra float_status parameter.
  */
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 3124368fb5..32f0340738 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3120,6 +3120,48 @@ DO_ZZI(UMIN, umin)
 
 #undef DO_ZZI
 
+/*
+ *** SVE Floating Point Accumulating Reduction Group
+ */
+
+static void trans_FADDA(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
+{
+    typedef void fadda_fn(TCGv_i64, TCGv_i64, TCGv_ptr,
+                          TCGv_ptr, TCGv_ptr, TCGv_i32);
+    static fadda_fn * const fns[3] = {
+        gen_helper_sve_fadda_h,
+        gen_helper_sve_fadda_s,
+        gen_helper_sve_fadda_d,
+    };
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_ptr t_rm, t_pg, t_fpst;
+    TCGv_i64 t_val;
+    TCGv_i32 t_desc;
+
+    if (a->esz == 0) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    t_val = load_esz(cpu_env, vec_reg_offset(s, a->rn, 0, a->esz), a->esz);
+    t_rm = tcg_temp_new_ptr();
+    t_pg = tcg_temp_new_ptr();
+    tcg_gen_addi_ptr(t_rm, cpu_env, vec_full_reg_offset(s, a->rm));
+    tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, a->pg));
+    t_fpst = get_fpstatus_ptr(a->esz == MO_16);
+    t_desc = tcg_const_i32(simd_desc(vsz, vsz, 0));
+
+    fns[a->esz - 1](t_val, t_val, t_rm, t_pg, t_fpst, t_desc);
+
+    tcg_temp_free_i32(t_desc);
+    tcg_temp_free_ptr(t_fpst);
+    tcg_temp_free_ptr(t_pg);
+    tcg_temp_free_ptr(t_rm);
+
+    write_fp_dreg(s, a->rd, t_val);
+    tcg_temp_free_i64(t_val);
+}
+
 /*
  *** SVE Floating Point Arithmetic - Unpredicated Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 817833f96e..95a290aed0 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -684,6 +684,11 @@ UMIN_zzi	00100101 .. 101 011 110 ........ .....		@rdn_i8u
 # SVE integer multiply immediate (unpredicated)
 MUL_zzi		00100101 .. 110 000 110 ........ .....		@rdn_i8s
 
+### SVE FP Accumulating Reduction Group
+
+# SVE floating-point serial reduction (predicated)
+FADDA		01100101 .. 011 000 001 ... ..... .....		@rdn_pg_rm
+
 ### SVE Floating Point Arithmetic - Unpredicated Group
 
 # SVE floating-point arithmetic (unpredicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 51/67] target/arm: Implement SVE load and broadcast element
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (49 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 50/67] target/arm: Implement SVE Floating Point Accumulating Reduction Group Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 14:15   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 52/67] target/arm: Implement SVE store vector/predicate register Richard Henderson
                   ` (17 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  5 +++++
 target/arm/sve_helper.c    | 43 ++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 55 +++++++++++++++++++++++++++++++++++++++++++++-
 target/arm/sve.decode      |  5 +++++
 4 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index c4502256d5..6c640a92ff 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -274,6 +274,11 @@ DEF_HELPER_FLAGS_3(sve_clr_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_clr_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_clr_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_3(sve_clri_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_clri_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_clri_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(sve_clri_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(sve_asr_zpzi_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_asr_zpzi_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_asr_zpzi_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 0e2b3091b0..a7dc6f6164 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -994,6 +994,49 @@ void HELPER(sve_clr_d)(void *vd, void *vg, uint32_t desc)
     }
 }
 
+/* Store zero into every inactive element of Zd.  */
+void HELPER(sve_clri_b)(void *vd, void *vg, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd;
+    uint8_t *pg = vg;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] &= expand_pred_b(pg[H1(i)]);
+    }
+}
+
+void HELPER(sve_clri_h)(void *vd, void *vg, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd;
+    uint8_t *pg = vg;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] &= expand_pred_h(pg[H1(i)]);
+    }
+}
+
+void HELPER(sve_clri_s)(void *vd, void *vg, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd;
+    uint8_t *pg = vg;
+    for (i = 0; i < opr_sz; i += 1) {
+        d[i] &= expand_pred_s(pg[H1(i)]);
+    }
+}
+
+void HELPER(sve_clri_d)(void *vd, void *vg, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+    uint64_t *d = vd;
+    uint8_t *pg = vg;
+    for (i = 0; i < opr_sz; i += 1) {
+        if (!(pg[H1(i)] & 1)) {
+            d[i] = 0;
+        }
+    }
+}
+
 /* Three-operand expander, immediate operand, controlled by a predicate.
  */
 #define DO_ZPZI(NAME, TYPE, H, OP)                              \
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 32f0340738..b000a2482e 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -584,6 +584,19 @@ static void do_clr_zp(DisasContext *s, int rd, int pg, int esz)
                        vsz, vsz, 0, fns[esz]);
 }
 
+/* Store zero into every inactive element of Zd.  */
+static void do_clr_inactive_zp(DisasContext *s, int rd, int pg, int esz)
+{
+    static gen_helper_gvec_2 * const fns[4] = {
+        gen_helper_sve_clri_b, gen_helper_sve_clri_h,
+        gen_helper_sve_clri_s, gen_helper_sve_clri_d,
+    };
+    unsigned vsz = vec_full_reg_size(s);
+    tcg_gen_gvec_2_ool(vec_full_reg_offset(s, rd),
+                       pred_full_reg_offset(s, pg),
+                       vsz, vsz, 0, fns[esz]);
+}
+
 static void do_zpzi_ool(DisasContext *s, arg_rpri_esz *a,
                         gen_helper_gvec_3 *fn)
 {
@@ -3506,7 +3519,7 @@ static void trans_LDR_pri(DisasContext *s, arg_rri *a, uint32_t insn)
  *** SVE Memory - Contiguous Load Group
  */
 
-/* The memory element size of dtype.  */
+/* The memory mode of the dtype.  */
 static const TCGMemOp dtype_mop[16] = {
     MO_UB, MO_UB, MO_UB, MO_UB,
     MO_SL, MO_UW, MO_UW, MO_UW,
@@ -3671,6 +3684,46 @@ static void trans_LD1RQ_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
     do_ldrq(s, a->rd, a->pg, addr, dtype_msz(a->dtype));
 }
 
+/* Load and broadcast element.  */
+static void trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    unsigned psz = pred_full_reg_size(s);
+    unsigned esz = dtype_esz[a->dtype];
+    TCGLabel *over = gen_new_label();
+    TCGv_i64 temp;
+
+    /* If the guarding predicate has no bits set, no load occurs.  */
+    if (psz <= 8) {
+        temp = tcg_temp_new_i64();
+        tcg_gen_ld_i64(temp, cpu_env, pred_full_reg_offset(s, a->pg));
+        tcg_gen_andi_i64(temp, temp,
+                         deposit64(0, 0, psz * 8, pred_esz_masks[esz]));
+        tcg_gen_brcondi_i64(TCG_COND_EQ, temp, 0, over);
+        tcg_temp_free_i64(temp);
+    } else {
+        TCGv_i32 t32 = tcg_temp_new_i32();
+        find_last_active(s, t32, esz, a->pg);
+        tcg_gen_brcondi_i32(TCG_COND_LT, t32, 0, over);
+        tcg_temp_free_i32(t32);
+    }
+
+    /* Load the data.  */
+    temp = tcg_temp_new_i64();
+    tcg_gen_addi_i64(temp, cpu_reg_sp(s, a->rn), a->imm);
+    tcg_gen_qemu_ld_i64(temp, temp, get_mem_index(s),
+                        s->be_data | dtype_mop[a->dtype]);
+
+    /* Broadcast to *all* elements.  */
+    tcg_gen_gvec_dup_i64(esz, vec_full_reg_offset(s, a->rd),
+                         vsz, vsz, temp);
+    tcg_temp_free_i64(temp);
+
+    /* Zero the inactive elements.  */
+    gen_set_label(over);
+    do_clr_inactive_zp(s, a->rd, a->pg, esz);
+}
+
 static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
                       int msz, int esz, int nreg)
 {
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 95a290aed0..3e30985a09 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -29,6 +29,7 @@
 %imm9_16_10	16:s6 10:3
 %preg4_5	5:4
 %size_23	23:2
+%dtype_23_13	23:2 13:2
 
 # A combination of tsz:imm3 -- extract esize.
 %tszimm_esz	22:2 5:5 !function=tszimm_esz
@@ -758,6 +759,10 @@ LDR_pri		10000101 10 ...... 000 ... ..... 0 ....		@pd_rn_i9
 # SVE load vector register
 LDR_zri		10000101 10 ...... 010 ... ..... .....		@rd_rn_i9
 
+# SVE load and broadcast element
+LD1R_zpri	1000010 .. 1 imm:6 1.. pg:3 rn:5 rd:5 \
+		&rpri_load dtype=%dtype_23_13 nreg=0
+
 ### SVE Memory Contiguous Load Group
 
 # SVE contiguous load (scalar plus scalar)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 52/67] target/arm: Implement SVE store vector/predicate register
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (50 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 51/67] target/arm: Implement SVE load and broadcast element Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 14:21   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 53/67] target/arm: Implement SVE scatter stores Richard Henderson
                   ` (16 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 101 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      |   6 +++
 2 files changed, 107 insertions(+)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index b000a2482e..9c724980a0 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3501,6 +3501,95 @@ static void do_ldr(DisasContext *s, uint32_t vofs, uint32_t len,
     tcg_temp_free_i64(t0);
 }
 
+/* Similarly for stores.  */
+static void do_str(DisasContext *s, uint32_t vofs, uint32_t len,
+                   int rn, int imm)
+{
+    uint32_t len_align = QEMU_ALIGN_DOWN(len, 8);
+    uint32_t len_remain = len % 8;
+    uint32_t nparts = len / 8 + ctpop8(len_remain);
+    int midx = get_mem_index(s);
+    TCGv_i64 addr, t0;
+
+    addr = tcg_temp_new_i64();
+    t0 = tcg_temp_new_i64();
+
+    /* Note that unpredicated load/store of vector/predicate registers
+     * are defined as a stream of bytes, which equates to little-endian
+     * operations on larger quantities.  There is no nice way to force
+     * a little-endian load for aarch64_be-linux-user out of line.
+     *
+     * Attempt to keep code expansion to a minimum by limiting the
+     * amount of unrolling done.
+     */
+    if (nparts <= 4) {
+        int i;
+
+        for (i = 0; i < len_align; i += 8) {
+            tcg_gen_ld_i64(t0, cpu_env, vofs + i);
+            tcg_gen_addi_i64(addr, cpu_reg_sp(s, rn), imm + i);
+            tcg_gen_qemu_st_i64(t0, addr, midx, MO_LEQ);
+        }
+    } else {
+        TCGLabel *loop = gen_new_label();
+        TCGv_ptr i = TCGV_NAT_TO_PTR(glue(tcg_const_local_, ptr)(0));
+        TCGv_ptr src;
+
+        gen_set_label(loop);
+
+        src = tcg_temp_new_ptr();
+        tcg_gen_add_ptr(src, cpu_env, i);
+        tcg_gen_ld_i64(t0, src, vofs);
+
+        /* Minimize the number of local temps that must be re-read from
+         * the stack each iteration.  Instead, re-compute values other
+         * than the loop counter.
+         */
+        tcg_gen_addi_ptr(src, i, imm);
+#if UINTPTR_MAX == UINT32_MAX
+        tcg_gen_extu_i32_i64(addr, TCGV_PTR_TO_NAT(src));
+        tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, rn));
+#else
+        tcg_gen_add_i64(addr, TCGV_PTR_TO_NAT(src), cpu_reg_sp(s, rn));
+#endif
+        tcg_temp_free_ptr(src);
+
+        tcg_gen_qemu_st_i64(t0, addr, midx, MO_LEQ);
+
+        tcg_gen_addi_ptr(i, i, 8);
+
+        glue(tcg_gen_brcondi_, ptr)(TCG_COND_LTU, TCGV_PTR_TO_NAT(i),
+                                   len_align, loop);
+        tcg_temp_free_ptr(i);
+    }
+
+    /* Predicate register stores can be any multiple of 2.  */
+    if (len_remain) {
+        tcg_gen_ld_i64(t0, cpu_env, vofs + len_align);
+        tcg_gen_addi_i64(addr, cpu_reg_sp(s, rn), imm + len_align);
+
+        switch (len_remain) {
+        case 2:
+        case 4:
+        case 8:
+            tcg_gen_qemu_st_i64(t0, addr, midx, MO_LE | ctz32(len_remain));
+            break;
+
+        case 6:
+            tcg_gen_qemu_st_i64(t0, addr, midx, MO_LEUL);
+            tcg_gen_addi_i64(addr, addr, 4);
+            tcg_gen_shri_i64(addr, addr, 32);
+            tcg_gen_qemu_st_i64(t0, addr, midx, MO_LEUW);
+            break;
+
+        default:
+            g_assert_not_reached();
+        }
+    }
+    tcg_temp_free_i64(addr);
+    tcg_temp_free_i64(t0);
+}
+
 #undef ptr
 
 static void trans_LDR_zri(DisasContext *s, arg_rri *a, uint32_t insn)
@@ -3515,6 +3604,18 @@ static void trans_LDR_pri(DisasContext *s, arg_rri *a, uint32_t insn)
     do_ldr(s, pred_full_reg_offset(s, a->rd), size, a->rn, a->imm * size);
 }
 
+static void trans_STR_zri(DisasContext *s, arg_rri *a, uint32_t insn)
+{
+    int size = vec_full_reg_size(s);
+    do_str(s, vec_full_reg_offset(s, a->rd), size, a->rn, a->imm * size);
+}
+
+static void trans_STR_pri(DisasContext *s, arg_rri *a, uint32_t insn)
+{
+    int size = pred_full_reg_size(s);
+    do_str(s, pred_full_reg_offset(s, a->rd), size, a->rn, a->imm * size);
+}
+
 /*
  *** SVE Memory - Contiguous Load Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 3e30985a09..5d8e1481d7 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -800,6 +800,12 @@ LD1RQ_zpri	1010010 .. 00 0.... 001 ... ..... ..... \
 
 ### SVE Memory Store Group
 
+# SVE store predicate register
+STR_pri		1110010 11 0.     ..... 000 ... ..... 0 ....	@pd_rn_i9
+
+# SVE store vector register
+STR_zri		1110010 11 0.     ..... 010 ... ..... .....	@rd_rn_i9
+
 # SVE contiguous store (scalar plus immediate)
 # ST1B, ST1H, ST1W, ST1D; require msz <= esz
 ST_zpri		1110010 .. esz:2  0.... 111 ... ..... ..... \
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 53/67] target/arm: Implement SVE scatter stores
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (51 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 52/67] target/arm: Implement SVE store vector/predicate register Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 14:36   ` Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 54/67] target/arm: Implement SVE prefetches Richard Henderson
                   ` (15 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 41 ++++++++++++++++++++++++++
 target/arm/sve_helper.c    | 62 ++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      | 39 +++++++++++++++++++++++++
 4 files changed, 213 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 6c640a92ff..b5c093f2fd 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -918,3 +918,44 @@ DEF_HELPER_FLAGS_4(sve_st1hs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_st1hd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_4(sve_st1sd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_6(sve_stbs_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_sths_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stss_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_6(sve_stbs_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_sths_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stss_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_6(sve_stbd_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_sthd_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stsd_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stdd_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_6(sve_stbd_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_sthd_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stsd_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stdd_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_6(sve_stbd_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_sthd_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stsd_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stdd_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index a7dc6f6164..07b3d285f2 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -3545,3 +3545,65 @@ void HELPER(sve_st4dd_r)(CPUARMState *env, void *vg,
         addr += 4 * 8;
     }
 }
+
+/* Stores with a vector index.  */
+
+#define DO_ST1_ZPZ_S(NAME, TYPEI, FN)                                   \
+void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm,       \
+                  target_ulong base, uint32_t desc)                     \
+{                                                                       \
+    intptr_t i, oprsz = simd_oprsz(desc) / 8;                           \
+    unsigned scale = simd_data(desc);                                   \
+    uintptr_t ra = GETPC();                                             \
+    uint32_t *d = vd; TYPEI *m = vm; uint8_t *pg = vg;                  \
+    for (i = 0; i < oprsz; i++) {                                       \
+        uint8_t pp = pg[H1(i)];                                         \
+        if (pp & 0x01) {                                                \
+            target_ulong off = (target_ulong)m[H4(i * 2)] << scale;     \
+            FN(env, base + off, d[H4(i * 2)], ra);                      \
+        }                                                               \
+        if (pp & 0x10) {                                                \
+            target_ulong off = (target_ulong)m[H4(i * 2 + 1)] << scale; \
+            FN(env, base + off, d[H4(i * 2 + 1)], ra);                  \
+        }                                                               \
+    }                                                                   \
+}
+
+#define DO_ST1_ZPZ_D(NAME, TYPEI, FN)                                   \
+void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm,       \
+                  target_ulong base, uint32_t desc)                     \
+{                                                                       \
+    intptr_t i, oprsz = simd_oprsz(desc) / 8;                           \
+    unsigned scale = simd_data(desc);                                   \
+    uintptr_t ra = GETPC();                                             \
+    uint64_t *d = vd, *m = vm; uint8_t *pg = vg;                        \
+    for (i = 0; i < oprsz; i++) {                                       \
+        if (pg[H1(i)] & 1) {                                            \
+            target_ulong off = (target_ulong)(TYPEI)m[i] << scale;      \
+            FN(env, base + off, d[i], ra);                              \
+        }                                                               \
+    }                                                                   \
+}
+
+DO_ST1_ZPZ_S(sve_stbs_zsu, uint32_t, cpu_stb_data_ra)
+DO_ST1_ZPZ_S(sve_sths_zsu, uint32_t, cpu_stw_data_ra)
+DO_ST1_ZPZ_S(sve_stss_zsu, uint32_t, cpu_stl_data_ra)
+
+DO_ST1_ZPZ_S(sve_stbs_zss, int32_t, cpu_stb_data_ra)
+DO_ST1_ZPZ_S(sve_sths_zss, int32_t, cpu_stw_data_ra)
+DO_ST1_ZPZ_S(sve_stss_zss, int32_t, cpu_stl_data_ra)
+
+DO_ST1_ZPZ_D(sve_stbd_zsu, uint32_t, cpu_stb_data_ra)
+DO_ST1_ZPZ_D(sve_sthd_zsu, uint32_t, cpu_stw_data_ra)
+DO_ST1_ZPZ_D(sve_stsd_zsu, uint32_t, cpu_stl_data_ra)
+DO_ST1_ZPZ_D(sve_stdd_zsu, uint32_t, cpu_stq_data_ra)
+
+DO_ST1_ZPZ_D(sve_stbd_zss, int32_t, cpu_stb_data_ra)
+DO_ST1_ZPZ_D(sve_sthd_zss, int32_t, cpu_stw_data_ra)
+DO_ST1_ZPZ_D(sve_stsd_zss, int32_t, cpu_stl_data_ra)
+DO_ST1_ZPZ_D(sve_stdd_zss, int32_t, cpu_stq_data_ra)
+
+DO_ST1_ZPZ_D(sve_stbd_zd, uint64_t, cpu_stb_data_ra)
+DO_ST1_ZPZ_D(sve_sthd_zd, uint64_t, cpu_stw_data_ra)
+DO_ST1_ZPZ_D(sve_stsd_zd, uint64_t, cpu_stl_data_ra)
+DO_ST1_ZPZ_D(sve_stdd_zd, uint64_t, cpu_stq_data_ra)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 9c724980a0..ca49b94924 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -47,6 +47,8 @@ typedef void gen_helper_gvec_flags_4(TCGv_i32, TCGv_ptr, TCGv_ptr,
                                      TCGv_ptr, TCGv_ptr, TCGv_i32);
 
 typedef void gen_helper_gvec_mem(TCGv_env, TCGv_ptr, TCGv_i64, TCGv_i32);
+typedef void gen_helper_gvec_mem_scatter(TCGv_env, TCGv_ptr, TCGv_ptr,
+                                         TCGv_ptr, TCGv_i64, TCGv_i32);
 
 /*
  * Helpers for extracting complex instruction fields.
@@ -3887,3 +3889,72 @@ static void trans_ST_zpri(DisasContext *s, arg_rpri_store *a, uint32_t insn)
                      (a->imm * elements * (a->nreg + 1)) << a->msz);
     do_st_zpa(s, a->rd, a->pg, addr, a->msz, a->esz, a->nreg);
 }
+
+/*
+ *** SVE gather loads / scatter stores
+ */
+
+static void do_mem_zpz(DisasContext *s, int zt, int pg, int zm, int scale,
+                       TCGv_i64 scalar, gen_helper_gvec_mem_scatter *fn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_i32 desc = tcg_const_i32(simd_desc(vsz, vsz, scale));
+    TCGv_ptr t_zm = tcg_temp_new_ptr();
+    TCGv_ptr t_pg = tcg_temp_new_ptr();
+    TCGv_ptr t_zt = tcg_temp_new_ptr();
+
+    tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg));
+    tcg_gen_addi_ptr(t_zm, cpu_env, vec_full_reg_offset(s, zm));
+    tcg_gen_addi_ptr(t_zt, cpu_env, vec_full_reg_offset(s, zt));
+    fn(cpu_env, t_zt, t_pg, t_zm, scalar, desc);
+
+    tcg_temp_free_ptr(t_zt);
+    tcg_temp_free_ptr(t_zm);
+    tcg_temp_free_ptr(t_pg);
+    tcg_temp_free_i32(desc);
+}
+
+static void trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn)
+{
+    /* Indexed by [xs][msz].  */
+    static gen_helper_gvec_mem_scatter * const fn32[2][3] = {
+        { gen_helper_sve_stbs_zsu,
+          gen_helper_sve_sths_zsu,
+          gen_helper_sve_stss_zsu, },
+        { gen_helper_sve_stbs_zss,
+          gen_helper_sve_sths_zss,
+          gen_helper_sve_stss_zss, },
+    };
+    static gen_helper_gvec_mem_scatter * const fn64[3][4] = {
+        { gen_helper_sve_stbd_zsu,
+          gen_helper_sve_sthd_zsu,
+          gen_helper_sve_stsd_zsu,
+          gen_helper_sve_stdd_zsu, },
+        { gen_helper_sve_stbd_zss,
+          gen_helper_sve_sthd_zss,
+          gen_helper_sve_stsd_zss,
+          gen_helper_sve_stdd_zss, },
+        { gen_helper_sve_stbd_zd,
+          gen_helper_sve_sthd_zd,
+          gen_helper_sve_stsd_zd,
+          gen_helper_sve_stdd_zd, },
+    };
+    gen_helper_gvec_mem_scatter *fn;
+
+    if (a->esz < a->msz || (a->msz == 0 && a->scale)) {
+        unallocated_encoding(s);
+        return;
+    }
+    switch (a->esz) {
+    case MO_32:
+        fn = fn32[a->xs][a->msz];
+        break;
+    case MO_64:
+        fn = fn64[a->xs][a->msz];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    do_mem_zpz(s, a->rd, a->pg, a->rm, a->scale * a->msz,
+               cpu_reg_sp(s, a->rn), fn);
+}
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 5d8e1481d7..edd9340c02 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -81,6 +81,7 @@
 &rpri_load	rd pg rn imm dtype nreg
 &rprr_store	rd pg rn rm msz esz nreg
 &rpri_store	rd pg rn imm msz esz nreg
+&rprr_scatter_store	rd pg rn rm esz msz xs scale
 
 ###########################################################################
 # Named instruction formats.  These are generally used to
@@ -199,6 +200,8 @@
 @rpri_store_msz     ....... msz:2 .. . imm:s4 ... pg:3 rn:5 rd:5    &rpri_store
 @rprr_store_esz_n0  ....... ..    esz:2  rm:5 ... pg:3 rn:5 rd:5 \
 		    &rprr_store nreg=0
+@rprr_scatter_store ....... msz:2 ..     rm:5 ... pg:3 rn:5 rd:5 \
+		    &rprr_scatter_store
 
 ###########################################################################
 # Instruction patterns.  Grouped according to the SVE encodingindex.xhtml.
@@ -832,3 +835,39 @@ ST_zpri		1110010 .. nreg:2 1.... 111 ... ..... ..... \
 # SVE store multiple structures (scalar plus scalar)         (nreg != 0)
 ST_zprr		1110010 msz:2 nreg:2 ..... 011 ... ..... ..... \
 		@rprr_store esz=%size_23
+
+# SVE 32-bit scatter store (scalar plus 32-bit scaled offsets)
+# Require msz > 0 && msz <= esz.
+ST1_zprz	1110010 .. 11 ..... 100 ... ..... ..... \
+		@rprr_scatter_store xs=0 esz=2 scale=1
+ST1_zprz	1110010 .. 11 ..... 110 ... ..... ..... \
+		@rprr_scatter_store xs=1 esz=2 scale=1
+
+# SVE 32-bit scatter store (scalar plus 32-bit unscaled offsets)
+# Require msz <= esz.
+ST1_zprz	1110010 .. 10 ..... 100 ... ..... ..... \
+		@rprr_scatter_store xs=0 esz=2 scale=0
+ST1_zprz	1110010 .. 10 ..... 110 ... ..... ..... \
+		@rprr_scatter_store xs=1 esz=2 scale=0
+
+# SVE 64-bit scatter store (scalar plus 64-bit scaled offset)
+# Require msz > 0
+ST1_zprz	1110010 .. 01 ..... 101 ... ..... ..... \
+		@rprr_scatter_store xs=2 esz=3 scale=1
+
+# SVE 64-bit scatter store (scalar plus 64-bit unscaled offset)
+ST1_zprz	1110010 .. 00 ..... 101 ... ..... ..... \
+		@rprr_scatter_store xs=2 esz=3 scale=0
+
+# SVE 64-bit scatter store (scalar plus unpacked 32-bit scaled offset)
+# Require msz > 0
+ST1_zprz	1110010 .. 01 ..... 100 ... ..... ..... \
+		@rprr_scatter_store xs=0 esz=3 scale=1
+ST1_zprz	1110010 .. 01 ..... 110 ... ..... ..... \
+		@rprr_scatter_store xs=1 esz=3 scale=1
+
+# SVE 64-bit scatter store (scalar plus unpacked 32-bit unscaled offset)
+ST1_zprz	1110010 .. 00 ..... 100 ... ..... ..... \
+		@rprr_scatter_store xs=0 esz=3 scale=0
+ST1_zprz	1110010 .. 00 ..... 110 ... ..... ..... \
+		@rprr_scatter_store xs=1 esz=3 scale=0
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 54/67] target/arm: Implement SVE prefetches
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (52 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 53/67] target/arm: Implement SVE scatter stores Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 14:43   ` Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 55/67] target/arm: Implement SVE gather loads Richard Henderson
                   ` (14 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c |  9 +++++++++
 target/arm/sve.decode      | 23 +++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index ca49b94924..63c7a0e8d8 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3958,3 +3958,12 @@ static void trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn)
     do_mem_zpz(s, a->rd, a->pg, a->rm, a->scale * a->msz,
                cpu_reg_sp(s, a->rn), fn);
 }
+
+/*
+ * Prefetches
+ */
+
+static void trans_PRF(DisasContext *s, arg_PRF *a, uint32_t insn)
+{
+    /* Prefetch is a nop within QEMU.  */
+}
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index edd9340c02..f0144aa2d0 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -801,6 +801,29 @@ LD1RQ_zprr	1010010 .. 00 ..... 000 ... ..... ..... \
 LD1RQ_zpri	1010010 .. 00 0.... 001 ... ..... ..... \
 		@rpri_load_msz nreg=0
 
+# SVE 32-bit gather prefetch (scalar plus 32-bit scaled offsets)
+PRF		1000010 00 -1 ----- 0-- --- ----- 0 ----
+
+# SVE 32-bit gather prefetch (vector plus immediate)
+PRF		1000010 -- 00 ----- 111 --- ----- 0 ----
+
+# SVE contiguous prefetch (scalar plus immediate)
+PRF		1000010 11 1- ----- 0-- --- ----- 0 ----
+
+# SVE contiguous prefetch (scalar plus scalar)
+PRF		1000010 -- 00 ----- 110 --- ----- 0 ----
+
+### SVE Memory 64-bit Gather Group
+
+# SVE 64-bit gather prefetch (scalar plus 64-bit scaled offsets)
+PRF		1100010 00 11 ----- 1-- --- ----- 0 ----
+
+# SVE 64-bit gather prefetch (scalar plus unpacked 32-bit scaled offsets)
+PRF		1100010 00 -1 ----- 0-- --- ----- 0 ----
+
+# SVE 64-bit gather prefetch (vector plus immediate)
+PRF		1100010 -- 00 ----- 111 --- ----- 0 ----
+
 ### SVE Memory Store Group
 
 # SVE store predicate register
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 55/67] target/arm: Implement SVE gather loads
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (53 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 54/67] target/arm: Implement SVE prefetches Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 14:53   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 56/67] target/arm: Implement SVE scatter store vector immediate Richard Henderson
                   ` (13 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 67 ++++++++++++++++++++++++++++++++
 target/arm/sve_helper.c    | 75 +++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      | 53 +++++++++++++++++++++++++
 4 files changed, 292 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index b5c093f2fd..3cb7ab9ef2 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -919,6 +919,73 @@ DEF_HELPER_FLAGS_4(sve_st1hd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_4(sve_st1sd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
+DEF_HELPER_FLAGS_6(sve_ldbsu_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldhsu_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldssu_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldbss_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldhss_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_6(sve_ldbsu_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldhsu_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldssu_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldbss_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldhss_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_6(sve_ldbdu_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldhdu_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsdu_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldddu_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldbds_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldhds_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsds_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_6(sve_ldbdu_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldhdu_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsdu_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldddu_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldbds_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldhds_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsds_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_6(sve_ldbdu_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldhdu_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsdu_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldddu_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldbds_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldhds_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsds_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+
 DEF_HELPER_FLAGS_6(sve_stbs_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 DEF_HELPER_FLAGS_6(sve_sths_zsu, TCG_CALL_NO_WG,
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 07b3d285f2..4edd3d4367 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -3546,6 +3546,81 @@ void HELPER(sve_st4dd_r)(CPUARMState *env, void *vg,
     }
 }
 
+/* Loads with a vector index.  */
+
+#define DO_LD1_ZPZ_S(NAME, TYPEI, TYPEM, FN)                            \
+void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm,       \
+                  target_ulong base, uint32_t desc)                     \
+{                                                                       \
+    intptr_t i, oprsz = simd_oprsz(desc) / 8;                           \
+    unsigned scale = simd_data(desc);                                   \
+    uintptr_t ra = GETPC();                                             \
+    uint32_t *d = vd; TYPEI *m = vm; uint8_t *pg = vg;                  \
+    for (i = 0; i < oprsz; i++) {                                       \
+        uint8_t pp = pg[H1(i)];                                         \
+        if (pp & 0x01) {                                                \
+            target_ulong off = (target_ulong)m[H4(i * 2)] << scale;     \
+            d[H4(i * 2)] = (TYPEM)FN(env, base + off, ra);              \
+        }                                                               \
+        if (pp & 0x10) {                                                \
+            target_ulong off = (target_ulong)m[H4(i * 2 + 1)] << scale; \
+            d[H4(i * 2 + 1)] = (TYPEM)FN(env, base + off, ra);          \
+        }                                                               \
+    }                                                                   \
+}
+
+#define DO_LD1_ZPZ_D(NAME, TYPEI, TYPEM, FN)                            \
+void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm,       \
+                  target_ulong base, uint32_t desc)                     \
+{                                                                       \
+    intptr_t i, oprsz = simd_oprsz(desc) / 8;                           \
+    unsigned scale = simd_data(desc);                                   \
+    uintptr_t ra = GETPC();                                             \
+    uint64_t *d = vd, *m = vm; uint8_t *pg = vg;                        \
+    for (i = 0; i < oprsz; i++) {                                       \
+        if (pg[H1(i)] & 1) {                                            \
+            target_ulong off = (target_ulong)(TYPEI)m[i] << scale;      \
+            d[i] = (TYPEM)FN(env, base + off, ra);                      \
+        }                                                               \
+    }                                                                   \
+}
+
+DO_LD1_ZPZ_S(sve_ldbsu_zsu, uint32_t, uint8_t,  cpu_ldub_data_ra)
+DO_LD1_ZPZ_S(sve_ldhsu_zsu, uint32_t, uint16_t, cpu_lduw_data_ra)
+DO_LD1_ZPZ_S(sve_ldssu_zsu, uint32_t, uint32_t, cpu_ldl_data_ra)
+DO_LD1_ZPZ_S(sve_ldbss_zsu, uint32_t, int8_t,   cpu_ldub_data_ra)
+DO_LD1_ZPZ_S(sve_ldhss_zsu, uint32_t, int16_t,  cpu_lduw_data_ra)
+
+DO_LD1_ZPZ_S(sve_ldbsu_zss, int32_t, uint8_t,  cpu_ldub_data_ra)
+DO_LD1_ZPZ_S(sve_ldhsu_zss, int32_t, uint16_t, cpu_lduw_data_ra)
+DO_LD1_ZPZ_S(sve_ldssu_zss, int32_t, uint32_t, cpu_ldl_data_ra)
+DO_LD1_ZPZ_S(sve_ldbss_zss, int32_t, int8_t,   cpu_ldub_data_ra)
+DO_LD1_ZPZ_S(sve_ldhss_zss, int32_t, int16_t,  cpu_lduw_data_ra)
+
+DO_LD1_ZPZ_D(sve_ldbdu_zsu, uint32_t, uint8_t,  cpu_ldub_data_ra)
+DO_LD1_ZPZ_D(sve_ldhdu_zsu, uint32_t, uint16_t, cpu_lduw_data_ra)
+DO_LD1_ZPZ_D(sve_ldsdu_zsu, uint32_t, uint32_t, cpu_ldl_data_ra)
+DO_LD1_ZPZ_D(sve_ldddu_zsu, uint32_t, uint64_t, cpu_ldq_data_ra)
+DO_LD1_ZPZ_D(sve_ldbds_zsu, uint32_t, int8_t,   cpu_ldub_data_ra)
+DO_LD1_ZPZ_D(sve_ldhds_zsu, uint32_t, int16_t,  cpu_lduw_data_ra)
+DO_LD1_ZPZ_D(sve_ldsds_zsu, uint32_t, int32_t,  cpu_ldl_data_ra)
+
+DO_LD1_ZPZ_D(sve_ldbdu_zss, int32_t, uint8_t,  cpu_ldub_data_ra)
+DO_LD1_ZPZ_D(sve_ldhdu_zss, int32_t, uint16_t, cpu_lduw_data_ra)
+DO_LD1_ZPZ_D(sve_ldsdu_zss, int32_t, uint32_t, cpu_ldl_data_ra)
+DO_LD1_ZPZ_D(sve_ldddu_zss, int32_t, uint64_t, cpu_ldq_data_ra)
+DO_LD1_ZPZ_D(sve_ldbds_zss, int32_t, int8_t,   cpu_ldub_data_ra)
+DO_LD1_ZPZ_D(sve_ldhds_zss, int32_t, int16_t,  cpu_lduw_data_ra)
+DO_LD1_ZPZ_D(sve_ldsds_zss, int32_t, int32_t,  cpu_ldl_data_ra)
+
+DO_LD1_ZPZ_D(sve_ldbdu_zd, uint64_t, uint8_t,  cpu_ldub_data_ra)
+DO_LD1_ZPZ_D(sve_ldhdu_zd, uint64_t, uint16_t, cpu_lduw_data_ra)
+DO_LD1_ZPZ_D(sve_ldsdu_zd, uint64_t, uint32_t, cpu_ldl_data_ra)
+DO_LD1_ZPZ_D(sve_ldddu_zd, uint64_t, uint64_t, cpu_ldq_data_ra)
+DO_LD1_ZPZ_D(sve_ldbds_zd, uint64_t, int8_t,   cpu_ldub_data_ra)
+DO_LD1_ZPZ_D(sve_ldhds_zd, uint64_t, int16_t,  cpu_lduw_data_ra)
+DO_LD1_ZPZ_D(sve_ldsds_zd, uint64_t, int32_t,  cpu_ldl_data_ra)
+
 /* Stores with a vector index.  */
 
 #define DO_ST1_ZPZ_S(NAME, TYPEI, FN)                                   \
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 63c7a0e8d8..6484ecd257 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3914,6 +3914,103 @@ static void do_mem_zpz(DisasContext *s, int zt, int pg, int zm, int scale,
     tcg_temp_free_i32(desc);
 }
 
+/* Indexed by [xs][u][msz].  */
+static gen_helper_gvec_mem_scatter * const gather_load_fn32[2][2][3] = {
+    { { gen_helper_sve_ldbss_zsu,
+        gen_helper_sve_ldhss_zsu,
+        NULL, },
+      { gen_helper_sve_ldbsu_zsu,
+        gen_helper_sve_ldhsu_zsu,
+        gen_helper_sve_ldssu_zsu, } },
+    { { gen_helper_sve_ldbss_zss,
+        gen_helper_sve_ldhss_zss,
+        NULL, },
+      { gen_helper_sve_ldbsu_zss,
+        gen_helper_sve_ldhsu_zss,
+        gen_helper_sve_ldssu_zss, } },
+};
+
+static gen_helper_gvec_mem_scatter * const gather_load_fn64[3][2][4] = {
+    { { gen_helper_sve_ldbds_zsu,
+        gen_helper_sve_ldhds_zsu,
+        gen_helper_sve_ldsds_zsu,
+        NULL, },
+      { gen_helper_sve_ldbdu_zsu,
+        gen_helper_sve_ldhdu_zsu,
+        gen_helper_sve_ldsdu_zsu,
+        gen_helper_sve_ldddu_zsu, } },
+    { { gen_helper_sve_ldbds_zss,
+        gen_helper_sve_ldhds_zss,
+        gen_helper_sve_ldsds_zss,
+        NULL, },
+      { gen_helper_sve_ldbdu_zss,
+        gen_helper_sve_ldhdu_zss,
+        gen_helper_sve_ldsdu_zss,
+        gen_helper_sve_ldddu_zss, } },
+    { { gen_helper_sve_ldbds_zd,
+        gen_helper_sve_ldhds_zd,
+        gen_helper_sve_ldsds_zd,
+        NULL, },
+      { gen_helper_sve_ldbdu_zd,
+        gen_helper_sve_ldhdu_zd,
+        gen_helper_sve_ldsdu_zd,
+        gen_helper_sve_ldddu_zd, } },
+};
+
+static void trans_LD1_zprz(DisasContext *s, arg_LD1_zprz *a, uint32_t insn)
+{
+    gen_helper_gvec_mem_scatter *fn = NULL;
+
+    if (a->esz < a->msz
+        || (a->msz == 0 && a->scale)
+        || (a->esz == a->msz && !a->u)) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    /* TODO: handle LDFF1.  */
+    switch (a->esz) {
+    case MO_32:
+        fn = gather_load_fn32[a->xs][a->u][a->msz];
+        break;
+    case MO_64:
+        fn = gather_load_fn64[a->xs][a->u][a->msz];
+        break;
+    }
+    assert(fn != NULL);
+
+    do_mem_zpz(s, a->rd, a->pg, a->rm, a->scale * a->msz,
+               cpu_reg_sp(s, a->rn), fn);
+}
+
+static void trans_LD1_zpiz(DisasContext *s, arg_LD1_zpiz *a, uint32_t insn)
+{
+    gen_helper_gvec_mem_scatter *fn = NULL;
+    TCGv_i64 imm;
+
+    if (a->esz < a->msz || (a->esz == a->msz && !a->u)) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    /* TODO: handle LDFF1.  */
+    switch (a->esz) {
+    case MO_32:
+        fn = gather_load_fn32[0][a->u][a->msz];
+        break;
+    case MO_64:
+        fn = gather_load_fn64[2][a->u][a->msz];
+        break;
+    }
+    assert(fn != NULL);
+
+    /* Treat LD1_zpiz (zn[x] + imm) the same way as LD1_zprz (rn + zm[x])
+       by loading the immediate into the scalar parameter.  */
+    imm = tcg_const_i64(a->imm << a->msz);
+    do_mem_zpz(s, a->rd, a->pg, a->rn, 0, imm, fn);
+    tcg_temp_free_i64(imm);
+}
+
 static void trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn)
 {
     /* Indexed by [xs][msz].  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index f0144aa2d0..f85d82e009 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -81,6 +81,8 @@
 &rpri_load	rd pg rn imm dtype nreg
 &rprr_store	rd pg rn rm msz esz nreg
 &rpri_store	rd pg rn imm msz esz nreg
+&rprr_gather_load	rd pg rn rm esz msz u ff xs scale
+&rpri_gather_load	rd pg rn imm esz msz u ff
 &rprr_scatter_store	rd pg rn rm esz msz xs scale
 
 ###########################################################################
@@ -195,6 +197,18 @@
 @rpri_load_msz	....... .... . imm:s4 ... pg:3 rn:5 rd:5 \
 		&rpri_load dtype=%msz_dtype
 
+# Gather Loads.
+@rprr_g_load_u	      ....... .. .    . rm:5 . u:1 ff:1 pg:3 rn:5 rd:5 \
+		      &rprr_gather_load xs=2
+@rprr_g_load_xs_u     ....... .. xs:1 . rm:5 . u:1 ff:1 pg:3 rn:5 rd:5 \
+		      &rprr_gather_load
+@rprr_g_load_xs_u_sc  ....... .. xs:1 scale:1 rm:5 . u:1 ff:1 pg:3 rn:5 rd:5 \
+		      &rprr_gather_load
+@rprr_g_load_u_sc     ....... .. .    scale:1 rm:5 . u:1 ff:1 pg:3 rn:5 rd:5 \
+		      &rprr_gather_load xs=2
+@rpri_g_load	      ....... msz:2 .. imm:5 . u:1 ff:1 pg:3 rn:5 rd:5 \
+		      &rpri_gather_load
+
 # Stores; user must fill in ESZ, MSZ, NREG as needed.
 @rprr_store	    ....... ..    ..     rm:5 ... pg:3 rn:5 rd:5    &rprr_store
 @rpri_store_msz     ....... msz:2 .. . imm:s4 ... pg:3 rn:5 rd:5    &rpri_store
@@ -766,6 +780,19 @@ LDR_zri		10000101 10 ...... 010 ... ..... .....		@rd_rn_i9
 LD1R_zpri	1000010 .. 1 imm:6 1.. pg:3 rn:5 rd:5 \
 		&rpri_load dtype=%dtype_23_13 nreg=0
 
+# SVE 32-bit gather load (scalar plus 32-bit unscaled offsets)
+# SVE 32-bit gather load (scalar plus 32-bit scaled offsets)
+LD1_zprz	1000010 00 .0 ..... 0.. ... ..... ..... \
+		@rprr_g_load_xs_u esz=2 msz=0 scale=0
+LD1_zprz	1000010 01 .. ..... 0.. ... ..... ..... \
+		@rprr_g_load_xs_u_sc esz=2 msz=1
+LD1_zprz	1000010 10 .. ..... 0.. ... ..... ..... \
+		@rprr_g_load_xs_u_sc esz=2 msz=2
+
+# SVE 32-bit gather load (vector plus immediate)
+LD1_zpiz	1000010 .. 01 ..... 1.. ... ..... ..... \
+		@rpri_g_load esz=2
+
 ### SVE Memory Contiguous Load Group
 
 # SVE contiguous load (scalar plus scalar)
@@ -815,6 +842,32 @@ PRF		1000010 -- 00 ----- 110 --- ----- 0 ----
 
 ### SVE Memory 64-bit Gather Group
 
+# SVE 64-bit gather load (scalar plus 32-bit unpacked unscaled offsets)
+# SVE 64-bit gather load (scalar plus 32-bit unpacked scaled offsets)
+LD1_zprz	1100010 00 .0 ..... 0.. ... ..... ..... \
+		@rprr_g_load_xs_u esz=3 msz=0 scale=0
+LD1_zprz	1100010 01 .. ..... 0.. ... ..... ..... \
+		@rprr_g_load_xs_u_sc esz=3 msz=1
+LD1_zprz	1100010 10 .. ..... 0.. ... ..... ..... \
+		@rprr_g_load_xs_u_sc esz=3 msz=2
+LD1_zprz	1100010 11 .. ..... 0.. ... ..... ..... \
+		@rprr_g_load_xs_u_sc esz=3 msz=3
+
+# SVE 64-bit gather load (scalar plus 64-bit unscaled offsets)
+# SVE 64-bit gather load (scalar plus 64-bit scaled offsets)
+LD1_zprz	1100010 00 10 ..... 1.. ... ..... ..... \
+		@rprr_g_load_u esz=3 msz=0 scale=0
+LD1_zprz	1100010 01 1. ..... 1.. ... ..... ..... \
+		@rprr_g_load_u_sc esz=3 msz=1
+LD1_zprz	1100010 10 1. ..... 1.. ... ..... ..... \
+		@rprr_g_load_u_sc esz=3 msz=2
+LD1_zprz	1100010 11 1. ..... 1.. ... ..... ..... \
+		@rprr_g_load_u_sc esz=3 msz=3
+
+# SVE 64-bit gather load (vector plus immediate)
+LD1_zpiz	1100010 .. 01 ..... 1.. ... ..... ..... \
+		@rpri_g_load esz=3
+
 # SVE 64-bit gather prefetch (scalar plus 64-bit scaled offsets)
 PRF		1100010 00 11 ----- 1-- --- ----- 0 ----
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 56/67] target/arm: Implement SVE scatter store vector immediate
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (54 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 55/67] target/arm: Implement SVE gather loads Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 15:02   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 57/67] target/arm: Implement SVE floating-point compare vectors Richard Henderson
                   ` (12 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 79 +++++++++++++++++++++++++++++++---------------
 target/arm/sve.decode      | 11 +++++++
 2 files changed, 65 insertions(+), 25 deletions(-)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 6484ecd257..0241e8e707 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -4011,31 +4011,33 @@ static void trans_LD1_zpiz(DisasContext *s, arg_LD1_zpiz *a, uint32_t insn)
     tcg_temp_free_i64(imm);
 }
 
+/* Indexed by [xs][msz].  */
+static gen_helper_gvec_mem_scatter * const scatter_store_fn32[2][3] = {
+    { gen_helper_sve_stbs_zsu,
+      gen_helper_sve_sths_zsu,
+      gen_helper_sve_stss_zsu, },
+    { gen_helper_sve_stbs_zss,
+      gen_helper_sve_sths_zss,
+      gen_helper_sve_stss_zss, },
+};
+
+static gen_helper_gvec_mem_scatter * const scatter_store_fn64[3][4] = {
+    { gen_helper_sve_stbd_zsu,
+      gen_helper_sve_sthd_zsu,
+      gen_helper_sve_stsd_zsu,
+      gen_helper_sve_stdd_zsu, },
+    { gen_helper_sve_stbd_zss,
+      gen_helper_sve_sthd_zss,
+      gen_helper_sve_stsd_zss,
+      gen_helper_sve_stdd_zss, },
+    { gen_helper_sve_stbd_zd,
+      gen_helper_sve_sthd_zd,
+      gen_helper_sve_stsd_zd,
+      gen_helper_sve_stdd_zd, },
+};
+
 static void trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn)
 {
-    /* Indexed by [xs][msz].  */
-    static gen_helper_gvec_mem_scatter * const fn32[2][3] = {
-        { gen_helper_sve_stbs_zsu,
-          gen_helper_sve_sths_zsu,
-          gen_helper_sve_stss_zsu, },
-        { gen_helper_sve_stbs_zss,
-          gen_helper_sve_sths_zss,
-          gen_helper_sve_stss_zss, },
-    };
-    static gen_helper_gvec_mem_scatter * const fn64[3][4] = {
-        { gen_helper_sve_stbd_zsu,
-          gen_helper_sve_sthd_zsu,
-          gen_helper_sve_stsd_zsu,
-          gen_helper_sve_stdd_zsu, },
-        { gen_helper_sve_stbd_zss,
-          gen_helper_sve_sthd_zss,
-          gen_helper_sve_stsd_zss,
-          gen_helper_sve_stdd_zss, },
-        { gen_helper_sve_stbd_zd,
-          gen_helper_sve_sthd_zd,
-          gen_helper_sve_stsd_zd,
-          gen_helper_sve_stdd_zd, },
-    };
     gen_helper_gvec_mem_scatter *fn;
 
     if (a->esz < a->msz || (a->msz == 0 && a->scale)) {
@@ -4044,10 +4046,10 @@ static void trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn)
     }
     switch (a->esz) {
     case MO_32:
-        fn = fn32[a->xs][a->msz];
+        fn = scatter_store_fn32[a->xs][a->msz];
         break;
     case MO_64:
-        fn = fn64[a->xs][a->msz];
+        fn = scatter_store_fn64[a->xs][a->msz];
         break;
     default:
         g_assert_not_reached();
@@ -4056,6 +4058,33 @@ static void trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn)
                cpu_reg_sp(s, a->rn), fn);
 }
 
+static void trans_ST1_zpiz(DisasContext *s, arg_ST1_zpiz *a, uint32_t insn)
+{
+    gen_helper_gvec_mem_scatter *fn = NULL;
+    TCGv_i64 imm;
+
+    if (a->esz < a->msz) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    switch (a->esz) {
+    case MO_32:
+        fn = scatter_store_fn32[0][a->msz];
+        break;
+    case MO_64:
+        fn = scatter_store_fn64[2][a->msz];
+        break;
+    }
+    assert(fn != NULL);
+
+    /* Treat ST1_zpiz (zn[x] + imm) the same way as ST1_zprz (rn + zm[x])
+       by loading the immediate into the scalar parameter.  */
+    imm = tcg_const_i64(a->imm << a->msz);
+    do_mem_zpz(s, a->rd, a->pg, a->rn, 0, imm, fn);
+    tcg_temp_free_i64(imm);
+}
+
 /*
  * Prefetches
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index f85d82e009..6ccb4289fc 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -84,6 +84,7 @@
 &rprr_gather_load	rd pg rn rm esz msz u ff xs scale
 &rpri_gather_load	rd pg rn imm esz msz u ff
 &rprr_scatter_store	rd pg rn rm esz msz xs scale
+&rpri_scatter_store	rd pg rn imm esz msz
 
 ###########################################################################
 # Named instruction formats.  These are generally used to
@@ -216,6 +217,8 @@
 		    &rprr_store nreg=0
 @rprr_scatter_store ....... msz:2 ..     rm:5 ... pg:3 rn:5 rd:5 \
 		    &rprr_scatter_store
+@rpri_scatter_store ....... msz:2 ..    imm:5 ... pg:3 rn:5 rd:5 \
+		    &rpri_scatter_store
 
 ###########################################################################
 # Instruction patterns.  Grouped according to the SVE encodingindex.xhtml.
@@ -935,6 +938,14 @@ ST1_zprz	1110010 .. 01 ..... 101 ... ..... ..... \
 ST1_zprz	1110010 .. 00 ..... 101 ... ..... ..... \
 		@rprr_scatter_store xs=2 esz=3 scale=0
 
+# SVE 64-bit scatter store (vector plus immediate)
+ST1_zpiz	1110010 .. 10 ..... 101 ... ..... ..... \
+		@rpri_scatter_store esz=3
+
+# SVE 32-bit scatter store (vector plus immediate)
+ST1_zpiz	1110010 .. 11 ..... 101 ... ..... ..... \
+		@rpri_scatter_store esz=2
+
 # SVE 64-bit scatter store (scalar plus unpacked 32-bit scaled offset)
 # Require msz > 0
 ST1_zprz	1110010 .. 01 ..... 100 ... ..... ..... \
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 57/67] target/arm: Implement SVE floating-point compare vectors
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (55 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 56/67] target/arm: Implement SVE scatter store vector immediate Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 15:04   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 58/67] target/arm: Implement SVE floating-point arithmetic with immediate Richard Henderson
                   ` (11 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 49 +++++++++++++++++++++++++++++++++++
 target/arm/sve_helper.c    | 64 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 41 +++++++++++++++++++++++++++++
 target/arm/sve.decode      | 11 ++++++++
 4 files changed, 165 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 3cb7ab9ef2..30373e3fc7 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -839,6 +839,55 @@ DEF_HELPER_FLAGS_5(sve_ucvt_ds, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(sve_ucvt_dd, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_6(sve_fcmge_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fcmge_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fcmge_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fcmgt_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fcmgt_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fcmgt_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fcmeq_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fcmeq_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fcmeq_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fcmne_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fcmne_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fcmne_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fcmuo_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fcmuo_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fcmuo_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_facge_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_facge_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_facge_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_facgt_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_facgt_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_facgt_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_3(sve_fmla_zpzzz_h, TCG_CALL_NO_RWG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_fmla_zpzzz_s, TCG_CALL_NO_RWG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_fmla_zpzzz_d, TCG_CALL_NO_RWG, void, env, ptr, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 4edd3d4367..ace613684d 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -3100,6 +3100,70 @@ DO_FMLA(sve_fnmls_zpzzz_d, 64, , 1, 1)
 
 #undef DO_FMLA
 
+/* Two operand floating-point comparison controlled by a predicate.
+ * Unlike the integer version, we are not allowed to optimistically
+ * compare operands, since the comparison may have side effects wrt
+ * the FPSR.
+ */
+#define DO_FPCMP_PPZZ(NAME, TYPE, H, OP)                                \
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg,               \
+                  void *status, uint32_t desc)                          \
+{                                                                       \
+    intptr_t opr_sz = simd_oprsz(desc);                                 \
+    intptr_t i = opr_sz, j = ((opr_sz - 1) & -64) >> 3;                 \
+    do {                                                                \
+        uint64_t out = 0;                                               \
+        uint64_t pg = *(uint64_t *)(vg + j);                            \
+        do {                                                            \
+            i -= sizeof(TYPE), out <<= sizeof(TYPE);                    \
+            if ((pg >> (i & 63)) & 1) {                                 \
+                TYPE nn = *(TYPE *)(vn + H(i));                         \
+                TYPE mm = *(TYPE *)(vm + H(i));                         \
+                out |= OP(TYPE, nn, mm, status);                        \
+            }                                                           \
+        } while (i & 63);                                               \
+        *(uint64_t *)(vd + j) = out;                                    \
+        j -= 8;                                                         \
+    } while (i > 0);                                                    \
+}
+
+#define DO_FPCMP_PPZZ_H(NAME, OP) \
+    DO_FPCMP_PPZZ(NAME##_h, float16, H1_2, OP)
+#define DO_FPCMP_PPZZ_S(NAME, OP) \
+    DO_FPCMP_PPZZ(NAME##_s, float32, H1_4, OP)
+#define DO_FPCMP_PPZZ_D(NAME, OP) \
+    DO_FPCMP_PPZZ(NAME##_d, float64,     , OP)
+
+#define DO_FPCMP_PPZZ_ALL(NAME, OP) \
+    DO_FPCMP_PPZZ_H(NAME, OP)   \
+    DO_FPCMP_PPZZ_S(NAME, OP)   \
+    DO_FPCMP_PPZZ_D(NAME, OP)
+
+#define DO_FCMGE(TYPE, X, Y, ST)  TYPE##_compare(Y, X, ST) <= 0
+#define DO_FCMGT(TYPE, X, Y, ST)  TYPE##_compare(Y, X, ST) < 0
+#define DO_FCMEQ(TYPE, X, Y, ST)  TYPE##_compare_quiet(X, Y, ST) == 0
+#define DO_FCMNE(TYPE, X, Y, ST)  TYPE##_compare_quiet(X, Y, ST) != 0
+#define DO_FCMUO(TYPE, X, Y, ST)  \
+    TYPE##_compare_quiet(X, Y, ST) == float_relation_unordered
+#define DO_FACGE(TYPE, X, Y, ST)  \
+    TYPE##_compare(TYPE##_abs(Y), TYPE##_abs(X), ST) <= 0
+#define DO_FACGT(TYPE, X, Y, ST)  \
+    TYPE##_compare(TYPE##_abs(Y), TYPE##_abs(X), ST) < 0
+
+DO_FPCMP_PPZZ_ALL(sve_fcmge, DO_FCMGE)
+DO_FPCMP_PPZZ_ALL(sve_fcmgt, DO_FCMGT)
+DO_FPCMP_PPZZ_ALL(sve_fcmeq, DO_FCMEQ)
+DO_FPCMP_PPZZ_ALL(sve_fcmne, DO_FCMNE)
+DO_FPCMP_PPZZ_ALL(sve_fcmuo, DO_FCMUO)
+DO_FPCMP_PPZZ_ALL(sve_facge, DO_FACGE)
+DO_FPCMP_PPZZ_ALL(sve_facgt, DO_FACGT)
+
+#undef DO_FPCMP_PPZZ_ALL
+#undef DO_FPCMP_PPZZ_D
+#undef DO_FPCMP_PPZZ_S
+#undef DO_FPCMP_PPZZ_H
+#undef DO_FPCMP_PPZZ
+
 /*
  * Load contiguous data, protected by a governing predicate.
  */
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 0241e8e707..8fcb9dd2be 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3265,6 +3265,47 @@ DO_FP3(FMULX, fmulx)
 
 #undef DO_FP3
 
+static void do_fp_cmp(DisasContext *s, arg_rprr_esz *a,
+                      gen_helper_gvec_4_ptr *fn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_ptr status;
+
+    if (fn == NULL) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    status = get_fpstatus_ptr(a->esz == MO_16);
+    tcg_gen_gvec_4_ptr(pred_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       vec_full_reg_offset(s, a->rm),
+                       pred_full_reg_offset(s, a->pg),
+                       status, vsz, vsz, 0, fn);
+    tcg_temp_free_ptr(status);
+}
+
+#define DO_FPCMP(NAME, name) \
+static void trans_##NAME##_ppzz(DisasContext *s, arg_rprr_esz *a,     \
+                                uint32_t insn)                        \
+{                                                                     \
+    static gen_helper_gvec_4_ptr * const fns[4] = {                   \
+        NULL, gen_helper_sve_##name##_h,                              \
+        gen_helper_sve_##name##_s, gen_helper_sve_##name##_d          \
+    };                                                                \
+    do_fp_cmp(s, a, fns[a->esz]);                                     \
+}
+
+DO_FPCMP(FCMGE, fcmge)
+DO_FPCMP(FCMGT, fcmgt)
+DO_FPCMP(FCMEQ, fcmeq)
+DO_FPCMP(FCMNE, fcmne)
+DO_FPCMP(FCMUO, fcmuo)
+DO_FPCMP(FACGE, facge)
+DO_FPCMP(FACGT, facgt)
+
+#undef DO_FPCMP
+
 typedef void gen_helper_sve_fmla(TCGv_env, TCGv_ptr, TCGv_i32);
 
 static void do_fmla(DisasContext *s, arg_rprrr_esz *a, gen_helper_sve_fmla *fn)
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 6ccb4289fc..f82cef2d7e 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -321,6 +321,17 @@ UXTH		00000100 .. 010 011 101 ... ..... .....		@rd_pg_rn
 SXTW		00000100 .. 010 100 101 ... ..... .....		@rd_pg_rn
 UXTW		00000100 .. 010 101 101 ... ..... .....		@rd_pg_rn
 
+### SVE Floating Point Compare - Vectors Group
+
+# SVE floating-point compare vectors
+FCMGE_ppzz	01100101 .. 0 ..... 010 ... ..... 0 ....	@pd_pg_rn_rm
+FCMGT_ppzz	01100101 .. 0 ..... 010 ... ..... 1 ....	@pd_pg_rn_rm
+FCMEQ_ppzz	01100101 .. 0 ..... 011 ... ..... 0 ....	@pd_pg_rn_rm
+FCMNE_ppzz	01100101 .. 0 ..... 011 ... ..... 1 ....	@pd_pg_rn_rm
+FCMUO_ppzz	01100101 .. 0 ..... 110 ... ..... 0 ....	@pd_pg_rn_rm
+FACGE_ppzz	01100101 .. 0 ..... 110 ... ..... 1 ....	@pd_pg_rn_rm
+FACGT_ppzz	01100101 .. 0 ..... 111 ... ..... 1 ....	@pd_pg_rn_rm
+
 ### SVE Integer Multiply-Add Group
 
 # SVE integer multiply-add writing addend (predicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 58/67] target/arm: Implement SVE floating-point arithmetic with immediate
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (56 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 57/67] target/arm: Implement SVE floating-point compare vectors Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 15:11   ` Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 59/67] target/arm: Implement SVE Floating Point Multiply Indexed Group Richard Henderson
                   ` (10 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 56 +++++++++++++++++++++++++++++++++++
 target/arm/sve_helper.c    | 68 ++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      | 14 +++++++++
 4 files changed, 211 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 30373e3fc7..7ada12687b 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -809,6 +809,62 @@ DEF_HELPER_FLAGS_6(sve_fmulx_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(sve_fmulx_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_6(sve_fadds_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fadds_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fadds_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fsubs_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fsubs_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fsubs_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fmuls_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fmuls_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fmuls_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fsubrs_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fsubrs_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fsubrs_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fmaxnms_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fmaxnms_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fmaxnms_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fminnms_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fminnms_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fminnms_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fmaxs_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fmaxs_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fmaxs_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+
+DEF_HELPER_FLAGS_6(sve_fmins_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fmins_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+DEF_HELPER_FLAGS_6(sve_fmins_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, i64, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_scvt_hh, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_scvt_sh, TCG_CALL_NO_RWG,
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index ace613684d..9378c8f0b2 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -2995,6 +2995,74 @@ DO_ZPZZ_FP_D(sve_fmulx_d, uint64_t, helper_vfp_mulxd)
 #undef DO_ZPZZ_FP
 #undef DO_ZPZZ_FP_D
 
+/* Three-operand expander, with one scalar operand, controlled by
+ * a predicate, with the extra float_status parameter.
+ */
+#define DO_ZPZS_FP(NAME, TYPE, H, OP) \
+void HELPER(NAME)(void *vd, void *vn, void *vg, uint64_t scalar,  \
+                  void *status, uint32_t desc)                    \
+{                                                                 \
+    intptr_t i, opr_sz = simd_oprsz(desc);                        \
+    TYPE mm = scalar;                                             \
+    for (i = 0; i < opr_sz; ) {                                   \
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));           \
+        do {                                                      \
+            if (pg & 1) {                                         \
+                TYPE nn = *(TYPE *)(vn + H(i));                   \
+                *(TYPE *)(vd + H(i)) = OP(nn, mm, status);        \
+            }                                                     \
+            i += sizeof(TYPE), pg >>= sizeof(TYPE);               \
+        } while (i & 15);                                         \
+    }                                                             \
+}
+
+DO_ZPZS_FP(sve_fadds_h, float16, H1_2, float16_add)
+DO_ZPZS_FP(sve_fadds_s, float32, H1_4, float32_add)
+DO_ZPZS_FP(sve_fadds_d, float64,     , float64_add)
+
+DO_ZPZS_FP(sve_fsubs_h, float16, H1_2, float16_sub)
+DO_ZPZS_FP(sve_fsubs_s, float32, H1_4, float32_sub)
+DO_ZPZS_FP(sve_fsubs_d, float64,     , float64_sub)
+
+DO_ZPZS_FP(sve_fmuls_h, float16, H1_2, float16_mul)
+DO_ZPZS_FP(sve_fmuls_s, float32, H1_4, float32_mul)
+DO_ZPZS_FP(sve_fmuls_d, float64,     , float64_mul)
+
+static inline float16 subr_h(float16 a, float16 b, float_status *s)
+{
+    return float16_sub(b, a, s);
+}
+
+static inline float32 subr_s(float32 a, float32 b, float_status *s)
+{
+    return float32_sub(b, a, s);
+}
+
+static inline float64 subr_d(float64 a, float64 b, float_status *s)
+{
+    return float64_sub(b, a, s);
+}
+
+DO_ZPZS_FP(sve_fsubrs_h, float16, H1_2, subr_h)
+DO_ZPZS_FP(sve_fsubrs_s, float32, H1_4, subr_s)
+DO_ZPZS_FP(sve_fsubrs_d, float64,     , subr_d)
+
+DO_ZPZS_FP(sve_fmaxnms_h, float16, H1_2, float16_maxnum)
+DO_ZPZS_FP(sve_fmaxnms_s, float32, H1_4, float32_maxnum)
+DO_ZPZS_FP(sve_fmaxnms_d, float64,     , float64_maxnum)
+
+DO_ZPZS_FP(sve_fminnms_h, float16, H1_2, float16_minnum)
+DO_ZPZS_FP(sve_fminnms_s, float32, H1_4, float32_minnum)
+DO_ZPZS_FP(sve_fminnms_d, float64,     , float64_minnum)
+
+DO_ZPZS_FP(sve_fmaxs_h, float16, H1_2, float16_max)
+DO_ZPZS_FP(sve_fmaxs_s, float32, H1_4, float32_max)
+DO_ZPZS_FP(sve_fmaxs_d, float64,     , float64_max)
+
+DO_ZPZS_FP(sve_fmins_h, float16, H1_2, float16_min)
+DO_ZPZS_FP(sve_fmins_s, float32, H1_4, float32_min)
+DO_ZPZS_FP(sve_fmins_d, float64,     , float64_min)
+
 /* Fully general two-operand expander, controlled by a predicate,
  * With the extra float_status parameter.
  */
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 8fcb9dd2be..6ce1b01b9a 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -32,6 +32,7 @@
 #include "exec/log.h"
 #include "trace-tcg.h"
 #include "translate-a64.h"
+#include "fpu/softfloat.h"
 
 typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
 typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t,
@@ -3265,6 +3266,78 @@ DO_FP3(FMULX, fmulx)
 
 #undef DO_FP3
 
+typedef void gen_helper_sve_fp2scalar(TCGv_ptr, TCGv_ptr, TCGv_ptr,
+                                      TCGv_i64, TCGv_ptr, TCGv_i32);
+
+static void do_fp_scalar(DisasContext *s, int zd, int zn, int pg, bool is_fp16,
+                         TCGv_i64 scalar, gen_helper_sve_fp2scalar *fn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_ptr t_zd, t_zn, t_pg, status;
+    TCGv_i32 desc;
+
+    t_zd = tcg_temp_new_ptr();
+    t_zn = tcg_temp_new_ptr();
+    t_pg = tcg_temp_new_ptr();
+    tcg_gen_addi_ptr(t_zd, cpu_env, vec_full_reg_offset(s, zd));
+    tcg_gen_addi_ptr(t_zn, cpu_env, vec_full_reg_offset(s, zn));
+    tcg_gen_addi_ptr(t_pg, cpu_env, vec_full_reg_offset(s, pg));
+
+    status = get_fpstatus_ptr(is_fp16);
+    desc = tcg_const_i32(simd_desc(vsz, vsz, 0));
+    fn(t_zd, t_zn, t_pg, scalar, status, desc);
+
+    tcg_temp_free_i32(desc);
+    tcg_temp_free_ptr(status);
+    tcg_temp_free_ptr(t_pg);
+    tcg_temp_free_ptr(t_zn);
+    tcg_temp_free_ptr(t_zd);
+}
+
+static void do_fp_imm(DisasContext *s, arg_rpri_esz *a, uint64_t imm,
+                      gen_helper_sve_fp2scalar *fn)
+{
+    TCGv_i64 temp = tcg_const_i64(imm);
+    do_fp_scalar(s, a->rd, a->rn, a->pg, a->esz == MO_16, temp, fn);
+    tcg_temp_free_i64(temp);
+}
+
+#define DO_FP_IMM(NAME, name, const0, const1) \
+static void trans_##NAME##_zpzi(DisasContext *s, arg_rpri_esz *a,         \
+                                uint32_t insn)                            \
+{                                                                         \
+    static gen_helper_sve_fp2scalar * const fns[3] = {                    \
+        gen_helper_sve_##name##_h,                                        \
+        gen_helper_sve_##name##_s,                                        \
+        gen_helper_sve_##name##_d                                         \
+    };                                                                    \
+    static uint64_t const val[3][2] = {                                   \
+        { float16_##const0, float16_##const1 },                           \
+        { float32_##const0, float32_##const1 },                           \
+        { float64_##const0, float64_##const1 },                           \
+    };                                                                    \
+    if (a->esz == 0) {                                                    \
+        unallocated_encoding(s);                                          \
+        return;                                                           \
+    }                                                                     \
+    do_fp_imm(s, a, val[a->esz - 1][a->imm], fns[a->esz - 1]);            \
+}
+
+#define float16_two  make_float16(0x4000)
+#define float32_two  make_float32(0x40000000)
+#define float64_two  make_float64(0x4000000000000000ULL)
+
+DO_FP_IMM(FADD, fadds, half, one)
+DO_FP_IMM(FSUB, fsubs, half, one)
+DO_FP_IMM(FMUL, fmuls, half, two)
+DO_FP_IMM(FSUBR, fsubrs, half, one)
+DO_FP_IMM(FMAXNM, fmaxnms, zero, one)
+DO_FP_IMM(FMINNM, fminnms, zero, one)
+DO_FP_IMM(FMAX, fmaxs, zero, one)
+DO_FP_IMM(FMIN, fmins, zero, one)
+
+#undef DO_FP_IMM
+
 static void do_fp_cmp(DisasContext *s, arg_rprr_esz *a,
                       gen_helper_gvec_4_ptr *fn)
 {
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index f82cef2d7e..258d14b729 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -161,6 +161,10 @@
 @rdn_pg4	........ esz:2 .. pg:4 ... ........ rd:5 \
 		&rpri_esz rn=%reg_movprfx
 
+# Two register operand, one one-bit floating-point operand.
+@rdn_i1		........ esz:2 ......... pg:3 .... imm:1 rd:5 \
+		&rpri_esz rn=%reg_movprfx
+
 # Two register operand, one encoded bitmask.
 @rdn_dbm	........ .. .... dbm:13 rd:5 \
 		&rr_dbm rn=%reg_movprfx
@@ -748,6 +752,16 @@ FMULX		01100101 .. 00 1010 100 ... ..... .....    @rdn_pg_rm
 FDIV		01100101 .. 00 1100 100 ... ..... .....    @rdm_pg_rn # FDIVR
 FDIV		01100101 .. 00 1101 100 ... ..... .....    @rdn_pg_rm
 
+# SVE floating-point arithmetic with immediate (predicated)
+FADD_zpzi	01100101 .. 011 000 100 ... 0000 . .....	@rdn_i1
+FSUB_zpzi	01100101 .. 011 001 100 ... 0000 . .....	@rdn_i1
+FMUL_zpzi	01100101 .. 011 010 100 ... 0000 . .....	@rdn_i1
+FSUBR_zpzi	01100101 .. 011 011 100 ... 0000 . .....	@rdn_i1
+FMAXNM_zpzi	01100101 .. 011 100 100 ... 0000 . .....	@rdn_i1
+FMINNM_zpzi	01100101 .. 011 101 100 ... 0000 . .....	@rdn_i1
+FMAX_zpzi	01100101 .. 011 110 100 ... 0000 . .....	@rdn_i1
+FMIN_zpzi	01100101 .. 011 111 100 ... 0000 . .....	@rdn_i1
+
 ### SVE FP Multiply-Add Group
 
 # SVE floating-point multiply-accumulate writing addend
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 59/67] target/arm: Implement SVE Floating Point Multiply Indexed Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (57 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 58/67] target/arm: Implement SVE floating-point arithmetic with immediate Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 15:18   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 60/67] target/arm: Implement SVE FP Fast Reduction Group Richard Henderson
                   ` (9 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h        | 14 ++++++++++
 target/arm/translate-sve.c | 44 +++++++++++++++++++++++++++++++
 target/arm/vec_helper.c    | 64 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      | 19 ++++++++++++++
 4 files changed, 141 insertions(+)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index f3ce58e276..a8d824b085 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -584,6 +584,20 @@ DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_ftsmul_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(gvec_fmul_idx_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fmul_idx_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fmul_idx_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_6(gvec_fmla_idx_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(gvec_fmla_idx_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(gvec_fmla_idx_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 6ce1b01b9a..cf2a4d3284 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3136,6 +3136,50 @@ DO_ZZI(UMIN, umin)
 
 #undef DO_ZZI
 
+/*
+ *** SVE Floating Point Multiply-Add Indexed Group
+ */
+
+static void trans_FMLA_zzxz(DisasContext *s, arg_FMLA_zzxz *a, uint32_t insn)
+{
+    static gen_helper_gvec_4_ptr * const fns[3] = {
+        gen_helper_gvec_fmla_idx_h,
+        gen_helper_gvec_fmla_idx_s,
+        gen_helper_gvec_fmla_idx_d,
+    };
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_ptr status = get_fpstatus_ptr(a->esz == MO_16);
+
+    tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       vec_full_reg_offset(s, a->rm),
+                       vec_full_reg_offset(s, a->ra),
+                       status, vsz, vsz, a->index * 2 + a->sub,
+                       fns[a->esz - 1]);
+    tcg_temp_free_ptr(status);
+}
+
+/*
+ *** SVE Floating Point Multiply Indexed Group
+ */
+
+static void trans_FMUL_zzx(DisasContext *s, arg_FMUL_zzx *a, uint32_t insn)
+{
+    static gen_helper_gvec_3_ptr * const fns[3] = {
+        gen_helper_gvec_fmul_idx_h,
+        gen_helper_gvec_fmul_idx_s,
+        gen_helper_gvec_fmul_idx_d,
+    };
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_ptr status = get_fpstatus_ptr(a->esz == MO_16);
+
+    tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       vec_full_reg_offset(s, a->rm),
+                       status, vsz, vsz, a->index, fns[a->esz - 1]);
+    tcg_temp_free_ptr(status);
+}
+
 /*
  *** SVE Floating Point Accumulating Reduction Group
  */
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index ad5c29cdd5..e711a3217d 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -24,6 +24,22 @@
 #include "fpu/softfloat.h"
 
 
+/* Note that vector data is stored in host-endian 64-bit chunks,
+   so addressing units smaller than that needs a host-endian fixup.  */
+#ifdef HOST_WORDS_BIGENDIAN
+#define H1(x)   ((x) ^ 7)
+#define H1_2(x) ((x) ^ 6)
+#define H1_4(x) ((x) ^ 4)
+#define H2(x)   ((x) ^ 3)
+#define H4(x)   ((x) ^ 1)
+#else
+#define H1(x)   (x)
+#define H1_2(x) (x)
+#define H1_4(x) (x)
+#define H2(x)   (x)
+#define H4(x)   (x)
+#endif
+
 /* Floating-point trigonometric starting value.
  * See the ARM ARM pseudocode function FPTrigSMul.
  */
@@ -92,3 +108,51 @@ DO_3OP(gvec_rsqrts_d, helper_rsqrtsf_f64, float64)
 
 #endif
 #undef DO_3OP
+
+/* For the indexed ops, SVE applies the index per 128-bit vector segment.
+ * For AdvSIMD, there is of course only one such vector segment.
+ */
+
+#define DO_MUL_IDX(NAME, TYPE, H) \
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
+{                                                                          \
+    intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE);  \
+    intptr_t idx = simd_data(desc);                                        \
+    TYPE *d = vd, *n = vn, *m = vm;                                        \
+    for (i = 0; i < oprsz / sizeof(TYPE); i += segment) {                  \
+        TYPE mm = m[H(i + idx)];                                           \
+        for (j = 0; j < segment; j++) {                                    \
+            d[i + j] = TYPE##_mul(n[i + j], mm, stat);                     \
+        }                                                                  \
+    }                                                                      \
+}
+
+DO_MUL_IDX(gvec_fmul_idx_h, float16, H2)
+DO_MUL_IDX(gvec_fmul_idx_s, float32, H4)
+DO_MUL_IDX(gvec_fmul_idx_d, float64, )
+
+#undef DO_MUL_IDX
+
+#define DO_FMLA_IDX(NAME, TYPE, H)                                         \
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *va,                  \
+                  void *stat, uint32_t desc)                               \
+{                                                                          \
+    intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE);  \
+    TYPE op1_neg = extract32(desc, SIMD_DATA_SHIFT, 1);                    \
+    intptr_t idx = desc >> (SIMD_DATA_SHIFT + 1);                          \
+    TYPE *d = vd, *n = vn, *m = vm, *a = va;                               \
+    op1_neg <<= (8 * sizeof(TYPE) - 1);                                    \
+    for (i = 0; i < oprsz / sizeof(TYPE); i += segment) {                  \
+        TYPE mm = m[H(i + idx)];                                           \
+        for (j = 0; j < segment; j++) {                                    \
+            d[i + j] = TYPE##_muladd(n[i + j] ^ op1_neg,                   \
+                                     mm, a[i + j], 0, stat);               \
+        }                                                                  \
+    }                                                                      \
+}
+
+DO_FMLA_IDX(gvec_fmla_idx_h, float16, H2)
+DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4)
+DO_FMLA_IDX(gvec_fmla_idx_d, float64, )
+
+#undef DO_FMLA_IDX
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 258d14b729..d16e733aa3 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -30,6 +30,7 @@
 %preg4_5	5:4
 %size_23	23:2
 %dtype_23_13	23:2 13:2
+%index3_22_19	22:1 19:2
 
 # A combination of tsz:imm3 -- extract esize.
 %tszimm_esz	22:2 5:5 !function=tszimm_esz
@@ -720,6 +721,24 @@ UMIN_zzi	00100101 .. 101 011 110 ........ .....		@rdn_i8u
 # SVE integer multiply immediate (unpredicated)
 MUL_zzi		00100101 .. 110 000 110 ........ .....		@rdn_i8s
 
+### SVE FP Multiply-Add Indexed Group
+
+# SVE floating-point multiply-add (indexed)
+FMLA_zzxz	01100100 0.1 .. rm:3 00000 sub:1 rn:5 rd:5 \
+		ra=%reg_movprfx index=%index3_22_19 esz=1
+FMLA_zzxz	01100100 101 index:2 rm:3 00000 sub:1 rn:5 rd:5 \
+		ra=%reg_movprfx esz=2
+FMLA_zzxz	01100100 111 index:1 rm:4 00000 sub:1 rn:5 rd:5 \
+		ra=%reg_movprfx esz=3
+
+### SVE FP Multiply Indexed Group
+
+# SVE floating-point multiply (indexed)
+FMUL_zzx	01100100 0.1 .. rm:3 001000 rn:5 rd:5 \
+		index=%index3_22_19 esz=1
+FMUL_zzx	01100100 101 index:2 rm:3 001000 rn:5 rd:5	esz=2
+FMUL_zzx	01100100 111 index:1 rm:4 001000 rn:5 rd:5	esz=3
+
 ### SVE FP Accumulating Reduction Group
 
 # SVE floating-point serial reduction (predicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 60/67] target/arm: Implement SVE FP Fast Reduction Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (58 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 59/67] target/arm: Implement SVE Floating Point Multiply Indexed Group Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 15:24   ` Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 61/67] target/arm: Implement SVE Floating Point Unary Operations - Unpredicated Group Richard Henderson
                   ` (8 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 35 ++++++++++++++++++++++++++
 target/arm/sve_helper.c    | 61 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 55 +++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      |  8 ++++++
 4 files changed, 159 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 7ada12687b..c07b2245ba 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -725,6 +725,41 @@ DEF_HELPER_FLAGS_5(gvec_rsqrts_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_rsqrts_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(sve_faddv_h, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_faddv_s, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_faddv_d, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_fmaxnmv_h, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_fmaxnmv_s, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_fmaxnmv_d, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_fminnmv_h, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_fminnmv_s, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_fminnmv_d, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_fmaxv_h, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_fmaxv_s, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_fmaxv_d, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve_fminv_h, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_fminv_s, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve_fminv_d, TCG_CALL_NO_RWG,
+                   i64, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_fadda_h, TCG_CALL_NO_RWG,
                    i64, i64, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_fadda_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 9378c8f0b2..29deefcd86 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -2832,6 +2832,67 @@ uint32_t HELPER(sve_while)(void *vd, uint32_t count, uint32_t pred_desc)
     return predtest_ones(d, oprsz, esz_mask);
 }
 
+/* Recursive reduction on a function;
+ * C.f. the ARM ARM function ReducePredicated.
+ *
+ * While it would be possible to write this without the DATA temporary,
+ * it is much simpler to process the predicate register this way.
+ * The recursion is bounded to depth 7 (128 fp16 elements), so there's
+ * little to gain with a more complex non-recursive form.
+ */
+#define DO_REDUCE(NAME, TYPE, H, FUNC, IDENT)                         \
+static TYPE NAME##_reduce(TYPE *data, float_status *status, uintptr_t n) \
+{                                                                     \
+    if (n == 1) {                                                     \
+        return *data;                                                 \
+    } else {                                                          \
+        uintptr_t half = n / 2;                                       \
+        TYPE lo = NAME##_reduce(data, status, half);                  \
+        TYPE hi = NAME##_reduce(data + half, status, half);           \
+        return TYPE##_##FUNC(lo, hi, status);                         \
+    }                                                                 \
+}                                                                     \
+uint64_t HELPER(NAME)(void *vn, void *vg, void *vs, uint32_t desc)    \
+{                                                                     \
+    uintptr_t i, oprsz = simd_oprsz(desc), maxsz = simd_maxsz(desc);  \
+    TYPE data[sizeof(ARMVectorReg) / sizeof(TYPE)];                   \
+    for (i = 0; i < oprsz; ) {                                        \
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));               \
+        do {                                                          \
+            TYPE nn = *(TYPE *)(vn + H(i));                           \
+            *(TYPE *)((void *)data + i) = (pg & 1 ? nn : IDENT);      \
+            i += sizeof(TYPE), pg >>= sizeof(TYPE);                   \
+        } while (i & 15);                                             \
+    }                                                                 \
+    for (; i < maxsz; i += sizeof(TYPE)) {                            \
+        *(TYPE *)((void *)data + i) = IDENT;                          \
+    }                                                                 \
+    return NAME##_reduce(data, vs, maxsz / sizeof(TYPE));             \
+}
+
+DO_REDUCE(sve_faddv_h, float16, H1_2, add, float16_zero)
+DO_REDUCE(sve_faddv_s, float32, H1_4, add, float32_zero)
+DO_REDUCE(sve_faddv_d, float64,     , add, float64_zero)
+
+/* Identity is floatN_default_nan, without the function call.  */
+DO_REDUCE(sve_fminnmv_h, float16, H1_2, minnum, 0x7E00)
+DO_REDUCE(sve_fminnmv_s, float32, H1_4, minnum, 0x7FC00000)
+DO_REDUCE(sve_fminnmv_d, float64,     , minnum, 0x7FF8000000000000ULL)
+
+DO_REDUCE(sve_fmaxnmv_h, float16, H1_2, maxnum, 0x7E00)
+DO_REDUCE(sve_fmaxnmv_s, float32, H1_4, maxnum, 0x7FC00000)
+DO_REDUCE(sve_fmaxnmv_d, float64,     , maxnum, 0x7FF8000000000000ULL)
+
+DO_REDUCE(sve_fminv_h, float16, H1_2, min, float16_infinity)
+DO_REDUCE(sve_fminv_s, float32, H1_4, min, float32_infinity)
+DO_REDUCE(sve_fminv_d, float64,     , min, float64_infinity)
+
+DO_REDUCE(sve_fmaxv_h, float16, H1_2, max, float16_chs(float16_infinity))
+DO_REDUCE(sve_fmaxv_s, float32, H1_4, max, float32_chs(float32_infinity))
+DO_REDUCE(sve_fmaxv_d, float64,     , max, float64_chs(float64_infinity))
+
+#undef DO_REDUCE
+
 uint64_t HELPER(sve_fadda_h)(uint64_t nn, void *vm, void *vg,
                              void *status, uint32_t desc)
 {
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index cf2a4d3284..a77ddf0f4b 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3180,6 +3180,61 @@ static void trans_FMUL_zzx(DisasContext *s, arg_FMUL_zzx *a, uint32_t insn)
     tcg_temp_free_ptr(status);
 }
 
+/*
+ *** SVE Floating Point Fast Reduction Group
+ */
+
+typedef void gen_helper_fp_reduce(TCGv_i64, TCGv_ptr, TCGv_ptr,
+                                  TCGv_ptr, TCGv_i32);
+
+static void do_reduce(DisasContext *s, arg_rpr_esz *a,
+                      gen_helper_fp_reduce *fn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    unsigned p2vsz = pow2ceil(vsz);
+    TCGv_i32 t_desc = tcg_const_i32(simd_desc(vsz, p2vsz, 0));
+    TCGv_ptr t_zn, t_pg, status;
+    TCGv_i64 temp;
+
+    temp = tcg_temp_new_i64();
+    t_zn = tcg_temp_new_ptr();
+    t_pg = tcg_temp_new_ptr();
+
+    tcg_gen_addi_ptr(t_zn, cpu_env, vec_full_reg_offset(s, a->rn));
+    tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, a->pg));
+    status = get_fpstatus_ptr(a->esz == MO_16);
+
+    fn(temp, t_zn, t_pg, status, t_desc);
+    tcg_temp_free_ptr(t_zn);
+    tcg_temp_free_ptr(t_pg);
+    tcg_temp_free_ptr(status);
+    tcg_temp_free_i32(t_desc);
+
+    write_fp_dreg(s, a->rd, temp);
+    tcg_temp_free_i64(temp);
+}
+
+#define DO_VPZ(NAME, name) \
+static void trans_##NAME(DisasContext *s, arg_rpr_esz *a, uint32_t insn) \
+{                                                                        \
+    static gen_helper_fp_reduce * const fns[3] = {                       \
+        gen_helper_sve_##name##_h,                                       \
+        gen_helper_sve_##name##_s,                                       \
+        gen_helper_sve_##name##_d,                                       \
+    };                                                                   \
+    if (a->esz == 0) {                                                   \
+        unallocated_encoding(s);                                         \
+        return;                                                          \
+    }                                                                    \
+    do_reduce(s, a, fns[a->esz - 1]);                                    \
+}
+
+DO_VPZ(FADDV, faddv)
+DO_VPZ(FMINNMV, fminnmv)
+DO_VPZ(FMAXNMV, fmaxnmv)
+DO_VPZ(FMINV, fminv)
+DO_VPZ(FMAXV, fmaxv)
+
 /*
  *** SVE Floating Point Accumulating Reduction Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index d16e733aa3..feb8c65e89 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -739,6 +739,14 @@ FMUL_zzx	01100100 0.1 .. rm:3 001000 rn:5 rd:5 \
 FMUL_zzx	01100100 101 index:2 rm:3 001000 rn:5 rd:5	esz=2
 FMUL_zzx	01100100 111 index:1 rm:4 001000 rn:5 rd:5	esz=3
 
+### SVE FP Fast Reduction Group
+
+FADDV		01100101 .. 000 000 001 ... ..... .....		@rd_pg_rn
+FMAXNMV		01100101 .. 000 100 001 ... ..... .....		@rd_pg_rn
+FMINNMV		01100101 .. 000 101 001 ... ..... .....		@rd_pg_rn
+FMAXV		01100101 .. 000 110 001 ... ..... .....		@rd_pg_rn
+FMINV		01100101 .. 000 111 001 ... ..... .....		@rd_pg_rn
+
 ### SVE FP Accumulating Reduction Group
 
 # SVE floating-point serial reduction (predicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 61/67] target/arm: Implement SVE Floating Point Unary Operations - Unpredicated Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (59 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 60/67] target/arm: Implement SVE FP Fast Reduction Group Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 15:28   ` Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 62/67] target/arm: Implement SVE FP Compare with Zero Group Richard Henderson
                   ` (7 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h        |  8 ++++++++
 target/arm/translate-sve.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 target/arm/vec_helper.c    | 20 ++++++++++++++++++++
 target/arm/sve.decode      |  5 +++++
 4 files changed, 76 insertions(+)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index a8d824b085..4bfefe42b2 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -565,6 +565,14 @@ DEF_HELPER_2(dc_zva, void, env, i64)
 DEF_HELPER_FLAGS_2(neon_pmull_64_lo, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(neon_pmull_64_hi, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 
+DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(gvec_frsqrte_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_frsqrte_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_frsqrte_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(gvec_fadd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_fadd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_fadd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index a77ddf0f4b..463ff7b690 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3235,6 +3235,49 @@ DO_VPZ(FMAXNMV, fmaxnmv)
 DO_VPZ(FMINV, fminv)
 DO_VPZ(FMAXV, fmaxv)
 
+/*
+ *** SVE Floating Point Unary Operations - Unpredicated Group
+ */
+
+static void do_zz_fp(DisasContext *s, arg_rr_esz *a, gen_helper_gvec_2_ptr *fn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_ptr status = get_fpstatus_ptr(a->esz == MO_16);
+
+    tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       status, vsz, vsz, 0, fn);
+    tcg_temp_free_ptr(status);
+}
+
+static void trans_FRECPE(DisasContext *s, arg_rr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_2_ptr * const fns[3] = {
+        gen_helper_gvec_frecpe_h,
+        gen_helper_gvec_frecpe_s,
+        gen_helper_gvec_frecpe_d,
+    };
+    if (a->esz == 0) {
+        unallocated_encoding(s);
+    } else {
+        do_zz_fp(s, a, fns[a->esz - 1]);
+    }
+}
+
+static void trans_FRSQRTE(DisasContext *s, arg_rr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_2_ptr * const fns[3] = {
+        gen_helper_gvec_frsqrte_h,
+        gen_helper_gvec_frsqrte_s,
+        gen_helper_gvec_frsqrte_d,
+    };
+    if (a->esz == 0) {
+        unallocated_encoding(s);
+    } else {
+        do_zz_fp(s, a, fns[a->esz - 1]);
+    }
+}
+
 /*
  *** SVE Floating Point Accumulating Reduction Group
  */
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index e711a3217d..60dc07cf87 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -40,6 +40,26 @@
 #define H4(x)   (x)
 #endif
 
+#define DO_2OP(NAME, FUNC, TYPE) \
+void HELPER(NAME)(void *vd, void *vn, void *stat, uint32_t desc)  \
+{                                                                 \
+    intptr_t i, oprsz = simd_oprsz(desc);                         \
+    TYPE *d = vd, *n = vn;                                        \
+    for (i = 0; i < oprsz / sizeof(TYPE); i++) {                  \
+        d[i] = FUNC(n[i], stat);                                  \
+    }                                                             \
+}
+
+DO_2OP(gvec_frecpe_h, helper_recpe_f16, float16)
+DO_2OP(gvec_frecpe_s, helper_recpe_f32, float32)
+DO_2OP(gvec_frecpe_d, helper_recpe_f64, float64)
+
+DO_2OP(gvec_frsqrte_h, helper_rsqrte_f16, float16)
+DO_2OP(gvec_frsqrte_s, helper_rsqrte_f32, float32)
+DO_2OP(gvec_frsqrte_d, helper_rsqrte_f64, float64)
+
+#undef DO_2OP
+
 /* Floating-point trigonometric starting value.
  * See the ARM ARM pseudocode function FPTrigSMul.
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index feb8c65e89..112e85174c 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -747,6 +747,11 @@ FMINNMV		01100101 .. 000 101 001 ... ..... .....		@rd_pg_rn
 FMAXV		01100101 .. 000 110 001 ... ..... .....		@rd_pg_rn
 FMINV		01100101 .. 000 111 001 ... ..... .....		@rd_pg_rn
 
+## SVE Floating Point Unary Operations - Unpredicated Group
+
+FRECPE		01100101 .. 001 110 001110 ..... .....		@rd_rn
+FRSQRTE		01100101 .. 001 111 001110 ..... .....		@rd_rn
+
 ### SVE FP Accumulating Reduction Group
 
 # SVE floating-point serial reduction (predicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 62/67] target/arm: Implement SVE FP Compare with Zero Group
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (60 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 61/67] target/arm: Implement SVE Floating Point Unary Operations - Unpredicated Group Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 15:31   ` Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 63/67] target/arm: Implement SVE floating-point trig multiply-add coefficient Richard Henderson
                   ` (6 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 42 ++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve_helper.c    | 45 +++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 41 +++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      | 10 ++++++++++
 4 files changed, 138 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index c07b2245ba..696c97648b 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -767,6 +767,48 @@ DEF_HELPER_FLAGS_5(sve_fadda_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(sve_fadda_d, TCG_CALL_NO_RWG,
                    i64, i64, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(sve_fcmge0_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcmge0_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcmge0_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_fcmgt0_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcmgt0_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcmgt0_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_fcmlt0_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcmlt0_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcmlt0_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_fcmle0_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcmle0_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcmle0_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_fcmeq0_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcmeq0_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcmeq0_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_fcmne0_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcmne0_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcmne0_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_6(sve_fadd_h, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_6(sve_fadd_s, TCG_CALL_NO_RWG,
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 29deefcd86..6a052ce9ad 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -3270,6 +3270,8 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg,               \
 
 #define DO_FCMGE(TYPE, X, Y, ST)  TYPE##_compare(Y, X, ST) <= 0
 #define DO_FCMGT(TYPE, X, Y, ST)  TYPE##_compare(Y, X, ST) < 0
+#define DO_FCMLE(TYPE, X, Y, ST)  TYPE##_compare(X, Y, ST) <= 0
+#define DO_FCMLT(TYPE, X, Y, ST)  TYPE##_compare(X, Y, ST) < 0
 #define DO_FCMEQ(TYPE, X, Y, ST)  TYPE##_compare_quiet(X, Y, ST) == 0
 #define DO_FCMNE(TYPE, X, Y, ST)  TYPE##_compare_quiet(X, Y, ST) != 0
 #define DO_FCMUO(TYPE, X, Y, ST)  \
@@ -3293,6 +3295,49 @@ DO_FPCMP_PPZZ_ALL(sve_facgt, DO_FACGT)
 #undef DO_FPCMP_PPZZ_H
 #undef DO_FPCMP_PPZZ
 
+/* One operand floating-point comparison against zero, controlled
+ * by a predicate.
+ */
+#define DO_FPCMP_PPZ0(NAME, TYPE, H, OP)                   \
+void HELPER(NAME)(void *vd, void *vn, void *vg,            \
+                  void *status, uint32_t desc)             \
+{                                                          \
+    intptr_t opr_sz = simd_oprsz(desc);                    \
+    intptr_t i = opr_sz, j = ((opr_sz - 1) & -64) >> 3;    \
+    do {                                                   \
+        uint64_t out = 0;                                  \
+        uint64_t pg = *(uint64_t *)(vg + j);               \
+        do {                                               \
+            i -= sizeof(TYPE), out <<= sizeof(TYPE);       \
+            if ((pg >> (i & 63)) & 1) {                    \
+                TYPE nn = *(TYPE *)(vn + H(i));            \
+                out |= OP(TYPE, nn, 0, status);            \
+            }                                              \
+        } while (i & 63);                                  \
+        *(uint64_t *)(vd + j) = out;                       \
+        j -= 8;                                            \
+    } while (i > 0);                                       \
+}
+
+#define DO_FPCMP_PPZ0_H(NAME, OP) \
+    DO_FPCMP_PPZ0(NAME##_h, float16, H1_2, OP)
+#define DO_FPCMP_PPZ0_S(NAME, OP) \
+    DO_FPCMP_PPZ0(NAME##_s, float32, H1_4, OP)
+#define DO_FPCMP_PPZ0_D(NAME, OP) \
+    DO_FPCMP_PPZ0(NAME##_d, float64,     , OP)
+
+#define DO_FPCMP_PPZ0_ALL(NAME, OP) \
+    DO_FPCMP_PPZ0_H(NAME, OP)   \
+    DO_FPCMP_PPZ0_S(NAME, OP)   \
+    DO_FPCMP_PPZ0_D(NAME, OP)
+
+DO_FPCMP_PPZ0_ALL(sve_fcmge0, DO_FCMGE)
+DO_FPCMP_PPZ0_ALL(sve_fcmgt0, DO_FCMGT)
+DO_FPCMP_PPZ0_ALL(sve_fcmle0, DO_FCMLE)
+DO_FPCMP_PPZ0_ALL(sve_fcmlt0, DO_FCMLT)
+DO_FPCMP_PPZ0_ALL(sve_fcmeq0, DO_FCMEQ)
+DO_FPCMP_PPZ0_ALL(sve_fcmne0, DO_FCMNE)
+
 /*
  * Load contiguous data, protected by a governing predicate.
  */
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 463ff7b690..02655bff03 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3278,6 +3278,47 @@ static void trans_FRSQRTE(DisasContext *s, arg_rr_esz *a, uint32_t insn)
     }
 }
 
+/*
+ *** SVE Floating Point Compare with Zero Group
+ */
+
+static void do_ppz_fp(DisasContext *s, arg_rpr_esz *a,
+                      gen_helper_gvec_3_ptr *fn)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_ptr status = get_fpstatus_ptr(a->esz == MO_16);
+
+    tcg_gen_gvec_3_ptr(pred_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       pred_full_reg_offset(s, a->pg),
+                       status, vsz, vsz, 0, fn);
+    tcg_temp_free_ptr(status);
+}
+
+#define DO_PPZ(NAME, name) \
+static void trans_##NAME(DisasContext *s, arg_rpr_esz *a, uint32_t insn) \
+{                                                                 \
+    static gen_helper_gvec_3_ptr * const fns[3] = {               \
+        gen_helper_sve_##name##_h,                                \
+        gen_helper_sve_##name##_s,                                \
+        gen_helper_sve_##name##_d,                                \
+    };                                                            \
+    if (a->esz == 0) {                                            \
+        unallocated_encoding(s);                                  \
+        return;                                                   \
+    }                                                             \
+    do_ppz_fp(s, a, fns[a->esz - 1]);                             \
+}
+
+DO_PPZ(FCMGE_ppz0, fcmge0)
+DO_PPZ(FCMGT_ppz0, fcmgt0)
+DO_PPZ(FCMLE_ppz0, fcmle0)
+DO_PPZ(FCMLT_ppz0, fcmlt0)
+DO_PPZ(FCMEQ_ppz0, fcmeq0)
+DO_PPZ(FCMNE_ppz0, fcmne0)
+
+#undef DO_PPZ
+
 /*
  *** SVE Floating Point Accumulating Reduction Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 112e85174c..f4505ad0bf 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -141,6 +141,7 @@
 # One register operand, with governing predicate, vector element size
 @rd_pg_rn	........ esz:2 ... ... ... pg:3 rn:5 rd:5	&rpr_esz
 @rd_pg4_pn	........ esz:2 ... ... .. pg:4 . rn:4 rd:5	&rpr_esz
+@pd_pg_rn	........ esz:2 ... ... ... pg:3 rn:5 . rd:4	&rpr_esz
 
 # One register operand, with governing predicate, no vector element size
 @rd_pg_rn_e0	........ .. ... ... ... pg:3 rn:5 rd:5		&rpr_esz esz=0
@@ -752,6 +753,15 @@ FMINV		01100101 .. 000 111 001 ... ..... .....		@rd_pg_rn
 FRECPE		01100101 .. 001 110 001110 ..... .....		@rd_rn
 FRSQRTE		01100101 .. 001 111 001110 ..... .....		@rd_rn
 
+### SVE FP Compare with Zero Group
+
+FCMGE_ppz0	01100101 .. 0100 00 001 ... ..... 0 ....	@pd_pg_rn
+FCMGT_ppz0	01100101 .. 0100 00 001 ... ..... 1 ....	@pd_pg_rn
+FCMLT_ppz0	01100101 .. 0100 01 001 ... ..... 0 ....	@pd_pg_rn
+FCMLE_ppz0	01100101 .. 0100 01 001 ... ..... 1 ....	@pd_pg_rn
+FCMEQ_ppz0	01100101 .. 0100 10 001 ... ..... 0 ....	@pd_pg_rn
+FCMNE_ppz0	01100101 .. 0100 11 001 ... ..... 0 ....	@pd_pg_rn
+
 ### SVE FP Accumulating Reduction Group
 
 # SVE floating-point serial reduction (predicated)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 63/67] target/arm: Implement SVE floating-point trig multiply-add coefficient
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (61 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 62/67] target/arm: Implement SVE FP Compare with Zero Group Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 15:34   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 64/67] target/arm: Implement SVE floating-point convert precision Richard Henderson
                   ` (5 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  4 +++
 target/arm/sve_helper.c    | 70 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c | 26 +++++++++++++++++
 target/arm/sve.decode      |  3 ++
 4 files changed, 103 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 696c97648b..ce5fe24dc2 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -1037,6 +1037,10 @@ DEF_HELPER_FLAGS_3(sve_fnmls_zpzzz_h, TCG_CALL_NO_RWG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_fnmls_zpzzz_s, TCG_CALL_NO_RWG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(sve_fnmls_zpzzz_d, TCG_CALL_NO_RWG, void, env, ptr, i32)
 
+DEF_HELPER_FLAGS_5(sve_ftmad_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_ftmad_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_ftmad_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(sve_ld1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_ld2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_ld3bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 6a052ce9ad..53e3516f47 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -3338,6 +3338,76 @@ DO_FPCMP_PPZ0_ALL(sve_fcmlt0, DO_FCMLT)
 DO_FPCMP_PPZ0_ALL(sve_fcmeq0, DO_FCMEQ)
 DO_FPCMP_PPZ0_ALL(sve_fcmne0, DO_FCMNE)
 
+/* FP Trig Multiply-Add. */
+
+void HELPER(sve_ftmad_h)(void *vd, void *vn, void *vm, void *vs, uint32_t desc)
+{
+    static const float16 coeff[16] = {
+        0x3c00, 0xb155, 0x2030, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
+        0x3c00, 0xb800, 0x293a, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
+    };
+    intptr_t i, opr_sz = simd_oprsz(desc) / sizeof(float16);
+    intptr_t x = simd_data(desc);
+    float16 *d = vd, *n = vn, *m = vm;
+    for (i = 0; i < opr_sz; i++) {
+        float16 mm = m[i];
+        intptr_t xx = x;
+        if (float16_is_neg(mm)) {
+            mm = float16_abs(mm);
+            xx += 8;
+        }
+        d[i] = float16_muladd(n[i], mm, coeff[xx], 0, vs);
+    }
+}
+
+void HELPER(sve_ftmad_s)(void *vd, void *vn, void *vm, void *vs, uint32_t desc)
+{
+    static const float32 coeff[16] = {
+        0x3f800000, 0xbe2aaaab, 0x3c088886, 0xb95008b9,
+        0x36369d6d, 0x00000000, 0x00000000, 0x00000000,
+        0x3f800000, 0xbf000000, 0x3d2aaaa6, 0xbab60705,
+        0x37cd37cc, 0x00000000, 0x00000000, 0x00000000,
+    };
+    intptr_t i, opr_sz = simd_oprsz(desc) / sizeof(float32);
+    intptr_t x = simd_data(desc);
+    float32 *d = vd, *n = vn, *m = vm;
+    for (i = 0; i < opr_sz; i++) {
+        float32 mm = m[i];
+        intptr_t xx = x;
+        if (float32_is_neg(mm)) {
+            mm = float32_abs(mm);
+            xx += 8;
+        }
+        d[i] = float32_muladd(n[i], mm, coeff[xx], 0, vs);
+    }
+}
+
+void HELPER(sve_ftmad_d)(void *vd, void *vn, void *vm, void *vs, uint32_t desc)
+{
+    static const float64 coeff[16] = {
+        0x3ff0000000000000ull, 0xbfc5555555555543ull,
+        0x3f8111111110f30cull, 0xbf2a01a019b92fc6ull,
+        0x3ec71de351f3d22bull, 0xbe5ae5e2b60f7b91ull,
+        0x3de5d8408868552full, 0x0000000000000000ull,
+        0x3ff0000000000000ull, 0xbfe0000000000000ull,
+        0x3fa5555555555536ull, 0xbf56c16c16c13a0bull,
+        0x3efa01a019b1e8d8ull, 0xbe927e4f7282f468ull,
+        0x3e21ee96d2641b13ull, 0xbda8f76380fbb401ull,
+    };
+    intptr_t i, opr_sz = simd_oprsz(desc) / sizeof(float64);
+    intptr_t x = simd_data(desc);
+    float64 *d = vd, *n = vn, *m = vm;
+    for (i = 0; i < opr_sz; i++) {
+        float64 mm = m[i];
+        intptr_t xx = x;
+        if (float64_is_neg(mm)) {
+            mm = float64_abs(mm);
+            xx += 8;
+        }
+        d[i] = float64_muladd(n[i], mm, coeff[xx], 0, vs);
+    }
+}
+
 /*
  * Load contiguous data, protected by a governing predicate.
  */
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 02655bff03..e185af29e3 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3319,6 +3319,32 @@ DO_PPZ(FCMNE_ppz0, fcmne0)
 
 #undef DO_PPZ
 
+/*
+ *** SVE floating-point trig multiply-add coefficient
+ */
+
+static void trans_FTMAD(DisasContext *s, arg_FTMAD *a, uint32_t insn)
+{
+    static gen_helper_gvec_3_ptr * const fns[3] = {
+        gen_helper_sve_ftmad_h,
+        gen_helper_sve_ftmad_s,
+        gen_helper_sve_ftmad_d,
+    };
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_ptr status;
+
+    if (a->esz == 0) {
+        unallocated_encoding(s);
+        return;
+    }
+    status = get_fpstatus_ptr(a->esz == MO_16);
+    tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       vec_full_reg_offset(s, a->rm),
+                       status, vsz, vsz, a->imm, fns[a->esz - 1]);
+    tcg_temp_free_ptr(status);
+}
+
 /*
  *** SVE Floating Point Accumulating Reduction Group
  */
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index f4505ad0bf..ca54895900 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -804,6 +804,9 @@ FMINNM_zpzi	01100101 .. 011 101 100 ... 0000 . .....	@rdn_i1
 FMAX_zpzi	01100101 .. 011 110 100 ... 0000 . .....	@rdn_i1
 FMIN_zpzi	01100101 .. 011 111 100 ... 0000 . .....	@rdn_i1
 
+# SVE floating-point trig multiply-add coefficient
+FTMAD		01100101 esz:2 010 imm:3 100000 rm:5 rd:5	rn=%reg_movprfx
+
 ### SVE FP Multiply-Add Group
 
 # SVE floating-point multiply-accumulate writing addend
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 64/67] target/arm: Implement SVE floating-point convert precision
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (62 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 63/67] target/arm: Implement SVE floating-point trig multiply-add coefficient Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 15:35   ` Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 65/67] target/arm: Implement SVE floating-point convert to integer Richard Henderson
                   ` (4 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 13 +++++++++++++
 target/arm/sve_helper.c    | 27 +++++++++++++++++++++++++++
 target/arm/translate-sve.c | 30 ++++++++++++++++++++++++++++++
 target/arm/sve.decode      |  8 ++++++++
 4 files changed, 78 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index ce5fe24dc2..bac4bfdc60 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -942,6 +942,19 @@ DEF_HELPER_FLAGS_6(sve_fmins_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(sve_fmins_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, i64, ptr, i32)
 
+DEF_HELPER_FLAGS_5(sve_fcvt_sh, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcvt_dh, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcvt_hs, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcvt_ds, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcvt_hd, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcvt_sd, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_scvt_hh, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_scvt_sh, TCG_CALL_NO_RWG,
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 53e3516f47..9db01ac2f2 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -3157,6 +3157,33 @@ void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc) \
     }                                                           \
 }
 
+static inline float32 float16_to_float32_ieee(float16 f, float_status *s)
+{
+    return float16_to_float32(f, true, s);
+}
+
+static inline float64 float16_to_float64_ieee(float16 f, float_status *s)
+{
+    return float16_to_float64(f, true, s);
+}
+
+static inline float16 float32_to_float16_ieee(float32 f, float_status *s)
+{
+    return float32_to_float16(f, true, s);
+}
+
+static inline float16 float64_to_float16_ieee(float64 f, float_status *s)
+{
+    return float64_to_float16(f, true, s);
+}
+
+DO_ZPZ_FP(sve_fcvt_sh, uint32_t, H1_4, float32_to_float16_ieee)
+DO_ZPZ_FP(sve_fcvt_hs, uint32_t, H1_4, float16_to_float32_ieee)
+DO_ZPZ_FP_D(sve_fcvt_dh, uint64_t, float64_to_float16_ieee)
+DO_ZPZ_FP_D(sve_fcvt_hd, uint64_t, float16_to_float64_ieee)
+DO_ZPZ_FP_D(sve_fcvt_ds, uint64_t, float64_to_float32)
+DO_ZPZ_FP_D(sve_fcvt_sd, uint64_t, float32_to_float64)
+
 DO_ZPZ_FP(sve_scvt_hh, uint16_t, H1_2, int16_to_float16)
 DO_ZPZ_FP(sve_scvt_sh, uint32_t, H1_4, int32_to_float16)
 DO_ZPZ_FP(sve_scvt_ss, uint32_t, H1_4, int32_to_float32)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index e185af29e3..361d545965 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3651,6 +3651,36 @@ static void do_zpz_ptr(DisasContext *s, int rd, int rn, int pg,
     tcg_temp_free_ptr(status);
 }
 
+static void trans_FCVT_sh(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_fcvt_sh);
+}
+
+static void trans_FCVT_hs(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_hs);
+}
+
+static void trans_FCVT_dh(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_fcvt_dh);
+}
+
+static void trans_FCVT_hd(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_hd);
+}
+
+static void trans_FCVT_ds(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_ds);
+}
+
+static void trans_FCVT_sd(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_sd);
+}
+
 static void trans_SCVTF_hh(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
 {
     do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_scvt_hh);
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index ca54895900..d44cf17fc8 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -824,6 +824,14 @@ FNMLS_zpzzz	01100101 .. 1 ..... 111 ... ..... .....		@rdn_pg_rm_ra
 
 ### SVE FP Unary Operations Predicated Group
 
+# SVE floating-point convert precision
+FCVT_sh		01100101 10 0010 00 101 ... ..... .....		@rd_pg_rn_e0
+FCVT_hs		01100101 10 0010 01 101 ... ..... .....		@rd_pg_rn_e0
+FCVT_dh		01100101 11 0010 00 101 ... ..... .....		@rd_pg_rn_e0
+FCVT_hd		01100101 11 0010 01 101 ... ..... .....		@rd_pg_rn_e0
+FCVT_ds		01100101 11 0010 10 101 ... ..... .....		@rd_pg_rn_e0
+FCVT_sd		01100101 11 0010 11 101 ... ..... .....		@rd_pg_rn_e0
+
 # SVE integer convert to floating-point
 SCVTF_hh	01100101 01 010 01 0 101 ... ..... .....	@rd_pg_rn_e0
 SCVTF_sh	01100101 01 010 10 0 101 ... ..... .....	@rd_pg_rn_e0
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 65/67] target/arm: Implement SVE floating-point convert to integer
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (63 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 64/67] target/arm: Implement SVE floating-point convert precision Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 15:36   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 66/67] target/arm: Implement SVE floating-point round to integral value Richard Henderson
                   ` (3 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 30 ++++++++++++++++++++
 target/arm/sve_helper.c    | 16 +++++++++++
 target/arm/translate-sve.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      | 16 +++++++++++
 4 files changed, 132 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index bac4bfdc60..0f5fea9045 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -955,6 +955,36 @@ DEF_HELPER_FLAGS_5(sve_fcvt_hd, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(sve_fcvt_sd, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(sve_fcvtzs_hh, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcvtzs_hs, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcvtzs_ss, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcvtzs_ds, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcvtzs_hd, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcvtzs_sd, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcvtzs_dd, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_fcvtzu_hh, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcvtzu_hs, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcvtzu_ss, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcvtzu_ds, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcvtzu_hd, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcvtzu_sd, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fcvtzu_dd, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_scvt_hh, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_scvt_sh, TCG_CALL_NO_RWG,
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 9db01ac2f2..09f5c77254 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -3184,6 +3184,22 @@ DO_ZPZ_FP_D(sve_fcvt_hd, uint64_t, float16_to_float64_ieee)
 DO_ZPZ_FP_D(sve_fcvt_ds, uint64_t, float64_to_float32)
 DO_ZPZ_FP_D(sve_fcvt_sd, uint64_t, float32_to_float64)
 
+DO_ZPZ_FP(sve_fcvtzs_hh, uint16_t, H1_2, float16_to_int16_round_to_zero)
+DO_ZPZ_FP(sve_fcvtzs_hs, uint32_t, H1_4, float16_to_int32_round_to_zero)
+DO_ZPZ_FP(sve_fcvtzs_ss, uint32_t, H1_4, float32_to_int32_round_to_zero)
+DO_ZPZ_FP_D(sve_fcvtzs_hd, uint64_t, float16_to_int64_round_to_zero)
+DO_ZPZ_FP_D(sve_fcvtzs_sd, uint64_t, float32_to_int64_round_to_zero)
+DO_ZPZ_FP_D(sve_fcvtzs_ds, uint64_t, float64_to_int32_round_to_zero)
+DO_ZPZ_FP_D(sve_fcvtzs_dd, uint64_t, float64_to_int64_round_to_zero)
+
+DO_ZPZ_FP(sve_fcvtzu_hh, uint16_t, H1_2, float16_to_uint16_round_to_zero)
+DO_ZPZ_FP(sve_fcvtzu_hs, uint32_t, H1_4, float16_to_uint32_round_to_zero)
+DO_ZPZ_FP(sve_fcvtzu_ss, uint32_t, H1_4, float32_to_uint32_round_to_zero)
+DO_ZPZ_FP_D(sve_fcvtzu_hd, uint64_t, float16_to_uint64_round_to_zero)
+DO_ZPZ_FP_D(sve_fcvtzu_sd, uint64_t, float32_to_uint64_round_to_zero)
+DO_ZPZ_FP_D(sve_fcvtzu_ds, uint64_t, float64_to_uint32_round_to_zero)
+DO_ZPZ_FP_D(sve_fcvtzu_dd, uint64_t, float64_to_uint64_round_to_zero)
+
 DO_ZPZ_FP(sve_scvt_hh, uint16_t, H1_2, int16_to_float16)
 DO_ZPZ_FP(sve_scvt_sh, uint32_t, H1_4, int32_to_float16)
 DO_ZPZ_FP(sve_scvt_ss, uint32_t, H1_4, int32_to_float32)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 361d545965..bc865dfd15 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3681,6 +3681,76 @@ static void trans_FCVT_sd(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
     do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_sd);
 }
 
+static void trans_FCVTZS_hh(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_fcvtzs_hh);
+}
+
+static void trans_FCVTZU_hh(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_fcvtzu_hh);
+}
+
+static void trans_FCVTZS_hs(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_fcvtzs_hs);
+}
+
+static void trans_FCVTZU_hs(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_fcvtzu_hs);
+}
+
+static void trans_FCVTZS_hd(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_fcvtzs_hd);
+}
+
+static void trans_FCVTZU_hd(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_fcvtzu_hd);
+}
+
+static void trans_FCVTZS_ss(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvtzs_ss);
+}
+
+static void trans_FCVTZU_ss(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvtzu_ss);
+}
+
+static void trans_FCVTZS_sd(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvtzs_sd);
+}
+
+static void trans_FCVTZU_sd(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvtzu_sd);
+}
+
+static void trans_FCVTZS_ds(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvtzs_ds);
+}
+
+static void trans_FCVTZU_ds(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvtzu_ds);
+}
+
+static void trans_FCVTZS_dd(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvtzs_dd);
+}
+
+static void trans_FCVTZU_dd(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvtzu_dd);
+}
+
 static void trans_SCVTF_hh(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
 {
     do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_scvt_hh);
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index d44cf17fc8..92dda3a241 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -832,6 +832,22 @@ FCVT_hd		01100101 11 0010 01 101 ... ..... .....		@rd_pg_rn_e0
 FCVT_ds		01100101 11 0010 10 101 ... ..... .....		@rd_pg_rn_e0
 FCVT_sd		01100101 11 0010 11 101 ... ..... .....		@rd_pg_rn_e0
 
+# SVE floating-point convert to integer
+FCVTZS_hh	01100101 01 011 01 0 101 ... ..... .....	@rd_pg_rn_e0
+FCVTZU_hh	01100101 01 011 01 1 101 ... ..... .....	@rd_pg_rn_e0
+FCVTZS_hs	01100101 01 011 10 0 101 ... ..... .....	@rd_pg_rn_e0
+FCVTZU_hs	01100101 01 011 10 1 101 ... ..... .....	@rd_pg_rn_e0
+FCVTZS_hd	01100101 01 011 11 0 101 ... ..... .....	@rd_pg_rn_e0
+FCVTZU_hd	01100101 01 011 11 1 101 ... ..... .....	@rd_pg_rn_e0
+FCVTZS_ss	01100101 10 011 10 0 101 ... ..... .....	@rd_pg_rn_e0
+FCVTZU_ss	01100101 10 011 10 1 101 ... ..... .....	@rd_pg_rn_e0
+FCVTZS_ds	01100101 11 011 00 0 101 ... ..... .....	@rd_pg_rn_e0
+FCVTZU_ds	01100101 11 011 00 1 101 ... ..... .....	@rd_pg_rn_e0
+FCVTZS_sd	01100101 11 011 10 0 101 ... ..... .....	@rd_pg_rn_e0
+FCVTZU_sd	01100101 11 011 10 1 101 ... ..... .....	@rd_pg_rn_e0
+FCVTZS_dd	01100101 11 011 11 0 101 ... ..... .....	@rd_pg_rn_e0
+FCVTZU_dd	01100101 11 011 11 1 101 ... ..... .....	@rd_pg_rn_e0
+
 # SVE integer convert to floating-point
 SCVTF_hh	01100101 01 010 01 0 101 ... ..... .....	@rd_pg_rn_e0
 SCVTF_sh	01100101 01 010 10 0 101 ... ..... .....	@rd_pg_rn_e0
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 66/67] target/arm: Implement SVE floating-point round to integral value
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (64 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 65/67] target/arm: Implement SVE floating-point convert to integer Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 15:39   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 67/67] target/arm: Implement SVE floating-point unary operations Richard Henderson
                   ` (2 subsequent siblings)
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 14 ++++++++
 target/arm/sve_helper.c    |  8 +++++
 target/arm/translate-sve.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++
 target/arm/sve.decode      |  9 ++++++
 4 files changed, 111 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 0f5fea9045..749bab0b38 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -985,6 +985,20 @@ DEF_HELPER_FLAGS_5(sve_fcvtzu_sd, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(sve_fcvtzu_dd, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(sve_frint_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_frint_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_frint_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_frintx_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_frintx_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_frintx_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_scvt_hh, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_scvt_sh, TCG_CALL_NO_RWG,
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 09f5c77254..7950710be7 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -3200,6 +3200,14 @@ DO_ZPZ_FP_D(sve_fcvtzu_sd, uint64_t, float32_to_uint64_round_to_zero)
 DO_ZPZ_FP_D(sve_fcvtzu_ds, uint64_t, float64_to_uint32_round_to_zero)
 DO_ZPZ_FP_D(sve_fcvtzu_dd, uint64_t, float64_to_uint64_round_to_zero)
 
+DO_ZPZ_FP(sve_frint_h, uint16_t, H1_2, helper_advsimd_rinth)
+DO_ZPZ_FP(sve_frint_s, uint32_t, H1_4, helper_rints)
+DO_ZPZ_FP_D(sve_frint_d, uint64_t, helper_rintd)
+
+DO_ZPZ_FP(sve_frintx_h, uint16_t, H1_2, float16_round_to_int)
+DO_ZPZ_FP(sve_frintx_s, uint32_t, H1_4, float32_round_to_int)
+DO_ZPZ_FP_D(sve_frintx_d, uint64_t, float64_round_to_int)
+
 DO_ZPZ_FP(sve_scvt_hh, uint16_t, H1_2, int16_to_float16)
 DO_ZPZ_FP(sve_scvt_sh, uint32_t, H1_4, int32_to_float16)
 DO_ZPZ_FP(sve_scvt_ss, uint32_t, H1_4, int32_to_float32)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index bc865dfd15..5f1c4984b8 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3751,6 +3751,86 @@ static void trans_FCVTZU_dd(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
     do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvtzu_dd);
 }
 
+static gen_helper_gvec_3_ptr * const frint_fns[3] = {
+    gen_helper_sve_frint_h,
+    gen_helper_sve_frint_s,
+    gen_helper_sve_frint_d
+};
+
+static void trans_FRINTI(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    if (a->esz == 0) {
+        unallocated_encoding(s);
+    } else {
+        do_zpz_ptr(s, a->rd, a->rn, a->pg, a->esz == MO_16,
+                   frint_fns[a->esz - 1]);
+    }
+}
+
+static void trans_FRINTX(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3_ptr * const fns[3] = {
+        gen_helper_sve_frintx_h,
+        gen_helper_sve_frintx_s,
+        gen_helper_sve_frintx_d
+    };
+    if (a->esz == 0) {
+        unallocated_encoding(s);
+    } else {
+        do_zpz_ptr(s, a->rd, a->rn, a->pg, a->esz == MO_16, fns[a->esz - 1]);
+    }
+}
+
+static void do_frint_mode(DisasContext *s, arg_rpr_esz *a, int mode)
+{
+    unsigned vsz = vec_full_reg_size(s);
+    TCGv_i32 tmode;
+    TCGv_ptr status;
+
+    if (a->esz == 0) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    tmode = tcg_const_i32(mode);
+    status = get_fpstatus_ptr(a->esz == MO_16);
+    gen_helper_set_rmode(tmode, tmode, status);
+
+    tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, a->rd),
+                       vec_full_reg_offset(s, a->rn),
+                       pred_full_reg_offset(s, a->pg),
+                       status, vsz, vsz, 0, frint_fns[a->esz - 1]);
+
+    gen_helper_set_rmode(tmode, tmode, status);
+    tcg_temp_free_i32(tmode);
+    tcg_temp_free_ptr(status);
+}
+
+static void trans_FRINTN(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_frint_mode(s, a, float_round_nearest_even);
+}
+
+static void trans_FRINTP(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_frint_mode(s, a, float_round_up);
+}
+
+static void trans_FRINTM(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_frint_mode(s, a, float_round_down);
+}
+
+static void trans_FRINTZ(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_frint_mode(s, a, float_round_to_zero);
+}
+
+static void trans_FRINTA(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    do_frint_mode(s, a, float_round_ties_away);
+}
+
 static void trans_SCVTF_hh(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
 {
     do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_scvt_hh);
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 92dda3a241..e06c0c5279 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -848,6 +848,15 @@ FCVTZU_sd	01100101 11 011 10 1 101 ... ..... .....	@rd_pg_rn_e0
 FCVTZS_dd	01100101 11 011 11 0 101 ... ..... .....	@rd_pg_rn_e0
 FCVTZU_dd	01100101 11 011 11 1 101 ... ..... .....	@rd_pg_rn_e0
 
+# SVE floating-point round to integral value
+FRINTN		01100101 .. 000 000 101 ... ..... .....		@rd_pg_rn
+FRINTP		01100101 .. 000 001 101 ... ..... .....		@rd_pg_rn
+FRINTM		01100101 .. 000 010 101 ... ..... .....		@rd_pg_rn
+FRINTZ		01100101 .. 000 011 101 ... ..... .....		@rd_pg_rn
+FRINTA		01100101 .. 000 100 101 ... ..... .....		@rd_pg_rn
+FRINTX		01100101 .. 000 110 101 ... ..... .....		@rd_pg_rn
+FRINTI		01100101 .. 000 111 101 ... ..... .....		@rd_pg_rn
+
 # SVE integer convert to floating-point
 SCVTF_hh	01100101 01 010 01 0 101 ... ..... .....	@rd_pg_rn_e0
 SCVTF_sh	01100101 01 010 10 0 101 ... ..... .....	@rd_pg_rn_e0
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* [Qemu-devel] [PATCH v2 67/67] target/arm: Implement SVE floating-point unary operations
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (65 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 66/67] target/arm: Implement SVE floating-point round to integral value Richard Henderson
@ 2018-02-17 18:23 ` Richard Henderson
  2018-02-27 15:40   ` Peter Maydell
  2018-02-23 17:05 ` [Qemu-devel] [Qemu-arm] [PATCH v2 00/67] target/arm: Scalable Vector Extension Alex Bennée
  2018-04-03 15:41 ` Alex Bennée
  68 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-17 18:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 14 ++++++++++++++
 target/arm/sve_helper.c    |  8 ++++++++
 target/arm/translate-sve.c | 28 ++++++++++++++++++++++++++++
 target/arm/sve.decode      |  4 ++++
 4 files changed, 54 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 749bab0b38..5cebc9121d 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -999,6 +999,20 @@ DEF_HELPER_FLAGS_5(sve_frintx_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(sve_frintx_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(sve_frecpx_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_frecpx_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_frecpx_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve_fsqrt_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fsqrt_s, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_fsqrt_d, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(sve_scvt_hh, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_scvt_sh, TCG_CALL_NO_RWG,
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 7950710be7..4f0985a29e 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -3208,6 +3208,14 @@ DO_ZPZ_FP(sve_frintx_h, uint16_t, H1_2, float16_round_to_int)
 DO_ZPZ_FP(sve_frintx_s, uint32_t, H1_4, float32_round_to_int)
 DO_ZPZ_FP_D(sve_frintx_d, uint64_t, float64_round_to_int)
 
+DO_ZPZ_FP(sve_frecpx_h, uint16_t, H1_2, helper_frecpx_f16)
+DO_ZPZ_FP(sve_frecpx_s, uint32_t, H1_4, helper_frecpx_f32)
+DO_ZPZ_FP_D(sve_frecpx_d, uint64_t, helper_frecpx_f64)
+
+DO_ZPZ_FP(sve_fsqrt_h, uint16_t, H1_2, float16_sqrt)
+DO_ZPZ_FP(sve_fsqrt_s, uint32_t, H1_4, float32_sqrt)
+DO_ZPZ_FP_D(sve_fsqrt_d, uint64_t, float64_sqrt)
+
 DO_ZPZ_FP(sve_scvt_hh, uint16_t, H1_2, int16_to_float16)
 DO_ZPZ_FP(sve_scvt_sh, uint32_t, H1_4, int32_to_float16)
 DO_ZPZ_FP(sve_scvt_ss, uint32_t, H1_4, int32_to_float32)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 5f1c4984b8..f1ff033333 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3831,6 +3831,34 @@ static void trans_FRINTA(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
     do_frint_mode(s, a, float_round_ties_away);
 }
 
+static void trans_FRECPX(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3_ptr * const fns[3] = {
+        gen_helper_sve_frecpx_h,
+        gen_helper_sve_frecpx_s,
+        gen_helper_sve_frecpx_d
+    };
+    if (a->esz == 0) {
+        unallocated_encoding(s);
+    } else {
+        do_zpz_ptr(s, a->rd, a->rn, a->pg, a->esz == MO_16, fns[a->esz - 1]);
+    }
+}
+
+static void trans_FSQRT(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
+{
+    static gen_helper_gvec_3_ptr * const fns[3] = {
+        gen_helper_sve_fsqrt_h,
+        gen_helper_sve_fsqrt_s,
+        gen_helper_sve_fsqrt_d
+    };
+    if (a->esz == 0) {
+        unallocated_encoding(s);
+    } else {
+        do_zpz_ptr(s, a->rd, a->rn, a->pg, a->esz == MO_16, fns[a->esz - 1]);
+    }
+}
+
 static void trans_SCVTF_hh(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
 {
     do_zpz_ptr(s, a->rd, a->rn, a->pg, true, gen_helper_sve_scvt_hh);
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index e06c0c5279..fbd9cf1384 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -857,6 +857,10 @@ FRINTA		01100101 .. 000 100 101 ... ..... .....		@rd_pg_rn
 FRINTX		01100101 .. 000 110 101 ... ..... .....		@rd_pg_rn
 FRINTI		01100101 .. 000 111 101 ... ..... .....		@rd_pg_rn
 
+# SVE floating-point unary operations
+FRECPX		01100101 .. 001 100 101 ... ..... .....		@rd_pg_rn
+FSQRT		01100101 .. 001 101 101 ... ..... .....		@rd_pg_rn
+
 # SVE integer convert to floating-point
 SCVTF_hh	01100101 01 010 01 0 101 ... ..... .....	@rd_pg_rn_e0
 SCVTF_sh	01100101 01 010 10 0 101 ... ..... .....	@rd_pg_rn_e0
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 01/67] target/arm: Enable SVE for aarch64-linux-user
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 01/67] target/arm: Enable SVE for aarch64-linux-user Richard Henderson
@ 2018-02-22 17:28   ` Peter Maydell
  2018-02-22 19:27     ` Richard Henderson
  2018-02-23 17:00   ` Alex Bennée
  1 sibling, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2018-02-22 17:28 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Enable ARM_FEATURE_SVE for the generic "any" cpu.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/cpu.c   | 7 +++++++
>  target/arm/cpu64.c | 1 +
>  2 files changed, 8 insertions(+)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

though this should probably go at the end of the patchseries rather
than the beginning?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 02/67] target/arm: Introduce translate-a64.h
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 02/67] target/arm: Introduce translate-a64.h Richard Henderson
@ 2018-02-22 17:30   ` Peter Maydell
  2018-04-03  9:01   ` Alex Bennée
  1 sibling, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-22 17:30 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Move some stuff that will be common to both translate-a64.c
> and translate-sve.c.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 03/67] target/arm: Add SVE decode skeleton
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 03/67] target/arm: Add SVE decode skeleton Richard Henderson
@ 2018-02-22 18:00   ` Peter Maydell
  2018-02-23 11:40   ` Peter Maydell
  1 sibling, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-22 18:00 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Including only 4, as-yet unimplemented, instruction patterns
> so that the whole thing compiles.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-a64.c | 11 +++++++-
>  target/arm/translate-sve.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++
>  .gitignore                 |  1 +
>  target/arm/Makefile.objs   | 10 ++++++++
>  target/arm/sve.decode      | 45 +++++++++++++++++++++++++++++++++
>  5 files changed, 129 insertions(+), 1 deletion(-)
>  create mode 100644 target/arm/translate-sve.c
>  create mode 100644 target/arm/sve.decode

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 04/67] target/arm: Implement SVE Bitwise Logical - Unpredicated Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 04/67] target/arm: Implement SVE Bitwise Logical - Unpredicated Group Richard Henderson
@ 2018-02-22 18:04   ` Peter Maydell
  2018-02-22 19:28     ` Richard Henderson
  0 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2018-02-22 18:04 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> These were the instructions that were stubbed out when
> introducing the decode skeleton.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-sve.c | 50 +++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 43 insertions(+), 7 deletions(-)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index 2c9e4733cb..50cf2a1fdd 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -32,6 +32,10 @@
>  #include "trace-tcg.h"
>  #include "translate-a64.h"
>
> +typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
> +typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
> +                        uint32_t, uint32_t, uint32_t);

I see we already have these in translate-a64.c -- put them in
translate-a64.h ?

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/67] target/arm: Implement SVE load vector/predicate
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 05/67] target/arm: Implement SVE load vector/predicate Richard Henderson
@ 2018-02-22 18:20   ` Peter Maydell
  2018-02-22 19:31     ` Richard Henderson
  2018-04-03  9:26   ` Alex Bennée
  1 sibling, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2018-02-22 18:20 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-sve.c | 132 +++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      |  22 +++++++-
>  2 files changed, 153 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index 50cf2a1fdd..c0cccfda6f 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -46,6 +46,19 @@ typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
>   * Implement all of the translator functions referenced by the decoder.
>   */
>
> +/* Return the offset info CPUARMState of the predicate vector register Pn.
> + * Note for this purpose, FFR is P16.  */

Missing newline before */.

> +/*
> + *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
> + */
> +
> +/* Subroutine loading a vector register at VOFS of LEN bytes.
> + * The load should begin at the address Rn + IMM.
> + */
> +
> +#if UINTPTR_MAX == UINT32_MAX
> +# define ptr i32
> +#else
> +# define ptr i64
> +#endif

I think that rather than doing this we should improve the
provision that tcg/tcg-op.h has for *_ptr() wrappers, so
from the target's point of view it has a proper tcg_const_local_ptr()
and tcg_gen_brcondi_ptr(), same as for _i32, _i64 and _tl.

> +
> +static void do_ldr(DisasContext *s, uint32_t vofs, uint32_t len,
> +                   int rn, int imm)
> +{
> +    uint32_t len_align = QEMU_ALIGN_DOWN(len, 8);
> +    uint32_t len_remain = len % 8;
> +    uint32_t nparts = len / 8 + ctpop8(len_remain);
> +    int midx = get_mem_index(s);
> +    TCGv_i64 addr, t0, t1;
> +
> +    addr = tcg_temp_new_i64();
> +    t0 = tcg_temp_new_i64();
> +
> +    /* Note that unpredicated load/store of vector/predicate registers
> +     * are defined as a stream of bytes, which equates to little-endian
> +     * operations on larger quantities.  There is no nice way to force
> +     * a little-endian load for aarch64_be-linux-user out of line.
> +     *
> +     * Attempt to keep code expansion to a minimum by limiting the
> +     * amount of unrolling done.
> +     */
> +    if (nparts <= 4) {
> +        int i;
> +
> +        for (i = 0; i < len_align; i += 8) {
> +            tcg_gen_addi_i64(addr, cpu_reg_sp(s, rn), imm + i);
> +            tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LEQ);
> +            tcg_gen_st_i64(t0, cpu_env, vofs + i);
> +        }
> +    } else {
> +        TCGLabel *loop = gen_new_label();
> +        TCGv_ptr i = TCGV_NAT_TO_PTR(glue(tcg_const_local_, ptr)(0));
> +        TCGv_ptr dest;
> +
> +        gen_set_label(loop);
> +
> +        /* Minimize the number of local temps that must be re-read from
> +         * the stack each iteration.  Instead, re-compute values other
> +         * than the loop counter.
> +         */
> +        dest = tcg_temp_new_ptr();
> +        tcg_gen_addi_ptr(dest, i, imm);
> +#if UINTPTR_MAX == UINT32_MAX
> +        tcg_gen_extu_i32_i64(addr, TCGV_PTR_TO_NAT(dest));
> +        tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, rn));
> +#else
> +        tcg_gen_add_i64(addr, TCGV_PTR_TO_NAT(dest), cpu_reg_sp(s, rn));
> +#endif

Can we avoid the ifdef by creating a tcg_gen_ext_ptr_i64() (similar
to but opposite in effect to the existing tcg_gen_ext_i32_ptr()) ?

> +
> +        tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LEQ);
> +
> +        tcg_gen_add_ptr(dest, cpu_env, i);
> +        tcg_gen_addi_ptr(i, i, 8);
> +        tcg_gen_st_i64(t0, dest, vofs);
> +        tcg_temp_free_ptr(dest);
> +
> +        glue(tcg_gen_brcondi_, ptr)(TCG_COND_LTU, TCGV_PTR_TO_NAT(i),
> +                                    len_align, loop);
> +        tcg_temp_free_ptr(i);
> +    }
> +
> +    /* Predicate register loads can be any multiple of 2.
> +     * Note that we still store the entire 64-bit unit into cpu_env.
> +     */
> +    if (len_remain) {
> +        tcg_gen_addi_i64(addr, cpu_reg_sp(s, rn), imm + len_align);
> +
> +        switch (len_remain) {
> +        case 2:
> +        case 4:
> +        case 8:
> +            tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LE | ctz32(len_remain));
> +            break;
> +
> +        case 6:
> +            t1 = tcg_temp_new_i64();
> +            tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LEUL);
> +            tcg_gen_addi_i64(addr, addr, 4);
> +            tcg_gen_qemu_ld_i64(t1, addr, midx, MO_LEUW);
> +            tcg_gen_deposit_i64(t0, t0, t1, 32, 32);
> +            tcg_temp_free_i64(t1);
> +            break;
> +
> +        default:
> +            g_assert_not_reached();
> +        }
> +        tcg_gen_st_i64(t0, cpu_env, vofs + len_align);
> +    }
> +    tcg_temp_free_i64(addr);
> +    tcg_temp_free_i64(t0);
> +}
> +
> +#undef ptr

>
> +&rri           rd rn imm
>  &rrr_esz       rd rn rm esz
>
>  ###########################################################################
> @@ -31,7 +37,13 @@
>  # reduce the amount of duplication between instruction patterns.
>
>  # Three operand with unused vector element size
> -@rd_rn_rm_e0   ........ ... rm:5  ... ...  rn:5 rd:5           &rrr_esz esz=0
> +@rd_rn_rm_e0   ........ ... rm:5 ... ... rn:5 rd:5             &rrr_esz esz=0

This change looks like it should be squashed into a previous patch?

> +
> +# Basic Load/Store with 9-bit immediate offset
> +@pd_rn_i9      ........ ........ ...... rn:5 . rd:4    \
> +               &rri imm=%imm9_16_10
> +@rd_rn_i9      ........ ........ ...... rn:5 rd:5      \
> +               &rri imm=%imm9_16_10
>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 06/67] target/arm: Implement SVE predicate test
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 06/67] target/arm: Implement SVE predicate test Richard Henderson
@ 2018-02-22 18:38   ` Peter Maydell
  2018-04-03  9:16   ` Alex Bennée
  1 sibling, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-22 18:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 21 +++++++++++++
>  target/arm/helper.h        |  1 +
>  target/arm/sve_helper.c    | 77 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 62 +++++++++++++++++++++++++++++++++++++
>  target/arm/Makefile.objs   |  2 +-
>  target/arm/sve.decode      |  5 +++
>  6 files changed, 167 insertions(+), 1 deletion(-)
>  create mode 100644 target/arm/helper-sve.h
>  create mode 100644 target/arm/sve_helper.c
>
> diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
> new file mode 100644
> index 0000000000..b6e91539ae
> --- /dev/null
> +++ b/target/arm/helper-sve.h
> @@ -0,0 +1,21 @@
> +/*
> + *  AArch64 SVE specific helper definitions
> + *
> + *  Copyright (c) 2018 Linaro, Ltd
> + *

> +/*
> + *  ARM SVE Operations
> + *
> + *  Copyright (c) 2018 Linaro

I think we prefer "Linaro Limited"  (cf https://wiki.linaro.org/Copyright)


> diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
> index 9934cf1d4d..452ac6f453 100644
> --- a/target/arm/Makefile.objs
> +++ b/target/arm/Makefile.objs
> @@ -19,4 +19,4 @@ target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.decode $(DECODETREE)
>           "GEN", $(TARGET_DIR)$@)
>
>  target/arm/translate-sve.o: target/arm/decode-sve.inc.c
> -obj-$(TARGET_AARCH64) += translate-sve.o
> +obj-$(TARGET_AARCH64) += translate-sve.o sve_helper.o
> diff --git a/target/arm/sve.decode b/target/arm/sve.decode
> index 0c6a7ba34d..7efaa8fe8e 100644
> --- a/target/arm/sve.decode
> +++ b/target/arm/sve.decode
> @@ -56,6 +56,11 @@ ORR_zzz              00000100 01 1 ..... 001 100 ..... .....         @rd_rn_rm_e0
>  EOR_zzz                00000100 10 1 ..... 001 100 ..... .....         @rd_rn_rm_e0
>  BIC_zzz                00000100 11 1 ..... 001 100 ..... .....         @rd_rn_rm_e0
>
> +### SVE Predicate Misc Group
> +
> +# SVE predicate test
> +PTEST          00100101 01010000 11 pg:4 0 rn:4 00000

Shouldn't this be "0 1 01000011" instead of "01010000 11"
(just a spacing change)? Bits 22 and 23 are op and S, so spacing
it like that makes it easier to compare against the encoding diagram.


Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 07/67] target/arm: Implement SVE Predicate Logical Operations Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 07/67] target/arm: Implement SVE Predicate Logical Operations Group Richard Henderson
@ 2018-02-22 18:55   ` Peter Maydell
  2018-02-22 19:37     ` Richard Henderson
  0 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2018-02-22 18:55 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

> -void trans_PTEST(DisasContext *s, arg_PTEST *a, uint32_t insn)
> +static void trans_PTEST(DisasContext *s, arg_PTEST *a, uint32_t insn)
>  {

Should this be in a previous patch?

>      int nofs = pred_full_reg_offset(s, a->rn);
>      int gofs = pred_full_reg_offset(s, a->pg);
> diff --git a/target/arm/sve.decode b/target/arm/sve.decode
> index 7efaa8fe8e..d92886127a 100644
> --- a/target/arm/sve.decode
> +++ b/target/arm/sve.decode
> @@ -31,6 +31,7 @@
>
>  &rri           rd rn imm
>  &rrr_esz       rd rn rm esz
> +&rprr_s                rd pg rn rm s
>
>  ###########################################################################
>  # Named instruction formats.  These are generally used to
> @@ -39,6 +40,9 @@
>  # Three operand with unused vector element size
>  @rd_rn_rm_e0   ........ ... rm:5 ... ... rn:5 rd:5             &rrr_esz esz=0
>
> +# Three prediate operand, with governing predicate, flag setting

Three what?

> +@pd_pg_pn_pm_s ........ . s:1 .. rm:4 .. pg:4 . rn:4 . rd:4    &rprr_s
> +
>  # Basic Load/Store with 9-bit immediate offset
>  @pd_rn_i9      ........ ........ ...... rn:5 . rd:4    \
>                 &rri imm=%imm9_16_10
> @@ -56,6 +60,18 @@ ORR_zzz              00000100 01 1 ..... 001 100 ..... .....         @rd_rn_rm_e0
>  EOR_zzz                00000100 10 1 ..... 001 100 ..... .....         @rd_rn_rm_e0
>  BIC_zzz                00000100 11 1 ..... 001 100 ..... .....         @rd_rn_rm_e0
>
> +### SVE Predicate Logical Operations Group
> +
> +# SVE predicate logical operations
> +AND_pppp       00100101 0. 00 .... 01 .... 0 .... 0 ....       @pd_pg_pn_pm_s
> +BIC_pppp       00100101 0. 00 .... 01 .... 0 .... 1 ....       @pd_pg_pn_pm_s
> +EOR_pppp       00100101 0. 00 .... 01 .... 1 .... 0 ....       @pd_pg_pn_pm_s
> +SEL_pppp       00100101 0. 00 .... 01 .... 1 .... 1 ....       @pd_pg_pn_pm_s
> +ORR_pppp       00100101 1. 00 .... 01 .... 0 .... 0 ....       @pd_pg_pn_pm_s
> +ORN_pppp       00100101 1. 00 .... 01 .... 0 .... 1 ....       @pd_pg_pn_pm_s
> +NOR_pppp       00100101 1. 00 .... 01 .... 1 .... 0 ....       @pd_pg_pn_pm_s
> +NAND_pppp      00100101 1. 00 .... 01 .... 1 .... 1 ....       @pd_pg_pn_pm_s
> +
>  ### SVE Predicate Misc Group

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 01/67] target/arm: Enable SVE for aarch64-linux-user
  2018-02-22 17:28   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
@ 2018-02-22 19:27     ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-22 19:27 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 02/22/2018 09:28 AM, Peter Maydell wrote:
> On 17 February 2018 at 18:22, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> Enable ARM_FEATURE_SVE for the generic "any" cpu.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  target/arm/cpu.c   | 7 +++++++
>>  target/arm/cpu64.c | 1 +
>>  2 files changed, 8 insertions(+)
> 
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> 
> though this should probably go at the end of the patchseries rather
> than the beginning?

Yes, but of course I need it at the beginning for testing.
I'll sort it to the end for the final version.


r~

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 04/67] target/arm: Implement SVE Bitwise Logical - Unpredicated Group
  2018-02-22 18:04   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
@ 2018-02-22 19:28     ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-22 19:28 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 02/22/2018 10:04 AM, Peter Maydell wrote:
> On 17 February 2018 at 18:22, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> These were the instructions that were stubbed out when
>> introducing the decode skeleton.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  target/arm/translate-sve.c | 50 +++++++++++++++++++++++++++++++++++++++-------
>>  1 file changed, 43 insertions(+), 7 deletions(-)
>>
>> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
>> index 2c9e4733cb..50cf2a1fdd 100644
>> --- a/target/arm/translate-sve.c
>> +++ b/target/arm/translate-sve.c
>> @@ -32,6 +32,10 @@
>>  #include "trace-tcg.h"
>>  #include "translate-a64.h"
>>
>> +typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
>> +typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
>> +                        uint32_t, uint32_t, uint32_t);
> 
> I see we already have these in translate-a64.c -- put them in
> translate-a64.h ?

Good plan, thanks.


r~

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/67] target/arm: Implement SVE load vector/predicate
  2018-02-22 18:20   ` Peter Maydell
@ 2018-02-22 19:31     ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-22 19:31 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 02/22/2018 10:20 AM, Peter Maydell wrote:
> On 17 February 2018 at 18:22, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  target/arm/translate-sve.c | 132 +++++++++++++++++++++++++++++++++++++++++++++
>>  target/arm/sve.decode      |  22 +++++++-
>>  2 files changed, 153 insertions(+), 1 deletion(-)
>>
>> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
>> index 50cf2a1fdd..c0cccfda6f 100644
>> --- a/target/arm/translate-sve.c
>> +++ b/target/arm/translate-sve.c
>> @@ -46,6 +46,19 @@ typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
>>   * Implement all of the translator functions referenced by the decoder.
>>   */
>>
>> +/* Return the offset info CPUARMState of the predicate vector register Pn.
>> + * Note for this purpose, FFR is P16.  */
> 
> Missing newline before */.
> 
>> +/*
>> + *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
>> + */
>> +
>> +/* Subroutine loading a vector register at VOFS of LEN bytes.
>> + * The load should begin at the address Rn + IMM.
>> + */
>> +
>> +#if UINTPTR_MAX == UINT32_MAX
>> +# define ptr i32
>> +#else
>> +# define ptr i64
>> +#endif
> 
> I think that rather than doing this we should improve the
> provision that tcg/tcg-op.h has for *_ptr() wrappers, so
> from the target's point of view it has a proper tcg_const_local_ptr()
> and tcg_gen_brcondi_ptr(), same as for _i32, _i64 and _tl.
> 
>> +
>> +static void do_ldr(DisasContext *s, uint32_t vofs, uint32_t len,
>> +                   int rn, int imm)
>> +{
>> +    uint32_t len_align = QEMU_ALIGN_DOWN(len, 8);
>> +    uint32_t len_remain = len % 8;
>> +    uint32_t nparts = len / 8 + ctpop8(len_remain);
>> +    int midx = get_mem_index(s);
>> +    TCGv_i64 addr, t0, t1;
>> +
>> +    addr = tcg_temp_new_i64();
>> +    t0 = tcg_temp_new_i64();
>> +
>> +    /* Note that unpredicated load/store of vector/predicate registers
>> +     * are defined as a stream of bytes, which equates to little-endian
>> +     * operations on larger quantities.  There is no nice way to force
>> +     * a little-endian load for aarch64_be-linux-user out of line.
>> +     *
>> +     * Attempt to keep code expansion to a minimum by limiting the
>> +     * amount of unrolling done.
>> +     */
>> +    if (nparts <= 4) {
>> +        int i;
>> +
>> +        for (i = 0; i < len_align; i += 8) {
>> +            tcg_gen_addi_i64(addr, cpu_reg_sp(s, rn), imm + i);
>> +            tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LEQ);
>> +            tcg_gen_st_i64(t0, cpu_env, vofs + i);
>> +        }
>> +    } else {
>> +        TCGLabel *loop = gen_new_label();
>> +        TCGv_ptr i = TCGV_NAT_TO_PTR(glue(tcg_const_local_, ptr)(0));
>> +        TCGv_ptr dest;
>> +
>> +        gen_set_label(loop);
>> +
>> +        /* Minimize the number of local temps that must be re-read from
>> +         * the stack each iteration.  Instead, re-compute values other
>> +         * than the loop counter.
>> +         */
>> +        dest = tcg_temp_new_ptr();
>> +        tcg_gen_addi_ptr(dest, i, imm);
>> +#if UINTPTR_MAX == UINT32_MAX
>> +        tcg_gen_extu_i32_i64(addr, TCGV_PTR_TO_NAT(dest));
>> +        tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, rn));
>> +#else
>> +        tcg_gen_add_i64(addr, TCGV_PTR_TO_NAT(dest), cpu_reg_sp(s, rn));
>> +#endif
> 
> Can we avoid the ifdef by creating a tcg_gen_ext_ptr_i64() (similar
> to but opposite in effect to the existing tcg_gen_ext_i32_ptr()) ?

Ok, I will improve tcg.h as necessary for better support of TCGv_ptr.

>> -@rd_rn_rm_e0   ........ ... rm:5  ... ...  rn:5 rd:5           &rrr_esz esz=0
>> +@rd_rn_rm_e0   ........ ... rm:5 ... ... rn:5 rd:5             &rrr_esz esz=0
> 
> This change looks like it should be squashed into a previous patch?

Ho hum, I thought I got all of these.  I'll take another look through.


r~

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 07/67] target/arm: Implement SVE Predicate Logical Operations Group
  2018-02-22 18:55   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
@ 2018-02-22 19:37     ` Richard Henderson
  2018-02-23  9:56       ` Peter Maydell
  0 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-22 19:37 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 02/22/2018 10:55 AM, Peter Maydell wrote:
>> +# Three prediate operand, with governing predicate, flag setting
> 
> Three what?

Feh, typo for predicate.  But more verbosely,

Three operands that are predicates, plus another predicate operand which
"governs" the operation (I believe that's the language from the ARM), plus an
"s" bit that controls whether the flags are set.


r~

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 07/67] target/arm: Implement SVE Predicate Logical Operations Group
  2018-02-22 19:37     ` Richard Henderson
@ 2018-02-23  9:56       ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23  9:56 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 22 February 2018 at 19:37, Richard Henderson
<richard.henderson@linaro.org> wrote:
> On 02/22/2018 10:55 AM, Peter Maydell wrote:
>>> +# Three prediate operand, with governing predicate, flag setting
>>
>> Three what?
>
> Feh, typo for predicate.

Oh, right -- I'd thought it might be some mashup/typo of something-immediate.

-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 08/67] target/arm: Implement SVE Predicate Misc Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 08/67] target/arm: Implement SVE Predicate Misc Group Richard Henderson
@ 2018-02-23 11:22   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 11:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/cpu.h           |   3 +
>  target/arm/helper-sve.h    |   3 +
>  target/arm/sve_helper.c    |  86 +++++++++++++++++++++++-
>  target/arm/translate-sve.c | 163 ++++++++++++++++++++++++++++++++++++++++++++-
>  target/arm/sve.decode      |  41 ++++++++++++
>  5 files changed, 293 insertions(+), 3 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 8befe43a01..27f395183b 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -2915,4 +2915,7 @@ static inline uint64_t *aa64_vfp_qreg(CPUARMState *env, unsigned regno)
>      return &env->vfp.zregs[regno].d[0];
>  }
>
> +/* Shared between translate-sve.c and sve_helper.c.  */
> +extern const uint64_t pred_esz_masks[4];
> +
>  #endif
> diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
> index 57adc4d912..0c04afff8c 100644
> --- a/target/arm/helper-sve.h
> +++ b/target/arm/helper-sve.h
> @@ -20,6 +20,9 @@
>  DEF_HELPER_FLAGS_2(sve_predtest1, TCG_CALL_NO_WG, i32, i64, i64)
>  DEF_HELPER_FLAGS_3(sve_predtest, TCG_CALL_NO_WG, i32, ptr, ptr, i32)
>
> +DEF_HELPER_FLAGS_3(sve_pfirst, TCG_CALL_NO_WG, i32, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_3(sve_pnext, TCG_CALL_NO_WG, i32, ptr, ptr, i32)
> +
>  DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index b63e7cc90e..cee7d9bcf6 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -39,7 +39,7 @@
>
>  static uint32_t iter_predtest_fwd(uint64_t d, uint64_t g, uint32_t flags)
>  {
> -    if (g) {
> +    if (likely(g)) {
>          /* Compute N from first D & G.
>             Use bit 2 to signal first G bit seen.  */
>          if (!(flags & 4)) {

Belongs in different patch ?

> @@ -114,3 +114,87 @@ LOGICAL_PPPP(sve_nand_pppp, DO_NAND)
>  #undef DO_NAND
>  #undef DO_SEL
>  #undef LOGICAL_PPPP
> +
> +/* Similar to the ARM LastActiveElement pseudocode function, except the
> +   result is multiplied by the element size.  This includes the not found
> +   indication; e.g. not found for esz=3 is -8.  */

Can we stick to the usual format for multiline comments, please?
(various examples here and elsewhere in the patchset). I know that
over the whole codebase we're a bit variable, but I think this is
the most common arrangement and it's definitely the one we use
in target/arm with perhaps the odd ancient comment as an exception.

/* line 1
 * line 2
 */


> +static void trans_PTRUE(DisasContext *s, arg_PTRUE *a, uint32_t insn)
> +{
> +    unsigned fullsz = vec_full_reg_size(s);
> +    unsigned ofs = pred_full_reg_offset(s, a->rd);
> +    unsigned numelem, setsz, i;
> +    uint64_t word, lastword;
> +    TCGv_i64 t;

A comment somewhere here about the way this code handles
the instructions that aren't PTRUE would be helpful I think
(specifically that a->pat is 32 for PFALSE and a->rd is
16 for SETFFR).

> +
> +    numelem = decode_pred_count(fullsz, a->pat, a->esz);
> +
> +    /* Determine what we must store into each bit, and how many.  */
> +    if (numelem == 0) {
> +        lastword = word = 0;
> +        setsz = fullsz;
> +    } else {
> +        setsz = numelem << a->esz;
> +        lastword = word = pred_esz_masks[a->esz];
> +        if (setsz % 64) {
> +            lastword &= ~(-1ull << (setsz % 64));
> +        }
> +    }
> +

>  ###########################################################################
>  # Named instruction formats.  These are generally used to
>  # reduce the amount of duplication between instruction patterns.
>
> +# Two operand with unused vector element size
> +@pd_pn_e0      ........ ........ ....... rn:4 . rd:4           &rr_esz esz=0
> +
> +# Two operand
> +@pd_pn         ........ esz:2 .. .... ....... rn:4 . rd:4      &rr_esz
> +
>  # Three operand with unused vector element size
>  @rd_rn_rm_e0   ........ ... rm:5 ... ... rn:5 rd:5             &rrr_esz esz=0
>
> @@ -77,6 +87,37 @@ NAND_pppp    00100101 1. 00 .... 01 .... 1 .... 1 ....       @pd_pg_pn_pm_s
>  # SVE predicate test
>  PTEST          00100101 01010000 11 pg:4 0 rn:4 00000
>
> +# SVE predicate initialize
> +PTRUE          00100101 esz:2 01100 s:1 111000 pat:5 0 rd:4    &ptrue
> +
> +# SVE initialize FFR (SETFFR)
> +PTRUE          00100101 0010 1100 1001 0000 0000 0000 \
> +               &ptrue rd=16 esz=0 pat=31 s=0

I found this very confusing at first, because the leftmost column
looks like it's the instruction name, and thus a copy-and-paste
error. I think it would be easier to read if we gave it a name
that indicates that it's dealing with a group of instructions
rather than only PTRUE.

> +
> +# SVE zero predicate register (PFALSE)
> +# Note that pat=32 is outside of the natural 0..31, and will
> +# always hit the default #uimm5 case of decode_pred_count.
> +PTRUE          00100101 0001 1000 1110 0100 0000 rd:4 \
> +               &ptrue esz=0 pat=32 s=0
> +
> +# SVE predicate read from FFR (predicated) (RDFFR)
> +ORR_pppp       00100101 0 s:1 0110001111000 pg:4 0 rd:4 \
> +               &rprr_s rn=16 rm=16
> +
> +# SVE predicate read from FFR (unpredicated) (RDFFR)
> +ORR_pppp       00100101 0001 1001 1111 0000 0000 rd:4 \
> +               &rprr_s rn=16 rm=16 pg=16 s=0
> +
> +# SVE FFR write from predicate (WRFFR)
> +ORR_pppp       00100101 0010 1000 1001 000 rn:4 00000 \
> +               &rprr_s rd=16 rm=%preg4_5 pg=%preg4_5 s=0

> +
> +# SVE predicate first active
> +PFIRST         00100101 01 011 000 11000 00 .... 0 ....        @pd_pn_e0
> +
> +# SVE predicate next active
> +PNEXT          00100101 .. 011 001 11000 10 .... 0 ....        @pd_pn
> +
>  ### SVE Memory - 32-bit Gather and Unsized Contiguous Group
>
>  # SVE load predicate register
> --
> 2.14.3

Otherwise

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/67] target/arm: Implement SVE Integer Binary Arithmetic - Predicated Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 09/67] target/arm: Implement SVE Integer Binary Arithmetic - Predicated Group Richard Henderson
@ 2018-02-23 11:35   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 11:35 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 145 +++++++++++++++++++++++++++++++++
>  target/arm/sve_helper.c    | 196 ++++++++++++++++++++++++++++++++++++++++++++-
>  target/arm/translate-sve.c |  65 +++++++++++++++
>  target/arm/sve.decode      |  42 ++++++++++
>  4 files changed, 447 insertions(+), 1 deletion(-)
>

> @@ -105,7 +121,7 @@ LOGICAL_PPPP(sve_orn_pppp, DO_ORN)
>  LOGICAL_PPPP(sve_nor_pppp, DO_NOR)
>  LOGICAL_PPPP(sve_nand_pppp, DO_NAND)
>
> -#undef DO_ADD
> +#undef DO_AND

Should this be in a previous patch?

>  #undef DO_BIC
>  #undef DO_EOR
>  #undef DO_ORR

> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index a9b6ae046d..116002792a 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -211,6 +211,71 @@ static void trans_BIC_zzz(DisasContext *s, arg_BIC_zzz *a, uint32_t insn)
>      do_vector3_z(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm);
>  }
>
> +/*
> + *** SVE Integer Arithmetic - Binary Predicated Group
> + */
> +
> +static void do_zpzz_ool(DisasContext *s, arg_rprr_esz *a, gen_helper_gvec_4 *fn)
> +{
> +    unsigned vsz = vec_full_reg_size(s);
> +    if (fn == NULL) {
> +        unallocated_encoding(s);
> +        return;
> +    }

I think you do not want to be catching unallocated encodings
this late in the decode process. We have to identify all
the unallocated encodings before we do the "are SVE and
FP instructions supposed to trap" tests, because those don't
apply to unallocated encodings.

> +    tcg_gen_gvec_4_ool(vec_full_reg_offset(s, a->rd),
> +                       vec_full_reg_offset(s, a->rn),
> +                       vec_full_reg_offset(s, a->rm),
> +                       pred_full_reg_offset(s, a->pg),
> +                       vsz, vsz, 0, fn);
> +}

Rest of patch looks OK.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 03/67] target/arm: Add SVE decode skeleton
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 03/67] target/arm: Add SVE decode skeleton Richard Henderson
  2018-02-22 18:00   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
@ 2018-02-23 11:40   ` Peter Maydell
  2018-02-23 11:43     ` Peter Maydell
  1 sibling, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 11:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Including only 4, as-yet unimplemented, instruction patterns
> so that the whole thing compiles.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-a64.c | 11 +++++++-
>  target/arm/translate-sve.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++
>  .gitignore                 |  1 +
>  target/arm/Makefile.objs   | 10 ++++++++
>  target/arm/sve.decode      | 45 +++++++++++++++++++++++++++++++++
>  5 files changed, 129 insertions(+), 1 deletion(-)
>  create mode 100644 target/arm/translate-sve.c
>  create mode 100644 target/arm/sve.decode
>
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index e0e7ebf68c..a50fef98af 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -12772,9 +12772,18 @@ static void disas_a64_insn(CPUARMState *env, DisasContext *s)
>      s->fp_access_checked = false;
>
>      switch (extract32(insn, 25, 4)) {
> -    case 0x0: case 0x1: case 0x2: case 0x3: /* UNALLOCATED */
> +    case 0x0: case 0x1: case 0x3: /* UNALLOCATED */
>          unallocated_encoding(s);
>          break;
> +    case 0x2:
> +        if (!arm_dc_feature(s, ARM_FEATURE_SVE)) {
> +            unallocated_encoding(s);
> +        } else if (!sve_access_check(s) || !fp_access_check(s)) {
> +            /* exception raised */
> +        } else if (!disas_sve(s, insn)) {
> +            unallocated_encoding(s);
> +        }
> +        break;

I realized while working through the rest of the series that this is
too early to do the sve_access_check() and fp_access_check(). Those
only apply to instructions which actually exist, so we mustn't
do the checks until after we've dealt with all the unallocated_encoding()
cases. I think you need to push them down inside disas_sve() somehow.
See also my comments on patch 8.

(We get this wrong for current AArch32 VFP and Neon, but correct
for all of AArch64.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 03/67] target/arm: Add SVE decode skeleton
  2018-02-23 11:40   ` Peter Maydell
@ 2018-02-23 11:43     ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 11:43 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 23 February 2018 at 11:40, Peter Maydell <peter.maydell@linaro.org> wrote:
> I realized while working through the rest of the series that this is
> too early to do the sve_access_check() and fp_access_check(). Those
> only apply to instructions which actually exist, so we mustn't
> do the checks until after we've dealt with all the unallocated_encoding()
> cases. I think you need to push them down inside disas_sve() somehow.
> See also my comments on patch 8.

...I meant "patch 9".

-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 10/67] target/arm: Implement SVE Integer Reduction Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 10/67] target/arm: Implement SVE Integer Reduction Group Richard Henderson
@ 2018-02-23 11:50   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 11:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Excepting MOVPRFX, which isn't a reduction.  Presumably it is
> placed within the group because of its encoding.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>


> @@ -306,8 +399,6 @@ DO_ZPZZ_D(sve_udiv_zpzz_d, uint64_t, DO_DIV)
>  #undef DO_ABD
>  #undef DO_MUL
>  #undef DO_DIV
> -#undef DO_ZPZZ
> -#undef DO_ZPZZ_D
>
>  /* Similar to the ARM LastActiveElement pseudocode function, except the
>     result is multiplied by the element size.  This includes the not found

Hunk in wrong patch or incorrect?

> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index 116002792a..49251a53c1 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -276,6 +276,71 @@ void trans_UDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
>
>  #undef DO_ZPZZ
>
> +/*
> + *** SVE Integer Reduction Group
> + */
> +
> +typedef void gen_helper_gvec_reduc(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_i32);
> +static void do_vpz_ool(DisasContext *s, arg_rpr_esz *a,
> +                       gen_helper_gvec_reduc *fn)
> +{
> +    unsigned vsz = vec_full_reg_size(s);
> +    TCGv_ptr t_zn, t_pg;
> +    TCGv_i32 desc;
> +    TCGv_i64 temp;
> +
> +    if (fn == 0) {
> +        unallocated_encoding(s);
> +        return;
> +    }

Same remarks as for patch 9 about this being too late to
catch unallocated_encodings (or alternatively needing to
do the sve/fp check after this). I won't bother to mention
this issue for later patches, but you can assume it as a
general caveat to my reviewed-by tags.

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 11/67] target/arm: Implement SVE bitwise shift by immediate (predicated)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 11/67] target/arm: Implement SVE bitwise shift by immediate (predicated) Richard Henderson
@ 2018-02-23 12:03   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 12:03 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  25 +++++
>  target/arm/sve_helper.c    | 265 +++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 128 ++++++++++++++++++++++
>  target/arm/sve.decode      |  29 ++++-
>  4 files changed, 445 insertions(+), 2 deletions(-)

>
> +/*
> + * Helpers for extracting complex instruction fields.
> + */
> +
> +/* See e.g. ASL (immediate, predicated).

Typo for "ASR", I guess ?

> + * Returns -1 for unallocated encoding; diagnose later.
> + */
> +static int tszimm_esz(int x)
> +{
> +    x >>= 3;  /* discard imm3 */
> +    return 31 - clz32(x);
> +}
> +
> +static int tszimm_shr(int x)
> +{
> +    return (16 << tszimm_esz(x)) - x;
> +}
> +
> +/* See e.g. LSL (immediate, predicated).  */
> +static int tszimm_shl(int x)
> +{
> +    return x - (8 << tszimm_esz(x));
> +}

> --- a/target/arm/sve.decode
> +++ b/target/arm/sve.decode
> @@ -22,12 +22,20 @@
>  ###########################################################################
>  # Named fields.  These are primarily for disjoint fields.
>
> +%imm6_22_5     22:1 5:5
>  %imm9_16_10    16:s6 10:3
>  %preg4_5       5:4
>
> +# A combination of tsz:imm3 -- extract esize.
> +%tszimm_esz    22:2 5:5 !function=tszimm_esz
> +# A combination of tsz:imm3 -- extract (2 * esize) - (tsz:imm3)
> +%tszimm_shr    22:2 5:5 !function=tszimm_shr
> +# A combination of tsz:imm3 -- extract (tsz:imm3) - esize
> +%tszimm_shl    22:2 5:5 !function=tszimm_shl
> +
>  # Either a copy of rd (at bit 0), or a different source
>  # as propagated via the MOVPRFX instruction.
> -%reg_movprfx           0:5
> +%reg_movprfx   0:5

Squash into relevant previous patch.

>  ###########################################################################
>  # Named attribute sets.  These are used to make nice(er) names
> @@ -40,7 +48,7 @@
>  &rpr_esz       rd pg rn esz
>  &rprr_s                rd pg rn rm s
>  &rprr_esz      rd pg rn rm esz
> -
> +&rpri_esz      rd pg rn imm esz

Should either not delete the blank line, or don't add it in the first place.

>  &ptrue         rd esz pat s
>
>  ###########################################################################
> @@ -68,6 +76,11 @@
>  # One register operand, with governing predicate, vector element size
>  @rd_pg_rn      ........ esz:2 ... ... ... pg:3 rn:5 rd:5       &rpr_esz
>
> +# Two register operand, one immediate operand, with predicate,
> +# element size encoded as TSZHL.  User must fill in imm.
> +@rdn_pg_tszimm ........ .. ... ... ... pg:3 ..... rd:5 \
> +               &rpri_esz rn=%reg_movprfx esz=%tszimm_esz
> +
>  # Basic Load/Store with 9-bit immediate offset
>  @pd_rn_i9      ........ ........ ...... rn:5 . rd:4    \
>                 &rri imm=%imm9_16_10
> @@ -126,6 +139,18 @@ UMAXV              00000100 .. 001 001 001 ... ..... .....         @rd_pg_rn
>  SMINV          00000100 .. 001 010 001 ... ..... .....         @rd_pg_rn
>  UMINV          00000100 .. 001 011 001 ... ..... .....         @rd_pg_rn
>
> +### SVE Shift by Immediate - Predicated Group
> +
> +# SVE bitwise shift by immediate (predicated)
> +ASR_zpzi       00000100 .. 000 000 100 ... .. ... ..... \
> +               @rdn_pg_tszimm imm=%tszimm_shr
> +LSR_zpzi       00000100 .. 000 001 100 ... .. ... ..... \
> +               @rdn_pg_tszimm imm=%tszimm_shr
> +LSL_zpzi       00000100 .. 000 011 100 ... .. ... ..... \
> +               @rdn_pg_tszimm imm=%tszimm_shl
> +ASRD           00000100 .. 000 100 100 ... .. ... ..... \
> +               @rdn_pg_tszimm imm=%tszimm_shr
> +
>  ### SVE Logical - Unpredicated Group
>
>  # SVE bitwise logical operations (unpredicated)
> --
> 2.14.3

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 12/67] target/arm: Implement SVE bitwise shift by vector (predicated)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 12/67] target/arm: Implement SVE bitwise shift by vector (predicated) Richard Henderson
@ 2018-02-23 12:50   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 12:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 13/67] target/arm: Implement SVE bitwise shift by wide elements (predicated)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 13/67] target/arm: Implement SVE bitwise shift by wide elements (predicated) Richard Henderson
@ 2018-02-23 12:57   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 12:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 21 +++++++++++++++++++++
>  target/arm/sve_helper.c    | 35 +++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 25 +++++++++++++++++++++++++
>  target/arm/sve.decode      |  6 ++++++
>  4 files changed, 87 insertions(+)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/67] target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 14/67] target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group Richard Henderson
@ 2018-02-23 13:08   ` Peter Maydell
  2018-02-23 17:25     ` Richard Henderson
  0 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 13:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  60 +++++++++++++++++++++
>  target/arm/sve_helper.c    | 127 +++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 111 +++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      |  23 ++++++++
>  4 files changed, 321 insertions(+)

> diff --git a/target/arm/sve.decode b/target/arm/sve.decode
> index 177f338fed..b875501475 100644
> --- a/target/arm/sve.decode
> +++ b/target/arm/sve.decode
> @@ -165,6 +165,29 @@ ASR_zpzw   00000100 .. 011 000 100 ... ..... .....         @rdn_pg_rm
>  LSR_zpzw       00000100 .. 011 001 100 ... ..... .....         @rdn_pg_rm
>  LSL_zpzw       00000100 .. 011 011 100 ... ..... .....         @rdn_pg_rm
>
> +### SVE Integer Arithmetic - Unary Predicated Group
> +
> +# SVE unary bit operations (predicated)
> +# Note esz != 0 for FABS and FNEG.
> +CLS            00000100 .. 011 000 101 ... ..... .....         @rd_pg_rn
> +CLZ            00000100 .. 011 001 101 ... ..... .....         @rd_pg_rn
> +CNT_zpz                00000100 .. 011 010 101 ... ..... .....         @rd_pg_rn
> +CNOT           00000100 .. 011 011 101 ... ..... .....         @rd_pg_rn
> +NOT_zpz                00000100 .. 011 110 101 ... ..... .....         @rd_pg_rn
> +FABS           00000100 .. 011 100 101 ... ..... .....         @rd_pg_rn
> +FNEG           00000100 .. 011 101 101 ... ..... .....         @rd_pg_rn

Indentation seems to be a bit skew for the _zpz lines.

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 15/67] target/arm: Implement SVE Integer Multiply-Add Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 15/67] target/arm: Implement SVE Integer Multiply-Add Group Richard Henderson
@ 2018-02-23 13:12   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 13:12 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 18 ++++++++++++++
>  target/arm/sve_helper.c    | 58 +++++++++++++++++++++++++++++++++++++++++++++-
>  target/arm/translate-sve.c | 31 +++++++++++++++++++++++++
>  target/arm/sve.decode      | 17 ++++++++++++++
>  4 files changed, 123 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
> index 11644125d1..b31d497f31 100644
> --- a/target/arm/helper-sve.h
> +++ b/target/arm/helper-sve.h
> @@ -345,6 +345,24 @@ DEF_HELPER_FLAGS_4(sve_neg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_4(sve_neg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_4(sve_neg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
>
> +DEF_HELPER_FLAGS_6(sve_mla_b, TCG_CALL_NO_RWG,
> +                   void, ptr, ptr, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_6(sve_mla_h, TCG_CALL_NO_RWG,
> +                   void, ptr, ptr, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_6(sve_mla_s, TCG_CALL_NO_RWG,
> +                   void, ptr, ptr, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_6(sve_mla_d, TCG_CALL_NO_RWG,
> +                   void, ptr, ptr, ptr, ptr, ptr, i32)
> +
> +DEF_HELPER_FLAGS_6(sve_mls_b, TCG_CALL_NO_RWG,
> +                   void, ptr, ptr, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_6(sve_mls_h, TCG_CALL_NO_RWG,
> +                   void, ptr, ptr, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_6(sve_mls_s, TCG_CALL_NO_RWG,
> +                   void, ptr, ptr, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_6(sve_mls_d, TCG_CALL_NO_RWG,
> +                   void, ptr, ptr, ptr, ptr, ptr, i32)
> +
>  DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index e11823a727..4b08a38ce8 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -932,6 +932,62 @@ DO_ZPZI_D(sve_asrd_d, int64_t, DO_ASRD)
>  #undef DO_SHR
>  #undef DO_SHL
>  #undef DO_ASRD
> -
>  #undef DO_ZPZI
>  #undef DO_ZPZI_D

Deletion of blank line should be dropped or squashed into earlier patch.

Otherwise

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 16/67] target/arm: Implement SVE Integer Arithmetic - Unpredicated Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 16/67] target/arm: Implement SVE Integer Arithmetic - Unpredicated Group Richard Henderson
@ 2018-02-23 13:16   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 13:16 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-sve.c | 41 ++++++++++++++++++++++++++++++++++++++---
>  target/arm/sve.decode      | 13 +++++++++++++
>  2 files changed, 51 insertions(+), 3 deletions(-)

> @@ -254,7 +288,8 @@ static void do_zpzz_ool(DisasContext *s, arg_rprr_esz *a, gen_helper_gvec_4 *fn)
>  }
>
>  #define DO_ZPZZ(NAME, name) \
> -void trans_##NAME##_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn) \
> +static void trans_##NAME##_zpzz(DisasContext *s, arg_rprr_esz *a,         \
> +                                uint32_t insn)                            \
>  {                                                                         \
>      static gen_helper_gvec_4 * const fns[4] = {                           \
>          gen_helper_sve_##name##_zpzz_b, gen_helper_sve_##name##_zpzz_h,   \
> @@ -286,7 +321,7 @@ DO_ZPZZ(ASR, asr)
>  DO_ZPZZ(LSR, lsr)
>  DO_ZPZZ(LSL, lsl)
>
> -void trans_SDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
> +static void trans_SDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
>  {
>      static gen_helper_gvec_4 * const fns[4] = {
>          NULL, NULL, gen_helper_sve_sdiv_zpzz_s, gen_helper_sve_sdiv_zpzz_d
> @@ -294,7 +329,7 @@ void trans_SDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
>      do_zpzz_ool(s, a, fns[a->esz]);
>  }
>
> -void trans_UDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
> +static void trans_UDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
>  {
>      static gen_helper_gvec_4 * const fns[4] = {
>          NULL, NULL, gen_helper_sve_udiv_zpzz_s, gen_helper_sve_udiv_zpzz_d

Should these changes to 'static' have been in a different patch, or was
that to avoid compiler warnings when the functions were introduced but not
used til this patch?

> diff --git a/target/arm/sve.decode b/target/arm/sve.decode
> index 68a1823b72..b40d7dc9a2 100644
> --- a/target/arm/sve.decode
> +++ b/target/arm/sve.decode
> @@ -68,6 +68,9 @@
>  # Three prediate operand, with governing predicate, flag setting
>  @pd_pg_pn_pm_s ........ . s:1 .. rm:4 .. pg:4 . rn:4 . rd:4    &rprr_s
>
> +# Three operand, vector element size
> +@rd_rn_rm      ........ esz:2 . rm:5  ... ...  rn:5 rd:5       &rrr_esz
> +
>  # Two register operand, with governing predicate, vector element size
>  @rdn_pg_rm     ........ esz:2 ... ... ... pg:3 rm:5 rd:5 \
>                 &rprr_esz rn=%reg_movprfx
> @@ -205,6 +208,16 @@ MLS                00000100 .. 0 ..... 011 ... ..... .....   @rda_pg_rn_rm
>  MLA            00000100 .. 0 ..... 110 ... ..... .....   @rdn_pg_ra_rm # MAD
>  MLS            00000100 .. 0 ..... 111 ... ..... .....   @rdn_pg_ra_rm # MSB
>
> +### SVE Integer Arithmetic - Unpredicated Group
> +
> +# SVE integer add/subtract vectors (unpredicated)
> +ADD_zzz                00000100 .. 1 ..... 000 000 ..... .....         @rd_rn_rm
> +SUB_zzz                00000100 .. 1 ..... 000 001 ..... .....         @rd_rn_rm
> +SQADD_zzz      00000100 .. 1 ..... 000 100 ..... .....         @rd_rn_rm
> +UQADD_zzz      00000100 .. 1 ..... 000 101 ..... .....         @rd_rn_rm
> +SQSUB_zzz      00000100 .. 1 ..... 000 110 ..... .....         @rd_rn_rm
> +UQSUB_zzz      00000100 .. 1 ..... 000 111 ..... .....         @rd_rn_rm

Misaligned lines for ADD and SUB.

Otherwise

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 17/67] target/arm: Implement SVE Index Generation Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 17/67] target/arm: Implement SVE Index Generation Group Richard Henderson
@ 2018-02-23 13:22   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 13:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  5 ++++
>  target/arm/sve_helper.c    | 40 +++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      | 14 ++++++++++
>  4 files changed, 126 insertions(+)
>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 18/67] target/arm: Implement SVE Stack Allocation Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 18/67] target/arm: Implement SVE Stack Allocation Group Richard Henderson
@ 2018-02-23 13:25   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 13:25 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-sve.c | 24 ++++++++++++++++++++++++
>  target/arm/sve.decode      | 12 ++++++++++++
>  2 files changed, 36 insertions(+)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 19/67] target/arm: Implement SVE Bitwise Shift - Unpredicated Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 19/67] target/arm: Implement SVE Bitwise Shift - Unpredicated Group Richard Henderson
@ 2018-02-23 13:28   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 13:28 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 12 +++++++
>  target/arm/sve_helper.c    | 30 +++++++++++++++++
>  target/arm/translate-sve.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      | 26 +++++++++++++++
>  4 files changed, 149 insertions(+)
>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 20/67] target/arm: Implement SVE Compute Vector Address Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 20/67] target/arm: Implement SVE Compute Vector Address Group Richard Henderson
@ 2018-02-23 13:34   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 13:34 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  5 +++++
>  target/arm/sve_helper.c    | 40 ++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 33 +++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      | 12 ++++++++++++
>  4 files changed, 90 insertions(+)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 21/67] target/arm: Implement SVE floating-point exponential accelerator
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 21/67] target/arm: Implement SVE floating-point exponential accelerator Richard Henderson
@ 2018-02-23 13:48   ` Peter Maydell
  2018-02-23 17:29     ` Richard Henderson
  0 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 13:48 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  4 +++
>  target/arm/sve_helper.c    | 81 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 22 +++++++++++++
>  target/arm/sve.decode      |  7 ++++
>  4 files changed, 114 insertions(+)
>
> diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
> index 5280d375f9..e2925ff8ec 100644
> --- a/target/arm/helper-sve.h
> +++ b/target/arm/helper-sve.h
> @@ -385,6 +385,10 @@ DEF_HELPER_FLAGS_4(sve_adr_p64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_4(sve_adr_s32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_4(sve_adr_u32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
>
> +DEF_HELPER_FLAGS_3(sve_fexpa_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_3(sve_fexpa_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_3(sve_fexpa_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
> +
>  DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index a290a58c02..4d42653eef 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -1101,3 +1101,84 @@ void HELPER(sve_adr_u32)(void *vd, void *vn, void *vm, uint32_t desc)
>          d[i] = n[i] + ((uint64_t)(uint32_t)m[i] << sh);
>      }
>  }
> +
> +void HELPER(sve_fexpa_h)(void *vd, void *vn, uint32_t desc)
> +{
> +    static const uint16_t coeff[] = {
> +        0x0000, 0x0016, 0x002d, 0x0045, 0x005d, 0x0075, 0x008e, 0x00a8,
> +        0x00c2, 0x00dc, 0x00f8, 0x0114, 0x0130, 0x014d, 0x016b, 0x0189,
> +        0x01a8, 0x01c8, 0x01e8, 0x0209, 0x022b, 0x024e, 0x0271, 0x0295,
> +        0x02ba, 0x02e0, 0x0306, 0x032e, 0x0356, 0x037f, 0x03a9, 0x03d4,
> +    };

Worth a comment that these data tables are from the specification
pseudocode, I think.

> +void HELPER(sve_fexpa_d)(void *vd, void *vn, uint32_t desc)
> +{
> +    static const uint64_t coeff[] = {
> +        0x0000000000000, 0x02C9A3E778061, 0x059B0D3158574, 0x0874518759BC8,
> +        0x0B5586CF9890F, 0x0E3EC32D3D1A2, 0x11301D0125B51, 0x1429AAEA92DE0,
> +        0x172B83C7D517B, 0x1A35BEB6FCB75, 0x1D4873168B9AA, 0x2063B88628CD6,
> +        0x2387A6E756238, 0x26B4565E27CDD, 0x29E9DF51FDEE1, 0x2D285A6E4030B,
> +        0x306FE0A31B715, 0x33C08B26416FF, 0x371A7373AA9CB, 0x3A7DB34E59FF7,
> +        0x3DEA64C123422, 0x4160A21F72E2A, 0x44E086061892D, 0x486A2B5C13CD0,
> +        0x4BFDAD5362A27, 0x4F9B2769D2CA7, 0x5342B569D4F82, 0x56F4736B527DA,
> +        0x5AB07DD485429, 0x5E76F15AD2148, 0x6247EB03A5585, 0x6623882552225,
> +        0x6A09E667F3BCD, 0x6DFB23C651A2F, 0x71F75E8EC5F74, 0x75FEB564267C9,
> +        0x7A11473EB0187, 0x7E2F336CF4E62, 0x82589994CCE13, 0x868D99B4492ED,
> +        0x8ACE5422AA0DB, 0x8F1AE99157736, 0x93737B0CDC5E5, 0x97D829FDE4E50,
> +        0x9C49182A3F090, 0xA0C667B5DE565, 0xA5503B23E255D, 0xA9E6B5579FDBF,
> +        0xAE89F995AD3AD, 0xB33A2B84F15FB, 0xB7F76F2FB5E47, 0xBCC1E904BC1D2,
> +        0xC199BDD85529C, 0xC67F12E57D14B, 0xCB720DCEF9069, 0xD072D4A07897C,
> +        0xD5818DCFBA487, 0xDA9E603DB3285, 0xDFC97337B9B5F, 0xE502EE78B3FF6,
> +        0xEA4AFA2A490DA, 0xEFA1BEE615A27, 0xF50765B6E4540, 0xFA7C1819E90D8,

This confused me at first because it looks like these are 64-bit numbers
but they are only 52 bits. Maybe comment? (or add the leading '000'?)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 22/67] target/arm: Implement SVE floating-point trig select coefficient
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 22/67] target/arm: Implement SVE floating-point trig select coefficient Richard Henderson
@ 2018-02-23 13:54   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 13:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  4 ++++
>  target/arm/sve_helper.c    | 43 +++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 19 +++++++++++++++++++
>  target/arm/sve.decode      |  4 ++++
>  4 files changed, 70 insertions(+)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 23/67] target/arm: Implement SVE Element Count Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 23/67] target/arm: Implement SVE Element Count Group Richard Henderson
@ 2018-02-23 14:06   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 14:06 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  11 ++
>  target/arm/sve_helper.c    | 136 ++++++++++++++++++++++
>  target/arm/translate-sve.c | 274 ++++++++++++++++++++++++++++++++++++++++++++-
>  target/arm/sve.decode      |  30 ++++-
>  4 files changed, 448 insertions(+), 3 deletions(-)
>

> @@ -127,7 +132,9 @@ static void do_vector3_z(DisasContext *s, GVecGen3Fn *gvec_fn,
>  /* Invoke a vector move on two Zregs.  */
>  static void do_mov_z(DisasContext *s, int rd, int rn)
>  {
> -    do_vector2_z(s, tcg_gen_gvec_mov, 0, rd, rn);
> +    if (rd != rn) {
> +        do_vector2_z(s, tcg_gen_gvec_mov, 0, rd, rn);
> +    }
>  }
>
>  /* Initialize a Zreg with replications of a 64-bit immediate.  */
> @@ -168,7 +175,9 @@ static void do_vecop4_p(DisasContext *s, const GVecGen4 *gvec_op,
>  /* Invoke a vector move on two Pregs.  */
>  static void do_mov_p(DisasContext *s, int rd, int rn)
>  {
> -    do_vector2_p(s, tcg_gen_gvec_mov, 0, rd, rn);
> +    if (rd != rn) {
> +        do_vector2_p(s, tcg_gen_gvec_mov, 0, rd, rn);
> +    }
>  }

These should probably be squashed into an earlier patch.


Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 24/67] target/arm: Implement SVE Bitwise Immediate Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 24/67] target/arm: Implement SVE Bitwise Immediate Group Richard Henderson
@ 2018-02-23 14:10   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 14:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-sve.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      | 17 ++++++++++++++++
>  2 files changed, 67 insertions(+)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 25/67] target/arm: Implement SVE Integer Wide Immediate - Predicated Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 25/67] target/arm: Implement SVE Integer Wide Immediate - Predicated Group Richard Henderson
@ 2018-02-23 14:18   ` Peter Maydell
  2018-02-23 17:31     ` Richard Henderson
  0 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 14:18 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

> +/* Two operand predicated copy immediate with merge.  All valid immediates
> + * can fit within 17 signed bits in the simd_data field.
> + */
> +void HELPER(sve_cpy_m_b)(void *vd, void *vn, void *vg,
> +                         uint64_t mm, uint32_t desc)
> +{
> +    intptr_t i, opr_sz = simd_oprsz(desc) / 8;
> +    uint64_t *d = vd, *n = vn;
> +    uint8_t *pg = vg;
> +
> +    mm = (mm & 0xff) * (-1ull / 0xff);

What is this expression doing? I guess from context that it's
replicating the low 8 bits of mm across the 64-bit value,
but this is too obscure to do without a comment or wrapping
it in a helper function with a useful name, I think.


Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 26/67] target/arm: Implement SVE Permute - Extract Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 26/67] target/arm: Implement SVE Permute - Extract Group Richard Henderson
@ 2018-02-23 14:24   ` Peter Maydell
  2018-02-23 17:46     ` Richard Henderson
  0 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 14:24 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  2 ++
>  target/arm/sve_helper.c    | 81 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 29 +++++++++++++++++
>  target/arm/sve.decode      |  9 +++++-
>  4 files changed, 120 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
> index 79493ab647..94f4356ce9 100644
> --- a/target/arm/helper-sve.h
> +++ b/target/arm/helper-sve.h
> @@ -414,6 +414,8 @@ DEF_HELPER_FLAGS_4(sve_cpy_z_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
>  DEF_HELPER_FLAGS_4(sve_cpy_z_s, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
>  DEF_HELPER_FLAGS_4(sve_cpy_z_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
>
> +DEF_HELPER_FLAGS_4(sve_ext, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +
>  DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index 6a95d1ec48..fb3f54300b 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -1469,3 +1469,84 @@ void HELPER(sve_cpy_z_d)(void *vd, void *vg, uint64_t val, uint32_t desc)
>          d[i] = (pg[H1(i)] & 1 ? val : 0);
>      }
>  }
> +
> +/* Big-endian hosts need to frob the byte indicies.  If the copy
> + * happens to be 8-byte aligned, then no frobbing necessary.
> + */

Have you run risu tests with a big endian host?

>  ###########################################################################
>  # Named fields.  These are primarily for disjoint fields.
>
> -%imm4_16_p1             16:4 !function=plus1
> +%imm4_16_p1    16:4 !function=plus1

Another bit that should be squashed into an earlier patch.

>  %imm6_22_5     22:1 5:5
> +%imm8_16_10    16:5 10:3
>  %imm9_16_10    16:s6 10:3
>  %preg4_5       5:4
>
> @@ -363,6 +364,12 @@ FCPY               00000101 .. 01 .... 110 imm:8 .....             @rdn_pg4
>  CPY_m_i                00000101 .. 01 .... 01 . ........ .....   @rdn_pg4 imm=%sh8_i8s
>  CPY_z_i                00000101 .. 01 .... 00 . ........ .....   @rdn_pg4 imm=%sh8_i8s
>
> +### SVE Permute - Extract Group
> +
> +# SVE extract vector (immediate offset)
> +EXT            00000101 001 ..... 000 ... rm:5 rd:5 \
> +               &rrri rn=%reg_movprfx imm=%imm8_16_10
> +
>  ### SVE Predicate Logical Operations Group
>
>  # SVE predicate logical operations
> --

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 27/67] target/arm: Implement SVE Permute - Unpredicated Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 27/67] target/arm: Implement SVE Permute - Unpredicated Group Richard Henderson
@ 2018-02-23 14:34   ` Peter Maydell
  2018-02-23 18:58     ` Richard Henderson
  0 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 14:34 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  23 +++++++++
>  target/arm/translate-a64.h |  14 +++---
>  target/arm/sve_helper.c    | 114 +++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 113 ++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      |  29 +++++++++++-
>  5 files changed, 285 insertions(+), 8 deletions(-)
>
> diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
> index e519aee314..328aa7fce1 100644
> --- a/target/arm/translate-a64.h
> +++ b/target/arm/translate-a64.h
> @@ -66,18 +66,18 @@ static inline void assert_fp_access_checked(DisasContext *s)
>  static inline int vec_reg_offset(DisasContext *s, int regno,
>                                   int element, TCGMemOp size)
>  {
> -    int offs = 0;
> +    int element_size = 1 << size;
> +    int offs = element * element_size;
>  #ifdef HOST_WORDS_BIGENDIAN
>      /* This is complicated slightly because vfp.zregs[n].d[0] is
>       * still the low half and vfp.zregs[n].d[1] the high half
>       * of the 128 bit vector, even on big endian systems.
> -     * Calculate the offset assuming a fully bigendian 128 bits,
> -     * then XOR to account for the order of the two 64 bit halves.
> +     * Calculate the offset assuming a fully little-endian 128 bits,
> +     * then XOR to account for the order of the 64 bit units.
>       */
> -    offs += (16 - ((element + 1) * (1 << size)));
> -    offs ^= 8;
> -#else
> -    offs += element * (1 << size);
> +    if (element_size < 8) {
> +        offs ^= 8 - element_size;
> +    }
>  #endif
>      offs += offsetof(CPUARMState, vfp.zregs[regno]);
>      assert_fp_access_checked(s);

This looks like it should have been in an earlier patch?

> @@ -85,7 +86,9 @@
>  @pd_pg_pn_pm_s ........ . s:1 .. rm:4 .. pg:4 . rn:4 . rd:4    &rprr_s
>
>  # Three operand, vector element size
> -@rd_rn_rm      ........ esz:2 . rm:5  ... ...  rn:5 rd:5       &rrr_esz
> +@rd_rn_rm      ........ esz:2 . rm:5 ... ... rn:5 rd:5         &rrr_esz

Another fragment that should be squashed.

> +@rdn_rm                ........ esz:2 ...... ...... rm:5 rd:5 \
> +               &rrr_esz rn=%reg_movprfx
>
>  # Three operand with "memory" size, aka immediate left shift
>  @rd_rn_msz_rm  ........ ... rm:5 .... imm:2 rn:5 rd:5          &rrri

Otherwise

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 28/67] target/arm: Implement SVE Permute - Predicates Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 28/67] target/arm: Implement SVE Permute - Predicates Group Richard Henderson
@ 2018-02-23 15:15   ` Peter Maydell
  2018-02-23 19:59     ` Richard Henderson
  0 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 15:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |   6 +
>  target/arm/sve_helper.c    | 280 +++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 110 ++++++++++++++++++
>  target/arm/sve.decode      |  18 +++
>  4 files changed, 414 insertions(+)
>
> diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
> index 0c9aad575e..ff958fcebd 100644
> --- a/target/arm/helper-sve.h
> +++ b/target/arm/helper-sve.h
> @@ -439,6 +439,12 @@ DEF_HELPER_FLAGS_3(sve_uunpk_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_3(sve_uunpk_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_3(sve_uunpk_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
>
> +DEF_HELPER_FLAGS_4(sve_zip_p, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(sve_uzp_p, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(sve_trn_p, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_3(sve_rev_p, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_3(sve_punpk_p, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
> +
>  DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index 466a209c1e..c3a2706a16 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -1664,3 +1664,283 @@ DO_UNPK(sve_uunpk_s, uint32_t, uint16_t, H4, H2)
>  DO_UNPK(sve_uunpk_d, uint64_t, uint32_t, , H4)
>
>  #undef DO_UNPK
> +
> +static const uint64_t expand_bit_data[5][2] = {
> +    { 0x1111111111111111ull, 0x2222222222222222ull },
> +    { 0x0303030303030303ull, 0x0c0c0c0c0c0c0c0cull },
> +    { 0x000f000f000f000full, 0x00f000f000f000f0ull },
> +    { 0x000000ff000000ffull, 0x0000ff000000ff00ull },
> +    { 0x000000000000ffffull, 0x00000000ffff0000ull }
> +};
> +
> +/* Expand units of 2**N bits to units of 2**(N+1) bits,
> +   with the higher bits zero.  */

In bitops.h we call this operation "half shuffle" (where
it is specifically working on units of 1 bit size), and
the inverse "half unshuffle". Worth mentioning that (or
using similar terminology) ?

> +static uint64_t expand_bits(uint64_t x, int n)
> +{
> +    int i, sh;

Worth asserting that n is within the range we expect it to be ?
(what range is that? 0 to 4?)

> +    for (i = 4, sh = 16; i >= n; i--, sh >>= 1) {
> +        x = ((x & expand_bit_data[i][1]) << sh) | (x & expand_bit_data[i][0]);
> +    }
> +    return x;
> +}
> +
> +/* Compress units of 2**(N+1) bits to units of 2**N bits.  */
> +static uint64_t compress_bits(uint64_t x, int n)
> +{
> +    int i, sh;

Ditto assert.

> +    for (i = n, sh = 1 << n; i <= 4; i++, sh <<= 1) {
> +        x = ((x >> sh) & expand_bit_data[i][1]) | (x & expand_bit_data[i][0]);
> +    }
> +    return x;
> +}
> +
> +void HELPER(sve_zip_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
> +{
> +    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
> +    int esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2);
> +    intptr_t high = extract32(pred_desc, SIMD_DATA_SHIFT + 2, 1);
> +    uint64_t *d = vd;
> +    intptr_t i;
> +
> +    if (oprsz <= 8) {
> +        uint64_t nn = *(uint64_t *)vn;
> +        uint64_t mm = *(uint64_t *)vm;
> +        int half = 4 * oprsz;
> +
> +        nn = extract64(nn, high * half, half);
> +        mm = extract64(mm, high * half, half);
> +        nn = expand_bits(nn, esz);
> +        mm = expand_bits(mm, esz);
> +        d[0] = nn + (mm << (1 << esz));

Is this actually doing an addition, or is it just an odd
way of writing a bitwise OR when neither of the two
inputs have 1 in the same bit position?

> +    } else {
> +        ARMPredicateReg tmp_n, tmp_m;
> +
> +        /* We produce output faster than we consume input.
> +           Therefore we must be mindful of possible overlap.  */
> +        if ((vn - vd) < (uintptr_t)oprsz) {
> +            vn = memcpy(&tmp_n, vn, oprsz);
> +        }
> +        if ((vm - vd) < (uintptr_t)oprsz) {
> +            vm = memcpy(&tmp_m, vm, oprsz);
> +        }
> +        if (high) {
> +            high = oprsz >> 1;
> +        }
> +
> +        if ((high & 3) == 0) {
> +            uint32_t *n = vn, *m = vm;
> +            high >>= 2;
> +
> +            for (i = 0; i < DIV_ROUND_UP(oprsz, 8); i++) {
> +                uint64_t nn = n[H4(high + i)];
> +                uint64_t mm = m[H4(high + i)];
> +
> +                nn = expand_bits(nn, esz);
> +                mm = expand_bits(mm, esz);
> +                d[i] = nn + (mm << (1 << esz));
> +            }
> +        } else {
> +            uint8_t *n = vn, *m = vm;
> +            uint16_t *d16 = vd;
> +
> +            for (i = 0; i < oprsz / 2; i++) {
> +                uint16_t nn = n[H1(high + i)];
> +                uint16_t mm = m[H1(high + i)];
> +
> +                nn = expand_bits(nn, esz);
> +                mm = expand_bits(mm, esz);
> +                d16[H2(i)] = nn + (mm << (1 << esz));
> +            }
> +        }
> +    }
> +}
> +
> +void HELPER(sve_uzp_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
> +{
> +    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
> +    int esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2);
> +    int odd = extract32(pred_desc, SIMD_DATA_SHIFT + 2, 1) << esz;
> +    uint64_t *d = vd, *n = vn, *m = vm;
> +    uint64_t l, h;
> +    intptr_t i;
> +
> +    if (oprsz <= 8) {
> +        l = compress_bits(n[0] >> odd, esz);
> +        h = compress_bits(m[0] >> odd, esz);
> +        d[0] = extract64(l + (h << (4 * oprsz)), 0, 8 * oprsz);

This looks like it's using addition for logical OR again ?

> +    } else {
> +        ARMPredicateReg tmp_m;
> +        intptr_t oprsz_16 = oprsz / 16;
> +
> +        if ((vm - vd) < (uintptr_t)oprsz) {
> +            m = memcpy(&tmp_m, vm, oprsz);
> +        }
> +
> +        for (i = 0; i < oprsz_16; i++) {
> +            l = n[2 * i + 0];
> +            h = n[2 * i + 1];
> +            l = compress_bits(l >> odd, esz);
> +            h = compress_bits(h >> odd, esz);
> +            d[i] = l + (h << 32);
> +        }
> +
> +        /* For VL which is not a power of 2, the results from M do not
> +           align nicely with the uint64_t for D.  Put the aligned results
> +           from M into TMP_M and then copy it into place afterward.  */

How much risu testing did you do of funny vector lengths ?

> +        if (oprsz & 15) {
> +            d[i] = compress_bits(n[2 * i] >> odd, esz);
> +
> +            for (i = 0; i < oprsz_16; i++) {
> +                l = m[2 * i + 0];
> +                h = m[2 * i + 1];
> +                l = compress_bits(l >> odd, esz);
> +                h = compress_bits(h >> odd, esz);
> +                tmp_m.p[i] = l + (h << 32);
> +            }
> +            tmp_m.p[i] = compress_bits(m[2 * i] >> odd, esz);
> +
> +            swap_memmove(vd + oprsz / 2, &tmp_m, oprsz / 2);
> +        } else {
> +            for (i = 0; i < oprsz_16; i++) {
> +                l = m[2 * i + 0];
> +                h = m[2 * i + 1];
> +                l = compress_bits(l >> odd, esz);
> +                h = compress_bits(h >> odd, esz);
> +                d[oprsz_16 + i] = l + (h << 32);
> +            }
> +        }
> +    }
> +}
> +
> +static const uint64_t even_bit_esz_masks[4] = {
> +    0x5555555555555555ull,
> +    0x3333333333333333ull,
> +    0x0f0f0f0f0f0f0f0full,
> +    0x00ff00ff00ff00ffull
> +};

Comment describing the purpose of these numbers would be useful.

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 29/67] target/arm: Implement SVE Permute - Interleaving Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 29/67] target/arm: Implement SVE Permute - Interleaving Group Richard Henderson
@ 2018-02-23 15:22   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 15:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 15 ++++++++++
>  target/arm/sve_helper.c    | 72 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 69 ++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      | 10 +++++++
>  4 files changed, 166 insertions(+)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 30/67] target/arm: Implement SVE compress active elements
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 30/67] target/arm: Implement SVE compress active elements Richard Henderson
@ 2018-02-23 15:25   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 15:25 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  3 +++
>  target/arm/sve_helper.c    | 34 ++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 12 ++++++++++++
>  target/arm/sve.decode      |  6 ++++++
>  4 files changed, 55 insertions(+)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 31/67] target/arm: Implement SVE conditionally broadcast/extract element
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 31/67] target/arm: Implement SVE conditionally broadcast/extract element Richard Henderson
@ 2018-02-23 15:44   ` Peter Maydell
  2018-02-23 20:15     ` Richard Henderson
  0 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 15:44 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |   2 +
>  target/arm/sve_helper.c    |  11 ++
>  target/arm/translate-sve.c | 299 +++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      |  20 +++
>  4 files changed, 332 insertions(+)
>
> diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
> index d977aea00d..a58fb4ba01 100644
> --- a/target/arm/helper-sve.h
> +++ b/target/arm/helper-sve.h
> @@ -463,6 +463,8 @@ DEF_HELPER_FLAGS_4(sve_trn_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_4(sve_compact_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_4(sve_compact_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
>
> +DEF_HELPER_FLAGS_2(sve_last_active_element, TCG_CALL_NO_RWG, s32, ptr, i32)
> +
>  DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index 87a1a32232..ee289be642 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -2050,3 +2050,14 @@ void HELPER(sve_compact_d)(void *vd, void *vn, void *vg, uint32_t desc)
>          d[j] = 0;
>      }
>  }
> +
> +/* Similar to the ARM LastActiveElement pseudocode function, except the
> +   result is multiplied by the element size.  This includes the not found
> +   indication; e.g. not found for esz=3 is -8.  */
> +int32_t HELPER(sve_last_active_element)(void *vg, uint32_t pred_desc)
> +{
> +    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
> +    intptr_t esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2);

pred_desc is obviously an encoding of some stuff, so the comment would
be a good place to mention what it is.


> +/* Compute CLAST for a scalar.  */
> +static void do_clast_scalar(DisasContext *s, int esz, int pg, int rm,
> +                            bool before, TCGv_i64 reg_val)
> +{
> +    TCGv_i32 last = tcg_temp_new_i32();
> +    TCGv_i64 ele, cmp, zero;
> +
> +    find_last_active(s, last, esz, pg);
> +
> +    /* Extend the original value of last prior to incrementing.  */
> +    cmp = tcg_temp_new_i64();
> +    tcg_gen_ext_i32_i64(cmp, last);
> +
> +    if (!before) {
> +        incr_last_active(s, last, esz);
> +    }
> +
> +    /* The conceit here is that while last < 0 indicates not found, after
> +       adjusting for cpu_env->vfp.zregs[rm], it is still a valid address
> +       from which we can load garbage.  We then discard the garbage with
> +       a conditional move.  */

That's a bit ugly. Can we at least do a compile time assert that the
worst case (which I guess is offset of zregs[0] minus largest-element-size)
is still positive ? That way if for some reason we reshuffle fields
in CPUARMState we'll notice if it's going to fall off the beginning
of the struct.

> +    ele = load_last_active(s, last, rm, esz);
> +    tcg_temp_free_i32(last);
> +
> +    zero = tcg_const_i64(0);
> +    tcg_gen_movcond_i64(TCG_COND_GE, reg_val, cmp, zero, ele, reg_val);
> +
> +    tcg_temp_free_i64(zero);
> +    tcg_temp_free_i64(cmp);
> +    tcg_temp_free_i64(ele);
> +}

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 32/67] target/arm: Implement SVE copy to vector (predicated)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 32/67] target/arm: Implement SVE copy to vector (predicated) Richard Henderson
@ 2018-02-23 15:45   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 15:45 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-sve.c | 13 +++++++++++++
>  target/arm/sve.decode      |  6 ++++++
>  2 files changed, 19 insertions(+)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index 207a22a0bc..fc2a295ab7 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -2422,6 +2422,19 @@ static void trans_LASTB_r(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
>      do_last_general(s, a, true);
>  }
>
> +static void trans_CPY_m_r(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
> +{
> +    do_cpy_m(s, a->esz, a->rd, a->rd, a->pg, cpu_reg_sp(s, a->rn));
> +}
> +
> +static void trans_CPY_m_v(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
> +{
> +    int ofs = vec_reg_offset(s, a->rn, 0, a->esz);
> +    TCGv_i64 t = load_esz(cpu_env, ofs, a->esz);
> +    do_cpy_m(s, a->esz, a->rd, a->rd, a->pg, t);
> +    tcg_temp_free_i64(t);
> +}
> +
>  /*
>   *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
>   */
> diff --git a/target/arm/sve.decode b/target/arm/sve.decode
> index 1370802c12..5e127de88c 100644
> --- a/target/arm/sve.decode
> +++ b/target/arm/sve.decode
> @@ -451,6 +451,12 @@ LASTB_v            00000101 .. 10001 1 100 ... ..... .....         @rd_pg_rn
>  LASTA_r                00000101 .. 10000 0 101 ... ..... .....         @rd_pg_rn
>  LASTB_r                00000101 .. 10000 1 101 ... ..... .....         @rd_pg_rn
>
> +# SVE copy element from SIMD&FP scalar register
> +CPY_m_v                00000101 .. 100000 100 ... ..... .....          @rd_pg_rn
> +
> +# SVE copy element from general register to vector (predicated)
> +CPY_m_r                00000101 .. 101000 101 ... ..... .....          @rd_pg_rn
> +
>  ### SVE Predicate Logical Operations Group
>
>  # SVE predicate logical operations
> --
> 2.14.3

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

(if only every patch in the series was this size...)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 33/67] target/arm: Implement SVE reverse within elements
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 33/67] target/arm: Implement SVE reverse within elements Richard Henderson
@ 2018-02-23 15:50   ` Peter Maydell
  2018-02-23 20:21     ` Richard Henderson
  0 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 15:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 14 ++++++++++++++
>  target/arm/sve_helper.c    | 41 ++++++++++++++++++++++++++++++++++-------
>  target/arm/translate-sve.c | 38 ++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      |  7 +++++++
>  4 files changed, 93 insertions(+), 7 deletions(-)

> +/* Swap 16-bit words within a 32-bit word.  */
> +static inline uint32_t hswap32(uint32_t h)
> +{
> +    return rol32(h, 16);
> +}
> +
> +/* Swap 16-bit words within a 64-bit word.  */
> +static inline uint64_t hswap64(uint64_t h)
> +{
> +    uint64_t m = 0x0000ffff0000ffffull;
> +    h = rol64(h, 32);
> +    return ((h & m) << 16) | ((h >> 16) & m);
> +}
> +
> +/* Swap 32-bit words within a 64-bit word.  */
> +static inline uint64_t wswap64(uint64_t h)
> +{
> +    return rol64(h, 32);
> +}
> +

Were there cases in earlier patches that could have used these? I forget.
I guess they're not useful enough to be worth putting in bswap.h.

> @@ -1577,13 +1611,6 @@ void HELPER(sve_rev_b)(void *vd, void *vn, uint32_t desc)
>      }
>  }
>
> -static inline uint64_t hswap64(uint64_t h)
> -{
> -    uint64_t m = 0x0000ffff0000ffffull;
> -    h = rol64(h, 32);
> -    return ((h & m) << 16) | ((h >> 16) & m);
> -}
> -

Better to put the function in the right place to start with.

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 34/67] target/arm: Implement SVE vector splice (predicated)
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 34/67] target/arm: Implement SVE vector splice (predicated) Richard Henderson
@ 2018-02-23 15:52   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 15:52 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  2 ++
>  target/arm/sve_helper.c    | 37 +++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 10 ++++++++++
>  target/arm/sve.decode      |  3 +++
>  4 files changed, 52 insertions(+)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 35/67] target/arm: Implement SVE Select Vectors Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 35/67] target/arm: Implement SVE Select Vectors Group Richard Henderson
@ 2018-02-23 16:21   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 16:21 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  9 ++++++++
>  target/arm/sve_helper.c    | 55 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c |  2 ++
>  target/arm/sve.decode      |  6 +++++
>  4 files changed, 72 insertions(+)
>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 36/67] target/arm: Implement SVE Integer Compare - Vectors Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 36/67] target/arm: Implement SVE Integer Compare - " Richard Henderson
@ 2018-02-23 16:29   ` Peter Maydell
  2018-02-23 20:57     ` Richard Henderson
  0 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 16:29 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index 86cd792cdf..ae433861f8 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -46,14 +46,14 @@
>   *
>   * The return value has bit 31 set if N is set, bit 1 set if Z is clear,
>   * and bit 0 set if C is set.
> - *
> - * This is an iterative function, called for each Pd and Pg word
> - * moving forward.
>   */
>
>  /* For no G bits set, NZCV = C.  */
>  #define PREDTEST_INIT  1
>
> +/* This is an iterative function, called for each Pd and Pg word
> + * moving forward.
> + */

Why move this comment?

>  static uint32_t iter_predtest_fwd(uint64_t d, uint64_t g, uint32_t flags)
>  {
>      if (likely(g)) {
> @@ -73,6 +73,28 @@ static uint32_t iter_predtest_fwd(uint64_t d, uint64_t g, uint32_t flags)
>      return flags;
>  }
>
> +/* This is an iterative function, called for each Pd and Pg word
> + * moving backward.
> + */
> +static uint32_t iter_predtest_bwd(uint64_t d, uint64_t g, uint32_t flags)
> +{
> +    if (likely(g)) {
> +        /* Compute C from first (i.e last) !(D & G).
> +           Use bit 2 to signal first G bit seen.  */
> +        if (!(flags & 4)) {
> +            flags += 4 - 1; /* add bit 2, subtract C from PREDTEST_INIT */
> +            flags |= (d & pow2floor(g)) == 0;
> +        }
> +
> +        /* Accumulate Z from each D & G.  */
> +        flags |= ((d & g) != 0) << 1;
> +
> +        /* Compute N from last (i.e first) D & G.  Replace previous.  */
> +        flags = deposit32(flags, 31, 1, (d & (g & -g)) != 0);
> +    }
> +    return flags;
> +}
> +
>  /* The same for a single word predicate.  */
>  uint32_t HELPER(sve_predtest1)(uint64_t d, uint64_t g)
>  {
> @@ -2180,3 +2202,168 @@ void HELPER(sve_sel_zpzz_d)(void *vd, void *vn, void *vm,
>          d[i] = (pg[H1(i)] & 1 ? nn : mm);
>      }
>  }
> +
> +/* Two operand comparison controlled by a predicate.
> + * ??? It is very tempting to want to be able to expand this inline
> + * with x86 instructions, e.g.
> + *
> + *    vcmpeqw    zm, zn, %ymm0
> + *    vpmovmskb  %ymm0, %eax
> + *    and        $0x5555, %eax
> + *    and        pg, %eax
> + *
> + * or even aarch64, e.g.
> + *
> + *    // mask = 4000 1000 0400 0100 0040 0010 0004 0001
> + *    cmeq       v0.8h, zn, zm
> + *    and        v0.8h, v0.8h, mask
> + *    addv       h0, v0.8h
> + *    and        v0.8b, pg
> + *
> + * However, coming up with an abstraction that allows vector inputs and
> + * a scalar output, and also handles the byte-ordering of sub-uint64_t
> + * scalar outputs, is tricky.
> + */
> +#define DO_CMP_PPZZ(NAME, TYPE, OP, H, MASK)                                 \
> +uint32_t HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \
> +{                                                                            \
> +    intptr_t opr_sz = simd_oprsz(desc);                                      \
> +    uint32_t flags = PREDTEST_INIT;                                          \
> +    intptr_t i = opr_sz;                                                     \
> +    do {                                                                     \
> +        uint64_t out = 0, pg;                                                \
> +        do {                                                                 \
> +            i -= sizeof(TYPE), out <<= sizeof(TYPE);                         \
> +            TYPE nn = *(TYPE *)(vn + H(i));                                  \
> +            TYPE mm = *(TYPE *)(vm + H(i));                                  \
> +            out |= nn OP mm;                                                 \
> +        } while (i & 63);                                                    \
> +        pg = *(uint64_t *)(vg + (i >> 3)) & MASK;                            \
> +        out &= pg;                                                           \
> +        *(uint64_t *)(vd + (i >> 3)) = out;                                  \
> +        flags = iter_predtest_bwd(out, pg, flags);                           \
> +    } while (i > 0);                                                         \
> +    return flags;                                                            \
> +}

Why do we iterate backwards through the vector? As far as I can
see the pseudocode iterates forwards, and I don't think it
makes a difference to the result which way we go.


Otherwise

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 37/67] target/arm: Implement SVE Integer Compare - Immediate Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 37/67] target/arm: Implement SVE Integer Compare - Immediate Group Richard Henderson
@ 2018-02-23 16:32   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 16:32 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 44 +++++++++++++++++++++++
>  target/arm/sve_helper.c    | 88 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 63 +++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      | 23 ++++++++++++
>  4 files changed, 218 insertions(+)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 38/67] target/arm: Implement SVE Partition Break Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 38/67] target/arm: Implement SVE Partition Break Group Richard Henderson
@ 2018-02-23 16:41   ` Peter Maydell
  2018-02-23 20:59     ` Richard Henderson
  0 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 16:41 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  18 ++++
>  target/arm/sve_helper.c    | 247 +++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c |  96 ++++++++++++++++++
>  target/arm/sve.decode      |  19 ++++
>  4 files changed, 380 insertions(+)


> +        b = g & n;            /* guard true, pred true*/

missing space before */

> +/* Given a computation function, compute a merging BRK.  */
> +static void compute_brk_m(uint64_t *d, uint64_t *n, uint64_t *g,
> +                          intptr_t oprsz, bool after)

Comment says "given a computation function" but the prototype
doesn't take a function as parameter ?

> +{
> +    bool brk = false;
> +    intptr_t i;
> +
> +    for (i = 0; i < DIV_ROUND_UP(oprsz, 8); ++i) {
> +        uint64_t this_b, this_g = g[i];
> +
> +        brk = compute_brk(&this_b, n[i], this_g, brk, after);
> +        d[i] = (this_b & this_g) | (d[i] & ~this_g);
> +    }
> +}
> +

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 39/67] target/arm: Implement SVE Predicate Count Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 39/67] target/arm: Implement SVE Predicate Count Group Richard Henderson
@ 2018-02-23 16:48   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 16:48 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |   2 +
>  target/arm/sve_helper.c    |  14 ++++++
>  target/arm/translate-sve.c | 116 +++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      |  27 +++++++++++
>  4 files changed, 159 insertions(+)
>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 40/67] target/arm: Implement SVE Integer Compare - Scalars Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 40/67] target/arm: Implement SVE Integer Compare - Scalars Group Richard Henderson
@ 2018-02-23 17:00   ` Peter Maydell
  2018-02-23 21:06     ` Richard Henderson
  0 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 17:00 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  2 +
>  target/arm/sve_helper.c    | 31 ++++++++++++++++
>  target/arm/translate-sve.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      |  8 ++++
>  4 files changed, 133 insertions(+)
>
> diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
> index dd4f8f754d..1863106d0f 100644
> --- a/target/arm/helper-sve.h
> +++ b/target/arm/helper-sve.h
> @@ -678,3 +678,5 @@ DEF_HELPER_FLAGS_4(sve_brkn, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_4(sve_brkns, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32)
>
>  DEF_HELPER_FLAGS_3(sve_cntp, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
> +
> +DEF_HELPER_FLAGS_3(sve_while, TCG_CALL_NO_RWG, i32, ptr, i32, i32)
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index dd884bdd1c..80b78da834 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -2716,3 +2716,34 @@ uint64_t HELPER(sve_cntp)(void *vn, void *vg, uint32_t pred_desc)
>      }
>      return sum;
>  }
> +
> +uint32_t HELPER(sve_while)(void *vd, uint32_t count, uint32_t pred_desc)

This could really use a comment about what part of the overall
instruction it's doing.

> +{
> +    uintptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
> +    intptr_t esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2);
> +    uint64_t esz_mask = pred_esz_masks[esz];
> +    ARMPredicateReg *d = vd;
> +    uint32_t flags;
> +    intptr_t i;
> +
> +    /* Begin with a zero predicate register.  */
> +    flags = do_zero(d, oprsz);
> +    if (count == 0) {
> +        return flags;
> +    }
> +
> +    /* Scale from predicate element count to bits.  */
> +    count <<= esz;
> +    /* Bound to the bits in the predicate.  */
> +    count = MIN(count, oprsz * 8);
> +
> +    /* Set all of the requested bits.  */
> +    for (i = 0; i < count / 64; ++i) {
> +        d->p[i] = esz_mask;
> +    }
> +    if (count & 63) {
> +        d->p[i] = ~(-1ull << (count & 63)) & esz_mask;
> +    }
> +
> +    return predtest_ones(d, oprsz, esz_mask);
> +}
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index 038800cc86..4b92a55c21 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -2847,6 +2847,98 @@ static void trans_SINCDECP_z(DisasContext *s, arg_incdec2_pred *a,
>      do_sat_addsub_vec(s, a->esz, a->rd, a->rn, val, a->u, a->d);
>  }
>
> +static void trans_WHILE(DisasContext *s, arg_WHILE *a, uint32_t insn)
> +{
> +    TCGv_i64 op0 = read_cpu_reg(s, a->rn, 1);
> +    TCGv_i64 op1 = read_cpu_reg(s, a->rm, 1);
> +    TCGv_i64 t0 = tcg_temp_new_i64();
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i32 t2, t3;
> +    TCGv_ptr ptr;
> +    unsigned desc, vsz = vec_full_reg_size(s);
> +    TCGCond cond;
> +
> +    if (!a->sf) {
> +        if (a->u) {
> +            tcg_gen_ext32u_i64(op0, op0);
> +            tcg_gen_ext32u_i64(op1, op1);
> +        } else {
> +            tcg_gen_ext32s_i64(op0, op0);
> +            tcg_gen_ext32s_i64(op1, op1);
> +        }
> +    }
> +
> +    /* For the helper, compress the different conditions into a computation
> +     * of how many iterations for which the condition is true.
> +     *
> +     * This is slightly complicated by 0 <= UINT64_MAX, which is nominally
> +     * 2**64 iterations, overflowing to 0.  Of course, predicate registers
> +     * aren't that large, so any value >= predicate size is sufficient.
> +     */
> +    tcg_gen_sub_i64(t0, op1, op0);
> +
> +    /* t0 = MIN(op1 - op0, vsz).  */
> +    if (a->eq) {
> +        /* Equality means one more iteration.  */
> +        tcg_gen_movi_i64(t1, vsz - 1);
> +        tcg_gen_movcond_i64(TCG_COND_LTU, t0, t0, t1, t0, t1);
> +        tcg_gen_addi_i64(t0, t0, 1);
> +    } else {
> +        tcg_gen_movi_i64(t1, vsz);
> +        tcg_gen_movcond_i64(TCG_COND_LTU, t0, t0, t1, t0, t1);
> +    }
> +
> +    /* t0 = (condition true ? t0 : 0).  */
> +    cond = (a->u
> +            ? (a->eq ? TCG_COND_LEU : TCG_COND_LTU)
> +            : (a->eq ? TCG_COND_LE : TCG_COND_LT));
> +    tcg_gen_movi_i64(t1, 0);
> +    tcg_gen_movcond_i64(cond, t0, op0, op1, t0, t1);
> +
> +    t2 = tcg_temp_new_i32();
> +    tcg_gen_extrl_i64_i32(t2, t0);
> +    tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(t1);
> +
> +    desc = (vsz / 8) - 2;
> +    desc = deposit32(desc, SIMD_DATA_SHIFT, 2, a->esz);
> +    t3 = tcg_const_i32(desc);
> +
> +    ptr = tcg_temp_new_ptr();
> +    tcg_gen_addi_ptr(ptr, cpu_env, pred_full_reg_offset(s, a->rd));
> +
> +    gen_helper_sve_while(t2, ptr, t2, t3);
> +    do_pred_flags(t2);
> +
> +    tcg_temp_free_ptr(ptr);
> +    tcg_temp_free_i32(t2);
> +    tcg_temp_free_i32(t3);
> +}

I got confused by this -- it is too far different from what the
pseudocode is doing. Could we have more explanatory comments, please?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 01/67] target/arm: Enable SVE for aarch64-linux-user
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 01/67] target/arm: Enable SVE for aarch64-linux-user Richard Henderson
  2018-02-22 17:28   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
@ 2018-02-23 17:00   ` Alex Bennée
  2018-02-23 18:47     ` Richard Henderson
  1 sibling, 1 reply; 167+ messages in thread
From: Alex Bennée @ 2018-02-23 17:00 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm


Richard Henderson <richard.henderson@linaro.org> writes:

> Enable ARM_FEATURE_SVE for the generic "any" cpu.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/cpu.c   | 7 +++++++
>  target/arm/cpu64.c | 1 +
>  2 files changed, 8 insertions(+)
>
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 1b3ae62db6..10843994c3 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -150,6 +150,13 @@ static void arm_cpu_reset(CPUState *s)
>          env->cp15.sctlr_el[1] |= SCTLR_UCT | SCTLR_UCI | SCTLR_DZE;
>          /* and to the FP/Neon instructions */
>          env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 20, 2, 3);
> +        /* and to the SVE instructions */
> +        env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 16, 2, 3);
> +        env->cp15.cptr_el[3] |= CPTR_EZ;
> +        /* with maximum vector length */
> +        env->vfp.zcr_el[1] = ARM_MAX_VQ - 1;
> +        env->vfp.zcr_el[2] = ARM_MAX_VQ - 1;
> +        env->vfp.zcr_el[3] = ARM_MAX_VQ - 1;
>  #else

I notice this is linux-user only but what happens if you specify a
specific CPU in linux-user mode, do we still end up running SVE specific
initialisation?

It seems to me that we should be seeing feature guarded reset stuff in here.

>          /* Reset into the highest available EL */
>          if (arm_feature(env, ARM_FEATURE_EL3)) {
> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> index efc519b49b..36ef9e9d9d 100644
> --- a/target/arm/cpu64.c
> +++ b/target/arm/cpu64.c
> @@ -231,6 +231,7 @@ static void aarch64_any_initfn(Object *obj)
>      set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
>      set_feature(&cpu->env, ARM_FEATURE_CRC);
>      set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
> +    set_feature(&cpu->env, ARM_FEATURE_SVE);
>      cpu->ctr = 0x80038003; /* 32 byte I and D cacheline size, VIPT icache */
>      cpu->dcz_blocksize = 7; /*  512 bytes */
>  }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 00/67] target/arm: Scalable Vector Extension
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (66 preceding siblings ...)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 67/67] target/arm: Implement SVE floating-point unary operations Richard Henderson
@ 2018-02-23 17:05 ` Alex Bennée
  2018-04-03 15:41 ` Alex Bennée
  68 siblings, 0 replies; 167+ messages in thread
From: Alex Bennée @ 2018-02-23 17:05 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm


Richard Henderson <richard.henderson@linaro.org> writes:

> This is 99% of the instruction set.  There are a few things missing,
> notably first-fault and non-fault loads (even these are decoded, but
> simply treated as normal loads for now).
>
> The patch set is dependant on at least 3 other branches.
> A fully composed tree is available as
>
>   git://github.com/rth7680/qemu.git tgt-arm-sve-7

Well now it's down just my half-precision patches because I was able to
apply this to my recently re-based against master arm-fp16-v3:

  https://github.com/stsquad/qemu/tree/review/sve-vectors-v2-rebase

>
> There are a few checkpatch errors due to macros and typedefs, but
> nothing that isn't be obvious as a false positive.
>
> This is able to run SVE enabled Himeno and LULESH benchmarks as
> compiled by last week's gcc-8:
>
> $ ./aarch64-linux-user/qemu-aarch64 ~/himeno-advsimd
> mimax = 129 mjmax = 65 mkmax = 65
> imax = 128 jmax = 64 kmax =64
> cpu : 67.028643 sec.
> Loop executed for 200 times
> Gosa : 1.688752e-03
> MFLOPS measured : 49.136295
> Score based on MMX Pentium 200MHz : 1.522662
>
> $ ./aarch64-linux-user/qemu-aarch64 ~/himeno-sve
> mimax = 129 mjmax = 65 mkmax = 65
> imax = 128 jmax = 64 kmax =64
> cpu : 43.481213 sec.
> Loop executed for 200 times
> Gosa : 3.830036e-06
> MFLOPS measured : 75.746259
> Score based on MMX Pentium 200MHz : 2.347266
>
> Hopefully the size of the patch set isn't too daunting...
>
>
> r~
>
>
> Richard Henderson (67):
>   target/arm: Enable SVE for aarch64-linux-user
>   target/arm: Introduce translate-a64.h
>   target/arm: Add SVE decode skeleton
>   target/arm: Implement SVE Bitwise Logical - Unpredicated Group
>   target/arm: Implement SVE load vector/predicate
>   target/arm: Implement SVE predicate test
>   target/arm: Implement SVE Predicate Logical Operations Group
>   target/arm: Implement SVE Predicate Misc Group
>   target/arm: Implement SVE Integer Binary Arithmetic - Predicated Group
>   target/arm: Implement SVE Integer Reduction Group
>   target/arm: Implement SVE bitwise shift by immediate (predicated)
>   target/arm: Implement SVE bitwise shift by vector (predicated)
>   target/arm: Implement SVE bitwise shift by wide elements (predicated)
>   target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group
>   target/arm: Implement SVE Integer Multiply-Add Group
>   target/arm: Implement SVE Integer Arithmetic - Unpredicated Group
>   target/arm: Implement SVE Index Generation Group
>   target/arm: Implement SVE Stack Allocation Group
>   target/arm: Implement SVE Bitwise Shift - Unpredicated Group
>   target/arm: Implement SVE Compute Vector Address Group
>   target/arm: Implement SVE floating-point exponential accelerator
>   target/arm: Implement SVE floating-point trig select coefficient
>   target/arm: Implement SVE Element Count Group
>   target/arm: Implement SVE Bitwise Immediate Group
>   target/arm: Implement SVE Integer Wide Immediate - Predicated Group
>   target/arm: Implement SVE Permute - Extract Group
>   target/arm: Implement SVE Permute - Unpredicated Group
>   target/arm: Implement SVE Permute - Predicates Group
>   target/arm: Implement SVE Permute - Interleaving Group
>   target/arm: Implement SVE compress active elements
>   target/arm: Implement SVE conditionally broadcast/extract element
>   target/arm: Implement SVE copy to vector (predicated)
>   target/arm: Implement SVE reverse within elements
>   target/arm: Implement SVE vector splice (predicated)
>   target/arm: Implement SVE Select Vectors Group
>   target/arm: Implement SVE Integer Compare - Vectors Group
>   target/arm: Implement SVE Integer Compare - Immediate Group
>   target/arm: Implement SVE Partition Break Group
>   target/arm: Implement SVE Predicate Count Group
>   target/arm: Implement SVE Integer Compare - Scalars Group
>   target/arm: Implement FDUP/DUP
>   target/arm: Implement SVE Integer Wide Immediate - Unpredicated Group
>   target/arm: Implement SVE Floating Point Arithmetic - Unpredicated
>     Group
>   target/arm: Implement SVE Memory Contiguous Load Group
>   target/arm: Implement SVE Memory Contiguous Store Group
>   target/arm: Implement SVE load and broadcast quadword
>   target/arm: Implement SVE integer convert to floating-point
>   target/arm: Implement SVE floating-point arithmetic (predicated)
>   target/arm: Implement SVE FP Multiply-Add Group
>   target/arm: Implement SVE Floating Point Accumulating Reduction Group
>   target/arm: Implement SVE load and broadcast element
>   target/arm: Implement SVE store vector/predicate register
>   target/arm: Implement SVE scatter stores
>   target/arm: Implement SVE prefetches
>   target/arm: Implement SVE gather loads
>   target/arm: Implement SVE scatter store vector immediate
>   target/arm: Implement SVE floating-point compare vectors
>   target/arm: Implement SVE floating-point arithmetic with immediate
>   target/arm: Implement SVE Floating Point Multiply Indexed Group
>   target/arm: Implement SVE FP Fast Reduction Group
>   target/arm: Implement SVE Floating Point Unary Operations -
>     Unpredicated Group
>   target/arm: Implement SVE FP Compare with Zero Group
>   target/arm: Implement SVE floating-point trig multiply-add coefficient
>   target/arm: Implement SVE floating-point convert precision
>   target/arm: Implement SVE floating-point convert to integer
>   target/arm: Implement SVE floating-point round to integral value
>   target/arm: Implement SVE floating-point unary operations
>
>  target/arm/cpu.h           |    7 +-
>  target/arm/helper-sve.h    | 1285 ++++++++++++
>  target/arm/helper.h        |   42 +
>  target/arm/translate-a64.h |  110 ++
>  target/arm/cpu.c           |    7 +
>  target/arm/cpu64.c         |    1 +
>  target/arm/sve_helper.c    | 4051 ++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-a64.c |  112 +-
>  target/arm/translate-sve.c | 4626 ++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/vec_helper.c    |  178 ++
>  .gitignore                 |    1 +
>  target/arm/Makefile.objs   |   12 +-
>  target/arm/sve.decode      | 1067 ++++++++++
>  13 files changed, 11408 insertions(+), 91 deletions(-)
>  create mode 100644 target/arm/helper-sve.h
>  create mode 100644 target/arm/translate-a64.h
>  create mode 100644 target/arm/sve_helper.c
>  create mode 100644 target/arm/translate-sve.c
>  create mode 100644 target/arm/vec_helper.c
>  create mode 100644 target/arm/sve.decode


--
Alex Bennée

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 41/67] target/arm: Implement FDUP/DUP
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 41/67] target/arm: Implement FDUP/DUP Richard Henderson
@ 2018-02-23 17:12   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 17:12 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-sve.c | 35 +++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      |  8 ++++++++
>  2 files changed, 43 insertions(+)
>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 42/67] target/arm: Implement SVE Integer Wide Immediate - Unpredicated Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 42/67] target/arm: Implement SVE Integer Wide Immediate - Unpredicated Group Richard Henderson
@ 2018-02-23 17:18   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 17:18 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  25 +++++++++
>  target/arm/sve_helper.c    |  41 ++++++++++++++
>  target/arm/translate-sve.c | 135 +++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      |  26 +++++++++
>  4 files changed, 227 insertions(+)


>
> +# SVE integer add/subtract immediate (unpredicated)
> +ADD_zzi                00100101 .. 100 000 11 . ........ .....         @rdn_sh_i8u
> +SUB_zzi                00100101 .. 100 001 11 . ........ .....         @rdn_sh_i8u
> +SUBR_zzi       00100101 .. 100 011 11 . ........ .....         @rdn_sh_i8u
> +SQADD_zzi      00100101 .. 100 100 11 . ........ .....         @rdn_sh_i8u
> +UQADD_zzi      00100101 .. 100 101 11 . ........ .....         @rdn_sh_i8u
> +SQSUB_zzi      00100101 .. 100 110 11 . ........ .....         @rdn_sh_i8u
> +UQSUB_zzi      00100101 .. 100 111 11 . ........ .....         @rdn_sh_i8u
> +
> +# SVE integer min/max immediate (unpredicated)
> +SMAX_zzi       00100101 .. 101 000 110 ........ .....          @rdn_i8s
> +UMAX_zzi       00100101 .. 101 001 110 ........ .....          @rdn_i8u
> +SMIN_zzi       00100101 .. 101 010 110 ........ .....          @rdn_i8s
> +UMIN_zzi       00100101 .. 101 011 110 ........ .....          @rdn_i8u
> +
> +# SVE integer multiply immediate (unpredicated)
> +MUL_zzi                00100101 .. 110 000 110 ........ .....          @rdn_i8s

ADD, SUB, MUL out of line with the others.

> +
>  ### SVE Memory - 32-bit Gather and Unsized Contiguous Group
>
>  # SVE load predicate register

otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 43/67] target/arm: Implement SVE Floating Point Arithmetic - Unpredicated Group
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 43/67] target/arm: Implement SVE Floating Point Arithmetic " Richard Henderson
@ 2018-02-23 17:25   ` Peter Maydell
  2018-02-23 21:15     ` Richard Henderson
  0 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 17:25 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 14 +++++++
>  target/arm/helper.h        | 19 ++++++++++
>  target/arm/translate-sve.c | 41 ++++++++++++++++++++
>  target/arm/vec_helper.c    | 94 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/Makefile.objs   |  2 +-
>  target/arm/sve.decode      | 10 +++++
>  6 files changed, 179 insertions(+), 1 deletion(-)
>  create mode 100644 target/arm/vec_helper.c
>

> +/* Floating-point trigonometric starting value.
> + * See the ARM ARM pseudocode function FPTrigSMul.
> + */
> +static float16 float16_ftsmul(float16 op1, uint16_t op2, float_status *stat)
> +{
> +    float16 result = float16_mul(op1, op1, stat);
> +    if (!float16_is_any_nan(result)) {
> +        result = float16_set_sign(result, op2 & 1);
> +    }
> +    return result;
> +}
> +
> +static float32 float32_ftsmul(float32 op1, uint32_t op2, float_status *stat)
> +{
> +    float32 result = float32_mul(op1, op1, stat);
> +    if (!float32_is_any_nan(result)) {
> +        result = float32_set_sign(result, op2 & 1);
> +    }
> +    return result;
> +}
> +
> +static float64 float64_ftsmul(float64 op1, uint64_t op2, float_status *stat)
> +{
> +    float64 result = float64_mul(op1, op1, stat);
> +    if (!float64_is_any_nan(result)) {
> +        result = float64_set_sign(result, op2 & 1);
> +    }
> +    return result;
> +}
> +
> +#define DO_3OP(NAME, FUNC, TYPE) \
> +void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
> +{                                                                          \
> +    intptr_t i, oprsz = simd_oprsz(desc);                                  \
> +    TYPE *d = vd, *n = vn, *m = vm;                                        \
> +    for (i = 0; i < oprsz / sizeof(TYPE); i++) {                           \
> +        d[i] = FUNC(n[i], m[i], stat);                                     \
> +    }                                                                      \
> +}
> +
> +DO_3OP(gvec_fadd_h, float16_add, float16)
> +DO_3OP(gvec_fadd_s, float32_add, float32)
> +DO_3OP(gvec_fadd_d, float64_add, float64)
> +
> +DO_3OP(gvec_fsub_h, float16_sub, float16)
> +DO_3OP(gvec_fsub_s, float32_sub, float32)
> +DO_3OP(gvec_fsub_d, float64_sub, float64)
> +
> +DO_3OP(gvec_fmul_h, float16_mul, float16)
> +DO_3OP(gvec_fmul_s, float32_mul, float32)
> +DO_3OP(gvec_fmul_d, float64_mul, float64)
> +
> +DO_3OP(gvec_ftsmul_h, float16_ftsmul, float16)
> +DO_3OP(gvec_ftsmul_s, float32_ftsmul, float32)
> +DO_3OP(gvec_ftsmul_d, float64_ftsmul, float64)
> +
> +#ifdef TARGET_AARCH64

This seems a bit odd given SVE is AArch64-only anyway...

> +
> +DO_3OP(gvec_recps_h, helper_recpsf_f16, float16)
> +DO_3OP(gvec_recps_s, helper_recpsf_f32, float32)
> +DO_3OP(gvec_recps_d, helper_recpsf_f64, float64)
> +
> +DO_3OP(gvec_rsqrts_h, helper_rsqrtsf_f16, float16)
> +DO_3OP(gvec_rsqrts_s, helper_rsqrtsf_f32, float32)
> +DO_3OP(gvec_rsqrts_d, helper_rsqrtsf_f64, float64)
> +
> +#endif
> +#undef DO_3OP

> +### SVE Floating Point Arithmetic - Unpredicated Group
> +
> +# SVE floating-point arithmetic (unpredicated)
> +FADD_zzz       01100101 .. 0 ..... 000 000 ..... .....         @rd_rn_rm
> +FSUB_zzz       01100101 .. 0 ..... 000 001 ..... .....         @rd_rn_rm
> +FMUL_zzz       01100101 .. 0 ..... 000 010 ..... .....         @rd_rn_rm
> +FTSMUL         01100101 .. 0 ..... 000 011 ..... .....         @rd_rn_rm
> +FRECPS         01100101 .. 0 ..... 000 110 ..... .....         @rd_rn_rm
> +FRSQRTS                01100101 .. 0 ..... 000 111 ..... .....         @rd_rn_rm

Another misaligned line.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/67] target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group
  2018-02-23 13:08   ` Peter Maydell
@ 2018-02-23 17:25     ` Richard Henderson
  2018-02-23 17:30       ` Peter Maydell
  0 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-02-23 17:25 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 02/23/2018 05:08 AM, Peter Maydell wrote:
>> +# SVE unary bit operations (predicated)
>> +# Note esz != 0 for FABS and FNEG.
>> +CLS            00000100 .. 011 000 101 ... ..... .....         @rd_pg_rn
>> +CLZ            00000100 .. 011 001 101 ... ..... .....         @rd_pg_rn
>> +CNT_zpz                00000100 .. 011 010 101 ... ..... .....         @rd_pg_rn
>> +CNOT           00000100 .. 011 011 101 ... ..... .....         @rd_pg_rn
>> +NOT_zpz                00000100 .. 011 110 101 ... ..... .....         @rd_pg_rn
>> +FABS           00000100 .. 011 100 101 ... ..... .....         @rd_pg_rn
>> +FNEG           00000100 .. 011 101 101 ... ..... .....         @rd_pg_rn
> 
> Indentation seems to be a bit skew for the _zpz lines.

There are tabs in here.  I know they're not allowed for C, but this isn't.


r~

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 21/67] target/arm: Implement SVE floating-point exponential accelerator
  2018-02-23 13:48   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
@ 2018-02-23 17:29     ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-23 17:29 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 02/23/2018 05:48 AM, Peter Maydell wrote:
>> +void HELPER(sve_fexpa_d)(void *vd, void *vn, uint32_t desc)
>> +{
>> +    static const uint64_t coeff[] = {
>> +        0x0000000000000, 0x02C9A3E778061, 0x059B0D3158574, 0x0874518759BC8,
>> +        0x0B5586CF9890F, 0x0E3EC32D3D1A2, 0x11301D0125B51, 0x1429AAEA92DE0,
>> +        0x172B83C7D517B, 0x1A35BEB6FCB75, 0x1D4873168B9AA, 0x2063B88628CD6,
>> +        0x2387A6E756238, 0x26B4565E27CDD, 0x29E9DF51FDEE1, 0x2D285A6E4030B,
>> +        0x306FE0A31B715, 0x33C08B26416FF, 0x371A7373AA9CB, 0x3A7DB34E59FF7,
>> +        0x3DEA64C123422, 0x4160A21F72E2A, 0x44E086061892D, 0x486A2B5C13CD0,
>> +        0x4BFDAD5362A27, 0x4F9B2769D2CA7, 0x5342B569D4F82, 0x56F4736B527DA,
>> +        0x5AB07DD485429, 0x5E76F15AD2148, 0x6247EB03A5585, 0x6623882552225,
>> +        0x6A09E667F3BCD, 0x6DFB23C651A2F, 0x71F75E8EC5F74, 0x75FEB564267C9,
>> +        0x7A11473EB0187, 0x7E2F336CF4E62, 0x82589994CCE13, 0x868D99B4492ED,
>> +        0x8ACE5422AA0DB, 0x8F1AE99157736, 0x93737B0CDC5E5, 0x97D829FDE4E50,
>> +        0x9C49182A3F090, 0xA0C667B5DE565, 0xA5503B23E255D, 0xA9E6B5579FDBF,
>> +        0xAE89F995AD3AD, 0xB33A2B84F15FB, 0xB7F76F2FB5E47, 0xBCC1E904BC1D2,
>> +        0xC199BDD85529C, 0xC67F12E57D14B, 0xCB720DCEF9069, 0xD072D4A07897C,
>> +        0xD5818DCFBA487, 0xDA9E603DB3285, 0xDFC97337B9B5F, 0xE502EE78B3FF6,
>> +        0xEA4AFA2A490DA, 0xEFA1BEE615A27, 0xF50765B6E4540, 0xFA7C1819E90D8,
> 
> This confused me at first because it looks like these are 64-bit numbers
> but they are only 52 bits. Maybe comment? (or add the leading '000'?)

Interesting... I didn't even notice.  This was pure cut-and-paste from the
pseudocode.  As such, with the comment, I wouldn't modify them.


r~

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/67] target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group
  2018-02-23 17:25     ` Richard Henderson
@ 2018-02-23 17:30       ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-23 17:30 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 23 February 2018 at 17:25, Richard Henderson
<richard.henderson@linaro.org> wrote:
> On 02/23/2018 05:08 AM, Peter Maydell wrote:
>>> +# SVE unary bit operations (predicated)
>>> +# Note esz != 0 for FABS and FNEG.
>>> +CLS            00000100 .. 011 000 101 ... ..... .....         @rd_pg_rn
>>> +CLZ            00000100 .. 011 001 101 ... ..... .....         @rd_pg_rn
>>> +CNT_zpz                00000100 .. 011 010 101 ... ..... .....         @rd_pg_rn
>>> +CNOT           00000100 .. 011 011 101 ... ..... .....         @rd_pg_rn
>>> +NOT_zpz                00000100 .. 011 110 101 ... ..... .....         @rd_pg_rn
>>> +FABS           00000100 .. 011 100 101 ... ..... .....         @rd_pg_rn
>>> +FNEG           00000100 .. 011 101 101 ... ..... .....         @rd_pg_rn
>>
>> Indentation seems to be a bit skew for the _zpz lines.
>
> There are tabs in here.  I know they're not allowed for C, but this isn't.

I don't think we should have tabs in these files either for the
same reasons we don't have them in C code.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 25/67] target/arm: Implement SVE Integer Wide Immediate - Predicated Group
  2018-02-23 14:18   ` Peter Maydell
@ 2018-02-23 17:31     ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-23 17:31 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 02/23/2018 06:18 AM, Peter Maydell wrote:
>> +    mm = (mm & 0xff) * (-1ull / 0xff);
> 
> What is this expression doing? I guess from context that it's
> replicating the low 8 bits of mm across the 64-bit value,
> but this is too obscure to do without a comment or wrapping
> it in a helper function with a useful name, I think.

I do have a helper now -- dup_const.  I thought I'd converted all of the uses,
but clearly missed one/some.


r~

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 26/67] target/arm: Implement SVE Permute - Extract Group
  2018-02-23 14:24   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
@ 2018-02-23 17:46     ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-23 17:46 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 02/23/2018 06:24 AM, Peter Maydell wrote:
> On 17 February 2018 at 18:22, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  target/arm/helper-sve.h    |  2 ++
>>  target/arm/sve_helper.c    | 81 ++++++++++++++++++++++++++++++++++++++++++++++
>>  target/arm/translate-sve.c | 29 +++++++++++++++++
>>  target/arm/sve.decode      |  9 +++++-
>>  4 files changed, 120 insertions(+), 1 deletion(-)
>>
>> diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
>> index 79493ab647..94f4356ce9 100644
>> --- a/target/arm/helper-sve.h
>> +++ b/target/arm/helper-sve.h
>> @@ -414,6 +414,8 @@ DEF_HELPER_FLAGS_4(sve_cpy_z_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
>>  DEF_HELPER_FLAGS_4(sve_cpy_z_s, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
>>  DEF_HELPER_FLAGS_4(sve_cpy_z_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
>>
>> +DEF_HELPER_FLAGS_4(sve_ext, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
>> +
>>  DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
>>  DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
>>  DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
>> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
>> index 6a95d1ec48..fb3f54300b 100644
>> --- a/target/arm/sve_helper.c
>> +++ b/target/arm/sve_helper.c
>> @@ -1469,3 +1469,84 @@ void HELPER(sve_cpy_z_d)(void *vd, void *vg, uint64_t val, uint32_t desc)
>>          d[i] = (pg[H1(i)] & 1 ? val : 0);
>>      }
>>  }
>> +
>> +/* Big-endian hosts need to frob the byte indicies.  If the copy
>> + * happens to be 8-byte aligned, then no frobbing necessary.
>> + */
> 
> Have you run risu tests with a big endian host?

Some, early on.  It's probably time to do it again.

Running those tests was why I dropped the ZIP/UZP/TRN patches from the host
vector support patch set.  Supporting those endian agnostic is incompatible
with our "pdp-endian-like" storage of vectors for ARM -- we would have to put
the vectors in full host-endian order for that.

In the meantime, the frobbing within helpers does work.


r~

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 01/67] target/arm: Enable SVE for aarch64-linux-user
  2018-02-23 17:00   ` Alex Bennée
@ 2018-02-23 18:47     ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-23 18:47 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, qemu-arm

On 02/23/2018 09:00 AM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> Enable ARM_FEATURE_SVE for the generic "any" cpu.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  target/arm/cpu.c   | 7 +++++++
>>  target/arm/cpu64.c | 1 +
>>  2 files changed, 8 insertions(+)
>>
>> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
>> index 1b3ae62db6..10843994c3 100644
>> --- a/target/arm/cpu.c
>> +++ b/target/arm/cpu.c
>> @@ -150,6 +150,13 @@ static void arm_cpu_reset(CPUState *s)
>>          env->cp15.sctlr_el[1] |= SCTLR_UCT | SCTLR_UCI | SCTLR_DZE;
>>          /* and to the FP/Neon instructions */
>>          env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 20, 2, 3);
>> +        /* and to the SVE instructions */
>> +        env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 16, 2, 3);
>> +        env->cp15.cptr_el[3] |= CPTR_EZ;
>> +        /* with maximum vector length */
>> +        env->vfp.zcr_el[1] = ARM_MAX_VQ - 1;
>> +        env->vfp.zcr_el[2] = ARM_MAX_VQ - 1;
>> +        env->vfp.zcr_el[3] = ARM_MAX_VQ - 1;
>>  #else
> 
> I notice this is linux-user only but what happens if you specify a
> specific CPU in linux-user mode, do we still end up running SVE specific
> initialisation?
> 
> It seems to me that we should be seeing feature guarded reset stuff in here.

You're right.  On the whole (probably) wouldn't matter in the end because the
actual insn decode would still be protected by ARM_FEATURE_SVE.  But even so
we'd see VQ=16 in the TB flags and do too much work in clear_high_part.


r~

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 27/67] target/arm: Implement SVE Permute - Unpredicated Group
  2018-02-23 14:34   ` Peter Maydell
@ 2018-02-23 18:58     ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-23 18:58 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 02/23/2018 06:34 AM, Peter Maydell wrote:
> On 17 February 2018 at 18:22, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  target/arm/helper-sve.h    |  23 +++++++++
>>  target/arm/translate-a64.h |  14 +++---
>>  target/arm/sve_helper.c    | 114 +++++++++++++++++++++++++++++++++++++++++++++
>>  target/arm/translate-sve.c | 113 ++++++++++++++++++++++++++++++++++++++++++++
>>  target/arm/sve.decode      |  29 +++++++++++-
>>  5 files changed, 285 insertions(+), 8 deletions(-)
>>
>> diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
>> index e519aee314..328aa7fce1 100644
>> --- a/target/arm/translate-a64.h
>> +++ b/target/arm/translate-a64.h
>> @@ -66,18 +66,18 @@ static inline void assert_fp_access_checked(DisasContext *s)
>>  static inline int vec_reg_offset(DisasContext *s, int regno,
>>                                   int element, TCGMemOp size)
>>  {
>> -    int offs = 0;
>> +    int element_size = 1 << size;
>> +    int offs = element * element_size;
>>  #ifdef HOST_WORDS_BIGENDIAN
>>      /* This is complicated slightly because vfp.zregs[n].d[0] is
>>       * still the low half and vfp.zregs[n].d[1] the high half
>>       * of the 128 bit vector, even on big endian systems.
>> -     * Calculate the offset assuming a fully bigendian 128 bits,
>> -     * then XOR to account for the order of the two 64 bit halves.
>> +     * Calculate the offset assuming a fully little-endian 128 bits,
>> +     * then XOR to account for the order of the 64 bit units.
>>       */
>> -    offs += (16 - ((element + 1) * (1 << size)));
>> -    offs ^= 8;
>> -#else
>> -    offs += element * (1 << size);
>> +    if (element_size < 8) {
>> +        offs ^= 8 - element_size;
>> +    }
>>  #endif
>>      offs += offsetof(CPUARMState, vfp.zregs[regno]);
>>      assert_fp_access_checked(s);
> 
> This looks like it should have been in an earlier patch?

Hah!  For the first time, no.   But perhaps a separate patch.

What this does is allow proper computation with size > 3.  In particular, I
want to support size==4, aka a 128-bit element.  I think it's cleaner to extend
this function than expose some internals where otherwise needed.


r~

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 28/67] target/arm: Implement SVE Permute - Predicates Group
  2018-02-23 15:15   ` Peter Maydell
@ 2018-02-23 19:59     ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-23 19:59 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 02/23/2018 07:15 AM, Peter Maydell wrote:
>> +static const uint64_t expand_bit_data[5][2] = {
>> +    { 0x1111111111111111ull, 0x2222222222222222ull },
>> +    { 0x0303030303030303ull, 0x0c0c0c0c0c0c0c0cull },
>> +    { 0x000f000f000f000full, 0x00f000f000f000f0ull },
>> +    { 0x000000ff000000ffull, 0x0000ff000000ff00ull },
>> +    { 0x000000000000ffffull, 0x00000000ffff0000ull }
>> +};
>> +
>> +/* Expand units of 2**N bits to units of 2**(N+1) bits,
>> +   with the higher bits zero.  */
> 
> In bitops.h we call this operation "half shuffle" (where
> it is specifically working on units of 1 bit size), and
> the inverse "half unshuffle". Worth mentioning that (or
> using similar terminology) ?

I hadn't noticed this helper.  I'll at least mention.

FWIW, the half_un/shuffle operation is what you get with N=0, which corresponds
to a byte predicate interleave.  We need the intermediate steps for half,
single, and double predicate interleaves.

>> +static uint64_t expand_bits(uint64_t x, int n)
>> +{
>> +    int i, sh;
> 
> Worth asserting that n is within the range we expect it to be ?
> (what range is that? 0 to 4?)

N goes from 0-3; I goes from 0-4.  N will have been controlled by decode, so
I'm not sure it's worth an assert.  Even if I did add one, I wouldn't want it
here, at the center of a loop kernel.

>> +        d[0] = nn + (mm << (1 << esz));
> 
> Is this actually doing an addition, or is it just an odd
> way of writing a bitwise OR when neither of the two
> inputs have 1 in the same bit position?

It could be an OR.  Here I'm hoping that the compiler will use a shift-add
instruction.  Which it wouldn't necessarily be able to prove by itself if I did
write it with an OR.

>> +        d[0] = extract64(l + (h << (4 * oprsz)), 0, 8 * oprsz);
> 
> This looks like it's using addition for logical OR again ?

Yes.  Although this time I admit it'll never produce an LEA.

>> +        /* For VL which is not a power of 2, the results from M do not
>> +           align nicely with the uint64_t for D.  Put the aligned results
>> +           from M into TMP_M and then copy it into place afterward.  */
> 
> How much risu testing did you do of funny vector lengths ?

As much as I can with the unlicensed Foundation Platform: all lengths from 1-4.

Which, unfortunately does leave a few multi-word predicate paths untested, but
many of the routines loop identically within this length and beyond.


>> +static const uint64_t even_bit_esz_masks[4] = {
>> +    0x5555555555555555ull,
>> +    0x3333333333333333ull,
>> +    0x0f0f0f0f0f0f0f0full,
>> +    0x00ff00ff00ff00ffull
>> +};
> 
> Comment describing the purpose of these numbers would be useful.

Ack.


r~

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 31/67] target/arm: Implement SVE conditionally broadcast/extract element
  2018-02-23 15:44   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
@ 2018-02-23 20:15     ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-23 20:15 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 02/23/2018 07:44 AM, Peter Maydell wrote:
>> +/* Similar to the ARM LastActiveElement pseudocode function, except the
>> +   result is multiplied by the element size.  This includes the not found
>> +   indication; e.g. not found for esz=3 is -8.  */
>> +int32_t HELPER(sve_last_active_element)(void *vg, uint32_t pred_desc)
>> +{
>> +    intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
>> +    intptr_t esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2);
> 
> pred_desc is obviously an encoding of some stuff, so the comment would
> be a good place to mention what it is.

Yeah, and I've also just noticed I'm not totally consistent about it.
I probably want to re-think how some of this is done.


>> +/* Compute CLAST for a scalar.  */
>> +static void do_clast_scalar(DisasContext *s, int esz, int pg, int rm,
>> +                            bool before, TCGv_i64 reg_val)
>> +{
>> +    TCGv_i32 last = tcg_temp_new_i32();
>> +    TCGv_i64 ele, cmp, zero;
>> +
>> +    find_last_active(s, last, esz, pg);
>> +
>> +    /* Extend the original value of last prior to incrementing.  */
>> +    cmp = tcg_temp_new_i64();
>> +    tcg_gen_ext_i32_i64(cmp, last);
>> +
>> +    if (!before) {
>> +        incr_last_active(s, last, esz);
>> +    }
>> +
>> +    /* The conceit here is that while last < 0 indicates not found, after
>> +       adjusting for cpu_env->vfp.zregs[rm], it is still a valid address
>> +       from which we can load garbage.  We then discard the garbage with
>> +       a conditional move.  */
> 
> That's a bit ugly. Can we at least do a compile time assert that the
> worst case (which I guess is offset of zregs[0] minus largest-element-size)
> is still positive ? That way if for some reason we reshuffle fields
> in CPUARMState we'll notice if it's going to fall off the beginning
> of the struct.

I suppose so.  Though as commented above find_last_active, the minimal value is
-8.  I feel fairly confident that zregs[0] will never be shuffled to the
absolute start of the structure.


r~

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 33/67] target/arm: Implement SVE reverse within elements
  2018-02-23 15:50   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
@ 2018-02-23 20:21     ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-23 20:21 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 02/23/2018 07:50 AM, Peter Maydell wrote:
> On 17 February 2018 at 18:22, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  target/arm/helper-sve.h    | 14 ++++++++++++++
>>  target/arm/sve_helper.c    | 41 ++++++++++++++++++++++++++++++++++-------
>>  target/arm/translate-sve.c | 38 ++++++++++++++++++++++++++++++++++++++
>>  target/arm/sve.decode      |  7 +++++++
>>  4 files changed, 93 insertions(+), 7 deletions(-)
> 
>> +/* Swap 16-bit words within a 32-bit word.  */
>> +static inline uint32_t hswap32(uint32_t h)
>> +{
>> +    return rol32(h, 16);
>> +}
>> +
>> +/* Swap 16-bit words within a 64-bit word.  */
>> +static inline uint64_t hswap64(uint64_t h)
>> +{
>> +    uint64_t m = 0x0000ffff0000ffffull;
>> +    h = rol64(h, 32);
>> +    return ((h & m) << 16) | ((h >> 16) & m);
>> +}
>> +
>> +/* Swap 32-bit words within a 64-bit word.  */
>> +static inline uint64_t wswap64(uint64_t h)
>> +{
>> +    return rol64(h, 32);
>> +}
>> +
> 
> Were there cases in earlier patches that could have used these? I forget.

No, the earlier patches dealt with bits not bytes.

> I guess they're not useful enough to be worth putting in bswap.h.

Probably not.

>> -static inline uint64_t hswap64(uint64_t h)
>> -{
>> -    uint64_t m = 0x0000ffff0000ffffull;
>> -    h = rol64(h, 32);
>> -    return ((h & m) << 16) | ((h >> 16) & m);
>> -}
>> -
> 
> Better to put the function in the right place to start with.

Oops, yes.


r~

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 36/67] target/arm: Implement SVE Integer Compare - Vectors Group
  2018-02-23 16:29   ` Peter Maydell
@ 2018-02-23 20:57     ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-23 20:57 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 02/23/2018 08:29 AM, Peter Maydell wrote:
> On 17 February 2018 at 18:22, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
> 
>> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
>> index 86cd792cdf..ae433861f8 100644
>> --- a/target/arm/sve_helper.c
>> +++ b/target/arm/sve_helper.c
>> @@ -46,14 +46,14 @@
>>   *
>>   * The return value has bit 31 set if N is set, bit 1 set if Z is clear,
>>   * and bit 0 set if C is set.
>> - *
>> - * This is an iterative function, called for each Pd and Pg word
>> - * moving forward.
>>   */
>>
>>  /* For no G bits set, NZCV = C.  */
>>  #define PREDTEST_INIT  1
>>
>> +/* This is an iterative function, called for each Pd and Pg word
>> + * moving forward.
>> + */
> 
> Why move this comment?

Meant to fold this to the first.  But moving so that I can separately document...

>> +/* This is an iterative function, called for each Pd and Pg word
>> + * moving backward.
>> + */
>> +static uint32_t iter_predtest_bwd(uint64_t d, uint64_t g, uint32_t flags)

... this.

>> +    do {                                                                     \
>> +        uint64_t out = 0, pg;                                                \
>> +        do {                                                                 \
>> +            i -= sizeof(TYPE), out <<= sizeof(TYPE);                         \
>> +            TYPE nn = *(TYPE *)(vn + H(i));                                  \
>> +            TYPE mm = *(TYPE *)(vm + H(i));                                  \
>> +            out |= nn OP mm;                                                 \
>> +        } while (i & 63);                                                    \
>> +        pg = *(uint64_t *)(vg + (i >> 3)) & MASK;                            \
>> +        out &= pg;                                                           \
>> +        *(uint64_t *)(vd + (i >> 3)) = out;                                  \
>> +        flags = iter_predtest_bwd(out, pg, flags);                           \
>> +    } while (i > 0);                                                         \
>> +    return flags;                                                            \
>> +}
> 
> Why do we iterate backwards through the vector? As far as I can
> see the pseudocode iterates forwards, and I don't think it
> makes a difference to the result which way we go.

You're right, it does not make a difference to the result which way we iterate.

Of the several different ways I've written loops over predicates, this is my
favorite.  It has several points in its favor:

  1) Operate on full uint64_t predicate units instead
     of uint8_t or uint16_t sub-units.  This means

     1a) No big-endian adjustment required,
     1b) Fewer memory loads.

  2) No separate loop tail; it is shared with the main loop body.

  3) A sub-point specific to predicate output, but the main loop
     gets to run un-predicated.  Here the governing predicate is
     applied at the end: out &= pg.


r~

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 38/67] target/arm: Implement SVE Partition Break Group
  2018-02-23 16:41   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
@ 2018-02-23 20:59     ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-23 20:59 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 02/23/2018 08:41 AM, Peter Maydell wrote:
> On 17 February 2018 at 18:22, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  target/arm/helper-sve.h    |  18 ++++
>>  target/arm/sve_helper.c    | 247 +++++++++++++++++++++++++++++++++++++++++++++
>>  target/arm/translate-sve.c |  96 ++++++++++++++++++
>>  target/arm/sve.decode      |  19 ++++
>>  4 files changed, 380 insertions(+)
> 
> 
>> +        b = g & n;            /* guard true, pred true*/
> 
> missing space before */
> 
>> +/* Given a computation function, compute a merging BRK.  */
>> +static void compute_brk_m(uint64_t *d, uint64_t *n, uint64_t *g,
>> +                          intptr_t oprsz, bool after)
> 
> Comment says "given a computation function" but the prototype
> doesn't take a function as parameter ?

Whoops, old comment.  FWIW, it did at one point.


r~

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 40/67] target/arm: Implement SVE Integer Compare - Scalars Group
  2018-02-23 17:00   ` Peter Maydell
@ 2018-02-23 21:06     ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-23 21:06 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 02/23/2018 09:00 AM, Peter Maydell wrote:
>> +
>> +uint32_t HELPER(sve_while)(void *vd, uint32_t count, uint32_t pred_desc)
> 
> This could really use a comment about what part of the overall
> instruction it's doing.

Ok.

>> +
>> +    /* For the helper, compress the different conditions into a computation
>> +     * of how many iterations for which the condition is true.
>> +     *
>> +     * This is slightly complicated by 0 <= UINT64_MAX, which is nominally
>> +     * 2**64 iterations, overflowing to 0.  Of course, predicate registers
>> +     * aren't that large, so any value >= predicate size is sufficient.
>> +     */
...

> I got confused by this -- it is too far different from what the
> pseudocode is doing. Could we have more explanatory comments, please?

Ok.  I guess the comment above wasn't as helpful as I imagined.  I'll come up
with something for the next round.


r~

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 43/67] target/arm: Implement SVE Floating Point Arithmetic - Unpredicated Group
  2018-02-23 17:25   ` Peter Maydell
@ 2018-02-23 21:15     ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-23 21:15 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 02/23/2018 09:25 AM, Peter Maydell wrote:
> On 17 February 2018 at 18:22, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  target/arm/helper-sve.h    | 14 +++++++
>>  target/arm/helper.h        | 19 ++++++++++
>>  target/arm/translate-sve.c | 41 ++++++++++++++++++++
>>  target/arm/vec_helper.c    | 94 ++++++++++++++++++++++++++++++++++++++++++++++
>>  target/arm/Makefile.objs   |  2 +-
>>  target/arm/sve.decode      | 10 +++++
>>  6 files changed, 179 insertions(+), 1 deletion(-)
>>  create mode 100644 target/arm/vec_helper.c
>>
> 
>> +/* Floating-point trigonometric starting value.
>> + * See the ARM ARM pseudocode function FPTrigSMul.
>> + */
>> +static float16 float16_ftsmul(float16 op1, uint16_t op2, float_status *stat)
>> +{
>> +    float16 result = float16_mul(op1, op1, stat);
>> +    if (!float16_is_any_nan(result)) {
>> +        result = float16_set_sign(result, op2 & 1);
>> +    }
>> +    return result;
>> +}
>> +
>> +static float32 float32_ftsmul(float32 op1, uint32_t op2, float_status *stat)
>> +{
>> +    float32 result = float32_mul(op1, op1, stat);
>> +    if (!float32_is_any_nan(result)) {
>> +        result = float32_set_sign(result, op2 & 1);
>> +    }
>> +    return result;
>> +}
>> +
>> +static float64 float64_ftsmul(float64 op1, uint64_t op2, float_status *stat)
>> +{
>> +    float64 result = float64_mul(op1, op1, stat);
>> +    if (!float64_is_any_nan(result)) {
>> +        result = float64_set_sign(result, op2 & 1);
>> +    }
>> +    return result;
>> +}
>> +
>> +#define DO_3OP(NAME, FUNC, TYPE) \
>> +void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
>> +{                                                                          \
>> +    intptr_t i, oprsz = simd_oprsz(desc);                                  \
>> +    TYPE *d = vd, *n = vn, *m = vm;                                        \
>> +    for (i = 0; i < oprsz / sizeof(TYPE); i++) {                           \
>> +        d[i] = FUNC(n[i], m[i], stat);                                     \
>> +    }                                                                      \
>> +}
>> +
>> +DO_3OP(gvec_fadd_h, float16_add, float16)
>> +DO_3OP(gvec_fadd_s, float32_add, float32)
>> +DO_3OP(gvec_fadd_d, float64_add, float64)
>> +
>> +DO_3OP(gvec_fsub_h, float16_sub, float16)
>> +DO_3OP(gvec_fsub_s, float32_sub, float32)
>> +DO_3OP(gvec_fsub_d, float64_sub, float64)
>> +
>> +DO_3OP(gvec_fmul_h, float16_mul, float16)
>> +DO_3OP(gvec_fmul_s, float32_mul, float32)
>> +DO_3OP(gvec_fmul_d, float64_mul, float64)
>> +
>> +DO_3OP(gvec_ftsmul_h, float16_ftsmul, float16)
>> +DO_3OP(gvec_ftsmul_s, float32_ftsmul, float32)
>> +DO_3OP(gvec_ftsmul_d, float64_ftsmul, float64)
>> +
>> +#ifdef TARGET_AARCH64
> 
> This seems a bit odd given SVE is AArch64-only anyway...

Ah right.

The thing to notice here is that the helpers have been placed such that the
helpers can be shared with AA32 and AA64 AdvSIMD.  One call to one of these
would replace the 2-8 calls that we currently generate for such an operation.

I thought it better to plan ahead for that cleanup as opposed to moving them later.

Here you see where AA64 differs from AA32 (and in particular where the scalar
operation is also conditionalized).


r~

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 44/67] target/arm: Implement SVE Memory Contiguous Load Group
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 44/67] target/arm: Implement SVE Memory Contiguous Load Group Richard Henderson
@ 2018-02-27 12:16   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 12:16 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Commit message should mention significant missing things
like first-fault/non-fault handling. (In general I would prefer
not to see so many patches which all have one-liner commit
messages.)

> +static void trans_LDFF1_zprr(DisasContext *s, arg_rprr_load *a, uint32_t insn)
> +{
> +    /* FIXME */
> +    trans_LD_zprr(s, a, insn);
> +}
> +
> +static void trans_LDNF1_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
> +{
> +    /* FIXME */
> +    trans_LD_zpri(s, a, insn);
> +}

What are these FIXMEs for? Either they should be fixed, or expanded
into longer comments describing what needs fixing. I assume it is
the missing non-fault/first-fault behaviour...

> diff --git a/target/arm/sve.decode b/target/arm/sve.decode
> index 42d14994a1..d2b3869c58 100644
> --- a/target/arm/sve.decode
> +++ b/target/arm/sve.decode
> @@ -42,9 +42,12 @@
>  %tszimm16_shl  22:2 16:5 !function=tszimm_shl
>
>  # Signed 8-bit immediate, optionally shifted left by 8.
> -%sh8_i8s               5:9 !function=expand_imm_sh8s
> +%sh8_i8s       5:9 !function=expand_imm_sh8s
>  # Unsigned 8-bit immediate, optionally shifted left by 8.
> -%sh8_i8u               5:9 !function=expand_imm_sh8u
> +%sh8_i8u       5:9 !function=expand_imm_sh8u


More changes that should be squashed into earlier patch.


otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 45/67] target/arm: Implement SVE Memory Contiguous Store Group
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 45/67] target/arm: Implement SVE Memory Contiguous Store Group Richard Henderson
@ 2018-02-27 13:22   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 13:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  29 +++++++
>  target/arm/sve_helper.c    | 211 +++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c |  68 ++++++++++++++-
>  target/arm/sve.decode      |  38 ++++++++
>  4 files changed, 343 insertions(+), 3 deletions(-)

> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index aa8bfd2ae7..fda9a56fd5 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -3320,7 +3320,6 @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
>
>      tcg_temp_free_ptr(t_pg);
>      tcg_temp_free_i32(desc);
> -    tcg_temp_free_i64(addr);
>  }
>
>  static void do_ld_zpa(DisasContext *s, int zt, int pg,
> @@ -3368,7 +3367,7 @@ static void trans_LD_zprr(DisasContext *s, arg_rprr_load *a, uint32_t insn)
>          return;
>      }
>
> -    addr = tcg_temp_new_i64();
> +    addr = new_tmp_a64(s);
>      tcg_gen_muli_i64(addr, cpu_reg(s, a->rm),
>                       (a->nreg + 1) << dtype_msz(a->dtype));
>      tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
> @@ -3379,7 +3378,7 @@ static void trans_LD_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
>  {
>      unsigned vsz = vec_full_reg_size(s);
>      unsigned elements = vsz >> dtype_esz[a->dtype];
> -    TCGv_i64 addr = tcg_temp_new_i64();
> +    TCGv_i64 addr = new_tmp_a64(s);
>
>      tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn),
>                       (a->imm * elements * (a->nreg + 1))

These changes to the load functions look like they should have been
in the previous patch ?

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 46/67] target/arm: Implement SVE load and broadcast quadword
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 46/67] target/arm: Implement SVE load and broadcast quadword Richard Henderson
@ 2018-02-27 13:36   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 13:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-sve.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      |  9 ++++++++
>  2 files changed, 60 insertions(+)

> +static void trans_LD1RQ_zprr(DisasContext *s, arg_rprr_load *a, uint32_t insn)
> +{
> +    TCGv_i64 addr;
> +    int msz = dtype_msz(a->dtype);
> +
> +    if (a->rm == 31) {
> +        unallocated_encoding(s);
> +        return;
> +    }
> +
> +    addr = new_tmp_a64(s);
> +    tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), msz);
> +    tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
> +    do_ldrq(s, a->rd, a->pg, addr, msz);
> +}
> +
> +static void trans_LD1RQ_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
> +{
> +    TCGv_i64 addr = new_tmp_a64(s);
> +
> +    tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), a->imm * 16);

It confused me initially here that the calculation of the offset
for the +immediate and the +scalar cases isn't the same, but that
is indeed what the architecture does. Maybe
       /* Unlike LD1RQ_zprr, offset scaling is constant rather
        * than based on msz.
        */
?

> +    do_ldrq(s, a->rd, a->pg, addr, dtype_msz(a->dtype));
> +}
>

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 47/67] target/arm: Implement SVE integer convert to floating-point
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 47/67] target/arm: Implement SVE integer convert to floating-point Richard Henderson
@ 2018-02-27 13:47   ` Peter Maydell
  2018-02-27 13:51   ` Peter Maydell
  1 sibling, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 13:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 30 +++++++++++++++
>  target/arm/sve_helper.c    | 52 ++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      | 22 +++++++++++
>  4 files changed, 196 insertions(+)
>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 48/67] target/arm: Implement SVE floating-point arithmetic (predicated)
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 48/67] target/arm: Implement SVE floating-point arithmetic (predicated) Richard Henderson
@ 2018-02-27 13:50   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 13:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  77 ++++++++++++++++++++++++++++++++
>  target/arm/sve_helper.c    | 107 +++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c |  47 ++++++++++++++++++++
>  target/arm/sve.decode      |  17 +++++++
>  4 files changed, 248 insertions(+)

> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index a1e0ceb5fb..d80babfae7 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -2789,6 +2789,113 @@ uint32_t HELPER(sve_while)(void *vd, uint32_t count, uint32_t pred_desc)
>      return predtest_ones(d, oprsz, esz_mask);
>  }
>
> +/* Fully general three-operand expander, controlled by a predicate,
> + * With the extra float_status parameter.

lower-case "w"

> @@ -3181,6 +3227,7 @@ static void do_zpz_ptr(DisasContext *s, int rd, int rn, int pg,
>                         vec_full_reg_offset(s, rn),
>                         pred_full_reg_offset(s, pg),
>                         status, vsz, vsz, 0, fn);
> +    tcg_temp_free_ptr(status);
>  }

Shouldn't this be squashed into an earlier patch?

otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 47/67] target/arm: Implement SVE integer convert to floating-point
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 47/67] target/arm: Implement SVE integer convert to floating-point Richard Henderson
  2018-02-27 13:47   ` Peter Maydell
@ 2018-02-27 13:51   ` Peter Maydell
  1 sibling, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 13:51 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---


> +/*
> + *** SVE Floating Point Unary Operations Prediated Group
> + */

Just noticed the typo: should be "Predicated".

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 49/67] target/arm: Implement SVE FP Multiply-Add Group
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 49/67] target/arm: Implement SVE FP Multiply-Add Group Richard Henderson
@ 2018-02-27 13:54   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 13:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 16 ++++++++++++++
>  target/arm/sve_helper.c    | 53 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 41 +++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      | 17 +++++++++++++++
>  4 files changed, 127 insertions(+)
>


Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 50/67] target/arm: Implement SVE Floating Point Accumulating Reduction Group
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 50/67] target/arm: Implement SVE Floating Point Accumulating Reduction Group Richard Henderson
@ 2018-02-27 13:59   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 13:59 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  7 ++++++
>  target/arm/sve_helper.c    | 56 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 42 ++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      |  5 +++++
>  4 files changed, 110 insertions(+)
>


Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 51/67] target/arm: Implement SVE load and broadcast element
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 51/67] target/arm: Implement SVE load and broadcast element Richard Henderson
@ 2018-02-27 14:15   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 14:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  5 +++++
>  target/arm/sve_helper.c    | 43 ++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 55 +++++++++++++++++++++++++++++++++++++++++++++-
>  target/arm/sve.decode      |  5 +++++
>  4 files changed, 107 insertions(+), 1 deletion(-)
>

>
> +/* Load and broadcast element.  */
> +static void trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
> +{
> +    unsigned vsz = vec_full_reg_size(s);
> +    unsigned psz = pred_full_reg_size(s);
> +    unsigned esz = dtype_esz[a->dtype];
> +    TCGLabel *over = gen_new_label();
> +    TCGv_i64 temp;
> +
> +    /* If the guarding predicate has no bits set, no load occurs.  */
> +    if (psz <= 8) {
> +        temp = tcg_temp_new_i64();
> +        tcg_gen_ld_i64(temp, cpu_env, pred_full_reg_offset(s, a->pg));
> +        tcg_gen_andi_i64(temp, temp,
> +                         deposit64(0, 0, psz * 8, pred_esz_masks[esz]));
> +        tcg_gen_brcondi_i64(TCG_COND_EQ, temp, 0, over);
> +        tcg_temp_free_i64(temp);
> +    } else {
> +        TCGv_i32 t32 = tcg_temp_new_i32();
> +        find_last_active(s, t32, esz, a->pg);
> +        tcg_gen_brcondi_i32(TCG_COND_LT, t32, 0, over);
> +        tcg_temp_free_i32(t32);
> +    }
> +
> +    /* Load the data.  */
> +    temp = tcg_temp_new_i64();
> +    tcg_gen_addi_i64(temp, cpu_reg_sp(s, a->rn), a->imm);

Isn't the immediate offset supposed to be scaled by mbytes ?

> +    tcg_gen_qemu_ld_i64(temp, temp, get_mem_index(s),
> +                        s->be_data | dtype_mop[a->dtype]);
> +
> +    /* Broadcast to *all* elements.  */
> +    tcg_gen_gvec_dup_i64(esz, vec_full_reg_offset(s, a->rd),
> +                         vsz, vsz, temp);
> +    tcg_temp_free_i64(temp);
> +
> +    /* Zero the inactive elements.  */
> +    gen_set_label(over);
> +    do_clr_inactive_zp(s, a->rd, a->pg, esz);
> +}
> +

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 52/67] target/arm: Implement SVE store vector/predicate register
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 52/67] target/arm: Implement SVE store vector/predicate register Richard Henderson
@ 2018-02-27 14:21   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 14:21 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-sve.c | 101 +++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      |   6 +++
>  2 files changed, 107 insertions(+)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index b000a2482e..9c724980a0 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -3501,6 +3501,95 @@ static void do_ldr(DisasContext *s, uint32_t vofs, uint32_t len,
>      tcg_temp_free_i64(t0);
>  }
>
> +/* Similarly for stores.  */
> +static void do_str(DisasContext *s, uint32_t vofs, uint32_t len,
> +                   int rn, int imm)
> +{
> +    uint32_t len_align = QEMU_ALIGN_DOWN(len, 8);
> +    uint32_t len_remain = len % 8;
> +    uint32_t nparts = len / 8 + ctpop8(len_remain);
> +    int midx = get_mem_index(s);
> +    TCGv_i64 addr, t0;
> +
> +    addr = tcg_temp_new_i64();
> +    t0 = tcg_temp_new_i64();
> +
> +    /* Note that unpredicated load/store of vector/predicate registers
> +     * are defined as a stream of bytes, which equates to little-endian
> +     * operations on larger quantities.  There is no nice way to force
> +     * a little-endian load for aarch64_be-linux-user out of line.

"store" in this case, I assume.

> +     *
> +     * Attempt to keep code expansion to a minimum by limiting the
> +     * amount of unrolling done.
> +     */
> +    if (nparts <= 4) {
> +        int i;
> +
> +        for (i = 0; i < len_align; i += 8) {
> +            tcg_gen_ld_i64(t0, cpu_env, vofs + i);
> +            tcg_gen_addi_i64(addr, cpu_reg_sp(s, rn), imm + i);
> +            tcg_gen_qemu_st_i64(t0, addr, midx, MO_LEQ);
> +        }
> +    } else {
> +        TCGLabel *loop = gen_new_label();
> +        TCGv_ptr i = TCGV_NAT_TO_PTR(glue(tcg_const_local_, ptr)(0));
> +        TCGv_ptr src;
> +
> +        gen_set_label(loop);
> +
> +        src = tcg_temp_new_ptr();
> +        tcg_gen_add_ptr(src, cpu_env, i);
> +        tcg_gen_ld_i64(t0, src, vofs);
> +
> +        /* Minimize the number of local temps that must be re-read from
> +         * the stack each iteration.  Instead, re-compute values other
> +         * than the loop counter.
> +         */
> +        tcg_gen_addi_ptr(src, i, imm);
> +#if UINTPTR_MAX == UINT32_MAX
> +        tcg_gen_extu_i32_i64(addr, TCGV_PTR_TO_NAT(src));
> +        tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, rn));
> +#else
> +        tcg_gen_add_i64(addr, TCGV_PTR_TO_NAT(src), cpu_reg_sp(s, rn));
> +#endif

We should be able to avoid the ifdef with more support for tcg_*_ptr
ops (similar to an earlier patch in the series).

> +        tcg_temp_free_ptr(src);
> +
> +        tcg_gen_qemu_st_i64(t0, addr, midx, MO_LEQ);
> +
> +        tcg_gen_addi_ptr(i, i, 8);
> +
> +        glue(tcg_gen_brcondi_, ptr)(TCG_COND_LTU, TCGV_PTR_TO_NAT(i),
> +                                   len_align, loop);
> +        tcg_temp_free_ptr(i);
> +    }
> +
> +    /* Predicate register stores can be any multiple of 2.  */
> +    if (len_remain) {
> +        tcg_gen_ld_i64(t0, cpu_env, vofs + len_align);
> +        tcg_gen_addi_i64(addr, cpu_reg_sp(s, rn), imm + len_align);
> +
> +        switch (len_remain) {
> +        case 2:
> +        case 4:
> +        case 8:
> +            tcg_gen_qemu_st_i64(t0, addr, midx, MO_LE | ctz32(len_remain));
> +            break;
> +
> +        case 6:
> +            tcg_gen_qemu_st_i64(t0, addr, midx, MO_LEUL);
> +            tcg_gen_addi_i64(addr, addr, 4);
> +            tcg_gen_shri_i64(addr, addr, 32);
> +            tcg_gen_qemu_st_i64(t0, addr, midx, MO_LEUW);
> +            break;
> +
> +        default:
> +            g_assert_not_reached();
> +        }
> +    }
> +    tcg_temp_free_i64(addr);
> +    tcg_temp_free_i64(t0);
> +}
> +
>  #undef ptr

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 53/67] target/arm: Implement SVE scatter stores
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 53/67] target/arm: Implement SVE scatter stores Richard Henderson
@ 2018-02-27 14:36   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 14:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 41 ++++++++++++++++++++++++++
>  target/arm/sve_helper.c    | 62 ++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      | 39 +++++++++++++++++++++++++
>  4 files changed, 213 insertions(+)


> diff --git a/target/arm/sve.decode b/target/arm/sve.decode
> index 5d8e1481d7..edd9340c02 100644
> --- a/target/arm/sve.decode
> +++ b/target/arm/sve.decode
> @@ -81,6 +81,7 @@
>  &rpri_load     rd pg rn imm dtype nreg
>  &rprr_store    rd pg rn rm msz esz nreg
>  &rpri_store    rd pg rn imm msz esz nreg
> +&rprr_scatter_store    rd pg rn rm esz msz xs scale
>
>  ###########################################################################
>  # Named instruction formats.  These are generally used to
> @@ -199,6 +200,8 @@
>  @rpri_store_msz     ....... msz:2 .. . imm:s4 ... pg:3 rn:5 rd:5    &rpri_store
>  @rprr_store_esz_n0  ....... ..    esz:2  rm:5 ... pg:3 rn:5 rd:5 \
>                     &rprr_store nreg=0
> +@rprr_scatter_store ....... msz:2 ..     rm:5 ... pg:3 rn:5 rd:5 \
> +                   &rprr_scatter_store
>
>  ###########################################################################
>  # Instruction patterns.  Grouped according to the SVE encodingindex.xhtml.
> @@ -832,3 +835,39 @@ ST_zpri            1110010 .. nreg:2 1.... 111 ... ..... ..... \
>  # SVE store multiple structures (scalar plus scalar)         (nreg != 0)
>  ST_zprr                1110010 msz:2 nreg:2 ..... 011 ... ..... ..... \
>                 @rprr_store esz=%size_23
> +
> +# SVE 32-bit scatter store (scalar plus 32-bit scaled offsets)
> +# Require msz > 0 && msz <= esz.
> +ST1_zprz       1110010 .. 11 ..... 100 ... ..... ..... \
> +               @rprr_scatter_store xs=0 esz=2 scale=1
> +ST1_zprz       1110010 .. 11 ..... 110 ... ..... ..... \
> +               @rprr_scatter_store xs=1 esz=2 scale=1
> +
> +# SVE 32-bit scatter store (scalar plus 32-bit unscaled offsets)
> +# Require msz <= esz.
> +ST1_zprz       1110010 .. 10 ..... 100 ... ..... ..... \
> +               @rprr_scatter_store xs=0 esz=2 scale=0
> +ST1_zprz       1110010 .. 10 ..... 110 ... ..... ..... \
> +               @rprr_scatter_store xs=1 esz=2 scale=0
> +
> +# SVE 64-bit scatter store (scalar plus 64-bit scaled offset)
> +# Require msz > 0
> +ST1_zprz       1110010 .. 01 ..... 101 ... ..... ..... \
> +               @rprr_scatter_store xs=2 esz=3 scale=1
> +
> +# SVE 64-bit scatter store (scalar plus 64-bit unscaled offset)
> +ST1_zprz       1110010 .. 00 ..... 101 ... ..... ..... \
> +               @rprr_scatter_store xs=2 esz=3 scale=0
> +
> +# SVE 64-bit scatter store (scalar plus unpacked 32-bit scaled offset)
> +# Require msz > 0
> +ST1_zprz       1110010 .. 01 ..... 100 ... ..... ..... \
> +               @rprr_scatter_store xs=0 esz=3 scale=1
> +ST1_zprz       1110010 .. 01 ..... 110 ... ..... ..... \
> +               @rprr_scatter_store xs=1 esz=3 scale=1
> +
> +# SVE 64-bit scatter store (scalar plus unpacked 32-bit unscaled offset)
> +ST1_zprz       1110010 .. 00 ..... 100 ... ..... ..... \
> +               @rprr_scatter_store xs=0 esz=3 scale=0
> +ST1_zprz       1110010 .. 00 ..... 110 ... ..... ..... \
> +               @rprr_scatter_store xs=1 esz=3 scale=0


Could you write all these with the 'scale=n' part picked up from bit 21,
rather than one pattern for scale=0 and one for scale=1 ?

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 54/67] target/arm: Implement SVE prefetches
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 54/67] target/arm: Implement SVE prefetches Richard Henderson
@ 2018-02-27 14:43   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 14:43 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-sve.c |  9 +++++++++
>  target/arm/sve.decode      | 23 +++++++++++++++++++++++
>  2 files changed, 32 insertions(+)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index ca49b94924..63c7a0e8d8 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -3958,3 +3958,12 @@ static void trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn)
>      do_mem_zpz(s, a->rd, a->pg, a->rm, a->scale * a->msz,
>                 cpu_reg_sp(s, a->rn), fn);
>  }
> +
> +/*
> + * Prefetches
> + */
> +
> +static void trans_PRF(DisasContext *s, arg_PRF *a, uint32_t insn)
> +{
> +    /* Prefetch is a nop within QEMU.  */
> +}
> diff --git a/target/arm/sve.decode b/target/arm/sve.decode
> index edd9340c02..f0144aa2d0 100644
> --- a/target/arm/sve.decode
> +++ b/target/arm/sve.decode
> @@ -801,6 +801,29 @@ LD1RQ_zprr 1010010 .. 00 ..... 000 ... ..... ..... \
>  LD1RQ_zpri     1010010 .. 00 0.... 001 ... ..... ..... \
>                 @rpri_load_msz nreg=0
>
> +# SVE 32-bit gather prefetch (scalar plus 32-bit scaled offsets)
> +PRF            1000010 00 -1 ----- 0-- --- ----- 0 ----
> +
> +# SVE 32-bit gather prefetch (vector plus immediate)
> +PRF            1000010 -- 00 ----- 111 --- ----- 0 ----
> +
> +# SVE contiguous prefetch (scalar plus immediate)
> +PRF            1000010 11 1- ----- 0-- --- ----- 0 ----
> +
> +# SVE contiguous prefetch (scalar plus scalar)
> +PRF            1000010 -- 00 ----- 110 --- ----- 0 ----

This one needs something slightly more complicated, because
Rm == 11111 has to be UnallocatedEncoding.

I checked the others and they don't have any unallocated cases
lurking in their decode pseudocode.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 55/67] target/arm: Implement SVE gather loads
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 55/67] target/arm: Implement SVE gather loads Richard Henderson
@ 2018-02-27 14:53   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 14:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 67 ++++++++++++++++++++++++++++++++
>  target/arm/sve_helper.c    | 75 +++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      | 53 +++++++++++++++++++++++++
>  4 files changed, 292 insertions(+)
>


Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 56/67] target/arm: Implement SVE scatter store vector immediate
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 56/67] target/arm: Implement SVE scatter store vector immediate Richard Henderson
@ 2018-02-27 15:02   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 15:02 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-sve.c | 79 +++++++++++++++++++++++++++++++---------------
>  target/arm/sve.decode      | 11 +++++++
>  2 files changed, 65 insertions(+), 25 deletions(-)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index 6484ecd257..0241e8e707 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -4011,31 +4011,33 @@ static void trans_LD1_zpiz(DisasContext *s, arg_LD1_zpiz *a, uint32_t insn)
>      tcg_temp_free_i64(imm);
>  }
>
> +/* Indexed by [xs][msz].  */
> +static gen_helper_gvec_mem_scatter * const scatter_store_fn32[2][3] = {
> +    { gen_helper_sve_stbs_zsu,
> +      gen_helper_sve_sths_zsu,
> +      gen_helper_sve_stss_zsu, },
> +    { gen_helper_sve_stbs_zss,
> +      gen_helper_sve_sths_zss,
> +      gen_helper_sve_stss_zss, },
> +};
> +
> +static gen_helper_gvec_mem_scatter * const scatter_store_fn64[3][4] = {
> +    { gen_helper_sve_stbd_zsu,
> +      gen_helper_sve_sthd_zsu,
> +      gen_helper_sve_stsd_zsu,
> +      gen_helper_sve_stdd_zsu, },
> +    { gen_helper_sve_stbd_zss,
> +      gen_helper_sve_sthd_zss,
> +      gen_helper_sve_stsd_zss,
> +      gen_helper_sve_stdd_zss, },
> +    { gen_helper_sve_stbd_zd,
> +      gen_helper_sve_sthd_zd,
> +      gen_helper_sve_stsd_zd,
> +      gen_helper_sve_stdd_zd, },
> +};
> +
>  static void trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn)
>  {
> -    /* Indexed by [xs][msz].  */
> -    static gen_helper_gvec_mem_scatter * const fn32[2][3] = {
> -        { gen_helper_sve_stbs_zsu,
> -          gen_helper_sve_sths_zsu,
> -          gen_helper_sve_stss_zsu, },
> -        { gen_helper_sve_stbs_zss,
> -          gen_helper_sve_sths_zss,
> -          gen_helper_sve_stss_zss, },
> -    };
> -    static gen_helper_gvec_mem_scatter * const fn64[3][4] = {
> -        { gen_helper_sve_stbd_zsu,
> -          gen_helper_sve_sthd_zsu,
> -          gen_helper_sve_stsd_zsu,
> -          gen_helper_sve_stdd_zsu, },
> -        { gen_helper_sve_stbd_zss,
> -          gen_helper_sve_sthd_zss,
> -          gen_helper_sve_stsd_zss,
> -          gen_helper_sve_stdd_zss, },
> -        { gen_helper_sve_stbd_zd,
> -          gen_helper_sve_sthd_zd,
> -          gen_helper_sve_stsd_zd,
> -          gen_helper_sve_stdd_zd, },
> -    };
>      gen_helper_gvec_mem_scatter *fn;
>
>      if (a->esz < a->msz || (a->msz == 0 && a->scale)) {
> @@ -4044,10 +4046,10 @@ static void trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn)
>      }
>      switch (a->esz) {
>      case MO_32:
> -        fn = fn32[a->xs][a->msz];
> +        fn = scatter_store_fn32[a->xs][a->msz];
>          break;
>      case MO_64:
> -        fn = fn64[a->xs][a->msz];
> +        fn = scatter_store_fn64[a->xs][a->msz];
>          break;
>      default:
>          g_assert_not_reached();

These bits would be better folded into the previous patches I think.

> @@ -4056,6 +4058,33 @@ static void trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn)
>                 cpu_reg_sp(s, a->rn), fn);
>  }
>
> +static void trans_ST1_zpiz(DisasContext *s, arg_ST1_zpiz *a, uint32_t insn)
> +{
> +    gen_helper_gvec_mem_scatter *fn = NULL;
> +    TCGv_i64 imm;
> +
> +    if (a->esz < a->msz) {
> +        unallocated_encoding(s);
> +        return;
> +    }
> +
> +    switch (a->esz) {
> +    case MO_32:
> +        fn = scatter_store_fn32[0][a->msz];
> +        break;
> +    case MO_64:
> +        fn = scatter_store_fn64[2][a->msz];
> +        break;
> +    }
> +    assert(fn != NULL);
> +
> +    /* Treat ST1_zpiz (zn[x] + imm) the same way as ST1_zprz (rn + zm[x])
> +       by loading the immediate into the scalar parameter.  */
> +    imm = tcg_const_i64(a->imm << a->msz);
> +    do_mem_zpz(s, a->rd, a->pg, a->rn, 0, imm, fn);
> +    tcg_temp_free_i64(imm);
> +}
> +
>  /*
>   * Prefetches
>   */
> diff --git a/target/arm/sve.decode b/target/arm/sve.decode
> index f85d82e009..6ccb4289fc 100644
> --- a/target/arm/sve.decode
> +++ b/target/arm/sve.decode
> @@ -84,6 +84,7 @@
>  &rprr_gather_load      rd pg rn rm esz msz u ff xs scale
>  &rpri_gather_load      rd pg rn imm esz msz u ff
>  &rprr_scatter_store    rd pg rn rm esz msz xs scale
> +&rpri_scatter_store    rd pg rn imm esz msz
>
>  ###########################################################################
>  # Named instruction formats.  These are generally used to
> @@ -216,6 +217,8 @@
>                     &rprr_store nreg=0
>  @rprr_scatter_store ....... msz:2 ..     rm:5 ... pg:3 rn:5 rd:5 \
>                     &rprr_scatter_store
> +@rpri_scatter_store ....... msz:2 ..    imm:5 ... pg:3 rn:5 rd:5 \
> +                   &rpri_scatter_store
>
>  ###########################################################################
>  # Instruction patterns.  Grouped according to the SVE encodingindex.xhtml.
> @@ -935,6 +938,14 @@ ST1_zprz   1110010 .. 01 ..... 101 ... ..... ..... \
>  ST1_zprz       1110010 .. 00 ..... 101 ... ..... ..... \
>                 @rprr_scatter_store xs=2 esz=3 scale=0
>
> +# SVE 64-bit scatter store (vector plus immediate)
> +ST1_zpiz       1110010 .. 10 ..... 101 ... ..... ..... \
> +               @rpri_scatter_store esz=3
> +
> +# SVE 32-bit scatter store (vector plus immediate)
> +ST1_zpiz       1110010 .. 11 ..... 101 ... ..... ..... \
> +               @rpri_scatter_store esz=2
> +
>  # SVE 64-bit scatter store (scalar plus unpacked 32-bit scaled offset)
>  # Require msz > 0
>  ST1_zprz       1110010 .. 01 ..... 100 ... ..... ..... \
> --
> 2.14.3

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 57/67] target/arm: Implement SVE floating-point compare vectors
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 57/67] target/arm: Implement SVE floating-point compare vectors Richard Henderson
@ 2018-02-27 15:04   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 15:04 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 49 +++++++++++++++++++++++++++++++++++
>  target/arm/sve_helper.c    | 64 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 41 +++++++++++++++++++++++++++++
>  target/arm/sve.decode      | 11 ++++++++
>  4 files changed, 165 insertions(+)


Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 58/67] target/arm: Implement SVE floating-point arithmetic with immediate
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 58/67] target/arm: Implement SVE floating-point arithmetic with immediate Richard Henderson
@ 2018-02-27 15:11   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 15:11 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 56 +++++++++++++++++++++++++++++++++++
>  target/arm/sve_helper.c    | 68 ++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      | 14 +++++++++
>  4 files changed, 211 insertions(+)

> +#define float16_two  make_float16(0x4000)
> +#define float32_two  make_float32(0x40000000)
> +#define float64_two  make_float64(0x4000000000000000ULL)

I think Alex's softfloat series puts these in the common
header now ?

Anyway
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 59/67] target/arm: Implement SVE Floating Point Multiply Indexed Group
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 59/67] target/arm: Implement SVE Floating Point Multiply Indexed Group Richard Henderson
@ 2018-02-27 15:18   ` Peter Maydell
  2018-02-27 16:29     ` Richard Henderson
  0 siblings, 1 reply; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 15:18 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper.h        | 14 ++++++++++
>  target/arm/translate-sve.c | 44 +++++++++++++++++++++++++++++++
>  target/arm/vec_helper.c    | 64 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      | 19 ++++++++++++++
>  4 files changed, 141 insertions(+)

> diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
> index ad5c29cdd5..e711a3217d 100644
> --- a/target/arm/vec_helper.c
> +++ b/target/arm/vec_helper.c
> @@ -24,6 +24,22 @@
>  #include "fpu/softfloat.h"
>
>
> +/* Note that vector data is stored in host-endian 64-bit chunks,
> +   so addressing units smaller than that needs a host-endian fixup.  */
> +#ifdef HOST_WORDS_BIGENDIAN
> +#define H1(x)   ((x) ^ 7)
> +#define H1_2(x) ((x) ^ 6)
> +#define H1_4(x) ((x) ^ 4)
> +#define H2(x)   ((x) ^ 3)
> +#define H4(x)   ((x) ^ 1)
> +#else
> +#define H1(x)   (x)
> +#define H1_2(x) (x)
> +#define H1_4(x) (x)
> +#define H2(x)   (x)
> +#define H4(x)   (x)
> +#endif
> +

I wasn't expecting to see these macros here. Don't we have them already?

Anyway
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 60/67] target/arm: Implement SVE FP Fast Reduction Group
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 60/67] target/arm: Implement SVE FP Fast Reduction Group Richard Henderson
@ 2018-02-27 15:24   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 15:24 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 35 ++++++++++++++++++++++++++
>  target/arm/sve_helper.c    | 61 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 55 +++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      |  8 ++++++
>  4 files changed, 159 insertions(+)


Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 61/67] target/arm: Implement SVE Floating Point Unary Operations - Unpredicated Group
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 61/67] target/arm: Implement SVE Floating Point Unary Operations - Unpredicated Group Richard Henderson
@ 2018-02-27 15:28   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 15:28 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper.h        |  8 ++++++++
>  target/arm/translate-sve.c | 43 +++++++++++++++++++++++++++++++++++++++++++
>  target/arm/vec_helper.c    | 20 ++++++++++++++++++++
>  target/arm/sve.decode      |  5 +++++
>  4 files changed, 76 insertions(+)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 62/67] target/arm: Implement SVE FP Compare with Zero Group
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 62/67] target/arm: Implement SVE FP Compare with Zero Group Richard Henderson
@ 2018-02-27 15:31   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 15:31 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 42 ++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve_helper.c    | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      | 10 ++++++++++
>  4 files changed, 138 insertions(+)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 63/67] target/arm: Implement SVE floating-point trig multiply-add coefficient
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 63/67] target/arm: Implement SVE floating-point trig multiply-add coefficient Richard Henderson
@ 2018-02-27 15:34   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 15:34 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  4 +++
>  target/arm/sve_helper.c    | 70 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 26 +++++++++++++++++
>  target/arm/sve.decode      |  3 ++
>  4 files changed, 103 insertions(+)

> +/* FP Trig Multiply-Add. */
> +
> +void HELPER(sve_ftmad_h)(void *vd, void *vn, void *vm, void *vs, uint32_t desc)
> +{
> +    static const float16 coeff[16] = {
> +        0x3c00, 0xb155, 0x2030, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
> +        0x3c00, 0xb800, 0x293a, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000,
> +    };

Comment that these are constants from the pseudocode, or whatever
we agreed for the earlier patch with constant tables...

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 64/67] target/arm: Implement SVE floating-point convert precision
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 64/67] target/arm: Implement SVE floating-point convert precision Richard Henderson
@ 2018-02-27 15:35   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 15:35 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 65/67] target/arm: Implement SVE floating-point convert to integer
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 65/67] target/arm: Implement SVE floating-point convert to integer Richard Henderson
@ 2018-02-27 15:36   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 15:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 30 ++++++++++++++++++++
>  target/arm/sve_helper.c    | 16 +++++++++++
>  target/arm/translate-sve.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      | 16 +++++++++++
>  4 files changed, 132 insertions(+)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 66/67] target/arm: Implement SVE floating-point round to integral value
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 66/67] target/arm: Implement SVE floating-point round to integral value Richard Henderson
@ 2018-02-27 15:39   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 15:39 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 14 ++++++++
>  target/arm/sve_helper.c    |  8 +++++
>  target/arm/translate-sve.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      |  9 ++++++
>  4 files changed, 111 insertions(+)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 67/67] target/arm: Implement SVE floating-point unary operations
  2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 67/67] target/arm: Implement SVE floating-point unary operations Richard Henderson
@ 2018-02-27 15:40   ` Peter Maydell
  0 siblings, 0 replies; 167+ messages in thread
From: Peter Maydell @ 2018-02-27 15:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, qemu-arm

On 17 February 2018 at 18:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 14 ++++++++++++++
>  target/arm/sve_helper.c    |  8 ++++++++
>  target/arm/translate-sve.c | 28 ++++++++++++++++++++++++++++
>  target/arm/sve.decode      |  4 ++++
>  4 files changed, 54 insertions(+)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 59/67] target/arm: Implement SVE Floating Point Multiply Indexed Group
  2018-02-27 15:18   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
@ 2018-02-27 16:29     ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2018-02-27 16:29 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, qemu-arm

On 02/27/2018 07:18 AM, Peter Maydell wrote:
> On 17 February 2018 at 18:23, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  target/arm/helper.h        | 14 ++++++++++
>>  target/arm/translate-sve.c | 44 +++++++++++++++++++++++++++++++
>>  target/arm/vec_helper.c    | 64 ++++++++++++++++++++++++++++++++++++++++++++++
>>  target/arm/sve.decode      | 19 ++++++++++++++
>>  4 files changed, 141 insertions(+)
> 
>> diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
>> index ad5c29cdd5..e711a3217d 100644
>> --- a/target/arm/vec_helper.c
>> +++ b/target/arm/vec_helper.c
>> @@ -24,6 +24,22 @@
>>  #include "fpu/softfloat.h"
>>
>>
>> +/* Note that vector data is stored in host-endian 64-bit chunks,
>> +   so addressing units smaller than that needs a host-endian fixup.  */
>> +#ifdef HOST_WORDS_BIGENDIAN
>> +#define H1(x)   ((x) ^ 7)
>> +#define H1_2(x) ((x) ^ 6)
>> +#define H1_4(x) ((x) ^ 4)
>> +#define H2(x)   ((x) ^ 3)
>> +#define H4(x)   ((x) ^ 1)
>> +#else
>> +#define H1(x)   (x)
>> +#define H1_2(x) (x)
>> +#define H1_4(x) (x)
>> +#define H2(x)   (x)
>> +#define H4(x)   (x)
>> +#endif
>> +
> 
> I wasn't expecting to see these macros here. Don't we have them already?

Not in this file.  Probably I should share these in a header...


r~

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 02/67] target/arm: Introduce translate-a64.h
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 02/67] target/arm: Introduce translate-a64.h Richard Henderson
  2018-02-22 17:30   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
@ 2018-04-03  9:01   ` Alex Bennée
  1 sibling, 0 replies; 167+ messages in thread
From: Alex Bennée @ 2018-04-03  9:01 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm


Richard Henderson <richard.henderson@linaro.org> writes:

> Move some stuff that will be common to both translate-a64.c
> and translate-sve.c.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target/arm/translate-a64.h | 110 +++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-a64.c | 101 ++++++-----------------------------------
>  2 files changed, 123 insertions(+), 88 deletions(-)
>  create mode 100644 target/arm/translate-a64.h
>
> diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
> new file mode 100644
> index 0000000000..e519aee314
> --- /dev/null
> +++ b/target/arm/translate-a64.h
> @@ -0,0 +1,110 @@
> +/*
> + *  AArch64 translation, common definitions.
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef TARGET_ARM_TRANSLATE_A64_H
> +#define TARGET_ARM_TRANSLATE_A64_H
> +
> +void unallocated_encoding(DisasContext *s);
> +
> +#define unsupported_encoding(s, insn)                                    \
> +    do {                                                                 \
> +        qemu_log_mask(LOG_UNIMP,                                         \
> +                      "%s:%d: unsupported instruction encoding 0x%08x "  \
> +                      "at pc=%016" PRIx64 "\n",                          \
> +                      __FILE__, __LINE__, insn, s->pc - 4);              \
> +        unallocated_encoding(s);                                         \
> +    } while (0)
> +
> +TCGv_i64 new_tmp_a64(DisasContext *s);
> +TCGv_i64 new_tmp_a64_zero(DisasContext *s);
> +TCGv_i64 cpu_reg(DisasContext *s, int reg);
> +TCGv_i64 cpu_reg_sp(DisasContext *s, int reg);
> +TCGv_i64 read_cpu_reg(DisasContext *s, int reg, int sf);
> +TCGv_i64 read_cpu_reg_sp(DisasContext *s, int reg, int sf);
> +void write_fp_dreg(DisasContext *s, int reg, TCGv_i64 v);
> +TCGv_ptr get_fpstatus_ptr(bool);
> +bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
> +                            unsigned int imms, unsigned int immr);
> +uint64_t vfp_expand_imm(int size, uint8_t imm8);
> +
> +/* We should have at some point before trying to access an FP register
> + * done the necessary access check, so assert that
> + * (a) we did the check and
> + * (b) we didn't then just plough ahead anyway if it failed.
> + * Print the instruction pattern in the abort message so we can figure
> + * out what we need to fix if a user encounters this problem in the wild.
> + */
> +static inline void assert_fp_access_checked(DisasContext *s)
> +{
> +#ifdef CONFIG_DEBUG_TCG
> +    if (unlikely(!s->fp_access_checked || s->fp_excp_el)) {
> +        fprintf(stderr, "target-arm: FP access check missing for "
> +                "instruction 0x%08x\n", s->insn);
> +        abort();
> +    }
> +#endif
> +}
> +
> +/* Return the offset into CPUARMState of an element of specified
> + * size, 'element' places in from the least significant end of
> + * the FP/vector register Qn.
> + */
> +static inline int vec_reg_offset(DisasContext *s, int regno,
> +                                 int element, TCGMemOp size)
> +{
> +    int offs = 0;
> +#ifdef HOST_WORDS_BIGENDIAN
> +    /* This is complicated slightly because vfp.zregs[n].d[0] is
> +     * still the low half and vfp.zregs[n].d[1] the high half
> +     * of the 128 bit vector, even on big endian systems.
> +     * Calculate the offset assuming a fully bigendian 128 bits,
> +     * then XOR to account for the order of the two 64 bit halves.
> +     */
> +    offs += (16 - ((element + 1) * (1 << size)));
> +    offs ^= 8;
> +#else
> +    offs += element * (1 << size);
> +#endif
> +    offs += offsetof(CPUARMState, vfp.zregs[regno]);
> +    assert_fp_access_checked(s);
> +    return offs;
> +}
> +
> +/* Return the offset info CPUARMState of the "whole" vector register Qn.  */
> +static inline int vec_full_reg_offset(DisasContext *s, int regno)
> +{
> +    assert_fp_access_checked(s);
> +    return offsetof(CPUARMState, vfp.zregs[regno]);
> +}
> +
> +/* Return a newly allocated pointer to the vector register.  */
> +static inline TCGv_ptr vec_full_reg_ptr(DisasContext *s, int regno)
> +{
> +    TCGv_ptr ret = tcg_temp_new_ptr();
> +    tcg_gen_addi_ptr(ret, cpu_env, vec_full_reg_offset(s, regno));
> +    return ret;
> +}
> +
> +/* Return the byte size of the "whole" vector register, VL / 8.  */
> +static inline int vec_full_reg_size(DisasContext *s)
> +{
> +    return s->sve_len;
> +}
> +
> +bool disas_sve(DisasContext *, uint32_t);
> +
> +#endif /* TARGET_ARM_TRANSLATE_A64_H */
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 032cbfa17d..e0e7ebf68c 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -36,13 +36,13 @@
>  #include "exec/log.h"
>
>  #include "trace-tcg.h"
> +#include "translate-a64.h"
>
>  static TCGv_i64 cpu_X[32];
>  static TCGv_i64 cpu_pc;
>
>  /* Load/store exclusive handling */
>  static TCGv_i64 cpu_exclusive_high;
> -static TCGv_i64 cpu_reg(DisasContext *s, int reg);
>
>  static const char *regnames[] = {
>      "x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7",
> @@ -392,22 +392,13 @@ static inline void gen_goto_tb(DisasContext *s, int n, uint64_t dest)
>      }
>  }
>
> -static void unallocated_encoding(DisasContext *s)
> +void unallocated_encoding(DisasContext *s)
>  {
>      /* Unallocated and reserved encodings are uncategorized */
>      gen_exception_insn(s, 4, EXCP_UDEF, syn_uncategorized(),
>                         default_exception_el(s));
>  }
>
> -#define unsupported_encoding(s, insn)                                    \
> -    do {                                                                 \
> -        qemu_log_mask(LOG_UNIMP,                                         \
> -                      "%s:%d: unsupported instruction encoding 0x%08x "  \
> -                      "at pc=%016" PRIx64 "\n",                          \
> -                      __FILE__, __LINE__, insn, s->pc - 4);              \
> -        unallocated_encoding(s);                                         \
> -    } while (0)
> -
>  static void init_tmp_a64_array(DisasContext *s)
>  {
>  #ifdef CONFIG_DEBUG_TCG
> @@ -425,13 +416,13 @@ static void free_tmp_a64(DisasContext *s)
>      init_tmp_a64_array(s);
>  }
>
> -static TCGv_i64 new_tmp_a64(DisasContext *s)
> +TCGv_i64 new_tmp_a64(DisasContext *s)
>  {
>      assert(s->tmp_a64_count < TMP_A64_MAX);
>      return s->tmp_a64[s->tmp_a64_count++] = tcg_temp_new_i64();
>  }
>
> -static TCGv_i64 new_tmp_a64_zero(DisasContext *s)
> +TCGv_i64 new_tmp_a64_zero(DisasContext *s)
>  {
>      TCGv_i64 t = new_tmp_a64(s);
>      tcg_gen_movi_i64(t, 0);
> @@ -453,7 +444,7 @@ static TCGv_i64 new_tmp_a64_zero(DisasContext *s)
>   * to cpu_X[31] and ZR accesses to a temporary which can be discarded.
>   * This is the point of the _sp forms.
>   */
> -static TCGv_i64 cpu_reg(DisasContext *s, int reg)
> +TCGv_i64 cpu_reg(DisasContext *s, int reg)
>  {
>      if (reg == 31) {
>          return new_tmp_a64_zero(s);
> @@ -463,7 +454,7 @@ static TCGv_i64 cpu_reg(DisasContext *s, int reg)
>  }
>
>  /* register access for when 31 == SP */
> -static TCGv_i64 cpu_reg_sp(DisasContext *s, int reg)
> +TCGv_i64 cpu_reg_sp(DisasContext *s, int reg)
>  {
>      return cpu_X[reg];
>  }
> @@ -472,7 +463,7 @@ static TCGv_i64 cpu_reg_sp(DisasContext *s, int reg)
>   * representing the register contents. This TCGv is an auto-freed
>   * temporary so it need not be explicitly freed, and may be modified.
>   */
> -static TCGv_i64 read_cpu_reg(DisasContext *s, int reg, int sf)
> +TCGv_i64 read_cpu_reg(DisasContext *s, int reg, int sf)
>  {
>      TCGv_i64 v = new_tmp_a64(s);
>      if (reg != 31) {
> @@ -487,7 +478,7 @@ static TCGv_i64 read_cpu_reg(DisasContext *s, int reg, int sf)
>      return v;
>  }
>
> -static TCGv_i64 read_cpu_reg_sp(DisasContext *s, int reg, int sf)
> +TCGv_i64 read_cpu_reg_sp(DisasContext *s, int reg, int sf)
>  {
>      TCGv_i64 v = new_tmp_a64(s);
>      if (sf) {
> @@ -498,72 +489,6 @@ static TCGv_i64 read_cpu_reg_sp(DisasContext *s, int reg, int sf)
>      return v;
>  }
>
> -/* We should have at some point before trying to access an FP register
> - * done the necessary access check, so assert that
> - * (a) we did the check and
> - * (b) we didn't then just plough ahead anyway if it failed.
> - * Print the instruction pattern in the abort message so we can figure
> - * out what we need to fix if a user encounters this problem in the wild.
> - */
> -static inline void assert_fp_access_checked(DisasContext *s)
> -{
> -#ifdef CONFIG_DEBUG_TCG
> -    if (unlikely(!s->fp_access_checked || s->fp_excp_el)) {
> -        fprintf(stderr, "target-arm: FP access check missing for "
> -                "instruction 0x%08x\n", s->insn);
> -        abort();
> -    }
> -#endif
> -}
> -
> -/* Return the offset into CPUARMState of an element of specified
> - * size, 'element' places in from the least significant end of
> - * the FP/vector register Qn.
> - */
> -static inline int vec_reg_offset(DisasContext *s, int regno,
> -                                 int element, TCGMemOp size)
> -{
> -    int offs = 0;
> -#ifdef HOST_WORDS_BIGENDIAN
> -    /* This is complicated slightly because vfp.zregs[n].d[0] is
> -     * still the low half and vfp.zregs[n].d[1] the high half
> -     * of the 128 bit vector, even on big endian systems.
> -     * Calculate the offset assuming a fully bigendian 128 bits,
> -     * then XOR to account for the order of the two 64 bit halves.
> -     */
> -    offs += (16 - ((element + 1) * (1 << size)));
> -    offs ^= 8;
> -#else
> -    offs += element * (1 << size);
> -#endif
> -    offs += offsetof(CPUARMState, vfp.zregs[regno]);
> -    assert_fp_access_checked(s);
> -    return offs;
> -}
> -
> -/* Return the offset info CPUARMState of the "whole" vector register Qn.  */
> -static inline int vec_full_reg_offset(DisasContext *s, int regno)
> -{
> -    assert_fp_access_checked(s);
> -    return offsetof(CPUARMState, vfp.zregs[regno]);
> -}
> -
> -/* Return a newly allocated pointer to the vector register.  */
> -static TCGv_ptr vec_full_reg_ptr(DisasContext *s, int regno)
> -{
> -    TCGv_ptr ret = tcg_temp_new_ptr();
> -    tcg_gen_addi_ptr(ret, cpu_env, vec_full_reg_offset(s, regno));
> -    return ret;
> -}
> -
> -/* Return the byte size of the "whole" vector register, VL / 8.  */
> -static inline int vec_full_reg_size(DisasContext *s)
> -{
> -    /* FIXME SVE: We should put the composite ZCR_EL* value into tb->flags.
> -       In the meantime this is just the AdvSIMD length of 128.  */
> -    return 128 / 8;
> -}
> -
>  /* Return the offset into CPUARMState of a slice (from
>   * the least significant end) of FP register Qn (ie
>   * Dn, Sn, Hn or Bn).
> @@ -620,7 +545,7 @@ static void clear_vec_high(DisasContext *s, bool is_q, int rd)
>      }
>  }
>
> -static void write_fp_dreg(DisasContext *s, int reg, TCGv_i64 v)
> +void write_fp_dreg(DisasContext *s, int reg, TCGv_i64 v)
>  {
>      unsigned ofs = fp_reg_offset(s, reg, MO_64);
>
> @@ -637,7 +562,7 @@ static void write_fp_sreg(DisasContext *s, int reg, TCGv_i32 v)
>      tcg_temp_free_i64(tmp);
>  }
>
> -static TCGv_ptr get_fpstatus_ptr(bool is_f16)
> +TCGv_ptr get_fpstatus_ptr(bool is_f16)
>  {
>      TCGv_ptr statusptr = tcg_temp_new_ptr();
>      int offset;
> @@ -3130,8 +3055,8 @@ static inline uint64_t bitmask64(unsigned int length)
>   * value (ie should cause a guest UNDEF exception), and true if they are
>   * valid, in which case the decoded bit pattern is written to result.
>   */
> -static bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
> -                                   unsigned int imms, unsigned int immr)
> +bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
> +                            unsigned int imms, unsigned int immr)
>  {
>      uint64_t mask;
>      unsigned e, levels, s, r;
> @@ -5164,7 +5089,7 @@ static void disas_fp_3src(DisasContext *s, uint32_t insn)
>   * the range 01....1xx to 10....0xx, and the most significant 4 bits of
>   * the mantissa; see VFPExpandImm() in the v8 ARM ARM.
>   */
> -static uint64_t vfp_expand_imm(int size, uint8_t imm8)
> +uint64_t vfp_expand_imm(int size, uint8_t imm8)
>  {
>      uint64_t imm;


--
Alex Bennée

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 06/67] target/arm: Implement SVE predicate test
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 06/67] target/arm: Implement SVE predicate test Richard Henderson
  2018-02-22 18:38   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
@ 2018-04-03  9:16   ` Alex Bennée
  2018-04-06  1:27     ` Richard Henderson
  1 sibling, 1 reply; 167+ messages in thread
From: Alex Bennée @ 2018-04-03  9:16 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    | 21 +++++++++++++
>  target/arm/helper.h        |  1 +
>  target/arm/sve_helper.c    | 77 ++++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 62 +++++++++++++++++++++++++++++++++++++
>  target/arm/Makefile.objs   |  2 +-
>  target/arm/sve.decode      |  5 +++
>  6 files changed, 167 insertions(+), 1 deletion(-)
>  create mode 100644 target/arm/helper-sve.h
>  create mode 100644 target/arm/sve_helper.c
>
> diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
> new file mode 100644
> index 0000000000..b6e91539ae
> --- /dev/null
> +++ b/target/arm/helper-sve.h
> @@ -0,0 +1,21 @@
> +/*
> + *  AArch64 SVE specific helper definitions
> + *
> + *  Copyright (c) 2018 Linaro, Ltd
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +DEF_HELPER_FLAGS_2(sve_predtest1, TCG_CALL_NO_WG, i32, i64, i64)
> +DEF_HELPER_FLAGS_3(sve_predtest, TCG_CALL_NO_WG, i32, ptr, ptr, i32)
> diff --git a/target/arm/helper.h b/target/arm/helper.h
> index 6dd8504ec3..be3c2fcdc0 100644
> --- a/target/arm/helper.h
> +++ b/target/arm/helper.h
> @@ -567,4 +567,5 @@ DEF_HELPER_FLAGS_2(neon_pmull_64_hi, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>
>  #ifdef TARGET_AARCH64
>  #include "helper-a64.h"
> +#include "helper-sve.h"
>  #endif
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> new file mode 100644
> index 0000000000..7d13fd40ed
> --- /dev/null
> +++ b/target/arm/sve_helper.c
> @@ -0,0 +1,77 @@
> +/*
> + *  ARM SVE Operations
> + *
> + *  Copyright (c) 2018 Linaro
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "cpu.h"
> +#include "exec/exec-all.h"
> +#include "exec/cpu_ldst.h"
> +#include "exec/helper-proto.h"
> +#include "tcg/tcg-gvec-desc.h"
> +
> +
> +/* Return a value for NZCV as per the ARM PredTest pseudofunction.
> + *
> + * The return value has bit 31 set if N is set, bit 1 set if Z is clear,
> + * and bit 0 set if C is set.
> + *
> + * This is an iterative function, called for each Pd and Pg word
> + * moving forward.
> + */
> +
> +/* For no G bits set, NZCV = C.  */
> +#define PREDTEST_INIT  1
> +
> +static uint32_t iter_predtest_fwd(uint64_t d, uint64_t g, uint32_t flags)
> +{
> +    if (g) {
> +        /* Compute N from first D & G.
> +           Use bit 2 to signal first G bit seen.  */
> +        if (!(flags & 4)) {
> +            flags |= ((d & (g & -g)) != 0) << 31;
> +            flags |= 4;
> +        }
> +
> +        /* Accumulate Z from each D & G.  */
> +        flags |= ((d & g) != 0) << 1;
> +
> +        /* Compute C from last !(D & G).  Replace previous.  */
> +        flags = deposit32(flags, 0, 1, (d & pow2floor(g)) == 0);
> +    }
> +    return flags;
> +}
> +
> +/* The same for a single word predicate.  */
> +uint32_t HELPER(sve_predtest1)(uint64_t d, uint64_t g)
> +{
> +    return iter_predtest_fwd(d, g, PREDTEST_INIT);
> +}
> +
> +/* The same for a multi-word predicate.  */
> +uint32_t HELPER(sve_predtest)(void *vd, void *vg, uint32_t words)
> +{
> +    uint32_t flags = PREDTEST_INIT;
> +    uint64_t *d = vd, *g = vg;
> +    uintptr_t i = 0;
> +
> +    do {
> +        flags = iter_predtest_fwd(d[i], g[i], flags);
> +    } while (++i < words);
> +
> +    return flags;
> +}
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index c0cccfda6f..c2e7fac938 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -83,6 +83,43 @@ static void do_mov_z(DisasContext *s, int rd, int rn)
>      do_vector2_z(s, tcg_gen_gvec_mov, 0, rd, rn);
>  }
>
> +/* Set the cpu flags as per a return from an SVE helper.  */
> +static void do_pred_flags(TCGv_i32 t)
> +{
> +    tcg_gen_mov_i32(cpu_NF, t);
> +    tcg_gen_andi_i32(cpu_ZF, t, 2);
> +    tcg_gen_andi_i32(cpu_CF, t, 1);
> +    tcg_gen_movi_i32(cpu_VF, 0);
> +}

Why bother returning a value from the helper to then spend time
shuffling it into env->cpu_FLAG when we could do this directly? Does
this aid code generation when flag values are queried?

Also from above:

> + * The return value has bit 31 set if N is set, bit 1 set if Z is clear,
> + * and bit 0 set if C is set.

So there is assumed knowledge in the encoding of cpu_NF here - maybe a
reference to cpu.h where this is codified.

> +
> +/* Subroutines computing the ARM PredTest psuedofunction.  */
> +static void do_predtest1(TCGv_i64 d, TCGv_i64 g)
> +{
> +    TCGv_i32 t = tcg_temp_new_i32();
> +
> +    gen_helper_sve_predtest1(t, d, g);
> +    do_pred_flags(t);
> +    tcg_temp_free_i32(t);
> +}
> +
> +static void do_predtest(DisasContext *s, int dofs, int gofs, int words)
> +{
> +    TCGv_ptr dptr = tcg_temp_new_ptr();
> +    TCGv_ptr gptr = tcg_temp_new_ptr();
> +    TCGv_i32 t;
> +
> +    tcg_gen_addi_ptr(dptr, cpu_env, dofs);
> +    tcg_gen_addi_ptr(gptr, cpu_env, gofs);
> +    t = tcg_const_i32(words);
> +
> +    gen_helper_sve_predtest(t, dptr, gptr, t);
> +    tcg_temp_free_ptr(dptr);
> +    tcg_temp_free_ptr(gptr);
> +
> +    do_pred_flags(t);
> +    tcg_temp_free_i32(t);
> +}
> +
>  /*
>   *** SVE Logical - Unpredicated Group
>   */
> @@ -111,6 +148,31 @@ static void trans_BIC_zzz(DisasContext *s, arg_BIC_zzz *a, uint32_t insn)
>      do_vector3_z(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm);
>  }
>
> +/*
> + *** SVE Predicate Misc Group
> + */
> +
> +void trans_PTEST(DisasContext *s, arg_PTEST *a, uint32_t insn)
> +{
> +    int nofs = pred_full_reg_offset(s, a->rn);
> +    int gofs = pred_full_reg_offset(s, a->pg);
> +    int words = DIV_ROUND_UP(pred_full_reg_size(s), 8);
> +
> +    if (words == 1) {
> +        TCGv_i64 pn = tcg_temp_new_i64();
> +        TCGv_i64 pg = tcg_temp_new_i64();
> +
> +        tcg_gen_ld_i64(pn, cpu_env, nofs);
> +        tcg_gen_ld_i64(pg, cpu_env, gofs);
> +        do_predtest1(pn, pg);
> +
> +        tcg_temp_free_i64(pn);
> +        tcg_temp_free_i64(pg);
> +    } else {
> +        do_predtest(s, nofs, gofs, words);
> +    }
> +}
> +
>  /*
>   *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
>   */
> diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
> index 9934cf1d4d..452ac6f453 100644
> --- a/target/arm/Makefile.objs
> +++ b/target/arm/Makefile.objs
> @@ -19,4 +19,4 @@ target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.decode $(DECODETREE)
>  	  "GEN", $(TARGET_DIR)$@)
>
>  target/arm/translate-sve.o: target/arm/decode-sve.inc.c
> -obj-$(TARGET_AARCH64) += translate-sve.o
> +obj-$(TARGET_AARCH64) += translate-sve.o sve_helper.o
> diff --git a/target/arm/sve.decode b/target/arm/sve.decode
> index 0c6a7ba34d..7efaa8fe8e 100644
> --- a/target/arm/sve.decode
> +++ b/target/arm/sve.decode
> @@ -56,6 +56,11 @@ ORR_zzz		00000100 01 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
>  EOR_zzz		00000100 10 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
>  BIC_zzz		00000100 11 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
>
> +### SVE Predicate Misc Group
> +
> +# SVE predicate test
> +PTEST		00100101 01010000 11 pg:4 0 rn:4 00000
> +
>  ### SVE Memory - 32-bit Gather and Unsized Contiguous Group
>
>  # SVE load predicate register


--
Alex Bennée

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/67] target/arm: Implement SVE load vector/predicate
  2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 05/67] target/arm: Implement SVE load vector/predicate Richard Henderson
  2018-02-22 18:20   ` Peter Maydell
@ 2018-04-03  9:26   ` Alex Bennée
  2018-04-06  1:23     ` Richard Henderson
  1 sibling, 1 reply; 167+ messages in thread
From: Alex Bennée @ 2018-04-03  9:26 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-sve.c | 132 +++++++++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      |  22 +++++++-
>  2 files changed, 153 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index 50cf2a1fdd..c0cccfda6f 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -46,6 +46,19 @@ typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
>   * Implement all of the translator functions referenced by the decoder.
>   */
>
> +/* Return the offset info CPUARMState of the predicate vector register Pn.
> + * Note for this purpose, FFR is P16.  */
> +static inline int pred_full_reg_offset(DisasContext *s, int regno)
> +{
> +    return offsetof(CPUARMState, vfp.pregs[regno]);
> +}

You don't use it yet but probably worth a:

static inline int ffr_full_reg_offset(DisasContext *s)
{
    return pred_full_reg_offset(s, 16);
}

here when you get to it to avoid the magic 16 appearing in the main code.

> +
> +/* Return the byte size of the whole predicate register, VL / 64.  */
> +static inline int pred_full_reg_size(DisasContext *s)
> +{
> +    return s->sve_len >> 3;
> +}
> +
>  /* Invoke a vector expander on two Zregs.  */
>  static void do_vector2_z(DisasContext *s, GVecGen2Fn *gvec_fn,
>                           int esz, int rd, int rn)
> @@ -97,3 +110,122 @@ static void trans_BIC_zzz(DisasContext *s, arg_BIC_zzz *a, uint32_t insn)
>  {
>      do_vector3_z(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm);
>  }
> +
> +/*
> + *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
> + */
> +
> +/* Subroutine loading a vector register at VOFS of LEN bytes.
> + * The load should begin at the address Rn + IMM.
> + */
> +
> +#if UINTPTR_MAX == UINT32_MAX
> +# define ptr i32
> +#else
> +# define ptr i64
> +#endif

This seems superfluous, don't we already have TCGv_ptr for this reason?

> +
> +static void do_ldr(DisasContext *s, uint32_t vofs, uint32_t len,
> +                   int rn, int imm)
> +{
> +    uint32_t len_align = QEMU_ALIGN_DOWN(len, 8);
> +    uint32_t len_remain = len % 8;
> +    uint32_t nparts = len / 8 + ctpop8(len_remain);
> +    int midx = get_mem_index(s);
> +    TCGv_i64 addr, t0, t1;
> +
> +    addr = tcg_temp_new_i64();
> +    t0 = tcg_temp_new_i64();
> +
> +    /* Note that unpredicated load/store of vector/predicate registers
> +     * are defined as a stream of bytes, which equates to little-endian
> +     * operations on larger quantities.  There is no nice way to force
> +     * a little-endian load for aarch64_be-linux-user out of line.
> +     *
> +     * Attempt to keep code expansion to a minimum by limiting the
> +     * amount of unrolling done.
> +     */
> +    if (nparts <= 4) {
> +        int i;
> +
> +        for (i = 0; i < len_align; i += 8) {
> +            tcg_gen_addi_i64(addr, cpu_reg_sp(s, rn), imm + i);
> +            tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LEQ);
> +            tcg_gen_st_i64(t0, cpu_env, vofs + i);
> +        }
> +    } else {
> +        TCGLabel *loop = gen_new_label();
> +        TCGv_ptr i = TCGV_NAT_TO_PTR(glue(tcg_const_local_, ptr)(0));
> +        TCGv_ptr dest;
> +
> +        gen_set_label(loop);
> +
> +        /* Minimize the number of local temps that must be re-read from
> +         * the stack each iteration.  Instead, re-compute values other
> +         * than the loop counter.
> +         */
> +        dest = tcg_temp_new_ptr();
> +        tcg_gen_addi_ptr(dest, i, imm);
> +#if UINTPTR_MAX == UINT32_MAX
> +        tcg_gen_extu_i32_i64(addr, TCGV_PTR_TO_NAT(dest));
> +        tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, rn));
> +#else
> +        tcg_gen_add_i64(addr, TCGV_PTR_TO_NAT(dest), cpu_reg_sp(s, rn));
> +#endif
> +
> +        tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LEQ);
> +
> +        tcg_gen_add_ptr(dest, cpu_env, i);
> +        tcg_gen_addi_ptr(i, i, 8);
> +        tcg_gen_st_i64(t0, dest, vofs);
> +        tcg_temp_free_ptr(dest);
> +
> +        glue(tcg_gen_brcondi_, ptr)(TCG_COND_LTU, TCGV_PTR_TO_NAT(i),
> +                                    len_align, loop);

If this is the only use for ptr I wonder if it would just make more
sense to #if/else here.

> +        tcg_temp_free_ptr(i);
> +    }
> +
> +    /* Predicate register loads can be any multiple of 2.
> +     * Note that we still store the entire 64-bit unit into cpu_env.
> +     */
> +    if (len_remain) {
> +        tcg_gen_addi_i64(addr, cpu_reg_sp(s, rn), imm + len_align);
> +
> +        switch (len_remain) {
> +        case 2:
> +        case 4:
> +        case 8:
> +            tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LE | ctz32(len_remain));
> +            break;
> +
> +        case 6:
> +            t1 = tcg_temp_new_i64();
> +            tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LEUL);
> +            tcg_gen_addi_i64(addr, addr, 4);
> +            tcg_gen_qemu_ld_i64(t1, addr, midx, MO_LEUW);
> +            tcg_gen_deposit_i64(t0, t0, t1, 32, 32);
> +            tcg_temp_free_i64(t1);
> +            break;
> +
> +        default:
> +            g_assert_not_reached();
> +        }
> +        tcg_gen_st_i64(t0, cpu_env, vofs + len_align);
> +    }
> +    tcg_temp_free_i64(addr);
> +    tcg_temp_free_i64(t0);
> +}
> +
> +#undef ptr
> +
> +static void trans_LDR_zri(DisasContext *s, arg_rri *a, uint32_t insn)
> +{
> +    int size = vec_full_reg_size(s);
> +    do_ldr(s, vec_full_reg_offset(s, a->rd), size, a->rn, a->imm * size);
> +}
> +
> +static void trans_LDR_pri(DisasContext *s, arg_rri *a, uint32_t insn)
> +{
> +    int size = pred_full_reg_size(s);
> +    do_ldr(s, pred_full_reg_offset(s, a->rd), size, a->rn, a->imm * size);
> +}
> diff --git a/target/arm/sve.decode b/target/arm/sve.decode
> index 2c13a6024a..0c6a7ba34d 100644
> --- a/target/arm/sve.decode
> +++ b/target/arm/sve.decode
> @@ -19,11 +19,17 @@
>  # This file is processed by scripts/decodetree.py
>  #
>
> +###########################################################################
> +# Named fields.  These are primarily for disjoint fields.
> +
> +%imm9_16_10	16:s6 10:3
> +
>  ###########################################################################
>  # Named attribute sets.  These are used to make nice(er) names
>  # when creating helpers common to those for the individual
>  # instruction patterns.
>
> +&rri		rd rn imm
>  &rrr_esz	rd rn rm esz
>
>  ###########################################################################
> @@ -31,7 +37,13 @@
>  # reduce the amount of duplication between instruction patterns.
>
>  # Three operand with unused vector element size
> -@rd_rn_rm_e0	........ ... rm:5  ... ...  rn:5 rd:5		&rrr_esz esz=0
> +@rd_rn_rm_e0	........ ... rm:5 ... ... rn:5 rd:5		&rrr_esz esz=0
> +
> +# Basic Load/Store with 9-bit immediate offset
> +@pd_rn_i9	........ ........ ...... rn:5 . rd:4	\
> +		&rri imm=%imm9_16_10
> +@rd_rn_i9	........ ........ ...... rn:5 rd:5	\
> +		&rri imm=%imm9_16_10
>
>  ###########################################################################
>  # Instruction patterns.  Grouped according to the SVE encodingindex.xhtml.
> @@ -43,3 +55,11 @@ AND_zzz		00000100 00 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
>  ORR_zzz		00000100 01 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
>  EOR_zzz		00000100 10 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
>  BIC_zzz		00000100 11 1 ..... 001 100 ..... .....		@rd_rn_rm_e0
> +
> +### SVE Memory - 32-bit Gather and Unsized Contiguous Group
> +
> +# SVE load predicate register
> +LDR_pri		10000101 10 ...... 000 ... ..... 0 ....		@pd_rn_i9
> +
> +# SVE load vector register
> +LDR_zri		10000101 10 ...... 010 ... ..... .....		@rd_rn_i9


--
Alex Bennée

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 00/67] target/arm: Scalable Vector Extension
  2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
                   ` (67 preceding siblings ...)
  2018-02-23 17:05 ` [Qemu-devel] [Qemu-arm] [PATCH v2 00/67] target/arm: Scalable Vector Extension Alex Bennée
@ 2018-04-03 15:41 ` Alex Bennée
  68 siblings, 0 replies; 167+ messages in thread
From: Alex Bennée @ 2018-04-03 15:41 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm


Richard Henderson <richard.henderson@linaro.org> writes:

> This is 99% of the instruction set.  There are a few things missing,
> notably first-fault and non-fault loads (even these are decoded, but
> simply treated as normal loads for now).

I've finished my quick pass, apart from the individual comments I think
it looks pretty good.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/67] target/arm: Implement SVE load vector/predicate
  2018-04-03  9:26   ` Alex Bennée
@ 2018-04-06  1:23     ` Richard Henderson
  2018-04-06 13:03       ` Alex Bennée
  0 siblings, 1 reply; 167+ messages in thread
From: Richard Henderson @ 2018-04-06  1:23 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, qemu-arm

On 04/03/2018 07:26 PM, Alex Bennée wrote:
> You don't use it yet but probably worth a:
> 
> static inline int ffr_full_reg_offset(DisasContext *s)
> {
>     return pred_full_reg_offset(s, 16);
> }
> 
> here when you get to it to avoid the magic 16 appearing in the main code.

Hum.  Most of the places that ffr is touched is in sve.decode.
I could certainly enhance the grammar there to allow a symbol
there instead of a number.

But I don't think treating ffr differently from a regular pr
register, as above, is a good idea.  At best I would use

  pred_full_reg_offset(s, FFR_PRED_NUM)

or something.


r~

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 06/67] target/arm: Implement SVE predicate test
  2018-04-03  9:16   ` Alex Bennée
@ 2018-04-06  1:27     ` Richard Henderson
  0 siblings, 0 replies; 167+ messages in thread
From: Richard Henderson @ 2018-04-06  1:27 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, qemu-arm

On 04/03/2018 07:16 PM, Alex Bennée wrote:
>> +/* Set the cpu flags as per a return from an SVE helper.  */
>> +static void do_pred_flags(TCGv_i32 t)
>> +{
>> +    tcg_gen_mov_i32(cpu_NF, t);
>> +    tcg_gen_andi_i32(cpu_ZF, t, 2);
>> +    tcg_gen_andi_i32(cpu_CF, t, 1);
>> +    tcg_gen_movi_i32(cpu_VF, 0);
>> +}
> 
> Why bother returning a value from the helper to then spend time
> shuffling it into env->cpu_FLAG when we could do this directly? Does
> this aid code generation when flag values are queried?

It means that the helper itself clobbers no TCG global temps, and so does not
invalidate any of the guest integer registers that might be live in host registers.

The arithmetic above is approximately as efficient as plain moves, so I don't
see this as "spending time shuffling" per se.

> Also from above:
> 
>> + * The return value has bit 31 set if N is set, bit 1 set if Z is clear,
>> + * and bit 0 set if C is set.
> 
> So there is assumed knowledge in the encoding of cpu_NF here - maybe a
> reference to cpu.h where this is codified.

I suppose, sure.


r~

^ permalink raw reply	[flat|nested] 167+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/67] target/arm: Implement SVE load vector/predicate
  2018-04-06  1:23     ` Richard Henderson
@ 2018-04-06 13:03       ` Alex Bennée
  0 siblings, 0 replies; 167+ messages in thread
From: Alex Bennée @ 2018-04-06 13:03 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-arm


Richard Henderson <richard.henderson@linaro.org> writes:

> On 04/03/2018 07:26 PM, Alex Bennée wrote:
>> You don't use it yet but probably worth a:
>>
>> static inline int ffr_full_reg_offset(DisasContext *s)
>> {
>>     return pred_full_reg_offset(s, 16);
>> }
>>
>> here when you get to it to avoid the magic 16 appearing in the main code.
>
> Hum.  Most of the places that ffr is touched is in sve.decode.
> I could certainly enhance the grammar there to allow a symbol
> there instead of a number.
>
> But I don't think treating ffr differently from a regular pr
> register, as above, is a good idea.  At best I would use
>
>   pred_full_reg_offset(s, FFR_PRED_NUM)

That would a fine alternative ;-)

--
Alex Bennée

^ permalink raw reply	[flat|nested] 167+ messages in thread

end of thread, other threads:[~2018-04-06 13:03 UTC | newest]

Thread overview: 167+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-17 18:22 [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Richard Henderson
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 01/67] target/arm: Enable SVE for aarch64-linux-user Richard Henderson
2018-02-22 17:28   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-22 19:27     ` Richard Henderson
2018-02-23 17:00   ` Alex Bennée
2018-02-23 18:47     ` Richard Henderson
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 02/67] target/arm: Introduce translate-a64.h Richard Henderson
2018-02-22 17:30   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-04-03  9:01   ` Alex Bennée
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 03/67] target/arm: Add SVE decode skeleton Richard Henderson
2018-02-22 18:00   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-23 11:40   ` Peter Maydell
2018-02-23 11:43     ` Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 04/67] target/arm: Implement SVE Bitwise Logical - Unpredicated Group Richard Henderson
2018-02-22 18:04   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-22 19:28     ` Richard Henderson
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 05/67] target/arm: Implement SVE load vector/predicate Richard Henderson
2018-02-22 18:20   ` Peter Maydell
2018-02-22 19:31     ` Richard Henderson
2018-04-03  9:26   ` Alex Bennée
2018-04-06  1:23     ` Richard Henderson
2018-04-06 13:03       ` Alex Bennée
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 06/67] target/arm: Implement SVE predicate test Richard Henderson
2018-02-22 18:38   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-04-03  9:16   ` Alex Bennée
2018-04-06  1:27     ` Richard Henderson
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 07/67] target/arm: Implement SVE Predicate Logical Operations Group Richard Henderson
2018-02-22 18:55   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-22 19:37     ` Richard Henderson
2018-02-23  9:56       ` Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 08/67] target/arm: Implement SVE Predicate Misc Group Richard Henderson
2018-02-23 11:22   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 09/67] target/arm: Implement SVE Integer Binary Arithmetic - Predicated Group Richard Henderson
2018-02-23 11:35   ` Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 10/67] target/arm: Implement SVE Integer Reduction Group Richard Henderson
2018-02-23 11:50   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 11/67] target/arm: Implement SVE bitwise shift by immediate (predicated) Richard Henderson
2018-02-23 12:03   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 12/67] target/arm: Implement SVE bitwise shift by vector (predicated) Richard Henderson
2018-02-23 12:50   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 13/67] target/arm: Implement SVE bitwise shift by wide elements (predicated) Richard Henderson
2018-02-23 12:57   ` Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 14/67] target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group Richard Henderson
2018-02-23 13:08   ` Peter Maydell
2018-02-23 17:25     ` Richard Henderson
2018-02-23 17:30       ` Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 15/67] target/arm: Implement SVE Integer Multiply-Add Group Richard Henderson
2018-02-23 13:12   ` Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 16/67] target/arm: Implement SVE Integer Arithmetic - Unpredicated Group Richard Henderson
2018-02-23 13:16   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 17/67] target/arm: Implement SVE Index Generation Group Richard Henderson
2018-02-23 13:22   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 18/67] target/arm: Implement SVE Stack Allocation Group Richard Henderson
2018-02-23 13:25   ` Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 19/67] target/arm: Implement SVE Bitwise Shift - Unpredicated Group Richard Henderson
2018-02-23 13:28   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 20/67] target/arm: Implement SVE Compute Vector Address Group Richard Henderson
2018-02-23 13:34   ` Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 21/67] target/arm: Implement SVE floating-point exponential accelerator Richard Henderson
2018-02-23 13:48   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-23 17:29     ` Richard Henderson
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 22/67] target/arm: Implement SVE floating-point trig select coefficient Richard Henderson
2018-02-23 13:54   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 23/67] target/arm: Implement SVE Element Count Group Richard Henderson
2018-02-23 14:06   ` Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 24/67] target/arm: Implement SVE Bitwise Immediate Group Richard Henderson
2018-02-23 14:10   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 25/67] target/arm: Implement SVE Integer Wide Immediate - Predicated Group Richard Henderson
2018-02-23 14:18   ` Peter Maydell
2018-02-23 17:31     ` Richard Henderson
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 26/67] target/arm: Implement SVE Permute - Extract Group Richard Henderson
2018-02-23 14:24   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-23 17:46     ` Richard Henderson
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 27/67] target/arm: Implement SVE Permute - Unpredicated Group Richard Henderson
2018-02-23 14:34   ` Peter Maydell
2018-02-23 18:58     ` Richard Henderson
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 28/67] target/arm: Implement SVE Permute - Predicates Group Richard Henderson
2018-02-23 15:15   ` Peter Maydell
2018-02-23 19:59     ` Richard Henderson
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 29/67] target/arm: Implement SVE Permute - Interleaving Group Richard Henderson
2018-02-23 15:22   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 30/67] target/arm: Implement SVE compress active elements Richard Henderson
2018-02-23 15:25   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 31/67] target/arm: Implement SVE conditionally broadcast/extract element Richard Henderson
2018-02-23 15:44   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-23 20:15     ` Richard Henderson
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 32/67] target/arm: Implement SVE copy to vector (predicated) Richard Henderson
2018-02-23 15:45   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 33/67] target/arm: Implement SVE reverse within elements Richard Henderson
2018-02-23 15:50   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-23 20:21     ` Richard Henderson
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 34/67] target/arm: Implement SVE vector splice (predicated) Richard Henderson
2018-02-23 15:52   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 35/67] target/arm: Implement SVE Select Vectors Group Richard Henderson
2018-02-23 16:21   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 36/67] target/arm: Implement SVE Integer Compare - " Richard Henderson
2018-02-23 16:29   ` Peter Maydell
2018-02-23 20:57     ` Richard Henderson
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 37/67] target/arm: Implement SVE Integer Compare - Immediate Group Richard Henderson
2018-02-23 16:32   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 38/67] target/arm: Implement SVE Partition Break Group Richard Henderson
2018-02-23 16:41   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-23 20:59     ` Richard Henderson
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 39/67] target/arm: Implement SVE Predicate Count Group Richard Henderson
2018-02-23 16:48   ` Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 40/67] target/arm: Implement SVE Integer Compare - Scalars Group Richard Henderson
2018-02-23 17:00   ` Peter Maydell
2018-02-23 21:06     ` Richard Henderson
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 41/67] target/arm: Implement FDUP/DUP Richard Henderson
2018-02-23 17:12   ` Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 42/67] target/arm: Implement SVE Integer Wide Immediate - Unpredicated Group Richard Henderson
2018-02-23 17:18   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:22 ` [Qemu-devel] [PATCH v2 43/67] target/arm: Implement SVE Floating Point Arithmetic " Richard Henderson
2018-02-23 17:25   ` Peter Maydell
2018-02-23 21:15     ` Richard Henderson
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 44/67] target/arm: Implement SVE Memory Contiguous Load Group Richard Henderson
2018-02-27 12:16   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 45/67] target/arm: Implement SVE Memory Contiguous Store Group Richard Henderson
2018-02-27 13:22   ` Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 46/67] target/arm: Implement SVE load and broadcast quadword Richard Henderson
2018-02-27 13:36   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 47/67] target/arm: Implement SVE integer convert to floating-point Richard Henderson
2018-02-27 13:47   ` Peter Maydell
2018-02-27 13:51   ` Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 48/67] target/arm: Implement SVE floating-point arithmetic (predicated) Richard Henderson
2018-02-27 13:50   ` Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 49/67] target/arm: Implement SVE FP Multiply-Add Group Richard Henderson
2018-02-27 13:54   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 50/67] target/arm: Implement SVE Floating Point Accumulating Reduction Group Richard Henderson
2018-02-27 13:59   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 51/67] target/arm: Implement SVE load and broadcast element Richard Henderson
2018-02-27 14:15   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 52/67] target/arm: Implement SVE store vector/predicate register Richard Henderson
2018-02-27 14:21   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 53/67] target/arm: Implement SVE scatter stores Richard Henderson
2018-02-27 14:36   ` Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 54/67] target/arm: Implement SVE prefetches Richard Henderson
2018-02-27 14:43   ` Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 55/67] target/arm: Implement SVE gather loads Richard Henderson
2018-02-27 14:53   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 56/67] target/arm: Implement SVE scatter store vector immediate Richard Henderson
2018-02-27 15:02   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 57/67] target/arm: Implement SVE floating-point compare vectors Richard Henderson
2018-02-27 15:04   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 58/67] target/arm: Implement SVE floating-point arithmetic with immediate Richard Henderson
2018-02-27 15:11   ` Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 59/67] target/arm: Implement SVE Floating Point Multiply Indexed Group Richard Henderson
2018-02-27 15:18   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-27 16:29     ` Richard Henderson
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 60/67] target/arm: Implement SVE FP Fast Reduction Group Richard Henderson
2018-02-27 15:24   ` Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 61/67] target/arm: Implement SVE Floating Point Unary Operations - Unpredicated Group Richard Henderson
2018-02-27 15:28   ` Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 62/67] target/arm: Implement SVE FP Compare with Zero Group Richard Henderson
2018-02-27 15:31   ` Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 63/67] target/arm: Implement SVE floating-point trig multiply-add coefficient Richard Henderson
2018-02-27 15:34   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 64/67] target/arm: Implement SVE floating-point convert precision Richard Henderson
2018-02-27 15:35   ` Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 65/67] target/arm: Implement SVE floating-point convert to integer Richard Henderson
2018-02-27 15:36   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 66/67] target/arm: Implement SVE floating-point round to integral value Richard Henderson
2018-02-27 15:39   ` [Qemu-devel] [Qemu-arm] " Peter Maydell
2018-02-17 18:23 ` [Qemu-devel] [PATCH v2 67/67] target/arm: Implement SVE floating-point unary operations Richard Henderson
2018-02-27 15:40   ` Peter Maydell
2018-02-23 17:05 ` [Qemu-devel] [Qemu-arm] [PATCH v2 00/67] target/arm: Scalable Vector Extension Alex Bennée
2018-04-03 15:41 ` Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.