All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/55] target/arm: First slice of MVE implementation
@ 2021-06-07 16:57 Peter Maydell
  2021-06-07 16:57 ` [PATCH 01/55] tcg: Introduce tcg_remove_ops_after Peter Maydell
                   ` (55 more replies)
  0 siblings, 56 replies; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

This patchseries provides an initial slice of the MVE
implementation. (MVE is "vector instructions for M-profile", also
known as Helium).

This is not complete support by a long way -- it covers only about 35%
of the decode patterns for MVE, and it implements only the slow-path
"we need predication, drop out to a helper function" versions of
insns. I send it out for two reasons:

 * if there's something I need to change about the general structure
   or the way I'm implementing insns, I want to know now rather than
   after I've implemented the other two thirds of the ISA

 * if I hold onto the whole patchset until I've got a complete MVE
   implementation it'll be 150+ patches, 10000 lines of code, and
   a nightmare to code review

The series covers:
 * framework for MVE decode, including infrastructure for
   handling predication, PSR.ECI, etc
 * tail-predication forms of low-overhead-loop insns (LCTP, WLSTP, LETP)
 * basic (non-gather) loads and stores
 * pretty much all the integer 2-operand vector and scalar insns
 * most of the integer 1-operand insns
 * a handful of other insns

(Unfortunately the v8M Arm ARM does not provide a nice neatly
separated list of encodings the way the SVE2 XML does.  I ended up
just pulling all the decode patterns out of the Arm ARM insn
descriptions and then hand-sorting them into what looked like common
formats. So the insns implemented aren't following a 100% logical
order.)

As noted above, the implementation here is purely the slow-path
fully-generic "call helpers that can handle predication". I do
want to implement a fast-path for "we know we have no predication,
so we can generate inline vector code", but I'd like to do that
as a series of followup patches once the main MVE code has landed.
That will (a) make it easier to review, I hope (b) mean we get to
"at least functional" MVE quicker and (c) allow people to bisect
any regressions to the "add fastpath" patch.

Almost nothing in this patchseries is "live code", because no CPU sets
the ID register bits to turn on MVE.  The exception is the handling of
PSR.ECI/ICI, which is enabled at least as far as the ICI bits go for
M-profile CPUs (thus fixing the missing corner-case requirement that
trying to execute a non-continuable insn with non-zero ICI should
fault).

My view is that if these patches get through code review we're better
off with them in upstream git rather than outside it; open to
arguments to the contrary.

Patch 1 is RTH's recently posted tcg_remove_ops_after() patch,
which we need for the PSR.ECI handling (which indeed is the
justification for having that new function in the first place).

You can also get this patchset here:
 https://git.linaro.org/people/peter.maydell/qemu-arm.git mve-drop-1

thanks
-- PMM

Peter Maydell (54):
  target/arm: Enable FPSCR.QC bit for MVE
  target/arm: Handle VPR semantics in existing code
  target/arm: Add handling for PSR.ECI/ICI
  target/arm: Let vfp_access_check() handle late NOCP checks
  target/arm: Implement MVE LCTP
  target/arm: Implement MVE WLSTP insn
  target/arm: Implement MVE DLSTP
  target/arm: Implement MVE LETP insn
  target/arm: Add framework for MVE decode
  target/arm: Implement MVE VLDR/VSTR (non-widening forms)
  target/arm: Implement widening/narrowing MVE VLDR/VSTR insns
  target/arm: Implement MVE VCLZ
  target/arm: Implement MVE VCLS
  bitops.h: Provide hswap32(), hswap64(), wswap64() swapping operations
  target/arm: Implement MVE VREV16, VREV32, VREV64
  target/arm: Implement MVE VMVN (register)
  target/arm: Implement MVE VABS
  target/arm: Implement MVE VNEG
  target/arm: Implement MVE VDUP
  target/arm: Implement MVE VAND, VBIC, VORR, VORN, VEOR
  target/arm: Implement MVE VADD, VSUB, VMUL
  target/arm: Implement MVE VMULH
  target/arm: Implement MVE VRMULH
  target/arm: Implement MVE VMAX, VMIN
  target/arm: Implement MVE VABD
  target/arm: Implement MVE VHADD, VHSUB
  target/arm: Implement MVE VMULL
  target/arm: Implement MVE VMLALDAV
  target/arm: Implement MVE VMLSLDAV
  include/qemu/int128.h: Add function to create Int128 from int64_t
  target/arm: Implement MVE VRMLALDAVH, VRMLSLDAVH
  target/arm: Implement MVE VADD (scalar)
  target/arm: Implement MVE VSUB, VMUL (scalar)
  target/arm: Implement MVE VHADD, VHSUB (scalar)
  target/arm: Implement MVE VBRSR
  target/arm: Implement MVE VPST
  target/arm: Implement MVE VQADD and VQSUB
  target/arm: Implement MVE VQDMULH and VQRDMULH (scalar)
  target/arm: Implement MVE VQDMULL scalar
  target/arm: Implement MVE VQDMULH, VQRDMULH (vector)
  target/arm: Implement MVE VQADD, VQSUB (vector)
  target/arm: Implement MVE VQSHL (vector)
  target/arm: Implement MVE VQRSHL
  target/arm: Implement MVE VSHL insn
  target/arm: Implement MVE VRSHL
  target/arm: Implement MVE VQDMLADH and VQRDMLADH
  target/arm: Implement MVE VQDMLSDH and VQRDMLSDH
  target/arm: Implement MVE VQDMULL (vector)
  target/arm: Implement MVE VRHADD
  target/arm: Implement MVE VADC, VSBC
  target/arm: Implement MVE VCADD
  target/arm: Implement MVE VHCADD
  target/arm: Implement MVE VADDV
  target/arm: Make VMOV scalar <-> gpreg beatwise for MVE

Richard Henderson (1):
  tcg: Introduce tcg_remove_ops_after

 include/qemu/bitops.h         |   29 +
 include/qemu/int128.h         |   10 +
 include/tcg/tcg.h             |    1 +
 target/arm/helper-mve.h       |  357 +++++++++
 target/arm/helper.h           |    2 +
 target/arm/internals.h        |   11 +
 target/arm/translate-a32.h    |    4 +
 target/arm/translate.h        |   19 +
 target/arm/mve.decode         |  261 +++++++
 target/arm/t32.decode         |   15 +-
 target/arm/m_helper.c         |   54 +-
 target/arm/mve_helper.c       | 1343 +++++++++++++++++++++++++++++++++
 target/arm/sve_helper.c       |   20 -
 target/arm/translate-m-nocp.c |   16 +-
 target/arm/translate-mve.c    |  865 +++++++++++++++++++++
 target/arm/translate-vfp.c    |  152 +++-
 target/arm/translate.c        |  301 +++++++-
 target/arm/vfp_helper.c       |    3 +-
 tcg/tcg.c                     |   13 +
 target/arm/meson.build        |    3 +
 20 files changed, 3408 insertions(+), 71 deletions(-)
 create mode 100644 target/arm/helper-mve.h
 create mode 100644 target/arm/mve.decode
 create mode 100644 target/arm/mve_helper.c
 create mode 100644 target/arm/translate-mve.c

-- 
2.20.1



^ permalink raw reply	[flat|nested] 130+ messages in thread

* [PATCH 01/55] tcg: Introduce tcg_remove_ops_after
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-07 16:57 ` [PATCH 02/55] target/arm: Enable FPSCR.QC bit for MVE Peter Maydell
                   ` (54 subsequent siblings)
  55 siblings, 0 replies; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

From: Richard Henderson <richard.henderson@linaro.org>

Introduce a function to remove everything emitted
since a given point.

Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210604212747.959028-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/tcg/tcg.h |  1 +
 tcg/tcg.c         | 13 +++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 74cb3453083..6895246fab5 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -1081,6 +1081,7 @@ TCGOp *tcg_emit_op(TCGOpcode opc);
 void tcg_op_remove(TCGContext *s, TCGOp *op);
 TCGOp *tcg_op_insert_before(TCGContext *s, TCGOp *op, TCGOpcode opc);
 TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp *op, TCGOpcode opc);
+void tcg_remove_ops_after(TCGOp *op);
 
 void tcg_optimize(TCGContext *s);
 
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 0dc271aac9f..262dbba1fde 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -2649,6 +2649,19 @@ void tcg_op_remove(TCGContext *s, TCGOp *op)
 #endif
 }
 
+void tcg_remove_ops_after(TCGOp *op)
+{
+    TCGContext *s = tcg_ctx;
+
+    while (true) {
+        TCGOp *last = tcg_last_op();
+        if (last == op) {
+            return;
+        }
+        tcg_op_remove(s, last);
+    }
+}
+
 static TCGOp *tcg_op_alloc(TCGOpcode opc)
 {
     TCGContext *s = tcg_ctx;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 02/55] target/arm: Enable FPSCR.QC bit for MVE
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
  2021-06-07 16:57 ` [PATCH 01/55] tcg: Introduce tcg_remove_ops_after Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-07 19:02   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 03/55] target/arm: Handle VPR semantics in existing code Peter Maydell
                   ` (53 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

MVE has an FPSCR.QC bit similar to the A-profile Neon one; when MVE
is implemented make the bit writeable, both in the generic "load and
store FPSCR" helper functions and in the code for handling the NZCVQC
sysreg which we had previously left as "TODO when we implement MVE".

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-vfp.c | 32 +++++++++++++++++++++++---------
 target/arm/vfp_helper.c    |  3 ++-
 2 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index d01e465821b..22a619eb2c5 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -784,10 +784,19 @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
     {
         TCGv_i32 fpscr;
         tmp = loadfn(s, opaque);
-        /*
-         * TODO: when we implement MVE, write the QC bit.
-         * For non-MVE, QC is RES0.
-         */
+        if (dc_isar_feature(aa32_mve, s)) {
+            /* QC is only present for MVE; otherwise RES0 */
+            TCGv_i32 qc = tcg_temp_new_i32();
+            TCGv_i32 zero;
+            tcg_gen_andi_i32(qc, tmp, FPCR_QC);
+            store_cpu_field(qc, vfp.qc[0]);
+            zero = tcg_const_i32(0);
+            store_cpu_field(zero, vfp.qc[1]);
+            zero = tcg_const_i32(0);
+            store_cpu_field(zero, vfp.qc[2]);
+            zero = tcg_const_i32(0);
+            store_cpu_field(zero, vfp.qc[3]);
+        }
         tcg_gen_andi_i32(tmp, tmp, FPCR_NZCV_MASK);
         fpscr = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
         tcg_gen_andi_i32(fpscr, fpscr, ~FPCR_NZCV_MASK);
@@ -869,6 +878,11 @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
         break;
     }
 
+    if (regno == ARM_VFP_FPSCR_NZCVQC && !dc_isar_feature(aa32_mve, s)) {
+        /* QC is RES0 without MVE, so NZCVQC simplifies to NZCV */
+        regno = QEMU_VFP_FPSCR_NZCV;
+    }
+
     switch (regno) {
     case ARM_VFP_FPSCR:
         tmp = tcg_temp_new_i32();
@@ -876,11 +890,11 @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
         storefn(s, opaque, tmp);
         break;
     case ARM_VFP_FPSCR_NZCVQC:
-        /*
-         * TODO: MVE has a QC bit, which we probably won't store
-         * in the xregs[] field. For non-MVE, where QC is RES0,
-         * we can just fall through to the FPSCR_NZCV case.
-         */
+        tmp = tcg_temp_new_i32();
+        gen_helper_vfp_get_fpscr(tmp, cpu_env);
+        tcg_gen_andi_i32(tmp, tmp, FPCR_NZCVQC_MASK);
+        storefn(s, opaque, tmp);
+        break;
     case QEMU_VFP_FPSCR_NZCV:
         /*
          * Read just NZCV; this is a special case to avoid the
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 496f0034772..8a716600592 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -220,7 +220,8 @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
                                      FPCR_LTPSIZE_LENGTH);
     }
 
-    if (arm_feature(env, ARM_FEATURE_NEON)) {
+    if (arm_feature(env, ARM_FEATURE_NEON) ||
+        cpu_isar_feature(aa32_mve, cpu)) {
         /*
          * The bit we set within fpscr_q is arbitrary; the register as a
          * whole being zero/non-zero is what counts.
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 03/55] target/arm: Handle VPR semantics in existing code
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
  2021-06-07 16:57 ` [PATCH 01/55] tcg: Introduce tcg_remove_ops_after Peter Maydell
  2021-06-07 16:57 ` [PATCH 02/55] target/arm: Enable FPSCR.QC bit for MVE Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-07 21:19   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 04/55] target/arm: Add handling for PSR.ECI/ICI Peter Maydell
                   ` (52 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

When MVE is supported, the VPR register has a place on the exception
stack frame in a previously reserved slot just above the FPSCR.
It must also be zeroed in various situations when we invalidate
FPU context.

Update the code which handles the stack frames (exception entry and
exit code, VLLDM, and VLSTM) to save/restore VPR.

Update code which invalidates FP registers (mostly also exception
entry and exit code, but also VSCCLRM and the code in
full_vfp_access_check() that corresponds to the ExecuteFPCheck()
pseudocode) to zero VPR.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/m_helper.c         | 54 +++++++++++++++++++++++++++++------
 target/arm/translate-m-nocp.c |  5 +++-
 target/arm/translate-vfp.c    |  9 ++++--
 3 files changed, 57 insertions(+), 11 deletions(-)

diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index 074c5434550..7a1e35ab5b6 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -378,7 +378,7 @@ void HELPER(v7m_preserve_fp_state)(CPUARMState *env)
             uint32_t shi = extract64(dn, 32, 32);
 
             if (i >= 16) {
-                faddr += 8; /* skip the slot for the FPSCR */
+                faddr += 8; /* skip the slot for the FPSCR/VPR */
             }
             stacked_ok = stacked_ok &&
                 v7m_stack_write(cpu, faddr, slo, mmu_idx, STACK_LAZYFP) &&
@@ -388,6 +388,11 @@ void HELPER(v7m_preserve_fp_state)(CPUARMState *env)
         stacked_ok = stacked_ok &&
             v7m_stack_write(cpu, fpcar + 0x40,
                             vfp_get_fpscr(env), mmu_idx, STACK_LAZYFP);
+        if (cpu_isar_feature(aa32_mve, cpu)) {
+            stacked_ok = stacked_ok &&
+                v7m_stack_write(cpu, fpcar + 0x44,
+                                env->v7m.vpr, mmu_idx, STACK_LAZYFP);
+        }
     }
 
     /*
@@ -410,16 +415,19 @@ void HELPER(v7m_preserve_fp_state)(CPUARMState *env)
     env->v7m.fpccr[is_secure] &= ~R_V7M_FPCCR_LSPACT_MASK;
 
     if (ts) {
-        /* Clear s0 to s31 and the FPSCR */
+        /* Clear s0 to s31 and the FPSCR and VPR */
         int i;
 
         for (i = 0; i < 32; i += 2) {
             *aa32_vfp_dreg(env, i / 2) = 0;
         }
         vfp_set_fpscr(env, 0);
+        if (cpu_isar_feature(aa32_mve, cpu)) {
+            env->v7m.vpr = 0;
+        }
     }
     /*
-     * Otherwise s0 to s15 and FPSCR are UNKNOWN; we choose to leave them
+     * Otherwise s0 to s15, FPSCR and VPR are UNKNOWN; we choose to leave them
      * unchanged.
      */
 }
@@ -1044,6 +1052,7 @@ static void v7m_update_fpccr(CPUARMState *env, uint32_t frameptr,
 void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr)
 {
     /* fptr is the value of Rn, the frame pointer we store the FP regs to */
+    ARMCPU *cpu = env_archcpu(env);
     bool s = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK;
     bool lspact = env->v7m.fpccr[s] & R_V7M_FPCCR_LSPACT_MASK;
     uintptr_t ra = GETPC();
@@ -1092,9 +1101,12 @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr)
             cpu_stl_data_ra(env, faddr + 4, shi, ra);
         }
         cpu_stl_data_ra(env, fptr + 0x40, vfp_get_fpscr(env), ra);
+        if (cpu_isar_feature(aa32_mve, cpu)) {
+            cpu_stl_data_ra(env, fptr + 0x44, env->v7m.vpr, ra);
+        }
 
         /*
-         * If TS is 0 then s0 to s15 and FPSCR are UNKNOWN; we choose to
+         * If TS is 0 then s0 to s15, FPSCR and VPR are UNKNOWN; we choose to
          * leave them unchanged, matching our choice in v7m_preserve_fp_state.
          */
         if (ts) {
@@ -1102,6 +1114,9 @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr)
                 *aa32_vfp_dreg(env, i / 2) = 0;
             }
             vfp_set_fpscr(env, 0);
+            if (cpu_isar_feature(aa32_mve, cpu)) {
+                env->v7m.vpr = 0;
+            }
         }
     } else {
         v7m_update_fpccr(env, fptr, false);
@@ -1112,6 +1127,7 @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr)
 
 void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t fptr)
 {
+    ARMCPU *cpu = env_archcpu(env);
     uintptr_t ra = GETPC();
 
     /* fptr is the value of Rn, the frame pointer we load the FP regs from */
@@ -1144,7 +1160,7 @@ void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t fptr)
             uint32_t faddr = fptr + 4 * i;
 
             if (i >= 16) {
-                faddr += 8; /* skip the slot for the FPSCR */
+                faddr += 8; /* skip the slot for the FPSCR and VPR */
             }
 
             slo = cpu_ldl_data_ra(env, faddr, ra);
@@ -1155,6 +1171,9 @@ void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t fptr)
         }
         fpscr = cpu_ldl_data_ra(env, fptr + 0x40, ra);
         vfp_set_fpscr(env, fpscr);
+        if (cpu_isar_feature(aa32_mve, cpu)) {
+            env->v7m.vpr = cpu_ldl_data_ra(env, fptr + 0x44, ra);
+        }
     }
 
     env->v7m.control[M_REG_S] |= R_V7M_CONTROL_FPCA_MASK;
@@ -1298,7 +1317,7 @@ static bool v7m_push_stack(ARMCPU *cpu)
                     uint32_t shi = extract64(dn, 32, 32);
 
                     if (i >= 16) {
-                        faddr += 8; /* skip the slot for the FPSCR */
+                        faddr += 8; /* skip the slot for the FPSCR and VPR */
                     }
                     stacked_ok = stacked_ok &&
                         v7m_stack_write(cpu, faddr, slo,
@@ -1309,11 +1328,19 @@ static bool v7m_push_stack(ARMCPU *cpu)
                 stacked_ok = stacked_ok &&
                     v7m_stack_write(cpu, frameptr + 0x60,
                                     vfp_get_fpscr(env), mmu_idx, STACK_NORMAL);
+                if (cpu_isar_feature(aa32_mve, cpu)) {
+                    stacked_ok = stacked_ok &&
+                        v7m_stack_write(cpu, frameptr + 0x64,
+                                        env->v7m.vpr, mmu_idx, STACK_NORMAL);
+                }
                 if (cpacr_pass) {
                     for (i = 0; i < ((framesize == 0xa8) ? 32 : 16); i += 2) {
                         *aa32_vfp_dreg(env, i / 2) = 0;
                     }
                     vfp_set_fpscr(env, 0);
+                    if (cpu_isar_feature(aa32_mve, cpu)) {
+                        env->v7m.vpr = 0;
+                    }
                 }
             } else {
                 /* Lazy stacking enabled, save necessary info to stack later */
@@ -1536,13 +1563,16 @@ static void do_v7m_exception_exit(ARMCPU *cpu)
                     v7m_exception_taken(cpu, excret, true, false);
                 }
             }
-            /* Clear s0..s15 and FPSCR; TODO also VPR when MVE is implemented */
+            /* Clear s0..s15, FPSCR and VPR */
             int i;
 
             for (i = 0; i < 16; i += 2) {
                 *aa32_vfp_dreg(env, i / 2) = 0;
             }
             vfp_set_fpscr(env, 0);
+            if (cpu_isar_feature(aa32_mve, cpu)) {
+                env->v7m.vpr = 0;
+            }
         }
     }
 
@@ -1771,7 +1801,7 @@ static void do_v7m_exception_exit(ARMCPU *cpu)
                     uint32_t faddr = frameptr + 0x20 + 4 * i;
 
                     if (i >= 16) {
-                        faddr += 8; /* Skip the slot for the FPSCR */
+                        faddr += 8; /* Skip the slot for the FPSCR and VPR */
                     }
 
                     pop_ok = pop_ok &&
@@ -1790,6 +1820,11 @@ static void do_v7m_exception_exit(ARMCPU *cpu)
                 if (pop_ok) {
                     vfp_set_fpscr(env, fpscr);
                 }
+                if (cpu_isar_feature(aa32_mve, cpu)) {
+                    pop_ok = pop_ok &&
+                        v7m_stack_read(cpu, &env->v7m.vpr,
+                                       frameptr + 0x64, mmu_idx);
+                }
                 if (!pop_ok) {
                     /*
                      * These regs are 0 if security extension present;
@@ -1799,6 +1834,9 @@ static void do_v7m_exception_exit(ARMCPU *cpu)
                         *aa32_vfp_dreg(env, i / 2) = 0;
                     }
                     vfp_set_fpscr(env, 0);
+                    if (cpu_isar_feature(aa32_mve, cpu)) {
+                        env->v7m.vpr = 0;
+                    }
                 }
             }
         }
diff --git a/target/arm/translate-m-nocp.c b/target/arm/translate-m-nocp.c
index d47eb8e1535..365810e582d 100644
--- a/target/arm/translate-m-nocp.c
+++ b/target/arm/translate-m-nocp.c
@@ -173,7 +173,10 @@ static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
         btmreg++;
     }
     assert(btmreg == topreg + 1);
-    /* TODO: when MVE is implemented, zero VPR here */
+    if (dc_isar_feature(aa32_mve, s)) {
+        TCGv_i32 z32 = tcg_const_i32(0);
+        store_cpu_field(z32, v7m.vpr);
+    }
     return true;
 }
 
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index 22a619eb2c5..c3504bd3b86 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -180,8 +180,8 @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
 
         if (s->v7m_new_fp_ctxt_needed) {
             /*
-             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA
-             * and the FPSCR.
+             * Create new FP context by updating CONTROL.FPCA, CONTROL.SFPA,
+             * the FPSCR, and VPR.
              */
             TCGv_i32 control, fpscr;
             uint32_t bits = R_V7M_CONTROL_FPCA_MASK;
@@ -189,6 +189,11 @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
             fpscr = load_cpu_field(v7m.fpdscr[s->v8m_secure]);
             gen_helper_vfp_set_fpscr(cpu_env, fpscr);
             tcg_temp_free_i32(fpscr);
+            if (dc_isar_feature(aa32_mve, s)) {
+                TCGv_i32 z32 = tcg_const_i32(0);
+                store_cpu_field(z32, v7m.vpr);
+            }
+
             /*
              * We don't need to arrange to end the TB, because the only
              * parts of FPSCR which we cache in the TB flags are the VECLEN
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 04/55] target/arm: Add handling for PSR.ECI/ICI
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (2 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 03/55] target/arm: Handle VPR semantics in existing code Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-07 23:33   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 05/55] target/arm: Let vfp_access_check() handle late NOCP checks Peter Maydell
                   ` (51 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

On A-profile, PSR bits [15:10][26:25] are always the IT state bits.
On M-profile, some of the reserved encodings of the IT state are used
to instead indicate partial progress through instructions that were
interrupted partway through by an exception and can be resumed.

These resumable instructions fall into two categories:

(1) load/store multiple instructions, where these bits are called
"ICI" and specify the register in the ldm/stm list where execution
should resume.  (Specifically: LDM, STM, VLDM, VSTM, VLLDM, VLSTM,
CLRM, VSCCLRM.)

(2) MVE instructions subject to beatwise execution, where these bits
are called "ECI" and specify which beats in this and possibly also
the following MVE insn have been executed.

There are also a few insns (LE, LETP, and BKPT) which do not use the
ICI/ECI bits but must leave them alone.

Otherwise, we should raise an INVSTATE UsageFault for any attempt to
execute an insn with non-zero ICI/ECI bits.

So far we have been able to ignore ECI/ICI, because the architecture
allows the IMPDEF choice of "always restart load/store multiple from
the beginning regardless of ICI state", so the only thing we have
been missing is that we don't raise the INVSTATE fault for bad guest
code.  However, MVE requires that we honour ECI bits and do not
rexecute beats of an insn that have already been executed.

Add the support in the decoder for handling ECI/ICI:
 * identify the ECI/ICI case in the CONDEXEC TB flags
 * when a load/store multiple insn succeeds, it updates the ECI/ICI
   state (both in DisasContext and in the CPU state), and sets a flag
   to say that the ECI/ICI state was handled
 * if we find that the insn we just decoded did not handle the
   ECI/ICI state, we delete all the code that we just generated for
   it and instead emit the code to raise the INVFAULT.  This allows
   us to avoid having to update every non-MVE non-LDM/STM insn to
   make it check for "is ECI/ICI set?".

We continue with our existing IMPDEF choice of not caring about the
ICI state for the load/store multiples and simply restarting them
from the beginning.  Because we don't allow interrupts in the middle
of an insn, the only way we would see this state is if the guest set
ICI manually on return from an exception handler, so it's a corner
case which doesn't merit optimisation.

ICI update for LDM/STM is simple -- it always zeroes the state.  ECI
update for MVE beatwise insns will be a little more complex, since
the ECI state may include information for the following insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a32.h    |   1 +
 target/arm/translate.h        |   9 +++
 target/arm/translate-m-nocp.c |  11 ++++
 target/arm/translate-vfp.c    |   6 ++
 target/arm/translate.c        | 113 ++++++++++++++++++++++++++++++++--
 5 files changed, 135 insertions(+), 5 deletions(-)

diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
index c997f4e3216..c946ac440ce 100644
--- a/target/arm/translate-a32.h
+++ b/target/arm/translate-a32.h
@@ -44,6 +44,7 @@ long vfp_reg_offset(bool dp, unsigned reg);
 long neon_full_reg_offset(unsigned reg);
 long neon_element_offset(int reg, int element, MemOp memop);
 void gen_rev16(TCGv_i32 dest, TCGv_i32 var);
+void clear_eci_state(DisasContext *s);
 
 static inline TCGv_i32 load_cpu_offset(int offset)
 {
diff --git a/target/arm/translate.h b/target/arm/translate.h
index 12c28b0d32c..2821b325e33 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -21,6 +21,15 @@ typedef struct DisasContext {
     /* Thumb-2 conditional execution bits.  */
     int condexec_mask;
     int condexec_cond;
+    /* M-profile ECI/ICI exception-continuable instruction state */
+    int eci;
+    /*
+     * trans_ functions for insns which are continuable should set this true
+     * after decode (ie after any UNDEF checks)
+     */
+    bool eci_handled;
+    /* TCG op to rewind to if this turns out to be an invalid ECI state */
+    TCGOp *insn_eci_rewind;
     int thumb;
     int sctlr_b;
     MemOp be_data;
diff --git a/target/arm/translate-m-nocp.c b/target/arm/translate-m-nocp.c
index 365810e582d..09b3be4ed31 100644
--- a/target/arm/translate-m-nocp.c
+++ b/target/arm/translate-m-nocp.c
@@ -75,8 +75,12 @@ static bool trans_VLLDM_VLSTM(DisasContext *s, arg_VLLDM_VLSTM *a)
         unallocated_encoding(s);
         return true;
     }
+
+    s->eci_handled = true;
+
     /* If no fpu, NOP. */
     if (!dc_isar_feature(aa32_vfp, s)) {
+        clear_eci_state(s);
         return true;
     }
 
@@ -88,6 +92,8 @@ static bool trans_VLLDM_VLSTM(DisasContext *s, arg_VLLDM_VLSTM *a)
     }
     tcg_temp_free_i32(fptr);
 
+    clear_eci_state(s);
+
     /* End the TB, because we have updated FP control bits */
     s->base.is_jmp = DISAS_UPDATE_EXIT;
     return true;
@@ -110,8 +116,11 @@ static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
         return true;
     }
 
+    s->eci_handled = true;
+
     if (!dc_isar_feature(aa32_vfp_simd, s)) {
         /* NOP if we have neither FP nor MVE */
+        clear_eci_state(s);
         return true;
     }
 
@@ -177,6 +186,8 @@ static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
         TCGv_i32 z32 = tcg_const_i32(0);
         store_cpu_field(z32, v7m.vpr);
     }
+
+    clear_eci_state(s);
     return true;
 }
 
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index c3504bd3b86..3a56639e708 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -1564,6 +1564,8 @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
         return false;
     }
 
+    s->eci_handled = true;
+
     if (!vfp_access_check(s)) {
         return true;
     }
@@ -1613,6 +1615,7 @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
         tcg_temp_free_i32(addr);
     }
 
+    clear_eci_state(s);
     return true;
 }
 
@@ -1647,6 +1650,8 @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
         return false;
     }
 
+    s->eci_handled = true;
+
     if (!vfp_access_check(s)) {
         return true;
     }
@@ -1703,6 +1708,7 @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
         tcg_temp_free_i32(addr);
     }
 
+    clear_eci_state(s);
     return true;
 }
 
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 8e0e55c1e0f..1a7a32c1be4 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -309,6 +309,20 @@ static inline bool is_singlestepping(DisasContext *s)
     return s->base.singlestep_enabled || s->ss_active;
 }
 
+void clear_eci_state(DisasContext *s)
+{
+    /*
+     * Clear any ECI/ICI state: used when a load multiple/store
+     * multiple insn executes.
+     */
+    if (s->eci) {
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        tcg_gen_movi_i32(tmp, 0);
+        store_cpu_field(tmp, condexec_bits);
+        s->eci = 0;
+    }
+}
+
 static void gen_smul_dual(TCGv_i32 a, TCGv_i32 b)
 {
     TCGv_i32 tmp1 = tcg_temp_new_i32();
@@ -6203,6 +6217,8 @@ static bool trans_BKPT(DisasContext *s, arg_BKPT *a)
     if (!ENABLE_ARCH_5) {
         return false;
     }
+    /* BKPT is OK with ECI set and leaves it untouched */
+    s->eci_handled = true;
     if (arm_dc_feature(s, ARM_FEATURE_M) &&
         semihosting_enabled() &&
 #ifndef CONFIG_USER_ONLY
@@ -7767,6 +7783,8 @@ static bool op_stm(DisasContext *s, arg_ldst_block *a, int min_n)
         return true;
     }
 
+    s->eci_handled = true;
+
     addr = op_addr_block_pre(s, a, n);
     mem_idx = get_mem_index(s);
 
@@ -7793,6 +7811,7 @@ static bool op_stm(DisasContext *s, arg_ldst_block *a, int min_n)
     }
 
     op_addr_block_post(s, a, addr, n);
+    clear_eci_state(s);
     return true;
 }
 
@@ -7847,6 +7866,8 @@ static bool do_ldm(DisasContext *s, arg_ldst_block *a, int min_n)
         return true;
     }
 
+    s->eci_handled = true;
+
     addr = op_addr_block_pre(s, a, n);
     mem_idx = get_mem_index(s);
     loaded_base = false;
@@ -7897,6 +7918,7 @@ static bool do_ldm(DisasContext *s, arg_ldst_block *a, int min_n)
         /* Must exit loop to check un-masked IRQs */
         s->base.is_jmp = DISAS_EXIT;
     }
+    clear_eci_state(s);
     return true;
 }
 
@@ -7952,6 +7974,8 @@ static bool trans_CLRM(DisasContext *s, arg_CLRM *a)
         return false;
     }
 
+    s->eci_handled = true;
+
     zero = tcg_const_i32(0);
     for (i = 0; i < 15; i++) {
         if (extract32(a->list, i, 1)) {
@@ -7969,6 +7993,7 @@ static bool trans_CLRM(DisasContext *s, arg_CLRM *a)
         tcg_temp_free_i32(maskreg);
     }
     tcg_temp_free_i32(zero);
+    clear_eci_state(s);
     return true;
 }
 
@@ -8150,6 +8175,9 @@ static bool trans_LE(DisasContext *s, arg_LE *a)
         return false;
     }
 
+    /* LE/LETP is OK with ECI set and leaves it untouched */
+    s->eci_handled = true;
+
     if (!a->f) {
         /* Not loop-forever. If LR <= 1 this is the last loop: do nothing. */
         arm_gen_condlabel(s);
@@ -8775,8 +8803,28 @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     dc->thumb = EX_TBFLAG_AM32(tb_flags, THUMB);
     dc->be_data = EX_TBFLAG_ANY(tb_flags, BE_DATA) ? MO_BE : MO_LE;
     condexec = EX_TBFLAG_AM32(tb_flags, CONDEXEC);
-    dc->condexec_mask = (condexec & 0xf) << 1;
-    dc->condexec_cond = condexec >> 4;
+    /*
+     * the CONDEXEC TB flags are CPSR bits [15:10][26:25]. On A-profile this
+     * is always the IT bits. On M-profile, some of the reserved encodings
+     * of IT are used instead to indicate either ICI or ECI, which
+     * indicate partial progress of a restartable insn that was interrupted
+     * partway through by an exception:
+     *  * if CONDEXEC[3:0] != 0b0000 : CONDEXEC is IT bits
+     *  * if CONDEXEC[3:0] == 0b0000 : CONDEXEC is ICI or ECI bits
+     * In all cases CONDEXEC == 0 means "not in IT block or restartable
+     * insn, behave normally".
+     */
+    if (condexec & 0xf) {
+        dc->condexec_mask = (condexec & 0xf) << 1;
+        dc->condexec_cond = condexec >> 4;
+        dc->eci = 0;
+    } else {
+        dc->condexec_mask = 0;
+        dc->condexec_cond = 0;
+        if (arm_feature(env, ARM_FEATURE_M)) {
+            dc->eci = condexec >> 4;
+        }
+    }
 
     core_mmu_idx = EX_TBFLAG_ANY(tb_flags, MMUIDX);
     dc->mmu_idx = core_to_arm_mmu_idx(env, core_mmu_idx);
@@ -8898,10 +8946,19 @@ static void arm_tr_tb_start(DisasContextBase *dcbase, CPUState *cpu)
 static void arm_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu)
 {
     DisasContext *dc = container_of(dcbase, DisasContext, base);
+    /*
+     * The ECI/ICI bits share PSR bits with the IT bits, so we
+     * need to reconstitute the bits from the split-out DisasContext
+     * fields here.
+     */
+    uint32_t condexec_bits;
 
-    tcg_gen_insn_start(dc->base.pc_next,
-                       (dc->condexec_cond << 4) | (dc->condexec_mask >> 1),
-                       0);
+    if (dc->eci) {
+        condexec_bits = dc->eci << 4;
+    } else {
+        condexec_bits = (dc->condexec_cond << 4) | (dc->condexec_mask >> 1);
+    }
+    tcg_gen_insn_start(dc->base.pc_next, condexec_bits, 0);
     dc->insn_start = tcg_last_op();
 }
 
@@ -9067,6 +9124,41 @@ static void thumb_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)
     }
     dc->insn = insn;
 
+    if (dc->eci) {
+        /*
+         * For M-profile continuable instructions, ECI/ICI handling
+         * falls into these cases:
+         *  - interrupt-continuable instructions
+         *     These are the various load/store multiple insns (both
+         *     integer and fp). The ICI bits indicate the register
+         *     where the load/store can resume. We make the IMPDEF
+         *     choice to always do "instruction restart", ie ignore
+         *     the ICI value and always execute the ldm/stm from the
+         *     start. So all we need to do is zero PSR.ICI if the
+         *     insn executes.
+         *  - MVE instructions subject to beat-wise execution
+         *     Here the ECI bits indicate which beats have already been
+         *     executed, and we must honour this. Each insn of this
+         *     type will handle it correctly. We will update PSR.ECI
+         *     in the helper function for the insn (some ECI values
+         *     mean that the following insn also has been partially
+         *     executed).
+         *  - Special cases which don't advance ECI
+         *     The insns LE, LETP and BKPT leave the ECI/ICI state
+         *     bits untouched.
+         *  - all other insns (the common case)
+         *     Non-zero ECI/ICI means an INVSTATE UsageFault.
+         *     We place a rewind-marker here. Insns in the previous
+         *     three categories will set a flag in the DisasContext.
+         *     If the flag isn't set after we call disas_thumb_insn()
+         *     or disas_thumb2_insn() then we know we have a "some other
+         *     insn" case. We will rewind to the marker (ie throwing away
+         *     all the generated code) and instead emit "take exception".
+         */
+        dc->eci_handled = false;
+        dc->insn_eci_rewind = tcg_last_op();
+    }
+
     if (dc->condexec_mask && !thumb_insn_is_unconditional(dc, insn)) {
         uint32_t cond = dc->condexec_cond;
 
@@ -9095,6 +9187,17 @@ static void thumb_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)
         }
     }
 
+    if (dc->eci && !dc->eci_handled) {
+        /*
+         * Insn wasn't valid for ECI/ICI at all: undo what we
+         * just generated and instead emit an exception
+         */
+        tcg_remove_ops_after(dc->insn_eci_rewind);
+        dc->condjmp = 0;
+        gen_exception_insn(dc, dc->pc_curr, EXCP_INVSTATE, syn_uncategorized(),
+                           default_exception_el(dc));
+    }
+
     arm_post_translate_insn(dc);
 
     /* Thumb is a variable-length ISA.  Stop translation when the next insn
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 05/55] target/arm: Let vfp_access_check() handle late NOCP checks
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (3 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 04/55] target/arm: Add handling for PSR.ECI/ICI Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-07 23:50   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 06/55] target/arm: Implement MVE LCTP Peter Maydell
                   ` (50 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

In commit a3494d4671797c we reworked the M-profile handling of its
checks for when the NOCP exception should be raised because the FPU
is disabled, so that (in line with the architecture) the NOCP check
is done early over a large range of the encoding space, and takes
precedence over UNDEF exceptions.  As part of this, we removed the
code from full_vfp_access_check() which raised an exception there for
M-profile with the FPU disabled, because it was no longer reachable.

For MVE, some instructions which are outside the "coprocessor space"
region of the encoding space must nonetheless do "is the FPU enabled"
checks and possibly raise a NOCP exception.  (In particular this
covers the MVE-specific low-overhead branch insns LCTP, DLSTP and
WLSTP.) To support these insns, reinstate the code in
full_vfp_access_check(), so that their trans functions can call
vfp_access_check() and get the correct behaviour.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-vfp.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index 3a56639e708..6a572591ce9 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -143,11 +143,21 @@ static void gen_preserve_fp_state(DisasContext *s)
 static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
 {
     if (s->fp_excp_el) {
-        /* M-profile handled this earlier, in disas_m_nocp() */
-        assert (!arm_dc_feature(s, ARM_FEATURE_M));
-        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
-                           syn_fp_access_trap(1, 0xe, false),
-                           s->fp_excp_el);
+        if (arm_dc_feature(s, ARM_FEATURE_M)) {
+            /*
+             * M-profile mostly catches the "FPU disabled" case early, in
+             * disas_m_nocp(), but a few insns (eg LCTP, WLSTP, DLSTP)
+             * which do coprocessor-checks are outside the large ranges of
+             * the encoding space handled by the patterns in m-nocp.decode,
+             * and for them we may need to raise NOCP here.
+             */
+            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
+                               syn_uncategorized(), s->fp_excp_el);
+        } else {
+            gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+                               syn_fp_access_trap(1, 0xe, false),
+                               s->fp_excp_el);
+        }
         return false;
     }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 06/55] target/arm: Implement MVE LCTP
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (4 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 05/55] target/arm: Let vfp_access_check() handle late NOCP checks Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08  0:05   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 07/55] target/arm: Implement MVE WLSTP insn Peter Maydell
                   ` (49 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE LCTP instruction.

We put its decode and implementation with the other
low-overhead-branch insns because although it is only present if MVE
is implemented it is logically in the same group as the other LOB
insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/t32.decode  |  2 ++
 target/arm/translate.c | 24 ++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index 8b2c487fa7a..087e514e0ac 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -674,5 +674,7 @@ BL               1111 0. .......... 11.1 ............         @branch24
     DLS          1111 0 0000 100     rn:4 1110 0000 0000 0001
     WLS          1111 0 0000 100     rn:4 1100 . .......... 1 imm=%lob_imm
     LE           1111 0 0000 0 f:1 0 1111 1100 . .......... 1 imm=%lob_imm
+
+    LCTP         1111 0 0000 000     1111 1110 0000 0000 0001
   ]
 }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 1a7a32c1be4..2f6c012f672 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -8192,6 +8192,30 @@ static bool trans_LE(DisasContext *s, arg_LE *a)
     return true;
 }
 
+static bool trans_LCTP(DisasContext *s, arg_LCTP *a)
+{
+    /*
+     * M-profile Loop Clear with Tail Predication. Since our implementation
+     * doesn't cache branch information, all we need to do is reset
+     * FPSCR.LTPSIZE to 4.
+     */
+    TCGv_i32 ltpsize;
+
+    if (!dc_isar_feature(aa32_lob, s) ||
+        !dc_isar_feature(aa32_mve, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    ltpsize = tcg_const_i32(4);
+    store_cpu_field(ltpsize, v7m.ltpsize);
+    return true;
+}
+
+
 static bool op_tbranch(DisasContext *s, arg_tbranch *a, bool half)
 {
     TCGv_i32 addr, tmp;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 07/55] target/arm: Implement MVE WLSTP insn
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (5 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 06/55] target/arm: Implement MVE LCTP Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08  1:42   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 08/55] target/arm: Implement MVE DLSTP Peter Maydell
                   ` (48 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE WLSTP insn; this is like the existing WLS insn,
except that it specifies a size value which is used to set
FPSCR.LTPSIZE.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/t32.decode  |  8 ++++++--
 target/arm/translate.c | 36 +++++++++++++++++++++++++++++++++++-
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index 087e514e0ac..4f0c686a3c3 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -672,8 +672,12 @@ BL               1111 0. .......... 11.1 ............         @branch24
     %lob_imm 1:10 11:1 !function=times_2
 
     DLS          1111 0 0000 100     rn:4 1110 0000 0000 0001
-    WLS          1111 0 0000 100     rn:4 1100 . .......... 1 imm=%lob_imm
-    LE           1111 0 0000 0 f:1 0 1111 1100 . .......... 1 imm=%lob_imm
+    WLS          1111 0 0000 100     rn:4 1100 . .......... 1 imm=%lob_imm size=4
+    {
+      # This is WLSTP
+      WLS        1111 0 0000 0 size:2 rn:4 1100 . .......... 1 imm=%lob_imm
+      LE         1111 0 0000 0 f:1 0 1111 1100 . .......... 1 imm=%lob_imm
+    }
 
     LCTP         1111 0 0000 000     1111 1110 0000 0000 0001
   ]
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 2f6c012f672..79ec185dd83 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -8135,7 +8135,11 @@ static bool trans_WLS(DisasContext *s, arg_WLS *a)
         return false;
     }
     if (a->rn == 13 || a->rn == 15) {
-        /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
+        /*
+         * For WLSTP rn == 15 is a related encoding (LE); the
+         * other cases caught by this condition are all
+         * CONSTRAINED UNPREDICTABLE: we choose to UNDEF
+         */
         return false;
     }
     if (s->condexec_mask) {
@@ -8148,10 +8152,40 @@ static bool trans_WLS(DisasContext *s, arg_WLS *a)
          */
         return false;
     }
+    if (a->size != 4) {
+        /* WLSTP */
+        if (!dc_isar_feature(aa32_mve, s)) {
+            return false;
+        }
+        /*
+         * We need to check that the FPU is enabled here, but mustn't
+         * call vfp_access_check() to do that because we don't want to
+         * do the lazy state preservation in the "loop count is zero" case.
+         * Do the check-and-raise-exception by hand.
+         */
+        if (s->fp_excp_el) {
+            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
+                               syn_uncategorized(), s->fp_excp_el);
+        }
+    }
+
     nextlabel = gen_new_label();
     tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_R[a->rn], 0, nextlabel);
     tmp = load_reg(s, a->rn);
     store_reg(s, 14, tmp);
+    if (a->size != 4) {
+        /*
+         * WLSTP: set FPSCR.LTPSIZE. This requires that we do the
+         * lazy state preservation, new FP context creation, etc,
+         * that vfp_access_check() does. We know that the actual
+         * access check will succeed (ie it won't generate code that
+         * throws an exception) because we did that check by hand earlier.
+         */
+        bool ok = vfp_access_check(s);
+        assert(ok);
+        tmp = tcg_const_i32(a->size);
+        store_cpu_field(tmp, v7m.ltpsize);
+    }
     gen_jmp_tb(s, s->base.pc_next, 1);
 
     gen_set_label(nextlabel);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 08/55] target/arm: Implement MVE DLSTP
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (6 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 07/55] target/arm: Implement MVE WLSTP insn Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08  2:56   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 09/55] target/arm: Implement MVE LETP insn Peter Maydell
                   ` (47 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE DLSTP insn; this is like the existing DLS
insn, except that it must do an FPU access check and it
sets LTPSIZE to the value specified in the insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/t32.decode  |  9 ++++++---
 target/arm/translate.c | 23 +++++++++++++++++++++--
 2 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index 4f0c686a3c3..8e1ca7d64a9 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -671,14 +671,17 @@ BL               1111 0. .......... 11.1 ............         @branch24
     # LE and WLS immediate
     %lob_imm 1:10 11:1 !function=times_2
 
-    DLS          1111 0 0000 100     rn:4 1110 0000 0000 0001
+    DLS          1111 0 0000 100     rn:4 1110 0000 0000 0001 size=4
     WLS          1111 0 0000 100     rn:4 1100 . .......... 1 imm=%lob_imm size=4
     {
       # This is WLSTP
       WLS        1111 0 0000 0 size:2 rn:4 1100 . .......... 1 imm=%lob_imm
       LE         1111 0 0000 0 f:1 0 1111 1100 . .......... 1 imm=%lob_imm
     }
-
-    LCTP         1111 0 0000 000     1111 1110 0000 0000 0001
+    {
+      # This is DLSTP
+      DLS        1111 0 0000 0 size:2 rn:4 1110 0000 0000 0001
+      LCTP       1111 0 0000 000     1111 1110 0000 0000 0001
+    }
   ]
 }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 79ec185dd83..976c665be9c 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -8115,13 +8115,32 @@ static bool trans_DLS(DisasContext *s, arg_DLS *a)
         return false;
     }
     if (a->rn == 13 || a->rn == 15) {
-        /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
+        /*
+         * For DLSTP rn == 15 is a related encoding (LCTP); the
+         * other cases caught by this condition are all
+         * CONSTRAINED UNPREDICTABLE: we choose to UNDEF
+         */
         return false;
     }
 
-    /* Not a while loop, no tail predication: just set LR to the count */
+    if (a->size != 4) {
+        /* DLSTP */
+        if (!dc_isar_feature(aa32_mve, s)) {
+            return false;
+        }
+        if (!vfp_access_check(s)) {
+            return true;
+        }
+    }
+
+    /* Not a while loop: set LR to the count, and set LTPSIZE for DLSTP */
     tmp = load_reg(s, a->rn);
     store_reg(s, 14, tmp);
+    if (a->size != 4) {
+        /* DLSTP: set FPSCR.LTPSIZE */
+        tmp = tcg_const_i32(a->size);
+        store_cpu_field(tmp, v7m.ltpsize);
+    }
     return true;
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 09/55] target/arm: Implement MVE LETP insn
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (7 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 08/55] target/arm: Implement MVE DLSTP Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08  3:40   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 10/55] target/arm: Add framework for MVE decode Peter Maydell
                   ` (46 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE LETP insn.  This is like the existing LE loop-end
insn, but it must perform an FPU-enabled check, and on loop-exit it
resets LTPSIZE to 4.

To accommodate the requirement to do something on loop-exit, we drop
the use of condlabel and instead manage both the TB exits manually,
in the same way we already do in trans_WLS().

The other MVE-specific change to the LE insn is that we must raise an
INVSTATE UsageFault insn if LTPSIZE is not 4.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
This amounts to a complete rewrite of trans_LE()...
---
 target/arm/t32.decode  |   2 +-
 target/arm/translate.c | 104 +++++++++++++++++++++++++++++++++++++----
 2 files changed, 97 insertions(+), 9 deletions(-)

diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index 8e1ca7d64a9..4115e08ce99 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -676,7 +676,7 @@ BL               1111 0. .......... 11.1 ............         @branch24
     {
       # This is WLSTP
       WLS        1111 0 0000 0 size:2 rn:4 1100 . .......... 1 imm=%lob_imm
-      LE         1111 0 0000 0 f:1 0 1111 1100 . .......... 1 imm=%lob_imm
+      LE         1111 0 0000 0 f:1 tp:1 1111 1100 . .......... 1 imm=%lob_imm
     }
     {
       # This is DLSTP
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 976c665be9c..6d70c89961a 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -8223,25 +8223,113 @@ static bool trans_LE(DisasContext *s, arg_LE *a)
      * any faster.
      */
     TCGv_i32 tmp;
+    TCGLabel *loopend;
+    bool fpu_active;
 
     if (!dc_isar_feature(aa32_lob, s)) {
         return false;
     }
+    if (a->f && a->tp) {
+        return false;
+    }
+    if (s->condexec_mask) {
+        /*
+         * LE in an IT block is CONSTRAINED UNPREDICTABLE;
+         * we choose to UNDEF, because otherwise our use of
+         * gen_goto_tb(1) would clash with the use of TB exit 1
+         * in the dc->condjmp condition-failed codepath in
+         * arm_tr_tb_stop() and we'd get an assertion.
+         */
+        return false;
+    }
+    if (a->tp) {
+        /* LETP */
+        if (!dc_isar_feature(aa32_mve, s)) {
+            return false;
+        }
+        if (!vfp_access_check(s)) {
+            s->eci_handled = true;
+            return true;
+        }
+    }
 
     /* LE/LETP is OK with ECI set and leaves it untouched */
     s->eci_handled = true;
 
-    if (!a->f) {
-        /* Not loop-forever. If LR <= 1 this is the last loop: do nothing. */
-        arm_gen_condlabel(s);
-        tcg_gen_brcondi_i32(TCG_COND_LEU, cpu_R[14], 1, s->condlabel);
-        /* Decrement LR */
-        tmp = load_reg(s, 14);
-        tcg_gen_addi_i32(tmp, tmp, -1);
-        store_reg(s, 14, tmp);
+    /*
+     * With MVE, LTPSIZE might not be 4, and we must emit an INVSTATE
+     * UsageFault exception for the LE insn in that case. Note that we
+     * are not directly checking FPSCR.LTPSIZE but instead check the
+     * pseudocode LTPSIZE() function, which returns 4 if the FPU is
+     * not currently active (ie ActiveFPState() returns false). We
+     * can identify not-active purely from our TB state flags, as the
+     * FPU is active only if:
+     *  the FPU is enabled
+     *  AND lazy state preservation is not active
+     *  AND we do not need a new fp context (this is the ASPEN/FPCA check)
+     *
+     * Usually we don't need to care about this distinction between
+     * LTPSIZE and FPSCR.LTPSIZE, because the code in vfp_access_check()
+     * will either take an exception or clear the conditions that make
+     * the FPU not active. But LE is an unusual case of a non-FP insn
+     * that looks at LTPSIZE.
+     */
+    fpu_active = !s->fp_excp_el && !s->v7m_lspact && !s->v7m_new_fp_ctxt_needed;
+
+    if (!a->tp && dc_isar_feature(aa32_mve, s) && fpu_active) {
+        /* Need to do a runtime check for LTPSIZE != 4 */
+        TCGLabel *skipexc = gen_new_label();
+        tmp = load_cpu_field(v7m.ltpsize);
+        tcg_gen_brcondi_i32(TCG_COND_EQ, tmp, 4, skipexc);
+        tcg_temp_free_i32(tmp);
+        gen_exception_insn(s, s->pc_curr, EXCP_INVSTATE, syn_uncategorized(),
+                           default_exception_el(s));
+        gen_set_label(skipexc);
+    }
+
+    if (a->f) {
+        /* Loop-forever: just jump back to the loop start */
+        gen_jmp(s, read_pc(s) - a->imm);
+        return true;
+    }
+
+    /*
+     * Not loop-forever. If LR <= loop-decrement-value this is the last loop.
+     * For LE, we know at this point that LTPSIZE must be 4 and the
+     * loop decrement value is 1. For LETP we need to calculate the decrement
+     * value from LTPSIZE.
+     */
+    loopend = gen_new_label();
+    if (!a->tp) {
+        tcg_gen_brcondi_i32(TCG_COND_LEU, cpu_R[14], 1, loopend);
+        tcg_gen_addi_i32(cpu_R[14], cpu_R[14], -1);
+    } else {
+        /*
+         * Decrement by 1 << (4 - LTPSIZE). We need to use a TCG local
+         * so that decr stays live after the brcondi.
+         */
+        TCGv_i32 decr = tcg_temp_local_new_i32();
+        TCGv_i32 ltpsize = load_cpu_field(v7m.ltpsize);
+        tcg_gen_sub_i32(decr, tcg_constant_i32(4), ltpsize);
+        tcg_gen_shl_i32(decr, tcg_constant_i32(1), decr);
+        tcg_temp_free_i32(ltpsize);
+
+        tcg_gen_brcond_i32(TCG_COND_LEU, cpu_R[14], decr, loopend);
+
+        tcg_gen_sub_i32(cpu_R[14], cpu_R[14], decr);
+        tcg_temp_free_i32(decr);
     }
     /* Jump back to the loop start */
     gen_jmp(s, read_pc(s) - a->imm);
+
+    gen_set_label(loopend);
+    if (a->tp) {
+        /* Exits from tail-pred loops must reset LTPSIZE to 4 */
+        tmp = tcg_const_i32(4);
+        store_cpu_field(tmp, v7m.ltpsize);
+    }
+    /* End TB, continuing to following insn */
+    gen_jmp_tb(s, s->base.pc_next, 1);
     return true;
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 10/55] target/arm: Add framework for MVE decode
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (8 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 09/55] target/arm: Implement MVE LETP insn Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08  3:59   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 11/55] target/arm: Implement MVE VLDR/VSTR (non-widening forms) Peter Maydell
                   ` (45 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Add the framework for decoding MVE insns, with the necessary new
files and the meson.build rules, but no actual content yet.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a32.h |  1 +
 target/arm/mve.decode      | 20 ++++++++++++++++++++
 target/arm/translate-mve.c | 29 +++++++++++++++++++++++++++++
 target/arm/translate.c     |  1 +
 target/arm/meson.build     |  2 ++
 5 files changed, 53 insertions(+)
 create mode 100644 target/arm/mve.decode
 create mode 100644 target/arm/translate-mve.c

diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
index c946ac440ce..0a0053949f5 100644
--- a/target/arm/translate-a32.h
+++ b/target/arm/translate-a32.h
@@ -22,6 +22,7 @@
 
 /* Prototypes for autogenerated disassembler functions */
 bool disas_m_nocp(DisasContext *dc, uint32_t insn);
+bool disas_mve(DisasContext *dc, uint32_t insn);
 bool disas_vfp(DisasContext *s, uint32_t insn);
 bool disas_vfp_uncond(DisasContext *s, uint32_t insn);
 bool disas_neon_dp(DisasContext *s, uint32_t insn);
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
new file mode 100644
index 00000000000..c8492bb5763
--- /dev/null
+++ b/target/arm/mve.decode
@@ -0,0 +1,20 @@
+# M-profile MVE instruction descriptions
+#
+#  Copyright (c) 2021 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+#
+# This file is processed by scripts/decodetree.py
+#
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
new file mode 100644
index 00000000000..c54d5cb7305
--- /dev/null
+++ b/target/arm/translate-mve.c
@@ -0,0 +1,29 @@
+/*
+ *  ARM translation: M-profile MVE instructions
+
+ *  Copyright (c) 2021 Linaro, Ltd.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "tcg/tcg-op.h"
+#include "tcg/tcg-op-gvec.h"
+#include "exec/exec-all.h"
+#include "exec/gen-icount.h"
+#include "translate.h"
+#include "translate-a32.h"
+
+/* Include the generated decoder */
+#include "decode-mve.c.inc"
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 6d70c89961a..ee17125465b 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -8919,6 +8919,7 @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
     if (disas_t32(s, insn) ||
         disas_vfp_uncond(s, insn) ||
         disas_neon_shared(s, insn) ||
+        disas_mve(s, insn) ||
         ((insn >> 28) == 0xe && disas_vfp(s, insn))) {
         return;
     }
diff --git a/target/arm/meson.build b/target/arm/meson.build
index 5bfaf43b500..2b50be3f862 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -6,6 +6,7 @@ gen = [
   decodetree.process('vfp.decode', extra_args: '--decode=disas_vfp'),
   decodetree.process('vfp-uncond.decode', extra_args: '--decode=disas_vfp_uncond'),
   decodetree.process('m-nocp.decode', extra_args: '--decode=disas_m_nocp'),
+  decodetree.process('mve.decode', extra_args: '--decode=disas_mve'),
   decodetree.process('a32.decode', extra_args: '--static-decode=disas_a32'),
   decodetree.process('a32-uncond.decode', extra_args: '--static-decode=disas_a32_uncond'),
   decodetree.process('t32.decode', extra_args: '--static-decode=disas_t32'),
@@ -27,6 +28,7 @@ arm_ss.add(files(
   'tlb_helper.c',
   'translate.c',
   'translate-m-nocp.c',
+  'translate-mve.c',
   'translate-neon.c',
   'translate-vfp.c',
   'vec_helper.c',
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 11/55] target/arm: Implement MVE VLDR/VSTR (non-widening forms)
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (9 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 10/55] target/arm: Add framework for MVE decode Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08 21:33   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 12/55] target/arm: Implement widening/narrowing MVE VLDR/VSTR insns Peter Maydell
                   ` (44 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the forms of the MVE VLDR and VSTR insns which perform
non-widening loads of bytes, halfwords or words from memory into
vector elements of the same width (encodings T5, T6, T7).

(At the moment we know for MVE and M-profile in general that
vfp_access_check() can never return false, but we include the
conventional return-true-on-failure check for consistency
with non-M-profile translation code.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/{translate-mve.c => helper-mve.h} |  21 +--
 target/arm/helper.h                          |   2 +
 target/arm/internals.h                       |  11 ++
 target/arm/mve.decode                        |  22 +++
 target/arm/mve_helper.c                      | 188 +++++++++++++++++++
 target/arm/translate-mve.c                   | 124 +++++++++++-
 target/arm/meson.build                       |   1 +
 7 files changed, 355 insertions(+), 14 deletions(-)
 copy target/arm/{translate-mve.c => helper-mve.h} (61%)
 create mode 100644 target/arm/mve_helper.c

diff --git a/target/arm/translate-mve.c b/target/arm/helper-mve.h
similarity index 61%
copy from target/arm/translate-mve.c
copy to target/arm/helper-mve.h
index c54d5cb7305..9e3b0b09afd 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/helper-mve.h
@@ -1,6 +1,6 @@
 /*
- *  ARM translation: M-profile MVE instructions
-
+ *  M-profile MVE specific helper definitions
+ *
  *  Copyright (c) 2021 Linaro, Ltd.
  *
  * This library is free software; you can redistribute it and/or
@@ -16,14 +16,9 @@
  * You should have received a copy of the GNU Lesser General Public
  * License along with this library; if not, see <http://www.gnu.org/licenses/>.
  */
-
-#include "qemu/osdep.h"
-#include "tcg/tcg-op.h"
-#include "tcg/tcg-op-gvec.h"
-#include "exec/exec-all.h"
-#include "exec/gen-icount.h"
-#include "translate.h"
-#include "translate-a32.h"
-
-/* Include the generated decoder */
-#include "decode-mve.c.inc"
+DEF_HELPER_FLAGS_3(mve_vldrb, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vldrh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vldrw, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vstrb, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vstrh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vstrw, TCG_CALL_NO_WG, void, env, ptr, i32)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index dc6eb96d439..db87d7d5376 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -1019,3 +1019,5 @@ DEF_HELPER_FLAGS_6(gvec_bfmlal_idx, TCG_CALL_NO_RWG,
 #include "helper-a64.h"
 #include "helper-sve.h"
 #endif
+
+#include "helper-mve.h"
diff --git a/target/arm/internals.h b/target/arm/internals.h
index 886db56b580..3ba86e8af81 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1202,4 +1202,15 @@ static inline uint64_t useronly_maybe_clean_ptr(uint32_t desc, uint64_t ptr)
     return ptr;
 }
 
+/* Values for M-profile PSR.ECI for MVE insns */
+enum MVEECIState {
+    ECI_NONE = 0, /* No completed beats */
+    ECI_A0 = 1, /* Completed: A0 */
+    ECI_A0A1 = 2, /* Completed: A0, A1 */
+    /* 3 is reserved */
+    ECI_A0A1A2 = 4, /* Completed: A0, A1, A2 */
+    ECI_A0A1A2B0 = 5, /* Completed: A0, A1, A2, B0 */
+    /* All other values reserved */
+};
+
 #endif
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index c8492bb5763..858a161fd7e 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -18,3 +18,25 @@
 #
 # This file is processed by scripts/decodetree.py
 #
+
+%qd 22:1 13:3
+
+&vldr_vstr rn qd imm p a w size l
+
+@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd
+
+# Vector loads and stores
+
+# Non-widening loads/stores (P=0 W=0 is 'related encoding')
+VLDR_VSTR        1110110 0 a:1 . 1   . .... ... 111100 .......   @vldr_vstr \
+                 size=0 p=0 w=1
+VLDR_VSTR        1110110 0 a:1 . 1   . .... ... 111101 .......   @vldr_vstr \
+                 size=1 p=0 w=1
+VLDR_VSTR        1110110 0 a:1 . 1   . .... ... 111110 .......   @vldr_vstr \
+                 size=2 p=0 w=1
+VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111100 .......   @vldr_vstr \
+                 size=0 p=1
+VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111101 .......   @vldr_vstr \
+                 size=1 p=1
+VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
+                 size=2 p=1
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
new file mode 100644
index 00000000000..575afce8fee
--- /dev/null
+++ b/target/arm/mve_helper.c
@@ -0,0 +1,188 @@
+/*
+ * M-profile MVE Operations
+ *
+ * Copyright (c) 2021 Linaro, Ltd.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "internals.h"
+#include "exec/helper-proto.h"
+#include "exec/cpu_ldst.h"
+#include "exec/exec-all.h"
+
+/*
+ * Note that vector data is stored in host-endian 64-bit chunks,
+ * so addressing units smaller than that needs a host-endian fixup.
+ */
+#ifdef HOST_WORDS_BIGENDIAN
+#define H1(x)  ((x) ^ 7)
+#define H2(x)  ((x) ^ 3)
+#define H4(x)  ((x) ^ 1)
+#else
+#define H1(x)  (x)
+#define H2(x)  (x)
+#define H4(x)  (x)
+#endif
+
+static uint16_t mve_element_mask(CPUARMState *env)
+{
+    /*
+     * Return the mask of which elements in the MVE vector should be
+     * updated. This is a combination of multiple things:
+     *  (1) by default, we update every lane in the vector
+     *  (2) VPT predication stores its state in the VPR register;
+     *  (3) low-overhead-branch tail predication will mask out part
+     *      the vector on the final iteration of the loop
+     *  (4) if EPSR.ECI is set then we must execute only some beats
+     *      of the insn
+     * We combine all these into a 16-bit result with the same semantics
+     * as VPR.P0: 0 to mask the lane, 1 if it is active.
+     * 8-bit vector ops will look at all bits of the result;
+     * 16-bit ops will look at bits 0, 2, 4, ...;
+     * 32-bit ops will look at bits 0, 4, 8 and 12.
+     * Compare pseudocode GetCurInstrBeat(), though that only returns
+     * the 4-bit slice of the mask corresponding to a single beat.
+     */
+    uint16_t mask = extract32(env->v7m.vpr, R_V7M_VPR_P0_SHIFT,
+                              R_V7M_VPR_P0_LENGTH);
+
+    if (!(env->v7m.vpr & R_V7M_VPR_MASK01_MASK)) {
+        mask |= 0xff;
+    }
+    if (!(env->v7m.vpr & R_V7M_VPR_MASK23_MASK)) {
+        mask |= 0xff00;
+    }
+
+    if (env->v7m.ltpsize < 4 &&
+        env->regs[14] <= (1 << (4 - env->v7m.ltpsize))) {
+        /*
+         * Tail predication active, and this is the last loop iteration.
+         * The element size is (1 << ltpsize), and we only want to process
+         * loopcount elements, so we want to retain the least significant
+         * (loopcount * esize) predicate bits and zero out bits above that.
+         */
+        int masklen = env->regs[14] << env->v7m.ltpsize;
+        assert(masklen <= 16);
+        mask &= MAKE_64BIT_MASK(0, masklen);
+    }
+
+    if ((env->condexec_bits & 0xf) == 0) {
+        /*
+         * ECI bits indicate which beats are already executed;
+         * we handle this by effectively predicating them out.
+         */
+        int eci = env->condexec_bits >> 4;
+        switch (eci) {
+        case ECI_NONE:
+            break;
+        case ECI_A0:
+            mask &= 0xfff0;
+            break;
+        case ECI_A0A1:
+            mask &= 0xff00;
+            break;
+        case ECI_A0A1A2:
+        case ECI_A0A1A2B0:
+            mask &= 0xf000;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
+
+    return mask;
+}
+
+static void mve_advance_vpt(CPUARMState *env)
+{
+    /* Advance the VPT and ECI state if necessary */
+    uint32_t vpr = env->v7m.vpr;
+    unsigned mask01, mask23;
+
+    if ((env->condexec_bits & 0xf) == 0) {
+        env->condexec_bits = (env->condexec_bits == (ECI_A0A1A2B0 << 4)) ?
+            (ECI_A0 << 4) : (ECI_NONE << 4);
+    }
+
+    if (!(vpr & (R_V7M_VPR_MASK01_MASK | R_V7M_VPR_MASK23_MASK))) {
+        /* VPT not enabled, nothing to do */
+        return;
+    }
+
+    mask01 = extract32(vpr, R_V7M_VPR_MASK01_SHIFT, R_V7M_VPR_MASK01_LENGTH);
+    mask23 = extract32(vpr, R_V7M_VPR_MASK23_SHIFT, R_V7M_VPR_MASK23_LENGTH);
+    if (mask01 > 8) {
+        /* high bit set, but not 0b1000: invert the relevant half of P0 */
+        vpr ^= 0xff;
+    }
+    if (mask23 > 8) {
+        /* high bit set, but not 0b1000: invert the relevant half of P0 */
+        vpr ^= 0xff00;
+    }
+    vpr = deposit32(vpr, R_V7M_VPR_MASK01_SHIFT, R_V7M_VPR_MASK01_LENGTH,
+                    mask01 << 1);
+    vpr = deposit32(vpr, R_V7M_VPR_MASK23_SHIFT, R_V7M_VPR_MASK23_LENGTH,
+                    mask23 << 1);
+    env->v7m.vpr = vpr;
+}
+
+
+#define DO_VLDR(OP, ESIZE, LDTYPE, TYPE, H)                             \
+    void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr)    \
+    {                                                                   \
+        TYPE *d = vd;                                                   \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned b, e;                                                  \
+        /*                                                              \
+         * R_SXTM allows the dest reg to become UNKNOWN for abandoned   \
+         * beats so we don't care if we update part of the dest and     \
+         * then take an exception.                                      \
+         */                                                             \
+        for (b = 0, e = 0; b < 16; b += ESIZE, e++) {                   \
+            if (mask & (1 << b)) {                                      \
+                d[H(e)] = cpu_##LDTYPE##_data_ra(env, addr, GETPC());   \
+                addr += ESIZE;                                          \
+            }                                                           \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+#define DO_VSTR(OP, ESIZE, STTYPE, TYPE, H)                             \
+    void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr)    \
+    {                                                                   \
+        TYPE *d = vd;                                                   \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned b, e;                                                  \
+        for (b = 0, e = 0; b < 16; b += ESIZE, e++) {                   \
+            if (mask & (1 << b)) {                                      \
+                cpu_##STTYPE##_data_ra(env, addr, d[H(e)], GETPC());    \
+                addr += ESIZE;                                          \
+            }                                                           \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+DO_VLDR(vldrb, 1, ldub, uint8_t, H1)
+DO_VLDR(vldrh, 2, lduw, uint16_t, H2)
+DO_VLDR(vldrw, 4, ldl, uint32_t, H4)
+
+DO_VSTR(vstrb, 1, stb, uint8_t, H1)
+DO_VSTR(vstrh, 2, stw, uint16_t, H2)
+DO_VSTR(vstrw, 4, stl, uint32_t, H4)
+
+#undef DO_VLDR
+#undef DO_VSTR
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index c54d5cb7305..e8bb2372ad9 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -1,6 +1,6 @@
 /*
  *  ARM translation: M-profile MVE instructions
-
+ *
  *  Copyright (c) 2021 Linaro, Ltd.
  *
  * This library is free software; you can redistribute it and/or
@@ -27,3 +27,125 @@
 
 /* Include the generated decoder */
 #include "decode-mve.c.inc"
+
+typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
+
+/* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
+static inline long mve_qreg_offset(unsigned reg)
+{
+    return offsetof(CPUARMState, vfp.zregs[reg].d[0]);
+}
+
+static TCGv_ptr mve_qreg_ptr(unsigned reg)
+{
+    TCGv_ptr ret = tcg_temp_new_ptr();
+    tcg_gen_addi_ptr(ret, cpu_env, mve_qreg_offset(reg));
+    return ret;
+}
+
+static bool mve_eci_check(DisasContext *s)
+{
+    /*
+     * This is a beatwise insn: check that ECI is valid (not a
+     * reserved value) and note that we are handling it.
+     * Return true if OK, false if we generated an exception.
+     */
+    s->eci_handled = true;
+    switch (s->eci) {
+    case ECI_NONE:
+    case ECI_A0:
+    case ECI_A0A1:
+    case ECI_A0A1A2:
+    case ECI_A0A1A2B0:
+        return true;
+    default:
+        /* Reserved value: INVSTATE UsageFault */
+        gen_exception_insn(s, s->pc_curr, EXCP_INVSTATE, syn_uncategorized(),
+                           default_exception_el(s));
+        return false;
+    }
+}
+
+static void mve_update_eci(DisasContext *s)
+{
+    /*
+     * The helper function will always update the CPUState field,
+     * so we only need to update the DisasContext field.
+     */
+    if (s->eci) {
+        s->eci = (s->eci == ECI_A0A1A2B0) ? ECI_A0 : ECI_NONE;
+    }
+}
+
+static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
+{
+    TCGv_i32 addr;
+    uint32_t offset;
+    TCGv_ptr qreg;
+
+    if (!dc_isar_feature(aa32_mve, s)) {
+        return false;
+    }
+
+    if (a->qd > 7 || !fn) {
+        return false;
+    }
+
+    /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
+    if (a->rn == 15 || (a->rn == 13 && a->w)) {
+        return false;
+    }
+
+    if (!mve_eci_check(s)) {
+        return true;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    offset = a->imm << a->size;
+    if (!a->a) {
+        offset = -offset;
+    }
+    addr = load_reg(s, a->rn);
+    if (a->p) {
+        tcg_gen_addi_i32(addr, addr, offset);
+    }
+
+    qreg = mve_qreg_ptr(a->qd);
+    fn(cpu_env, qreg, addr);
+    tcg_temp_free_ptr(qreg);
+
+    /*
+     * Writeback always happens after the last beat of the insn,
+     * regardless of predication
+     */
+    if (a->w) {
+        if (!a->p) {
+            tcg_gen_addi_i32(addr, addr, offset);
+        }
+        store_reg(s, a->rn, addr);
+    } else {
+        tcg_temp_free_i32(addr);
+    }
+    mve_update_eci(s);
+    return true;
+}
+
+static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
+{
+    MVEGenLdStFn *ldfns[] = {
+        gen_helper_mve_vldrb,
+        gen_helper_mve_vldrh,
+        gen_helper_mve_vldrw,
+        NULL,
+    };
+    MVEGenLdStFn *stfns[] = {
+        gen_helper_mve_vstrb,
+        gen_helper_mve_vstrh,
+        gen_helper_mve_vstrw,
+        NULL,
+    };
+    return do_ldst(s, a, a->l ? ldfns[a->size] : stfns[a->size]);
+}
diff --git a/target/arm/meson.build b/target/arm/meson.build
index 2b50be3f862..25a02bf2769 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -23,6 +23,7 @@ arm_ss.add(files(
   'helper.c',
   'iwmmxt_helper.c',
   'm_helper.c',
+  'mve_helper.c',
   'neon_helper.c',
   'op_helper.c',
   'tlb_helper.c',
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 12/55] target/arm: Implement widening/narrowing MVE VLDR/VSTR insns
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (10 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 11/55] target/arm: Implement MVE VLDR/VSTR (non-widening forms) Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08 21:46   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 13/55] target/arm: Implement MVE VCLZ Peter Maydell
                   ` (43 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the variants of MVE VLDR (encodings T1, T2) which perform
"widening" loads where bytes or halfwords are loaded from memory and
zero or sign-extended into halfword or word length vector elements,
and the narrowing MVE VSTR (encodings T1, T2) where bytes or
halfwords are stored from halfword or word elements.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    | 10 ++++++++++
 target/arm/mve.decode      | 25 +++++++++++++++++++++++--
 target/arm/mve_helper.c    | 10 ++++++++++
 target/arm/translate-mve.c | 18 ++++++++++++++++++
 4 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 9e3b0b09afd..e47d4164ae7 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -22,3 +22,13 @@ DEF_HELPER_FLAGS_3(mve_vldrw, TCG_CALL_NO_WG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vstrb, TCG_CALL_NO_WG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vstrh, TCG_CALL_NO_WG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vstrw, TCG_CALL_NO_WG, void, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(mve_vldrb_sh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vldrb_sw, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vldrb_uh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vldrb_uw, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vldrh_sw, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vldrh_uw, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vstrb_h, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vstrb_w, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 858a161fd7e..3bc5f034531 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -21,12 +21,33 @@
 
 %qd 22:1 13:3
 
-&vldr_vstr rn qd imm p a w size l
+&vldr_vstr rn qd imm p a w size l u
 
-@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd
+@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
+# Note that both Rn and Qd are 3 bits only (no D bit)
+@vldst_wn ... u:1 ... . . . . l:1 . rn:3 qd:3 . ... .. imm:7 &vldr_vstr
 
 # Vector loads and stores
 
+# Widening loads and narrowing stores:
+# for these P=0 W=0 is 'related encoding'; sz=11 is 'related encoding'
+# This means we need to expand out to multiple patterns for P, W, SZ.
+# For stores the U bit must be 0 but we catch that in the trans_ function.
+# The naming scheme here is "VLDSTB_H == in-memory byte load/store to/from
+# signed halfword element in register", etc.
+VLDSTB_H         111 . 110 0 a:1 0 1   . 0 ... ... 0 111 01 ....... @vldst_wn \
+                 p=0 w=1 size=1
+VLDSTB_H         111 . 110 1 a:1 0 w:1 . 0 ... ... 0 111 01 ....... @vldst_wn \
+                 p=1 size=1
+VLDSTB_W         111 . 110 0 a:1 0 1   . 0 ... ... 0 111 10 ....... @vldst_wn \
+                 p=0 w=1 size=2
+VLDSTB_W         111 . 110 1 a:1 0 w:1 . 0 ... ... 0 111 10 ....... @vldst_wn \
+                 p=1 size=2
+VLDSTH_W         111 . 110 0 a:1 0 1   . 1 ... ... 0 111 10 ....... @vldst_wn \
+                 p=0 w=1 size=2
+VLDSTH_W         111 . 110 1 a:1 0 w:1 . 1 ... ... 0 111 10 ....... @vldst_wn \
+                 p=1 size=2
+
 # Non-widening loads/stores (P=0 W=0 is 'related encoding')
 VLDR_VSTR        1110110 0 a:1 . 1   . .... ... 111100 .......   @vldr_vstr \
                  size=0 p=0 w=1
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 575afce8fee..6a2fc1c37cd 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -180,9 +180,19 @@ DO_VLDR(vldrb, 1, ldub, uint8_t, H1)
 DO_VLDR(vldrh, 2, lduw, uint16_t, H2)
 DO_VLDR(vldrw, 4, ldl, uint32_t, H4)
 
+DO_VLDR(vldrb_sh, 2, ldsb, int16_t, H2)
+DO_VLDR(vldrb_sw, 4, ldsb, int32_t, H4)
+DO_VLDR(vldrb_uh, 2, ldub, uint16_t, H2)
+DO_VLDR(vldrb_uw, 4, ldub, uint32_t, H4)
+DO_VLDR(vldrh_sw, 4, ldsw, int32_t, H4)
+DO_VLDR(vldrh_uw, 4, lduw, uint32_t, H4)
+
 DO_VSTR(vstrb, 1, stb, uint8_t, H1)
 DO_VSTR(vstrh, 2, stw, uint16_t, H2)
 DO_VSTR(vstrw, 4, stl, uint32_t, H4)
+DO_VSTR(vstrb_h, 2, stb, int16_t, H2)
+DO_VSTR(vstrb_w, 4, stb, int32_t, H4)
+DO_VSTR(vstrh_w, 4, stw, int32_t, H4)
 
 #undef DO_VLDR
 #undef DO_VSTR
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index e8bb2372ad9..14206893d5f 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -149,3 +149,21 @@ static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
     };
     return do_ldst(s, a, a->l ? ldfns[a->size] : stfns[a->size]);
 }
+
+#define DO_VLDST_WIDE_NARROW(OP, SLD, ULD, ST)                          \
+    static bool trans_##OP(DisasContext *s, arg_VLDR_VSTR *a)           \
+    {                                                                   \
+        MVEGenLdStFn *ldfns[] = {                                       \
+            gen_helper_mve_##SLD,                                       \
+            gen_helper_mve_##ULD,                                       \
+        };                                                              \
+        MVEGenLdStFn *stfns[] = {                                       \
+            gen_helper_mve_##ST,                                        \
+            NULL,                                                       \
+        };                                                              \
+        return do_ldst(s, a, a->l ? ldfns[a->u] : stfns[a->u]);         \
+    }
+
+DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h)
+DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w)
+DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 13/55] target/arm: Implement MVE VCLZ
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (11 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 12/55] target/arm: Implement widening/narrowing MVE VLDR/VSTR insns Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08 22:10   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 14/55] target/arm: Implement MVE VCLS Peter Maydell
                   ` (42 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VCLZ insn (and the necessary machinery
for MVE 1-input vector ops).

Note that for non-load instructions predication is always performed
at a byte level granularity regardless of element size (R_ZLSJ),
and so the masking logic here differs from that used in the VLDR
and VSTR helpers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  4 ++++
 target/arm/mve.decode      |  8 +++++++
 target/arm/mve_helper.c    | 48 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 43 ++++++++++++++++++++++++++++++++++
 4 files changed, 103 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index e47d4164ae7..c5c1315b161 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -32,3 +32,7 @@ DEF_HELPER_FLAGS_3(mve_vldrh_uw, TCG_CALL_NO_WG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vstrb_h, TCG_CALL_NO_WG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vstrb_w, TCG_CALL_NO_WG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(mve_vclzb, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vclzh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vclzw, TCG_CALL_NO_WG, void, env, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 3bc5f034531..24999bf703e 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -20,13 +20,17 @@
 #
 
 %qd 22:1 13:3
+%qm 5:1 1:3
 
 &vldr_vstr rn qd imm p a w size l u
+&1op qd qm size
 
 @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
 # Note that both Rn and Qd are 3 bits only (no D bit)
 @vldst_wn ... u:1 ... . . . . l:1 . rn:3 qd:3 . ... .. imm:7 &vldr_vstr
 
+@1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
+
 # Vector loads and stores
 
 # Widening loads and narrowing stores:
@@ -61,3 +65,7 @@ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111101 .......   @vldr_vstr \
                  size=1 p=1
 VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
                  size=2 p=1
+
+# Vector miscellaneous
+
+VCLZ             1111 1111 1 . 11 .. 00 ... 0 0100 11 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 6a2fc1c37cd..b7c44f57c09 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -196,3 +196,51 @@ DO_VSTR(vstrh_w, 4, stw, int32_t, H4)
 
 #undef DO_VLDR
 #undef DO_VSTR
+
+/*
+ * Take the bottom bits of mask (which is 1 bit per lane) and
+ * convert to a mask which has 1s in each byte which is predicated.
+ */
+static uint8_t mask_to_bytemask1(uint16_t mask)
+{
+    return (mask & 1) ? 0xff : 0;
+}
+
+static uint16_t mask_to_bytemask2(uint16_t mask)
+{
+    static const uint16_t masks[] = { 0x0000, 0x00ff, 0xff00, 0xffff };
+    return masks[mask & 3];
+}
+
+static uint32_t mask_to_bytemask4(uint16_t mask)
+{
+    static const uint32_t masks[] = {
+        0x00000000, 0x000000ff, 0x0000ff00, 0x0000ffff,
+        0x00ff0000, 0x00ff00ff, 0x00ffff00, 0x00ffffff,
+        0xff000000, 0xff0000ff, 0xff00ff00, 0xff00ffff,
+        0xffff0000, 0xffff00ff, 0xffffff00, 0xffffffff,
+    };
+    return masks[mask & 0xf];
+}
+
+#define DO_1OP(OP, ESIZE, TYPE, H, FN)                                  \
+    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
+    {                                                                   \
+        TYPE *d = vd, *m = vm;                                          \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+            TYPE r = FN(m[H(e)]);                                       \
+            uint64_t bytemask = mask_to_bytemask##ESIZE(mask);          \
+            d[H(e)] &= ~bytemask;                                       \
+            d[H(e)] |= (r & bytemask);                                  \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+#define DO_CLZ_B(N)   (clz32(N) - 24)
+#define DO_CLZ_H(N)   (clz32(N) - 16)
+
+DO_1OP(vclzb, 1, uint8_t, H1, DO_CLZ_B)
+DO_1OP(vclzh, 2, uint16_t, H2, DO_CLZ_H)
+DO_1OP(vclzw, 4, uint32_t, H4, clz32)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 14206893d5f..6bbc2df35c1 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -29,6 +29,7 @@
 #include "decode-mve.c.inc"
 
 typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
+typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 
 /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
 static inline long mve_qreg_offset(unsigned reg)
@@ -167,3 +168,45 @@ static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
 DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h)
 DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w)
 DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w)
+
+static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
+{
+    TCGv_ptr qd, qm;
+
+    if (!dc_isar_feature(aa32_mve, s)) {
+        return false;
+    }
+    if (a->qd > 7 || a->qm > 7 || !fn) {
+        return false;
+    }
+
+    if (!mve_eci_check(s)) {
+        return true;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    qd = mve_qreg_ptr(a->qd);
+    qm = mve_qreg_ptr(a->qm);
+    fn(cpu_env, qd, qm);
+    tcg_temp_free_ptr(qd);
+    tcg_temp_free_ptr(qm);
+    mve_update_eci(s);
+    return true;
+}
+
+#define DO_1OP(INSN, FN)                                        \
+    static bool trans_##INSN(DisasContext *s, arg_1op *a)       \
+    {                                                           \
+        MVEGenOneOpFn *fns[] = {                                \
+            gen_helper_mve_##FN##b,                             \
+            gen_helper_mve_##FN##h,                             \
+            gen_helper_mve_##FN##w,                             \
+            NULL,                                               \
+        };                                                      \
+        return do_1op(s, a, fns[a->size]);                      \
+    }
+
+DO_1OP(VCLZ, vclz)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 14/55] target/arm: Implement MVE VCLS
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (12 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 13/55] target/arm: Implement MVE VCLZ Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08 22:12   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 15/55] bitops.h: Provide hswap32(), hswap64(), wswap64() swapping operations Peter Maydell
                   ` (41 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VCLS insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    | 4 ++++
 target/arm/mve.decode      | 1 +
 target/arm/mve_helper.c    | 7 +++++++
 target/arm/translate-mve.c | 1 +
 4 files changed, 13 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index c5c1315b161..bdd6675ea14 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -33,6 +33,10 @@ DEF_HELPER_FLAGS_3(mve_vstrb_h, TCG_CALL_NO_WG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vstrb_w, TCG_CALL_NO_WG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
 
+DEF_HELPER_FLAGS_3(mve_vclsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vclsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vclsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
 DEF_HELPER_FLAGS_3(mve_vclzb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vclzh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vclzw, TCG_CALL_NO_WG, void, env, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 24999bf703e..adceef91597 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -68,4 +68,5 @@ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
 
 # Vector miscellaneous
 
+VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 VCLZ             1111 1111 1 . 11 .. 00 ... 0 0100 11 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index b7c44f57c09..071c9070593 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -238,6 +238,13 @@ static uint32_t mask_to_bytemask4(uint16_t mask)
         mve_advance_vpt(env);                                           \
     }
 
+#define DO_CLS_B(N)   (clrsb32(N) - 24)
+#define DO_CLS_H(N)   (clrsb32(N) - 16)
+
+DO_1OP(vclsb, 1, int8_t, H1, DO_CLS_B)
+DO_1OP(vclsh, 2, int16_t, H2, DO_CLS_H)
+DO_1OP(vclsw, 4, int32_t, H4, clrsb32)
+
 #define DO_CLZ_B(N)   (clz32(N) - 24)
 #define DO_CLZ_H(N)   (clz32(N) - 16)
 
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 6bbc2df35c1..3c6897548a2 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -210,3 +210,4 @@ static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
     }
 
 DO_1OP(VCLZ, vclz)
+DO_1OP(VCLS, vcls)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 15/55] bitops.h: Provide hswap32(), hswap64(), wswap64() swapping operations
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (13 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 14/55] target/arm: Implement MVE VCLS Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08  6:53   ` Philippe Mathieu-Daudé
  2021-06-08 22:14   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 16/55] target/arm: Implement MVE VREV16, VREV32, VREV64 Peter Maydell
                   ` (40 subsequent siblings)
  55 siblings, 2 replies; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Currently the ARM SVE helper code defines locally some utility
functions for swapping 16-bit halfwords within 32-bit or 64-bit
values and for swapping 32-bit words within 64-bit values,
parallel to the byte-swapping bswap16/32/64 functions.

We want these also for the ARM MVE code, and they're potentially
generally useful for other targets, so move them to bitops.h.
(We don't put them in bswap.h with the bswap* functions because
they are implemented in terms of the rotate operations also
defined in bitops.h, and including bitops.h from bswap.h seems
better avoided.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/qemu/bitops.h   | 29 +++++++++++++++++++++++++++++
 target/arm/sve_helper.c | 20 --------------------
 2 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
index a72f69fea85..03213ce952c 100644
--- a/include/qemu/bitops.h
+++ b/include/qemu/bitops.h
@@ -291,6 +291,35 @@ static inline uint64_t ror64(uint64_t word, unsigned int shift)
     return (word >> shift) | (word << ((64 - shift) & 63));
 }
 
+/**
+ * hswap32 - swap 16-bit halfwords within a 32-bit value
+ * @h: value to swap
+ */
+static inline uint32_t hswap32(uint32_t h)
+{
+    return rol32(h, 16);
+}
+
+/**
+ * hswap64 - swap 16-bit halfwords within a 64-bit value
+ * @h: value to swap
+ */
+static inline uint64_t hswap64(uint64_t h)
+{
+    uint64_t m = 0x0000ffff0000ffffull;
+    h = rol64(h, 32);
+    return ((h & m) << 16) | ((h >> 16) & m);
+}
+
+/**
+ * wswap64 - swap 32-bit words within a 64-bit value
+ * @h: value to swap
+ */
+static inline uint64_t wswap64(uint64_t h)
+{
+    return rol64(h, 32);
+}
+
 /**
  * extract32:
  * @value: the value to extract the bit field from
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 46a957b6fb0..15aa0a74982 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -247,26 +247,6 @@ static inline uint64_t expand_pred_s(uint8_t byte)
     return word[byte & 0x11];
 }
 
-/* Swap 16-bit words within a 32-bit word.  */
-static inline uint32_t hswap32(uint32_t h)
-{
-    return rol32(h, 16);
-}
-
-/* Swap 16-bit words within a 64-bit word.  */
-static inline uint64_t hswap64(uint64_t h)
-{
-    uint64_t m = 0x0000ffff0000ffffull;
-    h = rol64(h, 32);
-    return ((h & m) << 16) | ((h >> 16) & m);
-}
-
-/* Swap 32-bit words within a 64-bit word.  */
-static inline uint64_t wswap64(uint64_t h)
-{
-    return rol64(h, 32);
-}
-
 #define LOGICAL_PPPP(NAME, FUNC) \
 void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc)  \
 {                                                                         \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 16/55] target/arm: Implement MVE VREV16, VREV32, VREV64
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (14 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 15/55] bitops.h: Provide hswap32(), hswap64(), wswap64() swapping operations Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08 22:23   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 17/55] target/arm: Implement MVE VMVN (register) Peter Maydell
                   ` (39 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE instructions VREV16, VREV32 and VREV64.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  7 +++++++
 target/arm/mve.decode      |  4 ++++
 target/arm/mve_helper.c    | 13 +++++++++++++
 target/arm/translate-mve.c | 33 +++++++++++++++++++++++++++++++++
 4 files changed, 57 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index bdd6675ea14..4c89387587d 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -40,3 +40,10 @@ DEF_HELPER_FLAGS_3(mve_vclsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vclzb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vclzh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vclzw, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
+DEF_HELPER_FLAGS_3(mve_vrev16b, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vrev32b, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vrev32h, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vrev64b, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vrev64h, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vrev64w, TCG_CALL_NO_WG, void, env, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index adceef91597..16ee511a5cb 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -70,3 +70,7 @@ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
 
 VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
 VCLZ             1111 1111 1 . 11 .. 00 ... 0 0100 11 . 0 ... 0 @1op
+
+VREV16           1111 1111 1 . 11 .. 00 ... 0 0001 01 . 0 ... 0 @1op
+VREV32           1111 1111 1 . 11 .. 00 ... 0 0000 11 . 0 ... 0 @1op
+VREV64           1111 1111 1 . 11 .. 00 ... 0 0000 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 071c9070593..055606b905f 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -223,6 +223,12 @@ static uint32_t mask_to_bytemask4(uint16_t mask)
     return masks[mask & 0xf];
 }
 
+static uint64_t mask_to_bytemask8(uint16_t mask)
+{
+    return mask_to_bytemask4(mask) |
+        ((uint64_t)mask_to_bytemask4(mask >> 4) << 32);
+}
+
 #define DO_1OP(OP, ESIZE, TYPE, H, FN)                                  \
     void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
     {                                                                   \
@@ -251,3 +257,10 @@ DO_1OP(vclsw, 4, int32_t, H4, clrsb32)
 DO_1OP(vclzb, 1, uint8_t, H1, DO_CLZ_B)
 DO_1OP(vclzh, 2, uint16_t, H2, DO_CLZ_H)
 DO_1OP(vclzw, 4, uint32_t, H4, clz32)
+
+DO_1OP(vrev16b, 2, uint16_t, H2, bswap16)
+DO_1OP(vrev32b, 4, uint32_t, H4, bswap32)
+DO_1OP(vrev32h, 4, uint32_t, H4, hswap32)
+DO_1OP(vrev64b, 8, uint64_t, , bswap64)
+DO_1OP(vrev64h, 8, uint64_t, , hswap64)
+DO_1OP(vrev64w, 8, uint64_t, , wswap64)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 3c6897548a2..6f3d4796072 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -211,3 +211,36 @@ static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
 
 DO_1OP(VCLZ, vclz)
 DO_1OP(VCLS, vcls)
+
+static bool trans_VREV16(DisasContext *s, arg_1op *a)
+{
+    MVEGenOneOpFn *fns[] = {
+        gen_helper_mve_vrev16b,
+        NULL,
+        NULL,
+        NULL,
+    };
+    return do_1op(s, a, fns[a->size]);
+}
+
+static bool trans_VREV32(DisasContext *s, arg_1op *a)
+{
+    MVEGenOneOpFn *fns[] = {
+        gen_helper_mve_vrev32b,
+        gen_helper_mve_vrev32h,
+        NULL,
+        NULL,
+    };
+    return do_1op(s, a, fns[a->size]);
+}
+
+static bool trans_VREV64(DisasContext *s, arg_1op *a)
+{
+    MVEGenOneOpFn *fns[] = {
+        gen_helper_mve_vrev64b,
+        gen_helper_mve_vrev64h,
+        gen_helper_mve_vrev64w,
+        NULL,
+    };
+    return do_1op(s, a, fns[a->size]);
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 17/55] target/arm: Implement MVE VMVN (register)
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (15 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 16/55] target/arm: Implement MVE VREV16, VREV32, VREV64 Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08 22:27   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 18/55] target/arm: Implement MVE VABS Peter Maydell
                   ` (38 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VMVN(register) operation.  Note that for
predication this operation is byte-by-byte.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    | 2 ++
 target/arm/mve.decode      | 3 +++
 target/arm/mve_helper.c    | 4 ++++
 target/arm/translate-mve.c | 5 +++++
 4 files changed, 14 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 4c89387587d..f1dc52f7a50 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -47,3 +47,5 @@ DEF_HELPER_FLAGS_3(mve_vrev32h, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vrev64b, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vrev64h, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vrev64w, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
+DEF_HELPER_FLAGS_3(mve_vmvn, TCG_CALL_NO_WG, void, env, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 16ee511a5cb..ff8afb682fb 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -30,6 +30,7 @@
 @vldst_wn ... u:1 ... . . . . l:1 . rn:3 qd:3 . ... .. imm:7 &vldr_vstr
 
 @1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
+@1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
 
 # Vector loads and stores
 
@@ -74,3 +75,5 @@ VCLZ             1111 1111 1 . 11 .. 00 ... 0 0100 11 . 0 ... 0 @1op
 VREV16           1111 1111 1 . 11 .. 00 ... 0 0001 01 . 0 ... 0 @1op
 VREV32           1111 1111 1 . 11 .. 00 ... 0 0000 11 . 0 ... 0 @1op
 VREV64           1111 1111 1 . 11 .. 00 ... 0 0000 01 . 0 ... 0 @1op
+
+VMVN             1111 1111 1 . 11 00 00 ... 0 0101 11 . 0 ... 0 @1op_nosz
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 055606b905f..2aacc733166 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -264,3 +264,7 @@ DO_1OP(vrev32h, 4, uint32_t, H4, hswap32)
 DO_1OP(vrev64b, 8, uint64_t, , bswap64)
 DO_1OP(vrev64h, 8, uint64_t, , hswap64)
 DO_1OP(vrev64w, 8, uint64_t, , wswap64)
+
+#define DO_NOT(N) (~(N))
+
+DO_1OP(vmvn, 1, uint8_t, H1, DO_NOT)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 6f3d4796072..6e5c3df7179 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -244,3 +244,8 @@ static bool trans_VREV64(DisasContext *s, arg_1op *a)
     };
     return do_1op(s, a, fns[a->size]);
 }
+
+static bool trans_VMVN(DisasContext *s, arg_1op *a)
+{
+    return do_1op(s, a, gen_helper_mve_vmvn);
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 18/55] target/arm: Implement MVE VABS
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (16 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 17/55] target/arm: Implement MVE VMVN (register) Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08 22:34   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 19/55] target/arm: Implement MVE VNEG Peter Maydell
                   ` (37 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VABS functions (both integer and floating point).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  6 ++++++
 target/arm/mve.decode      |  3 +++
 target/arm/mve_helper.c    | 10 ++++++++++
 target/arm/translate-mve.c | 15 +++++++++++++++
 4 files changed, 34 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index f1dc52f7a50..76508d5dd71 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -49,3 +49,9 @@ DEF_HELPER_FLAGS_3(mve_vrev64h, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vrev64w, TCG_CALL_NO_WG, void, env, ptr, ptr)
 
 DEF_HELPER_FLAGS_3(mve_vmvn, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
+DEF_HELPER_FLAGS_3(mve_vabsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vabsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vabsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vfabsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vfabss, TCG_CALL_NO_WG, void, env, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index ff8afb682fb..66963dc1847 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -77,3 +77,6 @@ VREV32           1111 1111 1 . 11 .. 00 ... 0 0000 11 . 0 ... 0 @1op
 VREV64           1111 1111 1 . 11 .. 00 ... 0 0000 01 . 0 ... 0 @1op
 
 VMVN             1111 1111 1 . 11 00 00 ... 0 0101 11 . 0 ... 0 @1op_nosz
+
+VABS             1111 1111 1 . 11 .. 01 ... 0 0011 01 . 0 ... 0 @1op
+VABS_fp          1111 1111 1 . 11 .. 01 ... 0 0111 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 2aacc733166..2ab05e66dfc 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -268,3 +268,13 @@ DO_1OP(vrev64w, 8, uint64_t, , wswap64)
 #define DO_NOT(N) (~(N))
 
 DO_1OP(vmvn, 1, uint8_t, H1, DO_NOT)
+
+#define DO_ABS(N) ((N) < 0 ? -(N) : (N))
+#define DO_FABS(N)    (N & ((__typeof(N))-1 >> 1))
+
+DO_1OP(vabsb, 1, int8_t, H1, DO_ABS)
+DO_1OP(vabsh, 2, int16_t, H2, DO_ABS)
+DO_1OP(vabsw, 4, int32_t, H4, DO_ABS)
+
+DO_1OP(vfabsh, 2, uint16_t, H2, DO_FABS)
+DO_1OP(vfabss, 4, uint32_t, H4, DO_FABS)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 6e5c3df7179..badd4da2cbf 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -211,6 +211,7 @@ static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
 
 DO_1OP(VCLZ, vclz)
 DO_1OP(VCLS, vcls)
+DO_1OP(VABS, vabs)
 
 static bool trans_VREV16(DisasContext *s, arg_1op *a)
 {
@@ -249,3 +250,17 @@ static bool trans_VMVN(DisasContext *s, arg_1op *a)
 {
     return do_1op(s, a, gen_helper_mve_vmvn);
 }
+
+static bool trans_VABS_fp(DisasContext *s, arg_1op *a)
+{
+    MVEGenOneOpFn *fns[] = {
+        NULL,
+        gen_helper_mve_vfabsh,
+        gen_helper_mve_vfabss,
+        NULL,
+    };
+    if (!dc_isar_feature(aa32_mve_fp, s)) {
+        return false;
+    }
+    return do_1op(s, a, fns[a->size]);
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 19/55] target/arm: Implement MVE VNEG
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (17 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 18/55] target/arm: Implement MVE VABS Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08 22:40   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 20/55] target/arm: Implement MVE VDUP Peter Maydell
                   ` (36 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VNEG insn (both integer and floating point forms).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  6 ++++++
 target/arm/mve.decode      |  2 ++
 target/arm/mve_helper.c    | 10 ++++++++++
 target/arm/translate-mve.c | 15 +++++++++++++++
 4 files changed, 33 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 76508d5dd71..733a54d2e3c 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -55,3 +55,9 @@ DEF_HELPER_FLAGS_3(mve_vabsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vabsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vfabsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vfabss, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
+DEF_HELPER_FLAGS_3(mve_vnegb, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vfnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vfnegs, TCG_CALL_NO_WG, void, env, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 66963dc1847..82cc0abcb82 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -80,3 +80,5 @@ VMVN             1111 1111 1 . 11 00 00 ... 0 0101 11 . 0 ... 0 @1op_nosz
 
 VABS             1111 1111 1 . 11 .. 01 ... 0 0011 01 . 0 ... 0 @1op
 VABS_fp          1111 1111 1 . 11 .. 01 ... 0 0111 01 . 0 ... 0 @1op
+VNEG             1111 1111 1 . 11 .. 01 ... 0 0011 11 . 0 ... 0 @1op
+VNEG_fp          1111 1111 1 . 11 .. 01 ... 0 0111 11 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 2ab05e66dfc..b14826c05a7 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -278,3 +278,13 @@ DO_1OP(vabsw, 4, int32_t, H4, DO_ABS)
 
 DO_1OP(vfabsh, 2, uint16_t, H2, DO_FABS)
 DO_1OP(vfabss, 4, uint32_t, H4, DO_FABS)
+
+#define DO_NEG(N)    (-(N))
+#define DO_FNEG(N)    ((N) ^ ~((__typeof(N))-1 >> 1))
+
+DO_1OP(vnegb, 1, int8_t, H1, DO_NEG)
+DO_1OP(vnegh, 2, int16_t, H2, DO_NEG)
+DO_1OP(vnegw, 4, int32_t, H4, DO_NEG)
+
+DO_1OP(vfnegh, 2, uint16_t, H2, DO_FNEG)
+DO_1OP(vfnegs, 4, uint32_t, H4, DO_FNEG)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index badd4da2cbf..086cac9f0cd 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -212,6 +212,7 @@ static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
 DO_1OP(VCLZ, vclz)
 DO_1OP(VCLS, vcls)
 DO_1OP(VABS, vabs)
+DO_1OP(VNEG, vneg)
 
 static bool trans_VREV16(DisasContext *s, arg_1op *a)
 {
@@ -264,3 +265,17 @@ static bool trans_VABS_fp(DisasContext *s, arg_1op *a)
     }
     return do_1op(s, a, fns[a->size]);
 }
+
+static bool trans_VNEG_fp(DisasContext *s, arg_1op *a)
+{
+    MVEGenOneOpFn *fns[] = {
+        NULL,
+        gen_helper_mve_vfnegh,
+        gen_helper_mve_vfnegs,
+        NULL,
+    };
+    if (!dc_isar_feature(aa32_mve_fp, s)) {
+        return false;
+    }
+    return do_1op(s, a, fns[a->size]);
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 20/55] target/arm: Implement MVE VDUP
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (18 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 19/55] target/arm: Implement MVE VNEG Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08 23:17   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 21/55] target/arm: Implement MVE VAND, VBIC, VORR, VORN, VEOR Peter Maydell
                   ` (35 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VDUP insn, which duplicates a value from
a general-purpose register into every lane of a vector
register (subject to predication).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  4 ++++
 target/arm/mve.decode      | 10 +++++++++
 target/arm/mve_helper.c    | 18 ++++++++++++++++
 target/arm/translate-mve.c | 43 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 75 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 733a54d2e3c..ece9c481367 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -33,6 +33,10 @@ DEF_HELPER_FLAGS_3(mve_vstrb_h, TCG_CALL_NO_WG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vstrb_w, TCG_CALL_NO_WG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
 
+DEF_HELPER_FLAGS_3(mve_vdupb, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vduph, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vdupw, TCG_CALL_NO_WG, void, env, ptr, i32)
+
 DEF_HELPER_FLAGS_3(mve_vclsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vclsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vclsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 82cc0abcb82..09849917f5a 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -21,6 +21,7 @@
 
 %qd 22:1 13:3
 %qm 5:1 1:3
+%qn 7:1 17:3
 
 &vldr_vstr rn qd imm p a w size l u
 &1op qd qm size
@@ -82,3 +83,12 @@ VABS             1111 1111 1 . 11 .. 01 ... 0 0011 01 . 0 ... 0 @1op
 VABS_fp          1111 1111 1 . 11 .. 01 ... 0 0111 01 . 0 ... 0 @1op
 VNEG             1111 1111 1 . 11 .. 01 ... 0 0011 11 . 0 ... 0 @1op
 VNEG_fp          1111 1111 1 . 11 .. 01 ... 0 0111 11 . 0 ... 0 @1op
+
+&vdup qd rt size
+# Qd is in the fields usually named Qn
+@vdup            .... .... . . .. ... . rt:4 .... . . . . .... qd=%qn &vdup
+
+# B and E bits encode size, which we decode here to the usual size values
+VDUP             1110 1110 1 1 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=0
+VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 1 1 0000 @vdup size=1
+VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index b14826c05a7..a5ed4e01e33 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -229,6 +229,24 @@ static uint64_t mask_to_bytemask8(uint16_t mask)
         ((uint64_t)mask_to_bytemask4(mask >> 4) << 32);
 }
 
+#define DO_VDUP(OP, ESIZE, TYPE, H)                                     \
+    void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t val)     \
+    {                                                                   \
+        TYPE *d = vd;                                                   \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+            uint64_t bytemask = mask_to_bytemask##ESIZE(mask);          \
+            d[H(e)] &= ~bytemask;                                       \
+            d[H(e)] |= (val & bytemask);                                \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+DO_VDUP(vdupb, 1, uint8_t, H1)
+DO_VDUP(vduph, 2, uint16_t, H2)
+DO_VDUP(vdupw, 4, uint32_t, H4)
+
 #define DO_1OP(OP, ESIZE, TYPE, H, FN)                                  \
     void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
     {                                                                   \
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 086cac9f0cd..b4fc4054fe1 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -169,6 +169,49 @@ DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h)
 DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w)
 DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w)
 
+static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
+{
+    TCGv_ptr qd;
+    TCGv_i32 rt;
+
+    if (!dc_isar_feature(aa32_mve, s)) {
+        return false;
+    }
+    if (a->qd > 7) {
+        return false;
+    }
+    if (a->rt == 13 || a->rt == 15) {
+        /* UNPREDICTABLE; we choose to UNDEF */
+        return false;
+    }
+    if (!mve_eci_check(s)) {
+        return true;
+    }
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    qd = mve_qreg_ptr(a->qd);
+    rt = load_reg(s, a->rt);
+    switch (a->size) {
+    case 0:
+        gen_helper_mve_vdupb(cpu_env, qd, rt);
+        break;
+    case 1:
+        gen_helper_mve_vduph(cpu_env, qd, rt);
+        break;
+    case 2:
+        gen_helper_mve_vdupw(cpu_env, qd, rt);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    tcg_temp_free_ptr(qd);
+    tcg_temp_free_i32(rt);
+    mve_update_eci(s);
+    return true;
+}
+
 static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
 {
     TCGv_ptr qd, qm;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 21/55] target/arm: Implement MVE VAND, VBIC, VORR, VORN, VEOR
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (19 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 20/55] target/arm: Implement MVE VDUP Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08 23:23   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 22/55] target/arm: Implement MVE VADD, VSUB, VMUL Peter Maydell
                   ` (34 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE vector logical operations operating
on two registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  6 ++++++
 target/arm/mve.decode      |  9 +++++++++
 target/arm/mve_helper.c    | 28 ++++++++++++++++++++++++++
 target/arm/translate-mve.c | 41 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 84 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index ece9c481367..ad09170c9cf 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -65,3 +65,9 @@ DEF_HELPER_FLAGS_3(mve_vnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vfnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vfnegs, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vand, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vbic, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vorr, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vorn, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_veor, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 09849917f5a..332e0b8d1d6 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -25,6 +25,7 @@
 
 &vldr_vstr rn qd imm p a w size l u
 &1op qd qm size
+&2op qd qm qn size
 
 @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
 # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -32,6 +33,7 @@
 
 @1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
 @1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
+@2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
 
 # Vector loads and stores
 
@@ -68,6 +70,13 @@ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111101 .......   @vldr_vstr \
 VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
                  size=2 p=1
 
+# Vector 2-op
+VAND             1110 1111 0 . 00 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
+VBIC             1110 1111 0 . 01 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
+VORR             1110 1111 0 . 10 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
+VORN             1110 1111 0 . 11 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
+VEOR             1111 1111 0 . 00 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
+
 # Vector miscellaneous
 
 VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index a5ed4e01e33..6b3d4dbf2da 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -306,3 +306,31 @@ DO_1OP(vnegw, 4, int32_t, H4, DO_NEG)
 
 DO_1OP(vfnegh, 2, uint16_t, H2, DO_FNEG)
 DO_1OP(vfnegs, 4, uint32_t, H4, DO_FNEG)
+
+#define DO_2OP(OP, ESIZE, TYPE, H, FN)                                  \
+    void HELPER(glue(mve_, OP))(CPUARMState *env,                       \
+                                void *vd, void *vn, void *vm)           \
+    {                                                                   \
+        TYPE *d = vd, *n = vn, *m = vm;                                 \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+            TYPE r = FN(n[H(e)], m[H(e)]);                              \
+            uint64_t bytemask = mask_to_bytemask##ESIZE(mask);          \
+            d[H(e)] &= ~bytemask;                                       \
+            d[H(e)] |= (r & bytemask);                                  \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+#define DO_AND(N, M)  ((N) & (M))
+#define DO_BIC(N, M)  ((N) & ~(M))
+#define DO_ORR(N, M)  ((N) | (M))
+#define DO_ORN(N, M)  ((N) | ~(M))
+#define DO_EOR(N, M)  ((N) ^ (M))
+
+DO_2OP(vand, 1, uint8_t, H1, DO_AND)
+DO_2OP(vbic, 1, uint8_t, H1, DO_BIC)
+DO_2OP(vorr, 1, uint8_t, H1, DO_ORR)
+DO_2OP(vorn, 1, uint8_t, H1, DO_ORN)
+DO_2OP(veor, 1, uint8_t, H1, DO_EOR)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index b4fc4054fe1..0e0fa252364 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -30,6 +30,7 @@
 
 typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
+typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
 
 /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
 static inline long mve_qreg_offset(unsigned reg)
@@ -322,3 +323,43 @@ static bool trans_VNEG_fp(DisasContext *s, arg_1op *a)
     }
     return do_1op(s, a, fns[a->size]);
 }
+
+static bool do_2op(DisasContext *s, arg_2op *a, MVEGenTwoOpFn fn)
+{
+    TCGv_ptr qd, qn, qm;
+
+    if (!dc_isar_feature(aa32_mve, s)) {
+        return false;
+    }
+    if (a->qd > 7 || a->qn > 7 || a->qm > 7 || !fn) {
+        return false;
+    }
+    if (!mve_eci_check(s)) {
+        return true;
+    }
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    qd = mve_qreg_ptr(a->qd);
+    qn = mve_qreg_ptr(a->qn);
+    qm = mve_qreg_ptr(a->qm);
+    fn(cpu_env, qd, qn, qm);
+    tcg_temp_free_ptr(qd);
+    tcg_temp_free_ptr(qn);
+    tcg_temp_free_ptr(qm);
+    mve_update_eci(s);
+    return true;
+}
+
+#define DO_LOGIC(INSN, HELPER)                                  \
+    static bool trans_##INSN(DisasContext *s, arg_2op *a)       \
+    {                                                           \
+        return do_2op(s, a, HELPER);                            \
+    }
+
+DO_LOGIC(VAND, gen_helper_mve_vand)
+DO_LOGIC(VBIC, gen_helper_mve_vbic)
+DO_LOGIC(VORR, gen_helper_mve_vorr)
+DO_LOGIC(VORN, gen_helper_mve_vorn)
+DO_LOGIC(VEOR, gen_helper_mve_veor)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 22/55] target/arm: Implement MVE VADD, VSUB, VMUL
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (20 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 21/55] target/arm: Implement MVE VAND, VBIC, VORR, VORN, VEOR Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08 23:25   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 23/55] target/arm: Implement MVE VMULH Peter Maydell
                   ` (33 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VADD, VSUB and VMUL insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    | 12 ++++++++++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 14 ++++++++++++++
 target/arm/translate-mve.c | 16 ++++++++++++++++
 4 files changed, 47 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index ad09170c9cf..b7e9af2461e 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -71,3 +71,15 @@ DEF_HELPER_FLAGS_4(mve_vbic, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vorr, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vorn, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_veor, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vaddb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vaddh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vaddw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vsubb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vsubh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vsubw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vmulb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmulh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmulw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 332e0b8d1d6..f7d1d303f17 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -33,6 +33,7 @@
 
 @1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
 @1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
+@2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
 @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
 
 # Vector loads and stores
@@ -77,6 +78,10 @@ VORR             1110 1111 0 . 10 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
 VORN             1110 1111 0 . 11 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
 VEOR             1111 1111 0 . 00 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
 
+VADD             1110 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
+VSUB             1111 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
+VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
+
 # Vector miscellaneous
 
 VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 6b3d4dbf2da..39ab684c0c3 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -323,6 +323,12 @@ DO_1OP(vfnegs, 4, uint32_t, H4, DO_FNEG)
         mve_advance_vpt(env);                                           \
     }
 
+/* provide unsigned 2-op helpers for all sizes */
+#define DO_2OP_U(OP, FN)                        \
+    DO_2OP(OP##b, 1, uint8_t, H1, FN)           \
+    DO_2OP(OP##h, 2, uint16_t, H2, FN)          \
+    DO_2OP(OP##w, 4, uint32_t, H4, FN)
+
 #define DO_AND(N, M)  ((N) & (M))
 #define DO_BIC(N, M)  ((N) & ~(M))
 #define DO_ORR(N, M)  ((N) | (M))
@@ -334,3 +340,11 @@ DO_2OP(vbic, 1, uint8_t, H1, DO_BIC)
 DO_2OP(vorr, 1, uint8_t, H1, DO_ORR)
 DO_2OP(vorn, 1, uint8_t, H1, DO_ORN)
 DO_2OP(veor, 1, uint8_t, H1, DO_EOR)
+
+#define DO_ADD(N, M) ((N) + (M))
+#define DO_SUB(N, M) ((N) - (M))
+#define DO_MUL(N, M) ((N) * (M))
+
+DO_2OP_U(vadd, DO_ADD)
+DO_2OP_U(vsub, DO_SUB)
+DO_2OP_U(vmul, DO_MUL)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 0e0fa252364..1b2c8cd5ff7 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -363,3 +363,19 @@ DO_LOGIC(VBIC, gen_helper_mve_vbic)
 DO_LOGIC(VORR, gen_helper_mve_vorr)
 DO_LOGIC(VORN, gen_helper_mve_vorn)
 DO_LOGIC(VEOR, gen_helper_mve_veor)
+
+#define DO_2OP(INSN, FN) \
+    static bool trans_##INSN(DisasContext *s, arg_2op *a)       \
+    {                                                           \
+        MVEGenTwoOpFn *fns[] = {                                \
+            gen_helper_mve_##FN##b,                             \
+            gen_helper_mve_##FN##h,                             \
+            gen_helper_mve_##FN##w,                             \
+            NULL,                                               \
+        };                                                      \
+        return do_2op(s, a, fns[a->size]);                      \
+    }
+
+DO_2OP(VADD, vadd)
+DO_2OP(VSUB, vsub)
+DO_2OP(VMUL, vmul)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 23/55] target/arm: Implement MVE VMULH
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (21 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 22/55] target/arm: Implement MVE VADD, VSUB, VMUL Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08 23:29   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 24/55] target/arm: Implement MVE VRMULH Peter Maydell
                   ` (32 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VMULH insn, which performs a vector
multiply and returns the high half of the result.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  7 +++++++
 target/arm/mve.decode      |  3 +++
 target/arm/mve_helper.c    | 26 ++++++++++++++++++++++++++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 38 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index b7e9af2461e..17219df3159 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -83,3 +83,10 @@ DEF_HELPER_FLAGS_4(mve_vsubw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vmulhsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmulhsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmulhsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmulhub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmulhuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmulhuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index f7d1d303f17..ca4c27209da 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -82,6 +82,9 @@ VADD             1110 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
 VSUB             1111 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
 VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
 
+VMULH_S          111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
+VMULH_U          111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
+
 # Vector miscellaneous
 
 VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 39ab684c0c3..45b1b121ce6 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -348,3 +348,29 @@ DO_2OP(veor, 1, uint8_t, H1, DO_EOR)
 DO_2OP_U(vadd, DO_ADD)
 DO_2OP_U(vsub, DO_SUB)
 DO_2OP_U(vmul, DO_MUL)
+
+/*
+ * Because the computation type is at least twice as large as required,
+ * these work for both signed and unsigned source types.
+ */
+static inline uint8_t do_mulh_b(int32_t n, int32_t m)
+{
+    return (n * m) >> 8;
+}
+
+static inline uint16_t do_mulh_h(int32_t n, int32_t m)
+{
+    return (n * m) >> 16;
+}
+
+static inline uint32_t do_mulh_w(int64_t n, int64_t m)
+{
+    return (n * m) >> 32;
+}
+
+DO_2OP(vmulhsb, 1, int8_t, H1, do_mulh_b)
+DO_2OP(vmulhsh, 2, int16_t, H2, do_mulh_h)
+DO_2OP(vmulhsw, 4, int32_t, H4, do_mulh_w)
+DO_2OP(vmulhub, 1, uint8_t, H1, do_mulh_b)
+DO_2OP(vmulhuh, 2, uint16_t, H2, do_mulh_h)
+DO_2OP(vmulhuw, 4, uint32_t, H4, do_mulh_w)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 1b2c8cd5ff7..edea30ba1d7 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -379,3 +379,5 @@ DO_LOGIC(VEOR, gen_helper_mve_veor)
 DO_2OP(VADD, vadd)
 DO_2OP(VSUB, vsub)
 DO_2OP(VMUL, vmul)
+DO_2OP(VMULH_S, vmulhs)
+DO_2OP(VMULH_U, vmulhu)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 24/55] target/arm: Implement MVE VRMULH
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (22 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 23/55] target/arm: Implement MVE VMULH Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08 23:33   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 25/55] target/arm: Implement MVE VMAX, VMIN Peter Maydell
                   ` (31 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VRMULH insn, which performs a rounding multiply
and then returns the high half.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  7 +++++++
 target/arm/mve.decode      |  3 +++
 target/arm/mve_helper.c    | 22 ++++++++++++++++++++++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 34 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 17219df3159..38d084429b8 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -90,3 +90,10 @@ DEF_HELPER_FLAGS_4(mve_vmulhsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulhub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulhuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulhuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vrmulhsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vrmulhsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vrmulhsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vrmulhub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vrmulhuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vrmulhuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index ca4c27209da..4ab6c9dba90 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -85,6 +85,9 @@ VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
 VMULH_S          111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
 VMULH_U          111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
 
+VRMULH_S         111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
+VRMULH_U         111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
+
 # Vector miscellaneous
 
 VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 45b1b121ce6..20d96b86f5a 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -368,9 +368,31 @@ static inline uint32_t do_mulh_w(int64_t n, int64_t m)
     return (n * m) >> 32;
 }
 
+static inline uint8_t do_rmulh_b(int32_t n, int32_t m)
+{
+    return (n * m + (1U << 7)) >> 8;
+}
+
+static inline uint16_t do_rmulh_h(int32_t n, int32_t m)
+{
+    return (n * m + (1U << 15)) >> 16;
+}
+
+static inline uint32_t do_rmulh_w(int64_t n, int64_t m)
+{
+    return (n * m + (1U << 31)) >> 32;
+}
+
 DO_2OP(vmulhsb, 1, int8_t, H1, do_mulh_b)
 DO_2OP(vmulhsh, 2, int16_t, H2, do_mulh_h)
 DO_2OP(vmulhsw, 4, int32_t, H4, do_mulh_w)
 DO_2OP(vmulhub, 1, uint8_t, H1, do_mulh_b)
 DO_2OP(vmulhuh, 2, uint16_t, H2, do_mulh_h)
 DO_2OP(vmulhuw, 4, uint32_t, H4, do_mulh_w)
+
+DO_2OP(vrmulhsb, 1, int8_t, H1, do_rmulh_b)
+DO_2OP(vrmulhsh, 2, int16_t, H2, do_rmulh_h)
+DO_2OP(vrmulhsw, 4, int32_t, H4, do_rmulh_w)
+DO_2OP(vrmulhub, 1, uint8_t, H1, do_rmulh_b)
+DO_2OP(vrmulhuh, 2, uint16_t, H2, do_rmulh_h)
+DO_2OP(vrmulhuw, 4, uint32_t, H4, do_rmulh_w)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index edea30ba1d7..7e9d852c6ff 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -381,3 +381,5 @@ DO_2OP(VSUB, vsub)
 DO_2OP(VMUL, vmul)
 DO_2OP(VMULH_S, vmulhs)
 DO_2OP(VMULH_U, vmulhu)
+DO_2OP(VRMULH_S, vrmulhs)
+DO_2OP(VRMULH_U, vrmulhu)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 25/55] target/arm: Implement MVE VMAX, VMIN
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (23 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 24/55] target/arm: Implement MVE VRMULH Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08 23:35   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 26/55] target/arm: Implement MVE VABD Peter Maydell
                   ` (30 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VMAX and VMIN insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    | 14 ++++++++++++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 14 ++++++++++++++
 target/arm/translate-mve.c |  4 ++++
 4 files changed, 37 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 38d084429b8..bc9dcde5dba 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -97,3 +97,17 @@ DEF_HELPER_FLAGS_4(mve_vrmulhsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vrmulhub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vrmulhuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vrmulhuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vmaxsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmaxsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmaxsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmaxub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmaxuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmaxuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vminsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vminsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vminsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vminub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vminuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vminuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 4ab6c9dba90..42d5504500c 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -88,6 +88,11 @@ VMULH_U          111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
 VRMULH_S         111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
 VRMULH_U         111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
 
+VMAX_S           111 0 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
+VMAX_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
+VMIN_S           111 0 1111 0 . .. ... 0 ... 0 0110 . 1 . 1 ... 0 @2op
+VMIN_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 1 ... 0 @2op
+
 # Vector miscellaneous
 
 VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 20d96b86f5a..f53551c7de5 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -329,6 +329,12 @@ DO_1OP(vfnegs, 4, uint32_t, H4, DO_FNEG)
     DO_2OP(OP##h, 2, uint16_t, H2, FN)          \
     DO_2OP(OP##w, 4, uint32_t, H4, FN)
 
+/* provide signed 2-op helpers for all sizes */
+#define DO_2OP_S(OP, FN)                        \
+    DO_2OP(OP##b, 1, int8_t, H1, FN)            \
+    DO_2OP(OP##h, 2, int16_t, H2, FN)           \
+    DO_2OP(OP##w, 4, int32_t, H4, FN)
+
 #define DO_AND(N, M)  ((N) & (M))
 #define DO_BIC(N, M)  ((N) & ~(M))
 #define DO_ORR(N, M)  ((N) | (M))
@@ -396,3 +402,11 @@ DO_2OP(vrmulhsw, 4, int32_t, H4, do_rmulh_w)
 DO_2OP(vrmulhub, 1, uint8_t, H1, do_rmulh_b)
 DO_2OP(vrmulhuh, 2, uint16_t, H2, do_rmulh_h)
 DO_2OP(vrmulhuw, 4, uint32_t, H4, do_rmulh_w)
+
+#define DO_MAX(N, M)  ((N) >= (M) ? (N) : (M))
+#define DO_MIN(N, M)  ((N) >= (M) ? (M) : (N))
+
+DO_2OP_S(vmaxs, DO_MAX)
+DO_2OP_U(vmaxu, DO_MAX)
+DO_2OP_S(vmins, DO_MIN)
+DO_2OP_U(vminu, DO_MIN)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 7e9d852c6ff..c12b0174b82 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -383,3 +383,7 @@ DO_2OP(VMULH_S, vmulhs)
 DO_2OP(VMULH_U, vmulhu)
 DO_2OP(VRMULH_S, vrmulhs)
 DO_2OP(VRMULH_U, vrmulhu)
+DO_2OP(VMAX_S, vmaxs)
+DO_2OP(VMAX_U, vmaxu)
+DO_2OP(VMIN_S, vmins)
+DO_2OP(VMIN_U, vminu)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 26/55] target/arm: Implement MVE VABD
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (24 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 25/55] target/arm: Implement MVE VMAX, VMIN Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08 23:39   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 27/55] target/arm: Implement MVE VHADD, VHSUB Peter Maydell
                   ` (29 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VABD insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    | 7 +++++++
 target/arm/mve.decode      | 3 +++
 target/arm/mve_helper.c    | 5 +++++
 target/arm/translate-mve.c | 2 ++
 4 files changed, 17 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index bc9dcde5dba..bfe2057592f 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -111,3 +111,10 @@ DEF_HELPER_FLAGS_4(mve_vminsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vminub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vminuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vminuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vabdsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vabdsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vabdsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vabdub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vabduh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vabduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 42d5504500c..087d3db2a31 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -93,6 +93,9 @@ VMAX_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
 VMIN_S           111 0 1111 0 . .. ... 0 ... 0 0110 . 1 . 1 ... 0 @2op
 VMIN_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 1 ... 0 @2op
 
+VABD_S           111 0 1111 0 . .. ... 0 ... 0 0111 . 1 . 0 ... 0 @2op
+VABD_U           111 1 1111 0 . .. ... 0 ... 0 0111 . 1 . 0 ... 0 @2op
+
 # Vector miscellaneous
 
 VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index f53551c7de5..f026a9969d6 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -410,3 +410,8 @@ DO_2OP_S(vmaxs, DO_MAX)
 DO_2OP_U(vmaxu, DO_MAX)
 DO_2OP_S(vmins, DO_MIN)
 DO_2OP_U(vminu, DO_MIN)
+
+#define DO_ABD(N, M)  ((N) >= (M) ? (N) - (M) : (M) - (N))
+
+DO_2OP_S(vabds, DO_ABD)
+DO_2OP_U(vabdu, DO_ABD)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index c12b0174b82..a732612a86f 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -387,3 +387,5 @@ DO_2OP(VMAX_S, vmaxs)
 DO_2OP(VMAX_U, vmaxu)
 DO_2OP(VMIN_S, vmins)
 DO_2OP(VMIN_U, vminu)
+DO_2OP(VABD_S, vabds)
+DO_2OP(VABD_U, vabdu)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 27/55] target/arm: Implement MVE VHADD, VHSUB
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (25 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 26/55] target/arm: Implement MVE VABD Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08 23:43   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 28/55] target/arm: Implement MVE VMULL Peter Maydell
                   ` (28 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement MVE VHADD and VHSUB insns, which perform an addition
or subtraction and then halve the result.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    | 14 ++++++++++++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 25 +++++++++++++++++++++++++
 target/arm/translate-mve.c |  4 ++++
 4 files changed, 48 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index bfe2057592f..7b22990c3ba 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -118,3 +118,17 @@ DEF_HELPER_FLAGS_4(mve_vabdsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vabdub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vabduh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vabduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vhaddsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vhaddsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vhaddsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vhaddub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vhadduh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vhadduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vhsubsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vhsubsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vhsubsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vhsubub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vhsubuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vhsubuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 087d3db2a31..241d1c44c19 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -96,6 +96,11 @@ VMIN_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 1 ... 0 @2op
 VABD_S           111 0 1111 0 . .. ... 0 ... 0 0111 . 1 . 0 ... 0 @2op
 VABD_U           111 1 1111 0 . .. ... 0 ... 0 0111 . 1 . 0 ... 0 @2op
 
+VHADD_S          111 0 1111 0 . .. ... 0 ... 0 0000 . 1 . 0 ... 0 @2op
+VHADD_U          111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 0 ... 0 @2op
+VHSUB_S          111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
+VHSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
+
 # Vector miscellaneous
 
 VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index f026a9969d6..5982f6bf5eb 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -415,3 +415,28 @@ DO_2OP_U(vminu, DO_MIN)
 
 DO_2OP_S(vabds, DO_ABD)
 DO_2OP_U(vabdu, DO_ABD)
+
+static inline uint32_t do_vhadd_u(uint32_t n, uint32_t m)
+{
+    return ((uint64_t)n + m) >> 1;
+}
+
+static inline int32_t do_vhadd_s(int32_t n, int32_t m)
+{
+    return ((int64_t)n + m) >> 1;
+}
+
+static inline uint32_t do_vhsub_u(uint32_t n, uint32_t m)
+{
+    return ((uint64_t)n - m) >> 1;
+}
+
+static inline int32_t do_vhsub_s(int32_t n, int32_t m)
+{
+    return ((int64_t)n - m) >> 1;
+}
+
+DO_2OP_S(vhadds, do_vhadd_s)
+DO_2OP_U(vhaddu, do_vhadd_u)
+DO_2OP_S(vhsubs, do_vhsub_s)
+DO_2OP_U(vhsubu, do_vhsub_u)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index a732612a86f..c22b739f36e 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -389,3 +389,7 @@ DO_2OP(VMIN_S, vmins)
 DO_2OP(VMIN_U, vminu)
 DO_2OP(VABD_S, vabds)
 DO_2OP(VABD_U, vabdu)
+DO_2OP(VHADD_S, vhadds)
+DO_2OP(VHADD_U, vhaddu)
+DO_2OP(VHSUB_S, vhsubs)
+DO_2OP(VHSUB_U, vhsubu)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 28/55] target/arm: Implement MVE VMULL
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (26 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 27/55] target/arm: Implement MVE VHADD, VHSUB Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08 23:52   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 29/55] target/arm: Implement MVE VMLALDAV Peter Maydell
                   ` (27 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VMULL insn, which multiplies two single
width integer elements to produce a double width result.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    | 14 ++++++++++++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 35 +++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  4 ++++
 4 files changed, 58 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 7b22990c3ba..66d31633cef 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -132,3 +132,17 @@ DEF_HELPER_FLAGS_4(mve_vhsubsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vhsubub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vhsubuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vhsubuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vmullbsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmullbsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmullbsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmullbub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmullbuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmullbuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vmulltsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmulltsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmulltsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmulltub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmulltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmulltuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 241d1c44c19..5a480d61cd6 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -101,6 +101,11 @@ VHADD_U          111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 0 ... 0 @2op
 VHSUB_S          111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
 VHSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
 
+VMULL_BS         111 0 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
+VMULL_BU         111 1 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
+VMULL_TS         111 0 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
+VMULL_TU         111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
+
 # Vector miscellaneous
 
 VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 5982f6bf5eb..2d0c6998caa 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -335,6 +335,27 @@ DO_1OP(vfnegs, 4, uint32_t, H4, DO_FNEG)
     DO_2OP(OP##h, 2, int16_t, H2, FN)           \
     DO_2OP(OP##w, 4, int32_t, H4, FN)
 
+/*
+ * "Long" operations where two half-sized inputs (taken from either the
+ * top or the bottom of the input vector) produce a double-width result.
+ * Here TYPE and H are for the input, and LESIZE, LTYPE, LH for the output.
+ */
+#define DO_2OP_L(OP, TOP, TYPE, H, LESIZE, LTYPE, LH, FN)               \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn, void *vm) \
+    {                                                                   \
+        LTYPE *d = vd;                                                  \
+        TYPE *n = vn, *m = vm;                                          \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned le;                                                    \
+        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
+            LTYPE r = FN((LTYPE)n[H(le * 2 + TOP)], m[H(le * 2 + TOP)]); \
+            uint64_t bytemask = mask_to_bytemask##LESIZE(mask);         \
+            d[LH(le)] &= ~bytemask;                                     \
+            d[LH(le)] |= (r & bytemask);                                \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
 #define DO_AND(N, M)  ((N) & (M))
 #define DO_BIC(N, M)  ((N) & ~(M))
 #define DO_ORR(N, M)  ((N) | (M))
@@ -355,6 +376,20 @@ DO_2OP_U(vadd, DO_ADD)
 DO_2OP_U(vsub, DO_SUB)
 DO_2OP_U(vmul, DO_MUL)
 
+DO_2OP_L(vmullbsb, 0, int8_t, H1, 2, int16_t, H2, DO_MUL)
+DO_2OP_L(vmullbsh, 0, int16_t, H2, 4, int32_t, H4, DO_MUL)
+DO_2OP_L(vmullbsw, 0, int32_t, H4, 8, int64_t, , DO_MUL)
+DO_2OP_L(vmullbub, 0, uint8_t, H1, 2, uint16_t, H2, DO_MUL)
+DO_2OP_L(vmullbuh, 0, uint16_t, H2, 4, uint32_t, H4, DO_MUL)
+DO_2OP_L(vmullbuw, 0, uint32_t, H4, 8, uint64_t, , DO_MUL)
+
+DO_2OP_L(vmulltsb, 1, int8_t, H1, 2, int16_t, H2, DO_MUL)
+DO_2OP_L(vmulltsh, 1, int16_t, H2, 4, int32_t, H4, DO_MUL)
+DO_2OP_L(vmulltsw, 1, int32_t, H4, 8, int64_t, , DO_MUL)
+DO_2OP_L(vmulltub, 1, uint8_t, H1, 2, uint16_t, H2, DO_MUL)
+DO_2OP_L(vmulltuh, 1, uint16_t, H2, 4, uint32_t, H4, DO_MUL)
+DO_2OP_L(vmulltuw, 1, uint32_t, H4, 8, uint64_t, , DO_MUL)
+
 /*
  * Because the computation type is at least twice as large as required,
  * these work for both signed and unsigned source types.
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index c22b739f36e..ccff7fc0ecf 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -393,3 +393,7 @@ DO_2OP(VHADD_S, vhadds)
 DO_2OP(VHADD_U, vhaddu)
 DO_2OP(VHSUB_S, vhsubs)
 DO_2OP(VHSUB_U, vhsubu)
+DO_2OP(VMULL_BS, vmullbs)
+DO_2OP(VMULL_BU, vmullbu)
+DO_2OP(VMULL_TS, vmullts)
+DO_2OP(VMULL_TU, vmulltu)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 29/55] target/arm: Implement MVE VMLALDAV
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (27 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 28/55] target/arm: Implement MVE VMULL Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-09  0:46   ` Richard Henderson
  2021-06-09  0:46   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 30/55] target/arm: Implement MVE VMLSLDAV Peter Maydell
                   ` (26 subsequent siblings)
  55 siblings, 2 replies; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VMLALDAV insn, which multiplies pairs of integer
elements, accumulating them into a 64-bit result in a pair of
general-purpose registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |   8 +++
 target/arm/translate.h     |  10 ++++
 target/arm/mve.decode      |  15 ++++++
 target/arm/mve_helper.c    |  32 ++++++++++++
 target/arm/translate-mve.c | 100 +++++++++++++++++++++++++++++++++++++
 5 files changed, 165 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 66d31633cef..1013f6912da 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -146,3 +146,11 @@ DEF_HELPER_FLAGS_4(mve_vmulltsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulltub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulltuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+DEF_HELPER_FLAGS_4(mve_vmlaldavxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+
+DEF_HELPER_FLAGS_4(mve_vmlaldavuh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+DEF_HELPER_FLAGS_4(mve_vmlaldavuw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
diff --git a/target/arm/translate.h b/target/arm/translate.h
index 2821b325e33..99c917c571a 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -136,6 +136,11 @@ static inline int negate(DisasContext *s, int x)
     return -x;
 }
 
+static inline int plus_1(DisasContext *s, int x)
+{
+    return x + 1;
+}
+
 static inline int plus_2(DisasContext *s, int x)
 {
     return x + 2;
@@ -151,6 +156,11 @@ static inline int times_4(DisasContext *s, int x)
     return x * 4;
 }
 
+static inline int times_2_plus_1(DisasContext *s, int x)
+{
+    return x * 2 + 1;
+}
+
 static inline int arm_dc_feature(DisasContext *dc, int feature)
 {
     return (dc->features & (1ULL << feature)) != 0;
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 5a480d61cd6..bde54d05bb9 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -130,3 +130,18 @@ VNEG_fp          1111 1111 1 . 11 .. 01 ... 0 0111 11 . 0 ... 0 @1op
 VDUP             1110 1110 1 1 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=0
 VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 1 1 0000 @vdup size=1
 VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
+
+# multiply-add long dual accumulate
+# rdahi: bits [3:1] from insn, bit 0 is 1
+# rdalo: bits [3:1] from insn, bit 0 is 0
+%rdahi 20:3 !function=times_2_plus_1
+%rdalo 13:3 !function=times_2
+# size bit is 0 for 16 bit, 1 for 32 bit
+%size_16 16:1 !function=plus_1
+
+&vmlaldav rdahi rdalo size qn qm x a
+
+@vmlaldav        .... .... . ... ... . ... . .... .... qm:3 . \
+                 qn=%qn rdahi=%rdahi rdalo=%rdalo size=%size_16 &vmlaldav
+VMLALDAV_S       1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
+VMLALDAV_U       1111 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 2d0c6998caa..3c7a0bac3c7 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -475,3 +475,35 @@ DO_2OP_S(vhadds, do_vhadd_s)
 DO_2OP_U(vhaddu, do_vhadd_u)
 DO_2OP_S(vhsubs, do_vhsub_s)
 DO_2OP_U(vhsubu, do_vhsub_u)
+
+
+/*
+ * Multiply add long dual accumulate ops.
+ */
+#define DO_LDAV(OP, ESIZE, TYPE, H, XCHG, EVENACC, ODDACC)              \
+    uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,         \
+                                    void *vm, uint64_t a)               \
+    {                                                                   \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        TYPE *n = vn, *m = vm;                                          \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+            if (mask & 1) {                                             \
+                if (e & 1) {                                            \
+                    a ODDACC (int64_t)n[H(e - 1 * XCHG)] * m[H(e)];     \
+                } else {                                                \
+                    a EVENACC (int64_t)n[H(e + 1 * XCHG)] * m[H(e)];    \
+                }                                                       \
+            }                                                           \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+        return a;                                                       \
+    }
+
+DO_LDAV(vmlaldavsh, 2, int16_t, H2, false, +=, +=)
+DO_LDAV(vmlaldavxsh, 2, int16_t, H2, true, +=, +=)
+DO_LDAV(vmlaldavsw, 4, int32_t, H4, false, +=, +=)
+DO_LDAV(vmlaldavxsw, 4, int32_t, H4, true, +=, +=)
+
+DO_LDAV(vmlaldavuh, 2, uint16_t, H2, false, +=, +=)
+DO_LDAV(vmlaldavuw, 4, uint32_t, H4, false, +=, +=)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index ccff7fc0ecf..03d9496f17d 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -31,6 +31,7 @@
 typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
+typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
 
 /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
 static inline long mve_qreg_offset(unsigned reg)
@@ -79,6 +80,22 @@ static void mve_update_eci(DisasContext *s)
     }
 }
 
+static bool mve_skip_first_beat(DisasContext *s)
+{
+    /* Return true if PSR.ECI says we must skip the first beat of this insn */
+    switch (s->eci) {
+    case ECI_NONE:
+        return false;
+    case ECI_A0:
+    case ECI_A0A1:
+    case ECI_A0A1A2:
+    case ECI_A0A1A2B0:
+        return true;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
 {
     TCGv_i32 addr;
@@ -397,3 +414,86 @@ DO_2OP(VMULL_BS, vmullbs)
 DO_2OP(VMULL_BU, vmullbu)
 DO_2OP(VMULL_TS, vmullts)
 DO_2OP(VMULL_TU, vmulltu)
+
+static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
+                             MVEGenDualAccOpFn *fn)
+{
+    TCGv_ptr qn, qm;
+    TCGv_i64 rda;
+    TCGv_i32 rdalo, rdahi;
+
+    if (!fn || !dc_isar_feature(aa32_mve, s)) {
+        return false;
+    }
+    /*
+     * rdahi == 13 is UNPREDICTABLE; rdahi == 15 is a related
+     * encoding; rdalo always has bit 0 clear so cannot be 13 or 15.
+     */
+    if (a->rdahi == 13 || a->rdahi == 15) {
+        return false;
+    }
+    if (a->qn > 7 || a->qm > 7) {
+        return false;
+    }
+    if (!mve_eci_check(s)) {
+        return true;
+    }
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    qn = mve_qreg_ptr(a->qn);
+    qm = mve_qreg_ptr(a->qm);
+
+    /*
+     * This insn is subject to beat-wise execution. Partial execution
+     * of an A=0 (no-accumulate) insn which does not execute the first
+     * beat must start with the current rda value, not 0.
+     */
+    if (a->a || mve_skip_first_beat(s)) {
+        rda = tcg_temp_new_i64();
+        rdalo = load_reg(s, a->rdalo);
+        rdahi = load_reg(s, a->rdahi);
+        tcg_gen_concat_i32_i64(rda, rdalo, rdahi);
+        tcg_temp_free_i32(rdalo);
+        tcg_temp_free_i32(rdahi);
+    } else {
+        rda = tcg_const_i64(0);
+    }
+
+    fn(rda, cpu_env, qn, qm, rda);
+    tcg_temp_free_ptr(qn);
+    tcg_temp_free_ptr(qm);
+
+    rdalo = tcg_temp_new_i32();
+    rdahi = tcg_temp_new_i32();
+    tcg_gen_extrl_i64_i32(rdalo, rda);
+    tcg_gen_extrh_i64_i32(rdahi, rda);
+    store_reg(s, a->rdalo, rdalo);
+    store_reg(s, a->rdahi, rdahi);
+    tcg_temp_free_i64(rda);
+    mve_update_eci(s);
+    return true;
+}
+
+static bool trans_VMLALDAV_S(DisasContext *s, arg_vmlaldav *a)
+{
+    MVEGenDualAccOpFn *fns[4][2] = {
+        { NULL, NULL },
+        { gen_helper_mve_vmlaldavsh, gen_helper_mve_vmlaldavxsh },
+        { gen_helper_mve_vmlaldavsw, gen_helper_mve_vmlaldavxsw },
+        { NULL, NULL },
+    };
+    return do_long_dual_acc(s, a, fns[a->size][a->x]);
+}
+
+static bool trans_VMLALDAV_U(DisasContext *s, arg_vmlaldav *a)
+{
+    MVEGenDualAccOpFn *fns[4][2] = {
+        { NULL, NULL },
+        { gen_helper_mve_vmlaldavuh, NULL },
+        { gen_helper_mve_vmlaldavuw, NULL },
+        { NULL, NULL },
+    };
+    return do_long_dual_acc(s, a, fns[a->size][a->x]);
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 30/55] target/arm: Implement MVE VMLSLDAV
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (28 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 29/55] target/arm: Implement MVE VMLALDAV Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-09  0:47   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 31/55] include/qemu/int128.h: Add function to create Int128 from int64_t Peter Maydell
                   ` (25 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE insn VMLSLDAV, which multiplies source elements,
alternately adding and subtracting them, and accumulates into a
64-bit result in a pair of general purpose registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  5 +++++
 target/arm/mve.decode      |  2 ++
 target/arm/mve_helper.c    |  5 +++++
 target/arm/translate-mve.c | 11 +++++++++++
 4 files changed, 23 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 1013f6912da..7789da1986b 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -154,3 +154,8 @@ DEF_HELPER_FLAGS_4(mve_vmlaldavxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 
 DEF_HELPER_FLAGS_4(mve_vmlaldavuh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vmlaldavuw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+
+DEF_HELPER_FLAGS_4(mve_vmlsldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+DEF_HELPER_FLAGS_4(mve_vmlsldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+DEF_HELPER_FLAGS_4(mve_vmlsldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+DEF_HELPER_FLAGS_4(mve_vmlsldavxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index bde54d05bb9..1be2d6b270f 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -145,3 +145,5 @@ VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
                  qn=%qn rdahi=%rdahi rdalo=%rdalo size=%size_16 &vmlaldav
 VMLALDAV_S       1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
 VMLALDAV_U       1111 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
+
+VMLSLDAV         1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 3c7a0bac3c7..1c22e2777d9 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -507,3 +507,8 @@ DO_LDAV(vmlaldavxsw, 4, int32_t, H4, true, +=, +=)
 
 DO_LDAV(vmlaldavuh, 2, uint16_t, H2, false, +=, +=)
 DO_LDAV(vmlaldavuw, 4, uint32_t, H4, false, +=, +=)
+
+DO_LDAV(vmlsldavsh, 2, int16_t, H2, false, +=, -=)
+DO_LDAV(vmlsldavxsh, 2, int16_t, H2, true, +=, -=)
+DO_LDAV(vmlsldavsw, 4, int32_t, H4, false, +=, -=)
+DO_LDAV(vmlsldavxsw, 4, int32_t, H4, true, +=, -=)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 03d9496f17d..66d713a24e2 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -497,3 +497,14 @@ static bool trans_VMLALDAV_U(DisasContext *s, arg_vmlaldav *a)
     };
     return do_long_dual_acc(s, a, fns[a->size][a->x]);
 }
+
+static bool trans_VMLSLDAV(DisasContext *s, arg_vmlaldav *a)
+{
+    MVEGenDualAccOpFn *fns[4][2] = {
+        { NULL, NULL },
+        { gen_helper_mve_vmlsldavsh, gen_helper_mve_vmlsldavxsh },
+        { gen_helper_mve_vmlsldavsw, gen_helper_mve_vmlsldavxsw },
+        { NULL, NULL },
+    };
+    return do_long_dual_acc(s, a, fns[a->size][a->x]);
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 31/55] include/qemu/int128.h: Add function to create Int128 from int64_t
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (29 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 30/55] target/arm: Implement MVE VMLSLDAV Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-08  6:45   ` Philippe Mathieu-Daudé
  2021-06-09  0:51   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 32/55] target/arm: Implement MVE VRMLALDAVH, VRMLSLDAVH Peter Maydell
                   ` (24 subsequent siblings)
  55 siblings, 2 replies; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

int128_make64() creates an Int128 from an unsigned 64 bit value; add
a function int128_makes64() creating an Int128 from a signed 64 bit
value.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/qemu/int128.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 52fc2384211..64500385e37 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -11,6 +11,11 @@ static inline Int128 int128_make64(uint64_t a)
     return a;
 }
 
+static inline Int128 int128_makes64(int64_t a)
+{
+    return a;
+}
+
 static inline Int128 int128_make128(uint64_t lo, uint64_t hi)
 {
     return (__uint128_t)hi << 64 | lo;
@@ -167,6 +172,11 @@ static inline Int128 int128_make64(uint64_t a)
     return (Int128) { a, 0 };
 }
 
+static inline Int128 int128_makes64(int64_t a)
+{
+    return (Int128) { a, a >> 63 };
+}
+
 static inline Int128 int128_make128(uint64_t lo, uint64_t hi)
 {
     return (Int128) { lo, hi };
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 32/55] target/arm: Implement MVE VRMLALDAVH, VRMLSLDAVH
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (30 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 31/55] include/qemu/int128.h: Add function to create Int128 from int64_t Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-09  1:05   ` Richard Henderson
  2021-06-07 16:57 ` [PATCH 33/55] target/arm: Implement MVE VADD (scalar) Peter Maydell
                   ` (23 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VRMLALDAVH and VRMLSLDAVH insns, which accumulate
the results of a rounded multiply of pairs of elements into a 72-bit
accumulator, returning the top 64 bits in a pair of general purpose
registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      |  7 +++++++
 target/arm/mve_helper.c    | 35 +++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 24 ++++++++++++++++++++++++
 4 files changed, 74 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 7789da1986b..723bef4a83a 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -159,3 +159,11 @@ DEF_HELPER_FLAGS_4(mve_vmlsldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vmlsldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vmlsldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vmlsldavxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+
+DEF_HELPER_FLAGS_4(mve_vrmlaldavhsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+DEF_HELPER_FLAGS_4(mve_vrmlaldavhxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+
+DEF_HELPER_FLAGS_4(mve_vrmlaldavhuw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+
+DEF_HELPER_FLAGS_4(mve_vrmlsldavhsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+DEF_HELPER_FLAGS_4(mve_vrmlsldavhxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 1be2d6b270f..ac68f072bbe 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -143,7 +143,14 @@ VDUP             1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
 
 @vmlaldav        .... .... . ... ... . ... . .... .... qm:3 . \
                  qn=%qn rdahi=%rdahi rdalo=%rdalo size=%size_16 &vmlaldav
+@vmlaldav_nosz   .... .... . ... ... . ... . .... .... qm:3 . \
+                 qn=%qn rdahi=%rdahi rdalo=%rdalo size=0 &vmlaldav
 VMLALDAV_S       1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
 VMLALDAV_U       1111 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
 
 VMLSLDAV         1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav
+
+VRMLALDAVH_S     1110 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_nosz
+VRMLALDAVH_U     1111 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_nosz
+
+VRMLSLDAVH       1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_nosz
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 1c22e2777d9..b22a7535308 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -18,6 +18,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/int128.h"
 #include "cpu.h"
 #include "internals.h"
 #include "exec/helper-proto.h"
@@ -512,3 +513,37 @@ DO_LDAV(vmlsldavsh, 2, int16_t, H2, false, +=, -=)
 DO_LDAV(vmlsldavxsh, 2, int16_t, H2, true, +=, -=)
 DO_LDAV(vmlsldavsw, 4, int32_t, H4, false, +=, -=)
 DO_LDAV(vmlsldavxsw, 4, int32_t, H4, true, +=, -=)
+
+/*
+ * Rounding multiply add long dual accumulate high: we must keep
+ * a 72-bit internal accumulator value and return the top 64 bits.
+ */
+#define DO_LDAVH(OP, ESIZE, TYPE, H, XCHG, EVENACC, ODDACC, TO128)      \
+    uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,         \
+                                    void *vm, uint64_t a)               \
+    {                                                                   \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        TYPE *n = vn, *m = vm;                                          \
+        Int128 acc = TO128(a);                                          \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+            if (mask & 1) {                                             \
+                if (e & 1) {                                            \
+                    acc = ODDACC(acc, TO128(n[H(e - 1 * XCHG)] * m[H(e)])); \
+                } else {                                                \
+                    acc = EVENACC(acc, TO128(n[H(e + 1 * XCHG)] * m[H(e)])); \
+                }                                                       \
+                acc = int128_add(acc, 1 << 7);                          \
+            }                                                           \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+        return int128_getlo(int128_rshift(acc, 8));                     \
+    }
+
+DO_LDAVH(vrmlaldavhsw, 4, int32_t, H4, false, int128_add, int128_add, int128_makes64)
+DO_LDAVH(vrmlaldavhxsw, 4, int32_t, H4, true, int128_add, int128_add, int128_makes64)
+
+DO_LDAVH(vrmlaldavhuw, 4, uint32_t, H4, false, int128_add, int128_add, int128_make64)
+
+DO_LDAVH(vrmlsldavhsw, 4, int32_t, H4, false, int128_add, int128_sub, int128_makes64)
+DO_LDAVH(vrmlsldavhxsw, 4, int32_t, H4, true, int128_add, int128_sub, int128_makes64)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 66d713a24e2..6792fca798d 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -508,3 +508,27 @@ static bool trans_VMLSLDAV(DisasContext *s, arg_vmlaldav *a)
     };
     return do_long_dual_acc(s, a, fns[a->size][a->x]);
 }
+
+static bool trans_VRMLALDAVH_S(DisasContext *s, arg_vmlaldav *a)
+{
+    MVEGenDualAccOpFn *fns[] = {
+        gen_helper_mve_vrmlaldavhsw, gen_helper_mve_vrmlaldavhxsw,
+    };
+    return do_long_dual_acc(s, a, fns[a->x]);
+}
+
+static bool trans_VRMLALDAVH_U(DisasContext *s, arg_vmlaldav *a)
+{
+    MVEGenDualAccOpFn *fns[] = {
+        gen_helper_mve_vrmlaldavhuw, NULL,
+    };
+    return do_long_dual_acc(s, a, fns[a->x]);
+}
+
+static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
+{
+    MVEGenDualAccOpFn *fns[] = {
+        gen_helper_mve_vrmlsldavhsw, gen_helper_mve_vrmlsldavhxsw,
+    };
+    return do_long_dual_acc(s, a, fns[a->x]);
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 33/55] target/arm: Implement MVE VADD (scalar)
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (31 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 32/55] target/arm: Implement MVE VRMLALDAVH, VRMLSLDAVH Peter Maydell
@ 2021-06-07 16:57 ` Peter Maydell
  2021-06-09 17:58   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 34/55] target/arm: Implement MVE VSUB, VMUL (scalar) Peter Maydell
                   ` (22 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:57 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the scalar form of the MVE VADD insn. This takes the
scalar operand from a general purpose register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  4 ++++
 target/arm/mve.decode      |  7 ++++++
 target/arm/mve_helper.c    | 25 +++++++++++++++++++
 target/arm/translate-mve.c | 49 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 85 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 723bef4a83a..d2626810aaf 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -147,6 +147,10 @@ DEF_HELPER_FLAGS_4(mve_vmulltub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulltuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 
+DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index ac68f072bbe..0ee7a727081 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -26,6 +26,7 @@
 &vldr_vstr rn qd imm p a w size l u
 &1op qd qm size
 &2op qd qm qn size
+&2scalar qd qn rm size
 
 @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
 # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -36,6 +37,8 @@
 @2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
 @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
 
+@2scalar .... .... .. size:2 .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
+
 # Vector loads and stores
 
 # Widening loads and narrowing stores:
@@ -154,3 +157,7 @@ VRMLALDAVH_S     1110 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_no
 VRMLALDAVH_U     1111 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_nosz
 
 VRMLSLDAVH       1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_nosz
+
+# Scalar operations
+
+VADD_scalar      1110 1110 0 . .. ... 1 ... 0 1111 . 100 .... @2scalar
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index b22a7535308..8d9811c5473 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -478,6 +478,31 @@ DO_2OP_S(vhsubs, do_vhsub_s)
 DO_2OP_U(vhsubu, do_vhsub_u)
 
 
+#define DO_2OP_SCALAR(OP, ESIZE, TYPE, H, FN)                           \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
+                                uint32_t rm)                            \
+    {                                                                   \
+        TYPE *d = vd, *n = vn;                                          \
+        TYPE m = rm;                                                    \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+            TYPE r = FN(n[H(e)], m);                                    \
+            uint64_t bytemask = mask_to_bytemask##ESIZE(mask);          \
+            d[H(e)] &= ~bytemask;                                       \
+            d[H(e)] |= (r & bytemask);                                  \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+/* provide unsigned 2-op scalar helpers for all sizes */
+#define DO_2OP_SCALAR_U(OP, FN)                 \
+    DO_2OP_SCALAR(OP##b, 1, uint8_t, H1, FN)    \
+    DO_2OP_SCALAR(OP##h, 2, uint16_t, H2, FN)   \
+    DO_2OP_SCALAR(OP##w, 4, uint32_t, H4, FN)
+
+DO_2OP_SCALAR_U(vadd_scalar, DO_ADD)
+
 /*
  * Multiply add long dual accumulate ops.
  */
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 6792fca798d..89e5aa50284 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -31,6 +31,7 @@
 typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
+typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
 
 /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
@@ -415,6 +416,54 @@ DO_2OP(VMULL_BU, vmullbu)
 DO_2OP(VMULL_TS, vmullts)
 DO_2OP(VMULL_TU, vmulltu)
 
+static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
+                          MVEGenTwoOpScalarFn fn)
+{
+    TCGv_ptr qd, qn;
+    TCGv_i32 rm;
+
+    if (!dc_isar_feature(aa32_mve, s)) {
+        return false;
+    }
+    if (a->qd > 7 || a->qn > 7 || !fn) {
+        return false;
+    }
+    if (a->rm == 13 || a->rm == 15) {
+        /* UNPREDICTABLE */
+        return false;
+    }
+    if (!mve_eci_check(s)) {
+        return true;
+    }
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    qd = mve_qreg_ptr(a->qd);
+    qn = mve_qreg_ptr(a->qn);
+    rm = load_reg(s, a->rm);
+    fn(cpu_env, qd, qn, rm);
+    tcg_temp_free_i32(rm);
+    tcg_temp_free_ptr(qd);
+    tcg_temp_free_ptr(qn);
+    mve_update_eci(s);
+    return true;
+}
+
+#define DO_2OP_SCALAR(INSN, FN) \
+    static bool trans_##INSN(DisasContext *s, arg_2scalar *a)   \
+    {                                                           \
+        MVEGenTwoOpScalarFn *fns[] = {                          \
+            gen_helper_mve_##FN##b,                             \
+            gen_helper_mve_##FN##h,                             \
+            gen_helper_mve_##FN##w,                             \
+            NULL,                                               \
+        };                                                      \
+        return do_2op_scalar(s, a, fns[a->size]);               \
+    }
+
+DO_2OP_SCALAR(VADD_scalar, vadd_scalar)
+
 static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
                              MVEGenDualAccOpFn *fn)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 34/55] target/arm: Implement MVE VSUB, VMUL (scalar)
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (32 preceding siblings ...)
  2021-06-07 16:57 ` [PATCH 33/55] target/arm: Implement MVE VADD (scalar) Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-09 18:00   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 35/55] target/arm: Implement MVE VHADD, VHSUB (scalar) Peter Maydell
                   ` (21 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the scalar forms of the MVE VSUB and VMUL insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    | 8 ++++++++
 target/arm/mve.decode      | 2 ++
 target/arm/mve_helper.c    | 2 ++
 target/arm/translate-mve.c | 2 ++
 4 files changed, 14 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index d2626810aaf..4d39527e201 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -151,6 +151,14 @@ DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(mve_vsub_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vsub_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vsub_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vmul_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vmul_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vmul_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 0ee7a727081..af5fba78ce2 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -161,3 +161,5 @@ VRMLSLDAVH       1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_no
 # Scalar operations
 
 VADD_scalar      1110 1110 0 . .. ... 1 ... 0 1111 . 100 .... @2scalar
+VSUB_scalar      1110 1110 0 . .. ... 1 ... 1 1111 . 100 .... @2scalar
+VMUL_scalar      1110 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 8d9811c5473..8892a713287 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -502,6 +502,8 @@ DO_2OP_U(vhsubu, do_vhsub_u)
     DO_2OP_SCALAR(OP##w, 4, uint32_t, H4, FN)
 
 DO_2OP_SCALAR_U(vadd_scalar, DO_ADD)
+DO_2OP_SCALAR_U(vsub_scalar, DO_SUB)
+DO_2OP_SCALAR_U(vmul_scalar, DO_MUL)
 
 /*
  * Multiply add long dual accumulate ops.
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 89e5aa50284..c03528d1973 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -463,6 +463,8 @@ static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
     }
 
 DO_2OP_SCALAR(VADD_scalar, vadd_scalar)
+DO_2OP_SCALAR(VSUB_scalar, vsub_scalar)
+DO_2OP_SCALAR(VMUL_scalar, vmul_scalar)
 
 static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
                              MVEGenDualAccOpFn *fn)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 35/55] target/arm: Implement MVE VHADD, VHSUB (scalar)
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (33 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 34/55] target/arm: Implement MVE VSUB, VMUL (scalar) Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-09 18:02   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 36/55] target/arm: Implement MVE VBRSR Peter Maydell
                   ` (20 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the scalar variants of the MVE VHADD and VHSUB insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    | 16 ++++++++++++++++
 target/arm/mve.decode      |  4 ++++
 target/arm/mve_helper.c    |  8 ++++++++
 target/arm/translate-mve.c |  4 ++++
 4 files changed, 32 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 4d39527e201..5853bd34687 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -159,6 +159,22 @@ DEF_HELPER_FLAGS_4(mve_vmul_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vmul_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vmul_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(mve_vhadds_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vhadds_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vhadds_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vhaddu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vhaddu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vhaddu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vhsubs_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vhsubs_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vhsubs_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vhsubu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vhsubu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vhsubu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index af5fba78ce2..5c332b04a7c 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -163,3 +163,7 @@ VRMLSLDAVH       1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_no
 VADD_scalar      1110 1110 0 . .. ... 1 ... 0 1111 . 100 .... @2scalar
 VSUB_scalar      1110 1110 0 . .. ... 1 ... 1 1111 . 100 .... @2scalar
 VMUL_scalar      1110 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
+VHADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
+VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
+VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
+VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 8892a713287..dbcf4c24949 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -500,10 +500,18 @@ DO_2OP_U(vhsubu, do_vhsub_u)
     DO_2OP_SCALAR(OP##b, 1, uint8_t, H1, FN)    \
     DO_2OP_SCALAR(OP##h, 2, uint16_t, H2, FN)   \
     DO_2OP_SCALAR(OP##w, 4, uint32_t, H4, FN)
+#define DO_2OP_SCALAR_S(OP, FN)                 \
+    DO_2OP_SCALAR(OP##b, 1, int8_t, H1, FN)     \
+    DO_2OP_SCALAR(OP##h, 2, int16_t, H2, FN)    \
+    DO_2OP_SCALAR(OP##w, 4, int32_t, H4, FN)
 
 DO_2OP_SCALAR_U(vadd_scalar, DO_ADD)
 DO_2OP_SCALAR_U(vsub_scalar, DO_SUB)
 DO_2OP_SCALAR_U(vmul_scalar, DO_MUL)
+DO_2OP_SCALAR_S(vhadds_scalar, do_vhadd_s)
+DO_2OP_SCALAR_U(vhaddu_scalar, do_vhadd_u)
+DO_2OP_SCALAR_S(vhsubs_scalar, do_vhsub_s)
+DO_2OP_SCALAR_U(vhsubu_scalar, do_vhsub_u)
 
 /*
  * Multiply add long dual accumulate ops.
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index c03528d1973..8dfc52d8027 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -465,6 +465,10 @@ static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
 DO_2OP_SCALAR(VADD_scalar, vadd_scalar)
 DO_2OP_SCALAR(VSUB_scalar, vsub_scalar)
 DO_2OP_SCALAR(VMUL_scalar, vmul_scalar)
+DO_2OP_SCALAR(VHADD_S_scalar, vhadds_scalar)
+DO_2OP_SCALAR(VHADD_U_scalar, vhaddu_scalar)
+DO_2OP_SCALAR(VHSUB_S_scalar, vhsubs_scalar)
+DO_2OP_SCALAR(VHSUB_U_scalar, vhsubu_scalar)
 
 static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
                              MVEGenDualAccOpFn *fn)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 36/55] target/arm: Implement MVE VBRSR
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (34 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 35/55] target/arm: Implement MVE VHADD, VHSUB (scalar) Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-09 18:08   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 37/55] target/arm: Implement MVE VPST Peter Maydell
                   ` (19 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VBRSR insn, which reverses a specified
number of bits in each element, setting the rest to zero.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  4 ++++
 target/arm/mve.decode      |  1 +
 target/arm/mve_helper.c    | 43 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  1 +
 4 files changed, 49 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 5853bd34687..1f77a661b9b 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -175,6 +175,10 @@ DEF_HELPER_FLAGS_4(mve_vhsubu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vhsubu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vhsubu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(mve_vbrsrb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vbrsrh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vbrsrw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 5c332b04a7c..a3dbdb72a5c 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -167,3 +167,4 @@ VHADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
 VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
 VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
+VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index dbcf4c24949..25426fae992 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -513,6 +513,49 @@ DO_2OP_SCALAR_U(vhaddu_scalar, do_vhadd_u)
 DO_2OP_SCALAR_S(vhsubs_scalar, do_vhsub_s)
 DO_2OP_SCALAR_U(vhsubu_scalar, do_vhsub_u)
 
+static inline uint32_t do_vbrsrb(uint32_t n, uint32_t m)
+{
+    m &= 0xff;
+    if (m == 0) {
+        return 0;
+    }
+    n = revbit8(n);
+    if (m < 8) {
+        n >>= 8 - m;
+    }
+    return n;
+}
+
+static inline uint32_t do_vbrsrh(uint32_t n, uint32_t m)
+{
+    m &= 0xff;
+    if (m == 0) {
+        return 0;
+    }
+    n = revbit16(n);
+    if (m < 16) {
+        n >>= 16 - m;
+    }
+    return n;
+}
+
+static inline uint32_t do_vbrsrw(uint32_t n, uint32_t m)
+{
+    m &= 0xff;
+    if (m == 0) {
+        return 0;
+    }
+    n = revbit32(n);
+    if (m < 32) {
+        n >>= 32 - m;
+    }
+    return n;
+}
+
+DO_2OP_SCALAR(vbrsrb, 1, uint8_t, H1, do_vbrsrb)
+DO_2OP_SCALAR(vbrsrh, 2, uint16_t, H2, do_vbrsrh)
+DO_2OP_SCALAR(vbrsrw, 4, uint32_t, H4, do_vbrsrw)
+
 /*
  * Multiply add long dual accumulate ops.
  */
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 8dfc52d8027..b7bf7d0960f 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -469,6 +469,7 @@ DO_2OP_SCALAR(VHADD_S_scalar, vhadds_scalar)
 DO_2OP_SCALAR(VHADD_U_scalar, vhaddu_scalar)
 DO_2OP_SCALAR(VHSUB_S_scalar, vhsubs_scalar)
 DO_2OP_SCALAR(VHSUB_U_scalar, vhsubu_scalar)
+DO_2OP_SCALAR(VBRSR, vbrsr)
 
 static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
                              MVEGenDualAccOpFn *fn)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 37/55] target/arm: Implement MVE VPST
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (35 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 36/55] target/arm: Implement MVE VBRSR Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-09 18:23   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 38/55] target/arm: Implement MVE VQADD and VQSUB Peter Maydell
                   ` (18 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VPST insn, which sets the predicate mask
fields in the VPR to the immediate value encoded in the insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/mve.decode      |  4 +++
 target/arm/translate-mve.c | 59 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index a3dbdb72a5c..e189e2de648 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -168,3 +168,7 @@ VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
 VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
+
+# Predicate operations
+%mask_22_13      22:1 13:3
+VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index b7bf7d0960f..45a71a22853 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -588,3 +588,62 @@ static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
     };
     return do_long_dual_acc(s, a, fns[a->x]);
 }
+
+static bool trans_VPST(DisasContext *s, arg_VPST *a)
+{
+    TCGv_i32 vpr, mask;
+
+    /* mask == 0 is a "related encoding" */
+    if (!dc_isar_feature(aa32_mve, s) || !a->mask) {
+        return false;
+    }
+    if (!mve_eci_check(s)) {
+        return true;
+    }
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+    /*
+     * Set the VPR mask fields. We take advantage of MASK01 and MASK23
+     * being adjacent fields in the register.
+     *
+     * This insn is not predicated, but it is subject to beat-wise
+     * execution, and the mask is updated on the odd-numbered beats.
+     * So if PSR.ECI says we should skip beat 1, we mustn't update the
+     * 01 mask field.
+     */
+    vpr = load_cpu_field(v7m.vpr);
+    switch (s->eci) {
+    case ECI_NONE:
+    case ECI_A0:
+        /* Update both 01 and 23 fields */
+        mask = tcg_const_i32(a->mask | (a->mask << 4));
+        tcg_gen_deposit_i32(vpr, vpr, mask, R_V7M_VPR_MASK01_SHIFT,
+                            R_V7M_VPR_MASK01_LENGTH + R_V7M_VPR_MASK23_LENGTH);
+        break;
+    case ECI_A0A1:
+    case ECI_A0A1A2:
+    case ECI_A0A1A2B0:
+        /* Update only the 23 mask field */
+        mask = tcg_const_i32(a->mask);
+        tcg_gen_deposit_i32(vpr, vpr, mask, R_V7M_VPR_MASK23_SHIFT,
+                            R_V7M_VPR_MASK23_LENGTH);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    store_cpu_field(vpr, v7m.vpr);
+    tcg_temp_free_i32(mask);
+
+    if (s->eci) {
+        TCGv_i32 eci;
+        mve_update_eci(s);
+        /*
+         * Update ECI in CPUState (since we didn't call a helper
+         * that will call mve_advance_vpt()).
+         */
+        eci = tcg_const_i32(s->eci << 4);
+        store_cpu_field(eci, condexec_bits);
+    }
+    return true;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 38/55] target/arm: Implement MVE VQADD and VQSUB
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (36 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 37/55] target/arm: Implement MVE VPST Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-09 18:46   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 39/55] target/arm: Implement MVE VQDMULH and VQRDMULH (scalar) Peter Maydell
                   ` (17 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VQADD and VQSUB insns, which perform saturating
addition of a scalar to each element.  Note that individual bytes of
each result element are used or discarded according to the predicate
mask, but FPSCR.QC is only set if the predicate mask for the lowest
byte of the element is set.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    | 16 ++++++++++
 target/arm/mve.decode      |  5 +++
 target/arm/mve_helper.c    | 62 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  4 +++
 4 files changed, 87 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 1f77a661b9b..a1acc44e40e 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -175,6 +175,22 @@ DEF_HELPER_FLAGS_4(mve_vhsubu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vhsubu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vhsubu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(mve_vqadds_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqadds_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqadds_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqaddu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqaddu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqaddu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqsubs_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqsubs_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqsubs_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqsubu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqsubu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqsubu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(mve_vbrsrb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vbrsrh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vbrsrw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index e189e2de648..c85227c675a 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -167,6 +167,11 @@ VHADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
 VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
 VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
+
+VQADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
+VQADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
+VQSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
+VQSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
 VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
 
 # Predicate operations
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 25426fae992..41c4f2033f6 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -477,6 +477,33 @@ DO_2OP_U(vhaddu, do_vhadd_u)
 DO_2OP_S(vhsubs, do_vhsub_s)
 DO_2OP_U(vhsubu, do_vhsub_u)
 
+static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
+{
+    if (val > max) {
+        *s = true;
+        return max;
+    } else if (val < min) {
+        *s = true;
+        return min;
+    }
+    return val;
+}
+
+#define DO_SQADD_B(n, m, s) do_sat_bhw((int64_t)n + m, INT8_MIN, INT8_MAX, s)
+#define DO_SQADD_H(n, m, s) do_sat_bhw((int64_t)n + m, INT16_MIN, INT16_MAX, s)
+#define DO_SQADD_W(n, m, s) do_sat_bhw((int64_t)n + m, INT32_MIN, INT32_MAX, s)
+
+#define DO_UQADD_B(n, m, s) do_sat_bhw((int64_t)n + m, 0, UINT8_MAX, s)
+#define DO_UQADD_H(n, m, s) do_sat_bhw((int64_t)n + m, 0, UINT16_MAX, s)
+#define DO_UQADD_W(n, m, s) do_sat_bhw((int64_t)n + m, 0, UINT32_MAX, s)
+
+#define DO_SQSUB_B(n, m, s) do_sat_bhw((int64_t)n - m, INT8_MIN, INT8_MAX, s)
+#define DO_SQSUB_H(n, m, s) do_sat_bhw((int64_t)n - m, INT16_MIN, INT16_MAX, s)
+#define DO_SQSUB_W(n, m, s) do_sat_bhw((int64_t)n - m, INT32_MIN, INT32_MAX, s)
+
+#define DO_UQSUB_B(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT8_MAX, s)
+#define DO_UQSUB_H(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT16_MAX, s)
+#define DO_UQSUB_W(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT32_MAX, s)
 
 #define DO_2OP_SCALAR(OP, ESIZE, TYPE, H, FN)                           \
     void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
@@ -495,6 +522,27 @@ DO_2OP_U(vhsubu, do_vhsub_u)
         mve_advance_vpt(env);                                           \
     }
 
+#define DO_2OP_SAT_SCALAR(OP, ESIZE, TYPE, H, FN)                       \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
+                                uint32_t rm)                            \
+    {                                                                   \
+        TYPE *d = vd, *n = vn;                                          \
+        TYPE m = rm;                                                    \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+            bool sat = false;                                           \
+            TYPE r = FN(n[H(e)], m, &sat);                              \
+            uint64_t bytemask = mask_to_bytemask##ESIZE(mask);          \
+            d[H(e)] &= ~bytemask;                                       \
+            d[H(e)] |= (r & bytemask);                                  \
+            if (sat && (mask & 1)) {                                    \
+                env->vfp.qc[0] = 1;                                     \
+            }                                                           \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
 /* provide unsigned 2-op scalar helpers for all sizes */
 #define DO_2OP_SCALAR_U(OP, FN)                 \
     DO_2OP_SCALAR(OP##b, 1, uint8_t, H1, FN)    \
@@ -513,6 +561,20 @@ DO_2OP_SCALAR_U(vhaddu_scalar, do_vhadd_u)
 DO_2OP_SCALAR_S(vhsubs_scalar, do_vhsub_s)
 DO_2OP_SCALAR_U(vhsubu_scalar, do_vhsub_u)
 
+DO_2OP_SAT_SCALAR(vqaddu_scalarb, 1, uint8_t, H1, DO_UQADD_B)
+DO_2OP_SAT_SCALAR(vqaddu_scalarh, 2, uint16_t, H2, DO_UQADD_H)
+DO_2OP_SAT_SCALAR(vqaddu_scalarw, 4, uint32_t, H4, DO_UQADD_W)
+DO_2OP_SAT_SCALAR(vqadds_scalarb, 1, int8_t, H1, DO_SQADD_B)
+DO_2OP_SAT_SCALAR(vqadds_scalarh, 2, int16_t, H2, DO_SQADD_H)
+DO_2OP_SAT_SCALAR(vqadds_scalarw, 4, int32_t, H4, DO_SQADD_W)
+
+DO_2OP_SAT_SCALAR(vqsubu_scalarb, 1, uint8_t, H1, DO_UQSUB_B)
+DO_2OP_SAT_SCALAR(vqsubu_scalarh, 2, uint16_t, H2, DO_UQSUB_H)
+DO_2OP_SAT_SCALAR(vqsubu_scalarw, 4, uint32_t, H4, DO_UQSUB_W)
+DO_2OP_SAT_SCALAR(vqsubs_scalarb, 1, int8_t, H1, DO_SQSUB_B)
+DO_2OP_SAT_SCALAR(vqsubs_scalarh, 2, int16_t, H2, DO_SQSUB_H)
+DO_2OP_SAT_SCALAR(vqsubs_scalarw, 4, int32_t, H4, DO_SQSUB_W)
+
 static inline uint32_t do_vbrsrb(uint32_t n, uint32_t m)
 {
     m &= 0xff;
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 45a71a22853..254ff2a01b2 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -469,6 +469,10 @@ DO_2OP_SCALAR(VHADD_S_scalar, vhadds_scalar)
 DO_2OP_SCALAR(VHADD_U_scalar, vhaddu_scalar)
 DO_2OP_SCALAR(VHSUB_S_scalar, vhsubs_scalar)
 DO_2OP_SCALAR(VHSUB_U_scalar, vhsubu_scalar)
+DO_2OP_SCALAR(VQADD_S_scalar, vqadds_scalar)
+DO_2OP_SCALAR(VQADD_U_scalar, vqaddu_scalar)
+DO_2OP_SCALAR(VQSUB_S_scalar, vqsubs_scalar)
+DO_2OP_SCALAR(VQSUB_U_scalar, vqsubu_scalar)
 DO_2OP_SCALAR(VBRSR, vbrsr)
 
 static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 39/55] target/arm: Implement MVE VQDMULH and VQRDMULH (scalar)
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (37 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 38/55] target/arm: Implement MVE VQADD and VQSUB Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-09 18:58   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 40/55] target/arm: Implement MVE VQDMULL scalar Peter Maydell
                   ` (16 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VQDMULH and VQRDMULH scalar insns, which multiply
elements by the scalar, double, possibly round, take the high half
and saturate.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      |  3 +++
 target/arm/mve_helper.c    | 25 +++++++++++++++++++++++++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 38 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index a1acc44e40e..9bab04305a7 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -191,6 +191,14 @@ DEF_HELPER_FLAGS_4(mve_vqsubu_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vqsubu_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vqsubu_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(mve_vqdmulh_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqdmulh_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqdmulh_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqrdmulh_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrdmulh_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrdmulh_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(mve_vbrsrb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vbrsrh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vbrsrw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index c85227c675a..47ce6ebb83b 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -174,6 +174,9 @@ VQSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
 VQSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
 VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
 
+VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
+VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
+
 # Predicate operations
 %mask_22_13      22:1 13:3
 VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 41c4f2033f6..6e2da6ac8bc 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -505,6 +505,24 @@ static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
 #define DO_UQSUB_H(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT16_MAX, s)
 #define DO_UQSUB_W(n, m, s) do_sat_bhw((int64_t)n - m, 0, UINT32_MAX, s)
 
+/*
+ * For QDMULH and QRDMULH we simplify "double and shift by esize" into
+ * "shift by esize-1", adjusting the QRDMULH rounding constant to match.
+ */
+#define DO_QDMULH_B(n, m, s) do_sat_bhw(((int64_t)n * m) >> 7, \
+                                        INT8_MIN, INT8_MAX, s)
+#define DO_QDMULH_H(n, m, s) do_sat_bhw(((int64_t)n * m) >> 15, \
+                                        INT16_MIN, INT16_MAX, s)
+#define DO_QDMULH_W(n, m, s) do_sat_bhw(((int64_t)n * m) >> 31, \
+                                        INT32_MIN, INT32_MAX, s)
+
+#define DO_QRDMULH_B(n, m, s) do_sat_bhw(((int64_t)n * m + (1 << 6)) >> 7, \
+                                         INT8_MIN, INT8_MAX, s)
+#define DO_QRDMULH_H(n, m, s) do_sat_bhw(((int64_t)n * m + (1 << 14)) >> 15, \
+                                         INT16_MIN, INT16_MAX, s)
+#define DO_QRDMULH_W(n, m, s) do_sat_bhw(((int64_t)n * m + (1 << 30)) >> 31, \
+                                         INT32_MIN, INT32_MAX, s)
+
 #define DO_2OP_SCALAR(OP, ESIZE, TYPE, H, FN)                           \
     void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
                                 uint32_t rm)                            \
@@ -575,6 +593,13 @@ DO_2OP_SAT_SCALAR(vqsubs_scalarb, 1, int8_t, H1, DO_SQSUB_B)
 DO_2OP_SAT_SCALAR(vqsubs_scalarh, 2, int16_t, H2, DO_SQSUB_H)
 DO_2OP_SAT_SCALAR(vqsubs_scalarw, 4, int32_t, H4, DO_SQSUB_W)
 
+DO_2OP_SAT_SCALAR(vqdmulh_scalarb, 1, int8_t, H1, DO_QDMULH_B)
+DO_2OP_SAT_SCALAR(vqdmulh_scalarh, 2, int16_t, H2, DO_QDMULH_H)
+DO_2OP_SAT_SCALAR(vqdmulh_scalarw, 4, int32_t, H4, DO_QDMULH_W)
+DO_2OP_SAT_SCALAR(vqrdmulh_scalarb, 1, int8_t, H1, DO_QRDMULH_B)
+DO_2OP_SAT_SCALAR(vqrdmulh_scalarh, 2, int16_t, H2, DO_QRDMULH_H)
+DO_2OP_SAT_SCALAR(vqrdmulh_scalarw, 4, int32_t, H4, DO_QRDMULH_W)
+
 static inline uint32_t do_vbrsrb(uint32_t n, uint32_t m)
 {
     m &= 0xff;
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 254ff2a01b2..4d08067c1e2 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -473,6 +473,8 @@ DO_2OP_SCALAR(VQADD_S_scalar, vqadds_scalar)
 DO_2OP_SCALAR(VQADD_U_scalar, vqaddu_scalar)
 DO_2OP_SCALAR(VQSUB_S_scalar, vqsubs_scalar)
 DO_2OP_SCALAR(VQSUB_U_scalar, vqsubu_scalar)
+DO_2OP_SCALAR(VQDMULH_scalar, vqdmulh_scalar)
+DO_2OP_SCALAR(VQRDMULH_scalar, vqrdmulh_scalar)
 DO_2OP_SCALAR(VBRSR, vbrsr)
 
 static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 40/55] target/arm: Implement MVE VQDMULL scalar
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (38 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 39/55] target/arm: Implement MVE VQDMULH and VQRDMULH (scalar) Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-09 19:11   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 41/55] target/arm: Implement MVE VQDMULH, VQRDMULH (vector) Peter Maydell
                   ` (15 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VQDMULL scalar insn. This multiplies the top or
bottom half of each element by the scalar, doubles and saturates
to a double-width result.

Note that this encoding overlaps with VQADD and VQSUB; it uses
what in VQADD and VQSUB would be the 'size=0b11' encoding.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  5 +++
 target/arm/mve.decode      | 23 +++++++++++---
 target/arm/mve_helper.c    | 65 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 30 ++++++++++++++++++
 4 files changed, 119 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 9bab04305a7..55c4e41deff 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -203,6 +203,11 @@ DEF_HELPER_FLAGS_4(mve_vbrsrb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vbrsrh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vbrsrw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(mve_vqdmullb_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqdmullb_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 47ce6ebb83b..a71ad7252bf 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -23,6 +23,9 @@
 %qm 5:1 1:3
 %qn 7:1 17:3
 
+# VQDMULL has size in bit 28: 0 for 16 bit, 1 for 32 bit
+%size_28 28:1 !function=plus_1
+
 &vldr_vstr rn qd imm p a w size l u
 &1op qd qm size
 &2op qd qm qn size
@@ -38,6 +41,7 @@
 @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
 
 @2scalar .... .... .. size:2 .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
+@2scalar_nosz .... .... .... .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
 
 # Vector loads and stores
 
@@ -168,15 +172,26 @@ VHADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
 VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 VHSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
 
-VQADD_S_scalar   1110 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
-VQADD_U_scalar   1111 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
-VQSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
-VQSUB_U_scalar   1111 1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
+{
+  VQADD_S_scalar  1110  1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
+  VQADD_U_scalar  1111  1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
+  VQDMULLB_scalar 111 . 1110 0 . 11 ... 0 ... 0 1111 . 110 .... @2scalar_nosz \
+                  size=%size_28
+}
+
+{
+  VQSUB_S_scalar  1110  1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
+  VQSUB_U_scalar  1111  1110 0 . .. ... 0 ... 1 1111 . 110 .... @2scalar
+  VQDMULLT_scalar 111 . 1110 0 . 11 ... 0 ... 1 1111 . 110 .... @2scalar_nosz \
+                  size=%size_28
+}
+
 VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
 
 VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
 VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
 
+
 # Predicate operations
 %mask_22_13      22:1 13:3
 VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 6e2da6ac8bc..97529531ed0 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -600,6 +600,71 @@ DO_2OP_SAT_SCALAR(vqrdmulh_scalarb, 1, int8_t, H1, DO_QRDMULH_B)
 DO_2OP_SAT_SCALAR(vqrdmulh_scalarh, 2, int16_t, H2, DO_QRDMULH_H)
 DO_2OP_SAT_SCALAR(vqrdmulh_scalarw, 4, int32_t, H4, DO_QRDMULH_W)
 
+/*
+ * Long saturating scalar ops. As with DO_2OP_L, TYPE and H are for the
+ * input (smaller) type and LESIZE, LTYPE, LH for the output (long) type.
+ * SATMASK specifies which bits of the predicate mask matter for determining
+ * whether to propagate a saturation indication into FPSCR.QC -- for
+ * the 16x16->32 case we must check only the bit corresponding to the T or B
+ * half that we used, but for the 32x32->64 case we propagate if the mask
+ * bit is set for either half.
+ */
+#define DO_2OP_SAT_SCALAR_L(OP, TOP, TYPE, H, LESIZE, LTYPE, LH, FN, SATMASK) \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
+                                uint32_t rm)                            \
+    {                                                                   \
+        LTYPE *d = vd;                                                  \
+        TYPE *n = vn;                                                   \
+        TYPE m = rm;                                                    \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned le;                                                    \
+        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
+            bool sat = false;                                           \
+            LTYPE r = FN((LTYPE)n[H(le * 2 + TOP)], m, &sat);           \
+            uint64_t bytemask = mask_to_bytemask##LESIZE(mask);         \
+            d[LH(le)] &= ~bytemask;                                     \
+            d[LH(le)] |= (r & bytemask);                                \
+            if (sat && (mask & SATMASK)) {                              \
+                env->vfp.qc[0] = 1;                                     \
+            }                                                           \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+static inline int32_t do_qdmullh(int16_t n, int16_t m, bool *sat)
+{
+    int64_t r = ((int64_t)n * m) * 2;
+    return do_sat_bhw(r, INT32_MIN, INT32_MAX, sat);
+}
+
+static inline int64_t do_qdmullw(int32_t n, int32_t m, bool *sat)
+{
+    /* The multiply can't overflow, but the doubling might */
+    int64_t r = (int64_t)n * m;
+    if (r > INT64_MAX / 2) {
+        *sat = true;
+        return INT64_MAX;
+    } else if (r < INT64_MIN / 2) {
+        *sat = true;
+        return INT64_MIN;
+    } else {
+        return r * 2;
+    }
+}
+
+#define SATMASK16B 1
+#define SATMASK16T (1 << 2)
+#define SATMASK32 ((1 << 4) | 1)
+
+DO_2OP_SAT_SCALAR_L(vqdmullb_scalarh, 0, int16_t, H2, 4, int32_t, H4, \
+                    do_qdmullh, SATMASK16B)
+DO_2OP_SAT_SCALAR_L(vqdmullb_scalarw, 0, int32_t, H4, 8, int64_t, , \
+                    do_qdmullw, SATMASK32)
+DO_2OP_SAT_SCALAR_L(vqdmullt_scalarh, 1, int16_t, H2, 4, int32_t, H4, \
+                    do_qdmullh, SATMASK16T)
+DO_2OP_SAT_SCALAR_L(vqdmullt_scalarw, 1, int32_t, H4, 8, int64_t, , \
+                    do_qdmullw, SATMASK32)
+
 static inline uint32_t do_vbrsrb(uint32_t n, uint32_t m)
 {
     m &= 0xff;
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 4d08067c1e2..2bb7482e6af 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -477,6 +477,36 @@ DO_2OP_SCALAR(VQDMULH_scalar, vqdmulh_scalar)
 DO_2OP_SCALAR(VQRDMULH_scalar, vqrdmulh_scalar)
 DO_2OP_SCALAR(VBRSR, vbrsr)
 
+static bool trans_VQDMULLB_scalar(DisasContext *s, arg_2scalar *a)
+{
+    MVEGenTwoOpScalarFn *fns[] = {
+        NULL,
+        gen_helper_mve_vqdmullb_scalarh,
+        gen_helper_mve_vqdmullb_scalarw,
+        NULL,
+    };
+    if (a->qd == a->qn && a->size == MO_32) {
+        /* UNPREDICTABLE; we choose to undef */
+        return false;
+    }
+    return do_2op_scalar(s, a, fns[a->size]);
+}
+
+static bool trans_VQDMULLT_scalar(DisasContext *s, arg_2scalar *a)
+{
+    MVEGenTwoOpScalarFn *fns[] = {
+        NULL,
+        gen_helper_mve_vqdmullt_scalarh,
+        gen_helper_mve_vqdmullt_scalarw,
+        NULL,
+    };
+    if (a->qd == a->qn && a->size == MO_32) {
+        /* UNPREDICTABLE; we choose to undef */
+        return false;
+    }
+    return do_2op_scalar(s, a, fns[a->size]);
+}
+
 static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
                              MVEGenDualAccOpFn *fn)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 41/55] target/arm: Implement MVE VQDMULH, VQRDMULH (vector)
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (39 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 40/55] target/arm: Implement MVE VQDMULL scalar Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-09 19:13   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 42/55] target/arm: Implement MVE VQADD, VQSUB (vector) Peter Maydell
                   ` (14 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the vector forms of the MVE VQDMULH and VQRDMULH insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      |  3 +++
 target/arm/mve_helper.c    | 27 +++++++++++++++++++++++++++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 40 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 55c4e41deff..a7eddf3d488 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -147,6 +147,14 @@ DEF_HELPER_FLAGS_4(mve_vmulltub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulltuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 
+DEF_HELPER_FLAGS_4(mve_vqdmulhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmulhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmulhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vqrdmulhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrdmulhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrdmulhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
 DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index a71ad7252bf..9860d43f73c 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -113,6 +113,9 @@ VMULL_BU         111 1 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
 VMULL_TS         111 0 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
 VMULL_TU         111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
 
+VQDMULH          1110 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
+VQRDMULH         1111 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
+
 # Vector miscellaneous
 
 VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 97529531ed0..7d65bcef56c 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -357,6 +357,25 @@ DO_1OP(vfnegs, 4, uint32_t, H4, DO_FNEG)
         mve_advance_vpt(env);                                           \
     }
 
+#define DO_2OP_SAT(OP, ESIZE, TYPE, H, FN)                              \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn, void *vm) \
+    {                                                                   \
+        TYPE *d = vd, *n = vn, *m = vm;                                 \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+            bool sat = false;                                           \
+            TYPE r = FN(n[H(e)], m[H(e)], &sat);                        \
+            uint64_t bytemask = mask_to_bytemask##ESIZE(mask);          \
+            d[H(e)] &= ~bytemask;                                       \
+            d[H(e)] |= (r & bytemask);                                  \
+            if (sat && (mask & 1)) {                                    \
+                env->vfp.qc[0] = 1;                                     \
+            }                                                           \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
 #define DO_AND(N, M)  ((N) & (M))
 #define DO_BIC(N, M)  ((N) & ~(M))
 #define DO_ORR(N, M)  ((N) | (M))
@@ -523,6 +542,14 @@ static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
 #define DO_QRDMULH_W(n, m, s) do_sat_bhw(((int64_t)n * m + (1 << 30)) >> 31, \
                                          INT32_MIN, INT32_MAX, s)
 
+DO_2OP_SAT(vqdmulhb, 1, int8_t, H1, DO_QDMULH_B)
+DO_2OP_SAT(vqdmulhh, 2, int16_t, H2, DO_QDMULH_H)
+DO_2OP_SAT(vqdmulhw, 4, int32_t, H4, DO_QDMULH_W)
+
+DO_2OP_SAT(vqrdmulhb, 1, int8_t, H1, DO_QRDMULH_B)
+DO_2OP_SAT(vqrdmulhh, 2, int16_t, H2, DO_QRDMULH_H)
+DO_2OP_SAT(vqrdmulhw, 4, int32_t, H4, DO_QRDMULH_W)
+
 #define DO_2OP_SCALAR(OP, ESIZE, TYPE, H, FN)                           \
     void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
                                 uint32_t rm)                            \
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 2bb7482e6af..213a90b59b6 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -415,6 +415,8 @@ DO_2OP(VMULL_BS, vmullbs)
 DO_2OP(VMULL_BU, vmullbu)
 DO_2OP(VMULL_TS, vmullts)
 DO_2OP(VMULL_TU, vmulltu)
+DO_2OP(VQDMULH, vqdmulh)
+DO_2OP(VQRDMULH, vqrdmulh)
 
 static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                           MVEGenTwoOpScalarFn fn)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 42/55] target/arm: Implement MVE VQADD, VQSUB (vector)
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (40 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 41/55] target/arm: Implement MVE VQDMULH, VQRDMULH (vector) Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-09 19:15   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 43/55] target/arm: Implement MVE VQSHL (vector) Peter Maydell
                   ` (13 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the vector forms of the MVE VQADD and VQSUB insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    | 16 ++++++++++++++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 14 ++++++++++++++
 target/arm/translate-mve.c |  4 ++++
 4 files changed, 39 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index a7eddf3d488..9801a39a984 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -155,6 +155,22 @@ DEF_HELPER_FLAGS_4(mve_vqrdmulhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqrdmulhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqrdmulhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 
+DEF_HELPER_FLAGS_4(mve_vqaddsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqaddsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqaddsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vqaddub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqadduh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqadduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vqsubsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqsubsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqsubsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vqsubub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqsubuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqsubuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
 DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 9860d43f73c..80fa647c08f 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -116,6 +116,11 @@ VMULL_TU         111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
 VQDMULH          1110 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
 VQRDMULH         1111 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
 
+VQADD_S          111 0 1111 0 . .. ... 0 ... 0 0000 . 1 . 1 ... 0 @2op
+VQADD_U          111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 1 ... 0 @2op
+VQSUB_S          111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
+VQSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
+
 # Vector miscellaneous
 
 VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 7d65bcef56c..d3562f80026 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -550,6 +550,20 @@ DO_2OP_SAT(vqrdmulhb, 1, int8_t, H1, DO_QRDMULH_B)
 DO_2OP_SAT(vqrdmulhh, 2, int16_t, H2, DO_QRDMULH_H)
 DO_2OP_SAT(vqrdmulhw, 4, int32_t, H4, DO_QRDMULH_W)
 
+DO_2OP_SAT(vqaddub, 1, uint8_t, H1, DO_UQADD_B)
+DO_2OP_SAT(vqadduh, 2, uint16_t, H2, DO_UQADD_H)
+DO_2OP_SAT(vqadduw, 4, uint32_t, H4, DO_UQADD_W)
+DO_2OP_SAT(vqaddsb, 1, int8_t, H1, DO_SQADD_B)
+DO_2OP_SAT(vqaddsh, 2, int16_t, H2, DO_SQADD_H)
+DO_2OP_SAT(vqaddsw, 4, int32_t, H4, DO_SQADD_W)
+
+DO_2OP_SAT(vqsubub, 1, uint8_t, H1, DO_UQSUB_B)
+DO_2OP_SAT(vqsubuh, 2, uint16_t, H2, DO_UQSUB_H)
+DO_2OP_SAT(vqsubuw, 4, uint32_t, H4, DO_UQSUB_W)
+DO_2OP_SAT(vqsubsb, 1, int8_t, H1, DO_SQSUB_B)
+DO_2OP_SAT(vqsubsh, 2, int16_t, H2, DO_SQSUB_H)
+DO_2OP_SAT(vqsubsw, 4, int32_t, H4, DO_SQSUB_W)
+
 #define DO_2OP_SCALAR(OP, ESIZE, TYPE, H, FN)                           \
     void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
                                 uint32_t rm)                            \
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 213a90b59b6..957e7e48fab 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -417,6 +417,10 @@ DO_2OP(VMULL_TS, vmullts)
 DO_2OP(VMULL_TU, vmulltu)
 DO_2OP(VQDMULH, vqdmulh)
 DO_2OP(VQRDMULH, vqrdmulh)
+DO_2OP(VQADD_S, vqadds)
+DO_2OP(VQADD_U, vqaddu)
+DO_2OP(VQSUB_S, vqsubs)
+DO_2OP(VQSUB_U, vqsubu)
 
 static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                           MVEGenTwoOpScalarFn fn)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 43/55] target/arm: Implement MVE VQSHL (vector)
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (41 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 42/55] target/arm: Implement MVE VQADD, VQSUB (vector) Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-09 19:26   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 44/55] target/arm: Implement MVE VQRSHL Peter Maydell
                   ` (12 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VQSHL insn (encoding T4, which is the
vector-shift-by-vector version).

The DO_SQSHL_OP and DO_UQSHL_OP macros here are derived from
the neon_helper.c code for qshl_u{8,16,32} and qshl_s{8,16,32}.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  8 +++++
 target/arm/mve.decode      | 12 +++++++
 target/arm/mve_helper.c    | 73 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 95 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 9801a39a984..352b6a46a5e 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -171,6 +171,14 @@ DEF_HELPER_FLAGS_4(mve_vqsubub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqsubuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqsubuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 
+DEF_HELPER_FLAGS_4(mve_vqshlsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqshlsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqshlsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vqshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
 DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 80fa647c08f..2c37e265765 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -40,6 +40,15 @@
 @2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
 @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
 
+# The _rev suffix indicates that Vn and Vm are reversed. This is
+# the case for shifts. In the Arm ARM these insns are documented
+# with the Vm and Vn fields in their usual places, but in the
+# assembly the operands are listed "backwards", ie in the order
+# Qd, Qm, Qn where other insns use Qd, Qn, Qm. For QEMU we choose
+# to consider Vm and Vn as being in different fields in the insn.
+# This gives us consistency with A64 and Neon.
+@2op_rev .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qn qn=%qm
+
 @2scalar .... .... .. size:2 .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
 @2scalar_nosz .... .... .... .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
 
@@ -121,6 +130,9 @@ VQADD_U          111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 1 ... 0 @2op
 VQSUB_S          111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
 VQSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
 
+VQSHL_S          111 0 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
+VQSHL_U          111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
+
 # Vector miscellaneous
 
 VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index d3562f80026..7ac41cb1460 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -376,6 +376,18 @@ DO_1OP(vfnegs, 4, uint32_t, H4, DO_FNEG)
         mve_advance_vpt(env);                                           \
     }
 
+/* provide unsigned 2-op helpers for all sizes */
+#define DO_2OP_SAT_U(OP, FN)                    \
+    DO_2OP_SAT(OP##b, 1, uint8_t, H1, FN)       \
+    DO_2OP_SAT(OP##h, 2, uint16_t, H2, FN)      \
+    DO_2OP_SAT(OP##w, 4, uint32_t, H4, FN)
+
+/* provide signed 2-op helpers for all sizes */
+#define DO_2OP_SAT_S(OP, FN)                    \
+    DO_2OP_SAT(OP##b, 1, int8_t, H1, FN)        \
+    DO_2OP_SAT(OP##h, 2, int16_t, H2, FN)       \
+    DO_2OP_SAT(OP##w, 4, int32_t, H4, FN)
+
 #define DO_AND(N, M)  ((N) & (M))
 #define DO_BIC(N, M)  ((N) & ~(M))
 #define DO_ORR(N, M)  ((N) | (M))
@@ -564,6 +576,67 @@ DO_2OP_SAT(vqsubsb, 1, int8_t, H1, DO_SQSUB_B)
 DO_2OP_SAT(vqsubsh, 2, int16_t, H2, DO_SQSUB_H)
 DO_2OP_SAT(vqsubsw, 4, int32_t, H4, DO_SQSUB_W)
 
+#define DO_SQSHL_OP(src1, src2, satp)                           \
+    ({                                                          \
+        int8_t tmp;                                             \
+        typeof(src1) dest;                                      \
+        tmp = (int8_t)src2;                                     \
+        if (tmp >= (ssize_t)sizeof(src1) * 8) {                 \
+            if (src1) {                                         \
+                *satp = true;                                   \
+                dest = (uint32_t)(1 << (sizeof(src1) * 8 - 1)); \
+                if (src1 > 0) {                                 \
+                    dest--;                                     \
+                }                                               \
+            } else {                                            \
+                dest = src1;                                    \
+            }                                                   \
+        } else if (tmp <= -(ssize_t)sizeof(src1) * 8) {         \
+            dest = src1 >> 31;                                  \
+        } else if (tmp < 0) {                                   \
+            dest = src1 >> -tmp;                                \
+        } else {                                                \
+            dest = src1 << tmp;                                 \
+            if ((dest >> tmp) != src1) {                        \
+                *satp = true;                                   \
+                dest = (uint32_t)(1 << (sizeof(src1) * 8 - 1)); \
+                if (src1 > 0) {                                 \
+                    dest--;                                     \
+                }                                               \
+            }                                                   \
+        }                                                       \
+        dest;                                                   \
+    })
+
+#define DO_UQSHL_OP(src1, src2, satp)                   \
+    ({                                                  \
+        int8_t tmp;                                     \
+        typeof(src1) dest;                              \
+        tmp = (int8_t)src2;                             \
+        if (tmp >= (ssize_t)sizeof(src1) * 8) {         \
+            if (src1) {                                 \
+                *satp = true;                           \
+                dest = ~0;                              \
+            } else {                                    \
+                dest = 0;                               \
+            }                                           \
+        } else if (tmp <= -(ssize_t)sizeof(src1) * 8) { \
+            dest = 0;                                   \
+        } else if (tmp < 0) {                           \
+            dest = src1 >> -tmp;                        \
+        } else {                                        \
+            dest = src1 << tmp;                         \
+            if ((dest >> tmp) != src1) {                \
+                *satp = true;                           \
+                dest = ~0;                              \
+            }                                           \
+        }                                               \
+        dest;                                           \
+    })
+
+DO_2OP_SAT_S(vqshls, DO_SQSHL_OP)
+DO_2OP_SAT_U(vqshlu, DO_UQSHL_OP)
+
 #define DO_2OP_SCALAR(OP, ESIZE, TYPE, H, FN)                           \
     void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
                                 uint32_t rm)                            \
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 957e7e48fab..998f47fb94e 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -421,6 +421,8 @@ DO_2OP(VQADD_S, vqadds)
 DO_2OP(VQADD_U, vqaddu)
 DO_2OP(VQSUB_S, vqsubs)
 DO_2OP(VQSUB_U, vqsubu)
+DO_2OP(VQSHL_S, vqshls)
+DO_2OP(VQSHL_U, vqshlu)
 
 static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                           MVEGenTwoOpScalarFn fn)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 44/55] target/arm: Implement MVE VQRSHL
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (42 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 43/55] target/arm: Implement MVE VQSHL (vector) Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-09 19:29   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 45/55] target/arm: Implement MVE VSHL insn Peter Maydell
                   ` (11 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MV VQRSHL (vector) insn.  Again, the code to perform
the actual shifts is borrowed from neon_helper.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |   8 +++
 target/arm/mve.decode      |   3 +
 target/arm/mve_helper.c    | 127 +++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |   2 +
 4 files changed, 140 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 352b6a46a5e..a2f9916b24e 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -179,6 +179,14 @@ DEF_HELPER_FLAGS_4(mve_vqshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 
+DEF_HELPER_FLAGS_4(mve_vqrshlsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrshlsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrshlsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vqrshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
 DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 2c37e265765..e78eab6d659 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -133,6 +133,9 @@ VQSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
 VQSHL_S          111 0 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
 VQSHL_U          111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
 
+VQRSHL_S         111 0 1111 0 . .. ... 0 ... 0 0101 . 1 . 1 ... 0 @2op_rev
+VQRSHL_U         111 1 1111 0 . .. ... 0 ... 0 0101 . 1 . 1 ... 0 @2op_rev
+
 # Vector miscellaneous
 
 VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 7ac41cb1460..b7f9af4067b 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -637,6 +637,133 @@ DO_2OP_SAT(vqsubsw, 4, int32_t, H4, DO_SQSUB_W)
 DO_2OP_SAT_S(vqshls, DO_SQSHL_OP)
 DO_2OP_SAT_U(vqshlu, DO_UQSHL_OP)
 
+#define DO_UQRSHL_OP(src1, src2, satp)                  \
+    ({                                                  \
+        int8_t tmp;                                     \
+        typeof(src1) dest;                              \
+        tmp = (int8_t)src2;                             \
+        if (tmp >= (ssize_t)sizeof(src1) * 8) {         \
+            if (src1) {                                 \
+                *satp = true;                           \
+                dest = ~0;                              \
+            } else {                                    \
+                dest = 0;                               \
+            }                                           \
+        } else if (tmp < -(ssize_t)sizeof(src1) * 8) {  \
+            dest = 0;                                   \
+        } else if (tmp == -(ssize_t)sizeof(src1) * 8) { \
+            dest = src1 >> (sizeof(src1) * 8 - 1);      \
+        } else if (tmp < 0) {                           \
+            dest = (src1 + (1 << (-1 - tmp))) >> -tmp;  \
+        } else {                                        \
+            dest = src1 << tmp;                         \
+            if ((dest >> tmp) != src1) {                \
+                *satp = true;                           \
+                dest = ~0;                              \
+            }                                           \
+        }                                               \
+        dest;                                           \
+    })
+
+/*
+ * The addition of the rounding constant may overflow, so we use an
+ * intermediate 64 bit accumulator for the 32-bit version.
+ */
+#define DO_UQRSHL32_OP(src1, src2, satp)                                \
+    ({                                                                  \
+        uint32_t dest;                                                  \
+        uint32_t val = src1;                                            \
+        int8_t shift = (int8_t)src2;                                    \
+        if (shift >= 32) {                                              \
+            if (val) {                                                  \
+                *satp = true;                                           \
+                dest = ~0;                                              \
+            } else {                                                    \
+                dest = 0;                                               \
+            }                                                           \
+        } else if (shift < -32) {                                       \
+            dest = 0;                                                   \
+        } else if (shift == -32) {                                      \
+            dest = val >> 31;                                           \
+        } else if (shift < 0) {                                         \
+            uint64_t big_dest = ((uint64_t)val + (1 << (-1 - shift)));  \
+            dest = big_dest >> -shift;                                  \
+        } else {                                                        \
+            dest = val << shift;                                        \
+            if ((dest >> shift) != val) {                               \
+                *satp = true;                                           \
+                dest = ~0;                                              \
+            }                                                           \
+        }                                                               \
+        dest;                                                           \
+    })
+
+#define DO_SQRSHL_OP(src1, src2, satp)                                  \
+    ({                                                                  \
+        int8_t tmp;                                                     \
+        typeof(src1) dest;                                              \
+        tmp = (int8_t)src2;                                             \
+        if (tmp >= (ssize_t)sizeof(src1) * 8) {                         \
+            if (src1) {                                                 \
+                *satp = true;                                           \
+                dest = (typeof(dest))(1 << (sizeof(src1) * 8 - 1));     \
+                if (src1 > 0) {                                         \
+                    dest--;                                             \
+                }                                                       \
+            } else {                                                    \
+                dest = 0;                                               \
+            }                                                           \
+        } else if (tmp <= -(ssize_t)sizeof(src1) * 8) {                 \
+            dest = 0;                                                   \
+        } else if (tmp < 0) {                                           \
+            dest = (src1 + (1 << (-1 - tmp))) >> -tmp;                  \
+        } else {                                                        \
+            dest = src1 << tmp;                                         \
+            if ((dest >> tmp) != src1) {                                \
+                *satp = true;                                           \
+                dest = (uint32_t)(1 << (sizeof(src1) * 8 - 1));         \
+                if (src1 > 0) {                                         \
+                    dest--;                                             \
+                }                                                       \
+            }                                                           \
+        }                                                               \
+        dest;                                                           \
+    })
+
+#define DO_SQRSHL32_OP(src1, src2, satp)                                \
+    ({                                                                  \
+        int32_t dest;                                                   \
+        int32_t val = (int32_t)src1;                                    \
+        int8_t shift = (int8_t)src2;                                    \
+        if (shift >= 32) {                                              \
+            if (val) {                                                  \
+                *satp = true;                                           \
+                dest = (val >> 31) ^ ~(1U << 31);                       \
+            } else {                                                    \
+                dest = 0;                                               \
+            }                                                           \
+        } else if (shift <= -32) {                                      \
+            dest = 0;                                                   \
+        } else if (shift < 0) {                                         \
+            int64_t big_dest = ((int64_t)val + (1 << (-1 - shift)));    \
+            dest = big_dest >> -shift;                                  \
+        } else {                                                        \
+            dest = val << shift;                                        \
+            if ((dest >> shift) != val) {                               \
+                *satp = true;                                           \
+                dest = (val >> 31) ^ ~(1U << 31);                       \
+            }                                                           \
+        }                                                               \
+        dest;                                                           \
+    })
+
+DO_2OP_SAT(vqrshlub, 1, uint8_t, H1, DO_UQRSHL_OP)
+DO_2OP_SAT(vqrshluh, 2, uint16_t, H2, DO_UQRSHL_OP)
+DO_2OP_SAT(vqrshluw, 4, uint32_t, H4, DO_UQRSHL32_OP)
+DO_2OP_SAT(vqrshlsb, 1, int8_t, H1, DO_SQRSHL_OP)
+DO_2OP_SAT(vqrshlsh, 2, int16_t, H2, DO_SQRSHL_OP)
+DO_2OP_SAT(vqrshlsw, 4, int32_t, H4, DO_SQRSHL32_OP)
+
 #define DO_2OP_SCALAR(OP, ESIZE, TYPE, H, FN)                           \
     void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
                                 uint32_t rm)                            \
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 998f47fb94e..bea561726ea 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -423,6 +423,8 @@ DO_2OP(VQSUB_S, vqsubs)
 DO_2OP(VQSUB_U, vqsubu)
 DO_2OP(VQSHL_S, vqshls)
 DO_2OP(VQSHL_U, vqshlu)
+DO_2OP(VQRSHL_S, vqrshls)
+DO_2OP(VQRSHL_U, vqrshlu)
 
 static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                           MVEGenTwoOpScalarFn fn)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 45/55] target/arm: Implement MVE VSHL insn
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (43 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 44/55] target/arm: Implement MVE VQRSHL Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-09 19:40   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 46/55] target/arm: Implement MVE VRSHL Peter Maydell
                   ` (10 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VSHL insn (vector form).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      |  3 +++
 target/arm/mve_helper.c    | 30 ++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 43 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index a2f9916b24e..6ef01d367b4 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -171,6 +171,14 @@ DEF_HELPER_FLAGS_4(mve_vqsubub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqsubuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqsubuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 
+DEF_HELPER_FLAGS_4(mve_vshlsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vshlsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vshlsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
 DEF_HELPER_FLAGS_4(mve_vqshlsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqshlsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqshlsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index e78eab6d659..ebf156b46b5 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -130,6 +130,9 @@ VQADD_U          111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 1 ... 0 @2op
 VQSUB_S          111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
 VQSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
 
+VSHL_S           111 0 1111 0 . .. ... 0 ... 0 0100 . 1 . 0 ... 0 @2op_rev
+VSHL_U           111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 0 ... 0 @2op_rev
+
 VQSHL_S          111 0 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
 VQSHL_U          111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
 
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index b7f9af4067b..c95d5a0fd8e 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -508,6 +508,36 @@ DO_2OP_U(vhaddu, do_vhadd_u)
 DO_2OP_S(vhsubs, do_vhsub_s)
 DO_2OP_U(vhsubu, do_vhsub_u)
 
+static inline uint32_t do_ushl(uint32_t n, int8_t shift, int esize)
+{
+    if (shift >= esize || shift <= -esize) {
+        return 0;
+    } else if (shift < 0) {
+        return n >> -shift;
+    } else {
+        return n << shift;
+    }
+}
+
+static inline int32_t do_sshl(int32_t n, int8_t shift, int esize)
+{
+    if (shift >= esize) {
+        return 0;
+    } else if (shift <= -esize) {
+        return n >> (esize - 1);
+    } else if (shift < 0) {
+        return n >> -shift;
+    } else {
+        return n << shift;
+    }
+}
+
+#define DO_VSHLS(N, M) do_sshl(N, M, sizeof(N) * 8)
+#define DO_VSHLU(N, M) do_ushl(N, M, sizeof(N) * 8)
+
+DO_2OP_S(vshls, DO_VSHLS)
+DO_2OP_U(vshlu, DO_VSHLU)
+
 static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
 {
     if (val > max) {
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index bea561726ea..6eaa99bc0f5 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -421,6 +421,8 @@ DO_2OP(VQADD_S, vqadds)
 DO_2OP(VQADD_U, vqaddu)
 DO_2OP(VQSUB_S, vqsubs)
 DO_2OP(VQSUB_U, vqsubu)
+DO_2OP(VSHL_S, vshls)
+DO_2OP(VSHL_U, vshlu)
 DO_2OP(VQSHL_S, vqshls)
 DO_2OP(VQSHL_U, vqshlu)
 DO_2OP(VQRSHL_S, vqrshls)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 46/55] target/arm: Implement MVE VRSHL
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (44 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 45/55] target/arm: Implement MVE VSHL insn Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-09 19:43   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 47/55] target/arm: Implement MVE VQDMLADH and VQRDMLADH Peter Maydell
                   ` (9 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VRSHL insn (vector form).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      |  3 +++
 target/arm/mve_helper.c    | 36 ++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 49 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 6ef01d367b4..6939cf84c57 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -179,6 +179,14 @@ DEF_HELPER_FLAGS_4(mve_vshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 
+DEF_HELPER_FLAGS_4(mve_vrshlsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vrshlsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vrshlsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vrshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vrshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vrshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
 DEF_HELPER_FLAGS_4(mve_vqshlsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqshlsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqshlsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index ebf156b46b5..c30fb2c1536 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -133,6 +133,9 @@ VQSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 1 ... 0 @2op
 VSHL_S           111 0 1111 0 . .. ... 0 ... 0 0100 . 1 . 0 ... 0 @2op_rev
 VSHL_U           111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 0 ... 0 @2op_rev
 
+VRSHL_S          111 0 1111 0 . .. ... 0 ... 0 0101 . 1 . 0 ... 0 @2op_rev
+VRSHL_U          111 1 1111 0 . .. ... 0 ... 0 0101 . 1 . 0 ... 0 @2op_rev
+
 VQSHL_S          111 0 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
 VQSHL_U          111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
 
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index c95d5a0fd8e..9c23e6b9b28 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -538,6 +538,42 @@ static inline int32_t do_sshl(int32_t n, int8_t shift, int esize)
 DO_2OP_S(vshls, DO_VSHLS)
 DO_2OP_U(vshlu, DO_VSHLU)
 
+static inline uint32_t do_urshl(uint32_t n, int8_t shift, int esize)
+{
+    if (shift >= esize || shift < -esize) {
+        return 0;
+    } else if (shift == -esize) {
+        return n >> (-esize - 1);
+    } else if (shift < 0) {
+        /* Use 64 bit intermediate: adding the rounding const might overflow */
+        uint64_t r = (uint64_t)n + (1 << (-1 - shift));
+        return r >> -shift;
+    } else {
+        return n << shift;
+    }
+}
+
+static inline int32_t do_srshl(int32_t n, int8_t shift, int esize)
+{
+    if (shift >= esize || shift <= -esize) {
+        return 0;
+    } else if (shift == -esize) {
+        return n >> (-esize - 1);
+    } else if (shift < 0) {
+        /* Use 64 bit intermediate: adding the rounding const might overflow */
+        int64_t r = (int64_t)n + (1 << (-1 - shift));
+        return r >> -shift;
+    } else {
+        return n << shift;
+    }
+}
+
+#define DO_VRSHLS(N, M) do_srshl(N, M, sizeof(N) * 8)
+#define DO_VRSHLU(N, M) do_urshl(N, M, sizeof(N) * 8)
+
+DO_2OP_S(vrshls, DO_VRSHLS)
+DO_2OP_U(vrshlu, DO_VRSHLU)
+
 static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
 {
     if (val > max) {
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 6eaa99bc0f5..6bc32379172 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -423,6 +423,8 @@ DO_2OP(VQSUB_S, vqsubs)
 DO_2OP(VQSUB_U, vqsubu)
 DO_2OP(VSHL_S, vshls)
 DO_2OP(VSHL_U, vshlu)
+DO_2OP(VRSHL_S, vrshls)
+DO_2OP(VRSHL_U, vrshlu)
 DO_2OP(VQSHL_S, vqshls)
 DO_2OP(VQSHL_U, vqshlu)
 DO_2OP(VQRSHL_S, vqrshls)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 47/55] target/arm: Implement MVE VQDMLADH and VQRDMLADH
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (45 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 46/55] target/arm: Implement MVE VRSHL Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-09 20:05   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 48/55] target/arm: Implement MVE VQDMLSDH and VQRDMLSDH Peter Maydell
                   ` (8 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VQDMLADH and VQRDMLADH insns.  These multiply
elements, and then add pairs of products, double, possibly round,
saturate and return the high half of the result.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    | 16 +++++++
 target/arm/mve.decode      |  5 +++
 target/arm/mve_helper.c    | 87 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  4 ++
 4 files changed, 112 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 6939cf84c57..c62066d94aa 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -203,6 +203,22 @@ DEF_HELPER_FLAGS_4(mve_vqrshlub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqrshluh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqrshluw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 
+DEF_HELPER_FLAGS_4(mve_vqdmladhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmladhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmladhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vqdmladhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmladhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmladhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vqrdmladhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrdmladhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrdmladhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vqrdmladhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrdmladhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrdmladhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
 DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index c30fb2c1536..d267c8838eb 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -142,6 +142,11 @@ VQSHL_U          111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
 VQRSHL_S         111 0 1111 0 . .. ... 0 ... 0 0101 . 1 . 1 ... 0 @2op_rev
 VQRSHL_U         111 1 1111 0 . .. ... 0 ... 0 0101 . 1 . 1 ... 0 @2op_rev
 
+VQDMLADH         1110 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 0 @2op
+VQDMLADHX        1110 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 0 @2op
+VQRDMLADH        1110 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 1 @2op
+VQRDMLADHX       1110 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
+
 # Vector miscellaneous
 
 VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 9c23e6b9b28..03701d32dcb 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -830,6 +830,93 @@ DO_2OP_SAT(vqrshlsb, 1, int8_t, H1, DO_SQRSHL_OP)
 DO_2OP_SAT(vqrshlsh, 2, int16_t, H2, DO_SQRSHL_OP)
 DO_2OP_SAT(vqrshlsw, 4, int32_t, H4, DO_SQRSHL32_OP)
 
+/*
+ * Multiply add dual returning high half
+ * The 'FN' here takes four inputs A, B, C, D, a 0/1 indicator of
+ * whether to add the rounding constant, and the pointer to the
+ * saturation flag, and should do "(A * B + C * D) * 2 + rounding constant",
+ * saturate to twice the input size and return the high half; or
+ * (A * B - C * D) etc for VQDMLSDH.
+ */
+#define DO_VQDMLADH_OP(OP, ESIZE, TYPE, H, XCHG, ROUND, FN)             \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
+                                void *vm)                               \
+    {                                                                   \
+        TYPE *d = vd, *n = vn, *m = vm;                                 \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+            bool sat = false;                                           \
+            if ((e & 1) == XCHG) {                                      \
+                TYPE r = FN(n[H(e)], m[H(e - XCHG)],                    \
+                            n[H(e + (1 - 2 * XCHG))], m[H(e + (1 - XCHG))], \
+                            ROUND, &sat);                               \
+                uint64_t bytemask = mask_to_bytemask##ESIZE(mask);      \
+                d[H(e)] &= ~bytemask;                                   \
+                d[H(e)] |= (r & bytemask);                              \
+                if (sat && (mask & 1)) {                                \
+                    env->vfp.qc[0] = 1;                                 \
+                }                                                       \
+            }                                                           \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+static int8_t do_vqdmladh_b(int8_t a, int8_t b, int8_t c, int8_t d,
+                            int round, bool *sat)
+{
+    int64_t r = ((int64_t)a * b + (int64_t)c * d) * 2 + (round << 7);
+    return do_sat_bhw(r, INT16_MIN, INT16_MAX, sat) >> 8;
+}
+
+static int16_t do_vqdmladh_h(int16_t a, int16_t b, int16_t c, int16_t d,
+                             int round, bool *sat)
+{
+    int64_t r = ((int64_t)a * b + (int64_t)c * d) * 2 + (round << 15);
+    return do_sat_bhw(r, INT32_MIN, INT32_MAX, sat) >> 16;
+}
+
+static int32_t do_vqdmladh_w(int32_t a, int32_t b, int32_t c, int32_t d,
+                             int round, bool *sat)
+{
+    int64_t m1 = (int64_t)a * b;
+    int64_t m2 = (int64_t)c * d;
+    int64_t r;
+    /*
+     * Architecturally we should do the entire add, double, round
+     * and then check for saturation. We do three saturating adds,
+     * but we need to be careful about the order. If the first
+     * m1 + m2 saturates then it's impossible for the *2+rc to
+     * bring it back into the non-saturated range. However, if
+     * m1 + m2 is negative then it's possible that doing the doubling
+     * would take the intermediate result below INT64_MAX and the
+     * addition of the rounding constant then brings it back in range.
+     * So we add half the rounding constant before doubling rather
+     * than adding the rounding constant after the doubling.
+     */
+    if (sadd64_overflow(m1, m2, &r) ||
+        sadd64_overflow(r, (round << 30), &r) ||
+        sadd64_overflow(r, r, &r)) {
+        *sat = true;
+        return r < 0 ? INT32_MAX : INT32_MIN;
+    }
+    return r >> 32;
+}
+
+DO_VQDMLADH_OP(vqdmladhb, 1, int8_t, H1, 0, 0, do_vqdmladh_b)
+DO_VQDMLADH_OP(vqdmladhh, 2, int16_t, H2, 0, 0, do_vqdmladh_h)
+DO_VQDMLADH_OP(vqdmladhw, 4, int32_t, H4, 0, 0, do_vqdmladh_w)
+DO_VQDMLADH_OP(vqdmladhxb, 1, int8_t, H1, 1, 0, do_vqdmladh_b)
+DO_VQDMLADH_OP(vqdmladhxh, 2, int16_t, H2, 1, 0, do_vqdmladh_h)
+DO_VQDMLADH_OP(vqdmladhxw, 4, int32_t, H4, 1, 0, do_vqdmladh_w)
+
+DO_VQDMLADH_OP(vqrdmladhb, 1, int8_t, H1, 0, 1, do_vqdmladh_b)
+DO_VQDMLADH_OP(vqrdmladhh, 2, int16_t, H2, 0, 1, do_vqdmladh_h)
+DO_VQDMLADH_OP(vqrdmladhw, 4, int32_t, H4, 0, 1, do_vqdmladh_w)
+DO_VQDMLADH_OP(vqrdmladhxb, 1, int8_t, H1, 1, 1, do_vqdmladh_b)
+DO_VQDMLADH_OP(vqrdmladhxh, 2, int16_t, H2, 1, 1, do_vqdmladh_h)
+DO_VQDMLADH_OP(vqrdmladhxw, 4, int32_t, H4, 1, 1, do_vqdmladh_w)
+
 #define DO_2OP_SCALAR(OP, ESIZE, TYPE, H, FN)                           \
     void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
                                 uint32_t rm)                            \
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 6bc32379172..7c25802bf53 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -429,6 +429,10 @@ DO_2OP(VQSHL_S, vqshls)
 DO_2OP(VQSHL_U, vqshlu)
 DO_2OP(VQRSHL_S, vqrshls)
 DO_2OP(VQRSHL_U, vqrshlu)
+DO_2OP(VQDMLADH, vqdmladh)
+DO_2OP(VQDMLADHX, vqdmladhx)
+DO_2OP(VQRDMLADH, vqrdmladh)
+DO_2OP(VQRDMLADHX, vqrdmladhx)
 
 static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                           MVEGenTwoOpScalarFn fn)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 48/55] target/arm: Implement MVE VQDMLSDH and VQRDMLSDH
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (46 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 47/55] target/arm: Implement MVE VQDMLADH and VQRDMLADH Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-09 20:08   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 49/55] target/arm: Implement MVE VQDMULL (vector) Peter Maydell
                   ` (7 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VQDMLSDH and VQRDMLSDH insns, which are
like VQDMLADH and VQRDMLADH except that products are subtracted
rather than added.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    | 16 ++++++++++++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 44 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  4 ++++
 4 files changed, 69 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index c62066d94aa..e25299b229e 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -219,6 +219,22 @@ DEF_HELPER_FLAGS_4(mve_vqrdmladhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqrdmladhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqrdmladhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 
+DEF_HELPER_FLAGS_4(mve_vqdmlsdhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmlsdhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmlsdhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vqdmlsdhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmlsdhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmlsdhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vqrdmlsdhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrdmlsdhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrdmlsdhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vqrdmlsdhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrdmlsdhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqrdmlsdhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
 DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index d267c8838eb..fa4fb1b2038 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -147,6 +147,11 @@ VQDMLADHX        1110 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 0 @2op
 VQRDMLADH        1110 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 1 @2op
 VQRDMLADHX       1110 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
 
+VQDMLSDH         1111 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 0 @2op
+VQDMLSDHX        1111 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 0 @2op
+VQRDMLSDH        1111 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 1 @2op
+VQRDMLSDHX       1111 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
+
 # Vector miscellaneous
 
 VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 03701d32dcb..ed0da8097dc 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -903,6 +903,36 @@ static int32_t do_vqdmladh_w(int32_t a, int32_t b, int32_t c, int32_t d,
     return r >> 32;
 }
 
+static int8_t do_vqdmlsdh_b(int8_t a, int8_t b, int8_t c, int8_t d,
+                            int round, bool *sat)
+{
+    int64_t r = ((int64_t)a * b - (int64_t)c * d) * 2 + (round << 7);
+    return do_sat_bhw(r, INT16_MIN, INT16_MAX, sat) >> 8;
+}
+
+static int16_t do_vqdmlsdh_h(int16_t a, int16_t b, int16_t c, int16_t d,
+                             int round, bool *sat)
+{
+    int64_t r = ((int64_t)a * b - (int64_t)c * d) * 2 + (round << 15);
+    return do_sat_bhw(r, INT32_MIN, INT32_MAX, sat) >> 16;
+}
+
+static int32_t do_vqdmlsdh_w(int32_t a, int32_t b, int32_t c, int32_t d,
+                             int round, bool *sat)
+{
+    int64_t m1 = (int64_t)a * b;
+    int64_t m2 = (int64_t)c * d;
+    int64_t r;
+    /* The same ordering issue as in do_vqdmladh_w applies here too */
+    if (ssub64_overflow(m1, m2, &r) ||
+        sadd64_overflow(r, (round << 30), &r) ||
+        sadd64_overflow(r, r, &r)) {
+        *sat = true;
+        return r < 0 ? INT32_MAX : INT32_MIN;
+    }
+    return r >> 32;
+}
+
 DO_VQDMLADH_OP(vqdmladhb, 1, int8_t, H1, 0, 0, do_vqdmladh_b)
 DO_VQDMLADH_OP(vqdmladhh, 2, int16_t, H2, 0, 0, do_vqdmladh_h)
 DO_VQDMLADH_OP(vqdmladhw, 4, int32_t, H4, 0, 0, do_vqdmladh_w)
@@ -917,6 +947,20 @@ DO_VQDMLADH_OP(vqrdmladhxb, 1, int8_t, H1, 1, 1, do_vqdmladh_b)
 DO_VQDMLADH_OP(vqrdmladhxh, 2, int16_t, H2, 1, 1, do_vqdmladh_h)
 DO_VQDMLADH_OP(vqrdmladhxw, 4, int32_t, H4, 1, 1, do_vqdmladh_w)
 
+DO_VQDMLADH_OP(vqdmlsdhb, 1, int8_t, H1, 0, 0, do_vqdmlsdh_b)
+DO_VQDMLADH_OP(vqdmlsdhh, 2, int16_t, H2, 0, 0, do_vqdmlsdh_h)
+DO_VQDMLADH_OP(vqdmlsdhw, 4, int32_t, H4, 0, 0, do_vqdmlsdh_w)
+DO_VQDMLADH_OP(vqdmlsdhxb, 1, int8_t, H1, 1, 0, do_vqdmlsdh_b)
+DO_VQDMLADH_OP(vqdmlsdhxh, 2, int16_t, H2, 1, 0, do_vqdmlsdh_h)
+DO_VQDMLADH_OP(vqdmlsdhxw, 4, int32_t, H4, 1, 0, do_vqdmlsdh_w)
+
+DO_VQDMLADH_OP(vqrdmlsdhb, 1, int8_t, H1, 0, 1, do_vqdmlsdh_b)
+DO_VQDMLADH_OP(vqrdmlsdhh, 2, int16_t, H2, 0, 1, do_vqdmlsdh_h)
+DO_VQDMLADH_OP(vqrdmlsdhw, 4, int32_t, H4, 0, 1, do_vqdmlsdh_w)
+DO_VQDMLADH_OP(vqrdmlsdhxb, 1, int8_t, H1, 1, 1, do_vqdmlsdh_b)
+DO_VQDMLADH_OP(vqrdmlsdhxh, 2, int16_t, H2, 1, 1, do_vqdmlsdh_h)
+DO_VQDMLADH_OP(vqrdmlsdhxw, 4, int32_t, H4, 1, 1, do_vqdmlsdh_w)
+
 #define DO_2OP_SCALAR(OP, ESIZE, TYPE, H, FN)                           \
     void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
                                 uint32_t rm)                            \
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 7c25802bf53..0048aec1e9e 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -433,6 +433,10 @@ DO_2OP(VQDMLADH, vqdmladh)
 DO_2OP(VQDMLADHX, vqdmladhx)
 DO_2OP(VQRDMLADH, vqrdmladh)
 DO_2OP(VQRDMLADHX, vqrdmladhx)
+DO_2OP(VQDMLSDH, vqdmlsdh)
+DO_2OP(VQDMLSDHX, vqdmlsdhx)
+DO_2OP(VQRDMLSDH, vqrdmlsdh)
+DO_2OP(VQRDMLSDHX, vqrdmlsdhx)
 
 static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                           MVEGenTwoOpScalarFn fn)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 49/55] target/arm: Implement MVE VQDMULL (vector)
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (47 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 48/55] target/arm: Implement MVE VQDMLSDH and VQRDMLSDH Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-09 20:20   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 50/55] target/arm: Implement MVE VRHADD Peter Maydell
                   ` (6 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the vector form of the MVE VQDMULL insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  5 +++++
 target/arm/mve.decode      |  5 +++++
 target/arm/mve_helper.c    | 30 ++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 30 ++++++++++++++++++++++++++++++
 4 files changed, 70 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index e25299b229e..ffddbd72377 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -235,6 +235,11 @@ DEF_HELPER_FLAGS_4(mve_vqrdmlsdhxb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqrdmlsdhxh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqrdmlsdhxw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 
+DEF_HELPER_FLAGS_4(mve_vqdmullbh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmullbw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmullth, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vqdmulltw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
 DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index fa4fb1b2038..3a2a7e75a3a 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -39,6 +39,8 @@
 @1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
 @2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
 @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
+@2op_sz28 .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn \
+     size=%size_28
 
 # The _rev suffix indicates that Vn and Vm are reversed. This is
 # the case for shifts. In the Arm ARM these insns are documented
@@ -152,6 +154,9 @@ VQDMLSDHX        1111 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 0 @2op
 VQRDMLSDH        1111 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 1 @2op
 VQRDMLSDHX       1111 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
 
+VQDMULLB         111 . 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 1 @2op_sz28
+VQDMULLT         111 . 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 1 @2op_sz28
+
 # Vector miscellaneous
 
 VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index ed0da8097dc..68a2339feae 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -1103,6 +1103,36 @@ DO_2OP_SAT_SCALAR_L(vqdmullt_scalarh, 1, int16_t, H2, 4, int32_t, H4, \
 DO_2OP_SAT_SCALAR_L(vqdmullt_scalarw, 1, int32_t, H4, 8, int64_t, , \
                     do_qdmullw, SATMASK32)
 
+/*
+ * Long saturating ops
+ */
+#define DO_2OP_SAT_L(OP, TOP, TYPE, H, LESIZE, LTYPE, LH, FN, SATMASK)  \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
+                                void *vm)                               \
+    {                                                                   \
+        LTYPE *d = vd;                                                  \
+        TYPE *n = vn, *m = vm;                                          \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned le;                                                    \
+        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
+            bool sat = false;                                           \
+            LTYPE op1 = n[H(le * 2 + TOP)], op2 = m[H(le * 2 + TOP)];   \
+            LTYPE r = FN(op1, op2, &sat);                               \
+            uint64_t bytemask = mask_to_bytemask##LESIZE(mask);         \
+            d[LH(le)] &= ~bytemask;                                     \
+            d[LH(le)] |= (r & bytemask);                                \
+            if (sat && (mask & SATMASK)) {                              \
+                env->vfp.qc[0] = 1;                                     \
+            }                                                           \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+DO_2OP_SAT_L(vqdmullbh, 0, int16_t, H2, 4, int32_t, H4, do_qdmullh, SATMASK16B)
+DO_2OP_SAT_L(vqdmullbw, 0, int32_t, H4, 8, int64_t, , do_qdmullw, SATMASK32)
+DO_2OP_SAT_L(vqdmullth, 1, int16_t, H2, 4, int32_t, H4, do_qdmullh, SATMASK16T)
+DO_2OP_SAT_L(vqdmulltw, 1, int32_t, H4, 8, int64_t, , do_qdmullw, SATMASK32)
+
 static inline uint32_t do_vbrsrb(uint32_t n, uint32_t m)
 {
     m &= 0xff;
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 0048aec1e9e..b227b72e5b6 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -438,6 +438,36 @@ DO_2OP(VQDMLSDHX, vqdmlsdhx)
 DO_2OP(VQRDMLSDH, vqrdmlsdh)
 DO_2OP(VQRDMLSDHX, vqrdmlsdhx)
 
+static bool trans_VQDMULLB(DisasContext *s, arg_2op *a)
+{
+    MVEGenTwoOpFn *fns[] = {
+        NULL,
+        gen_helper_mve_vqdmullbh,
+        gen_helper_mve_vqdmullbw,
+        NULL,
+    };
+    if (a->size == MO_32 && (a->qd == a->qm || a->qd == a->qn)) {
+        /* UNPREDICTABLE; we choose to undef */
+        return false;
+    }
+    return do_2op(s, a, fns[a->size]);
+}
+
+static bool trans_VQDMULLT(DisasContext *s, arg_2op *a)
+{
+    MVEGenTwoOpFn *fns[] = {
+        NULL,
+        gen_helper_mve_vqdmullth,
+        gen_helper_mve_vqdmulltw,
+        NULL,
+    };
+    if (a->size == MO_32 && (a->qd == a->qm || a->qd == a->qn)) {
+        /* UNPREDICTABLE; we choose to undef */
+        return false;
+    }
+    return do_2op(s, a, fns[a->size]);
+}
+
 static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                           MVEGenTwoOpScalarFn fn)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 50/55] target/arm: Implement MVE VRHADD
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (48 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 49/55] target/arm: Implement MVE VQDMULL (vector) Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-09 20:24   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 51/55] target/arm: Implement MVE VADC, VSBC Peter Maydell
                   ` (5 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VRHADD insn, which performs a rounded halving
addition.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    | 8 ++++++++
 target/arm/mve.decode      | 3 +++
 target/arm/mve_helper.c    | 6 ++++++
 target/arm/translate-mve.c | 2 ++
 4 files changed, 19 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index ffddbd72377..cd2cc6252f8 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -240,6 +240,14 @@ DEF_HELPER_FLAGS_4(mve_vqdmullbw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqdmullth, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqdmulltw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 
+DEF_HELPER_FLAGS_4(mve_vrhaddsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vrhaddsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vrhaddsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vrhaddub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vrhadduh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vrhadduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
 DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 3a2a7e75a3a..6b969902df0 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -157,6 +157,9 @@ VQRDMLSDHX       1111 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
 VQDMULLB         111 . 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 1 @2op_sz28
 VQDMULLT         111 . 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 1 @2op_sz28
 
+VRHADD_S         111 0 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
+VRHADD_U         111 1 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
+
 # Vector miscellaneous
 
 VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 68a2339feae..c9434479604 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -574,6 +574,12 @@ static inline int32_t do_srshl(int32_t n, int8_t shift, int esize)
 DO_2OP_S(vrshls, DO_VRSHLS)
 DO_2OP_U(vrshlu, DO_VRSHLU)
 
+#define DO_RHADD_S(N, M) (((int64_t)(N) + (M) + 1) >> 1)
+#define DO_RHADD_U(N, M) (((uint64_t)(N) + (M) + 1) >> 1)
+
+DO_2OP_S(vrhadds, DO_RHADD_S)
+DO_2OP_U(vrhaddu, DO_RHADD_U)
+
 static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
 {
     if (val > max) {
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index b227b72e5b6..9a88583385f 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -437,6 +437,8 @@ DO_2OP(VQDMLSDH, vqdmlsdh)
 DO_2OP(VQDMLSDHX, vqdmlsdhx)
 DO_2OP(VQRDMLSDH, vqrdmlsdh)
 DO_2OP(VQRDMLSDHX, vqrdmlsdhx)
+DO_2OP(VRHADD_S, vrhadds)
+DO_2OP(VRHADD_U, vrhaddu)
 
 static bool trans_VQDMULLB(DisasContext *s, arg_2op *a)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 51/55] target/arm: Implement MVE VADC, VSBC
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (49 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 50/55] target/arm: Implement MVE VRHADD Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-09 21:06   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 52/55] target/arm: Implement MVE VCADD Peter Maydell
                   ` (4 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VADC and VSBC insns.  These perform an
add-with-carry or subtract-with-carry of the 32-bit elements in each
lane of the input vectors, where the carry-out of each add is the
carry-in of the next.  The initial carry input is either 1 or is from
FPSCR.C; the carry out at the end is written back to FPSCR.C.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  3 ++
 target/arm/mve.decode      |  6 ++++
 target/arm/mve_helper.c    | 30 +++++++++++++++++
 target/arm/translate-mve.c | 69 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 108 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index cd2cc6252f8..686e5d9a39b 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -248,6 +248,9 @@ DEF_HELPER_FLAGS_4(mve_vrhaddub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vrhadduh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vrhadduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 
+DEF_HELPER_FLAGS_5(mve_vadc, TCG_CALL_NO_WG, i32, env, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(mve_vsbc, TCG_CALL_NO_WG, i32, env, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 6b969902df0..6a4aae7a1fc 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -30,6 +30,7 @@
 &1op qd qm size
 &2op qd qm qn size
 &2scalar qd qn rm size
+&vadc qd qm qn i
 
 @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
 # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -42,6 +43,8 @@
 @2op_sz28 .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn \
      size=%size_28
 
+@vadc .... .... .... .... ... i:1 .... .... .... &vadc qd=%qd qm=%qm qn=%qn
+
 # The _rev suffix indicates that Vn and Vm are reversed. This is
 # the case for shifts. In the Arm ARM these insns are documented
 # with the Vm and Vn fields in their usual places, but in the
@@ -160,6 +163,9 @@ VQDMULLT         111 . 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 1 @2op_sz28
 VRHADD_S         111 0 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
 VRHADD_U         111 1 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
 
+VADC             1110 1110 0 . 11 ... 0 ... . 1111 . 0 . 0 ... 0 @vadc
+VSBC             1111 1110 0 . 11 ... 0 ... . 1111 . 0 . 0 ... 0 @vadc
+
 # Vector miscellaneous
 
 VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index c9434479604..e07f12c8389 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -580,6 +580,36 @@ DO_2OP_U(vrshlu, DO_VRSHLU)
 DO_2OP_S(vrhadds, DO_RHADD_S)
 DO_2OP_U(vrhaddu, DO_RHADD_U)
 
+#define DO_VADC(OP, INV)                                                \
+    uint32_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,         \
+                                    void *vn, void *vm, uint32_t nzcv)  \
+    {                                                                   \
+        uint32_t *d = vd, *n = vn, *m = vm;                             \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        int carry = (nzcv & FPCR_C) ? 1 : 0;                            \
+        /* If we do no additions at all the flags are preserved */      \
+        bool updates_flags = (mask & 0x1111) != 0;                      \
+        for (e = 0; e < 16 / 4; e++, mask >>= 4) {                      \
+            uint64_t r = (uint64_t)n[H4(e)] + INV(m[H4(e)]) + carry;    \
+            if (mask & 1) {                                             \
+                carry = r >> 32;                                        \
+            }                                                           \
+            uint64_t bytemask = mask_to_bytemask4(mask);                \
+            d[H4(e)] &= ~bytemask;                                      \
+            d[H4(e)] |= (r & bytemask);                                 \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+        if (updates_flags) {                                            \
+            nzcv = carry ? FPCR_C : 0;                                  \
+        }                                                               \
+        return nzcv;                                                    \
+    }
+
+/* VSBC differs only in inverting op2 before the additiona */
+DO_VADC(vadc, )
+DO_VADC(vsbc, DO_NOT)
+
 static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
 {
     if (val > max) {
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 9a88583385f..2ed499a6de2 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -33,6 +33,7 @@ typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
+typedef void MVEGenADCFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
 
 /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
 static inline long mve_qreg_offset(unsigned reg)
@@ -737,3 +738,71 @@ static bool trans_VPST(DisasContext *s, arg_VPST *a)
     }
     return true;
 }
+
+static bool do_vadc(DisasContext *s, arg_vadc *a, MVEGenADCFn fn,
+                    uint32_t fixed_carry)
+{
+    /*
+     * VADC and VSBC: these perform an add-with-carry or subtract-with-carry
+     * of the 32-bit elements in each lane of the input vectors, where the
+     * carry-out of each add is the carry-in of the next.  The initial carry
+     * input is either fixed (for the I variant: 0 for VADCI, 1 for VSBCI,
+     * passed in as fixed_carry) or is from FPSCR.C; the carry out at the
+     * end is written back to FPSCR.C.
+     */
+
+    TCGv_ptr qd, qn, qm;
+    TCGv_i32 nzcv, fpscr;
+
+    if (!dc_isar_feature(aa32_mve, s)) {
+        return false;
+    }
+    if (a->qd > 7 || a->qn > 7 || a->qm > 7 || !fn) {
+        return false;
+    }
+    if (!mve_eci_check(s)) {
+        return true;
+    }
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    /*
+     * This insn is subject to beat-wise execution.  Partial execution
+     * of an I=1 (initial carry input fixed) insn which does not
+     * execute the first beat must start with the current FPSCR.NZCV
+     * value, not the fixed constant input.
+     */
+    if (a->i && !mve_skip_first_beat(s)) {
+        /* Carry input is 0 (VADCI) or 1 (VSBCI), NZV zeroed */
+        nzcv = tcg_const_i32(fixed_carry);
+    } else {
+        /* Carry input from existing NZCV flag values */
+        nzcv = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
+        tcg_gen_andi_i32(nzcv, nzcv, FPCR_NZCV_MASK);
+    }
+    qd = mve_qreg_ptr(a->qd);
+    qn = mve_qreg_ptr(a->qn);
+    qm = mve_qreg_ptr(a->qm);
+    fn(nzcv, cpu_env, qd, qn, qm, nzcv);
+    fpscr = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
+    tcg_gen_andi_i32(fpscr, fpscr, ~FPCR_NZCV_MASK);
+    tcg_gen_or_i32(fpscr, fpscr, nzcv);
+    store_cpu_field(fpscr, vfp.xregs[ARM_VFP_FPSCR]);
+    tcg_temp_free_i32(nzcv);
+    tcg_temp_free_ptr(qd);
+    tcg_temp_free_ptr(qn);
+    tcg_temp_free_ptr(qm);
+    mve_update_eci(s);
+    return true;
+}
+
+static bool trans_VADC(DisasContext *s, arg_vadc *a)
+{
+    return do_vadc(s, a, gen_helper_mve_vadc, 0);
+}
+
+static bool trans_VSBC(DisasContext *s, arg_vadc *a)
+{
+    return do_vadc(s, a, gen_helper_mve_vsbc, FPCR_C);
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 52/55] target/arm: Implement MVE VCADD
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (50 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 51/55] target/arm: Implement MVE VADC, VSBC Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-09 21:16   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 53/55] target/arm: Implement MVE VHCADD Peter Maydell
                   ` (3 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VCADD insn, which performs a complex add with
rotate.  Note that the size=0b11 encoding is VSBC.

The architecture grants some leeway for the "destination and Vm
source overlap" case for the size MO_32 case, but we choose not to
make use of it, instead always calculating all 16 bytes worth of
results before setting the destination register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      |  7 ++++++-
 target/arm/mve_helper.c    | 31 +++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  7 +++++++
 4 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 686e5d9a39b..6e345470cbb 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -251,6 +251,14 @@ DEF_HELPER_FLAGS_4(mve_vrhadduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_5(mve_vadc, TCG_CALL_NO_WG, i32, env, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(mve_vsbc, TCG_CALL_NO_WG, i32, env, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(mve_vcadd90b, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vcadd90h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vcadd90w, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vcadd270b, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vcadd270h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vcadd270w, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
 DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 6a4aae7a1fc..c0979f3941b 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -164,7 +164,12 @@ VRHADD_S         111 0 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
 VRHADD_U         111 1 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
 
 VADC             1110 1110 0 . 11 ... 0 ... . 1111 . 0 . 0 ... 0 @vadc
-VSBC             1111 1110 0 . 11 ... 0 ... . 1111 . 0 . 0 ... 0 @vadc
+
+{
+  VCADD90        1111 1110 0 . .. ... 0 ... 0 1111 . 0 . 0 ... 0 @2op
+  VCADD270       1111 1110 0 . .. ... 0 ... 1 1111 . 0 . 0 ... 0 @2op
+  VSBC           1111 1110 0 . 11 ... 0 ... . 1111 . 0 . 0 ... 0 @vadc
+}
 
 # Vector miscellaneous
 
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index e07f12c8389..2c8ef25b208 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -610,6 +610,37 @@ DO_2OP_U(vrhaddu, DO_RHADD_U)
 DO_VADC(vadc, )
 DO_VADC(vsbc, DO_NOT)
 
+#define DO_VCADD(OP, ESIZE, TYPE, H, FN0, FN1)                          \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn, void *vm) \
+    {                                                                   \
+        TYPE *d = vd, *n = vn, *m = vm;                                 \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned e;                                                     \
+        TYPE r[16 / ESIZE];                                             \
+        /* Calculate all results first to avoid overwriting inputs */   \
+        for (e = 0; e < 16 / ESIZE; e++) {                              \
+            if (!(e & 1)) {                                             \
+                r[e] = FN0(n[H(e)], m[H(e + 1)]);                       \
+            } else {                                                    \
+                r[e] = FN1(n[H(e)], m[H(e - 1)]);                       \
+            }                                                           \
+        }                                                               \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+            uint64_t bytemask = mask_to_bytemask##ESIZE(mask);          \
+            d[H(e)] &= ~bytemask;                                       \
+            d[H(e)] |= (r[e] & bytemask);                               \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+#define DO_VCADD_ALL(OP, FN0, FN1)              \
+    DO_VCADD(OP##b, 1, int8_t, H1, FN0, FN1)    \
+    DO_VCADD(OP##h, 2, int16_t, H1, FN0, FN1)   \
+    DO_VCADD(OP##w, 4, int32_t, H1, FN0, FN1)
+
+DO_VCADD_ALL(vcadd90, DO_SUB, DO_ADD)
+DO_VCADD_ALL(vcadd270, DO_ADD, DO_SUB)
+
 static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
 {
     if (val > max) {
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 2ed499a6de2..8e3989b0176 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -440,6 +440,13 @@ DO_2OP(VQRDMLSDH, vqrdmlsdh)
 DO_2OP(VQRDMLSDHX, vqrdmlsdhx)
 DO_2OP(VRHADD_S, vrhadds)
 DO_2OP(VRHADD_U, vrhaddu)
+/*
+ * VCADD Qd == Qm at size MO_32 is UNPREDICTABLE; we choose not to diagnose
+ * so we can reuse the DO_2OP macro. (Our implementation calculates the
+ * "expected" results in this case.)
+ */
+DO_2OP(VCADD90, vcadd90)
+DO_2OP(VCADD270, vcadd270)
 
 static bool trans_VQDMULLB(DisasContext *s, arg_2op *a)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 53/55] target/arm: Implement MVE VHCADD
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (51 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 52/55] target/arm: Implement MVE VCADD Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-10  3:50   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 54/55] target/arm: Implement MVE VADDV Peter Maydell
                   ` (2 subsequent siblings)
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VHCADD insn, which is similar to VCADD
but performs a halving step. This one overlaps with VADC.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    | 8 ++++++++
 target/arm/mve.decode      | 6 +++++-
 target/arm/mve_helper.c    | 5 +++++
 target/arm/translate-mve.c | 4 +++-
 4 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 6e345470cbb..3f056e67871 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -259,6 +259,14 @@ DEF_HELPER_FLAGS_4(mve_vcadd270b, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vcadd270h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vcadd270w, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 
+DEF_HELPER_FLAGS_4(mve_vhcadd90b, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vhcadd90h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vhcadd90w, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vhcadd270b, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vhcadd270h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vhcadd270w, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
 DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index c0979f3941b..23ae12b7a38 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -163,7 +163,11 @@ VQDMULLT         111 . 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 1 @2op_sz28
 VRHADD_S         111 0 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
 VRHADD_U         111 1 1111 0 . .. ... 0 ... 0 0001 . 1 . 0 ... 0 @2op
 
-VADC             1110 1110 0 . 11 ... 0 ... . 1111 . 0 . 0 ... 0 @vadc
+{
+  VHCADD90       1110 1110 0 . .. ... 0 ... 0 1111 . 0 . 0 ... 0 @2op
+  VHCADD270      1110 1110 0 . .. ... 0 ... 1 1111 . 0 . 0 ... 0 @2op
+  VADC           1110 1110 0 . 11 ... 0 ... . 1111 . 0 . 0 ... 0 @vadc
+}
 
 {
   VCADD90        1111 1110 0 . .. ... 0 ... 0 1111 . 0 . 0 ... 0 @2op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 2c8ef25b208..3477d2bb191 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -638,8 +638,13 @@ DO_VADC(vsbc, DO_NOT)
     DO_VCADD(OP##h, 2, int16_t, H1, FN0, FN1)   \
     DO_VCADD(OP##w, 4, int32_t, H1, FN0, FN1)
 
+#define DO_HADD(N, M) (((int64_t)(N) + (int64_t)(M)) >> 1)
+#define DO_HSUB(N, M) (((int64_t)(N) - (int64_t)(M)) >> 1)
+
 DO_VCADD_ALL(vcadd90, DO_SUB, DO_ADD)
 DO_VCADD_ALL(vcadd270, DO_ADD, DO_SUB)
+DO_VCADD_ALL(vhcadd90, DO_HSUB, DO_HADD)
+DO_VCADD_ALL(vhcadd270, DO_HADD, DO_HSUB)
 
 static inline int32_t do_sat_bhw(int64_t val, int64_t min, int64_t max, bool *s)
 {
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 8e3989b0176..b2020bd90b1 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -443,10 +443,12 @@ DO_2OP(VRHADD_U, vrhaddu)
 /*
  * VCADD Qd == Qm at size MO_32 is UNPREDICTABLE; we choose not to diagnose
  * so we can reuse the DO_2OP macro. (Our implementation calculates the
- * "expected" results in this case.)
+ * "expected" results in this case.) Similarly for VHCADD.
  */
 DO_2OP(VCADD90, vcadd90)
 DO_2OP(VCADD270, vcadd270)
+DO_2OP(VHCADD90, vhcadd90)
+DO_2OP(VHCADD270, vhcadd270)
 
 static bool trans_VQDMULLB(DisasContext *s, arg_2op *a)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 54/55] target/arm: Implement MVE VADDV
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (52 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 53/55] target/arm: Implement MVE VHCADD Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-10 14:06   ` Richard Henderson
  2021-06-07 16:58 ` [PATCH 55/55] target/arm: Make VMOV scalar <-> gpreg beatwise for MVE Peter Maydell
  2021-06-09 14:33 ` [PATCH 00/55] target/arm: First slice of MVE implementation no-reply
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

Implement the MVE VADDV insn, which performs an addition
across vector lanes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-mve.h    |  7 ++++++
 target/arm/mve.decode      |  2 ++
 target/arm/mve_helper.c    | 24 +++++++++++++++++++
 target/arm/translate-mve.c | 48 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 81 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 3f056e67871..c1ef44d5927 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -348,3 +348,10 @@ DEF_HELPER_FLAGS_4(mve_vrmlaldavhuw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 
 DEF_HELPER_FLAGS_4(mve_vrmlsldavhsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vrmlsldavhxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
+
+DEF_HELPER_FLAGS_3(mve_vaddvsb, TCG_CALL_NO_WG, i32, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vaddvub, TCG_CALL_NO_WG, i32, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vaddvsh, TCG_CALL_NO_WG, i32, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vaddvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vaddvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vaddvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 23ae12b7a38..bfbf8cf4252 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -253,6 +253,8 @@ VBRSR            1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
 VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
 VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
 
+# Vector add across vector
+VADDV            111 u:1 1110 1111 size:2 01 ... 0 1111 0 0 a:1 0 qm:3 0 rda=%rdalo
 
 # Predicate operations
 %mask_22_13      22:1 13:3
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 3477d2bb191..191eb3f58aa 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -1317,3 +1317,27 @@ DO_LDAVH(vrmlaldavhuw, 4, uint32_t, H4, false, int128_add, int128_add, int128_ma
 
 DO_LDAVH(vrmlsldavhsw, 4, int32_t, H4, false, int128_add, int128_sub, int128_makes64)
 DO_LDAVH(vrmlsldavhxsw, 4, int32_t, H4, true, int128_add, int128_sub, int128_makes64)
+
+/* Vector add across vector */
+#define DO_VADDV(OP, ESIZE, TYPE, H)                            \
+    uint32_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vm, \
+                                    uint32_t ra)                \
+    {                                                           \
+        uint16_t mask = mve_element_mask(env);                  \
+        unsigned e;                                             \
+        TYPE *m = vm;                                           \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
+            if (mask & 1) {                                     \
+                ra += m[H(e)];                                  \
+            }                                                   \
+        }                                                       \
+        mve_advance_vpt(env);                                   \
+        return ra;                                              \
+    }                                                           \
+
+DO_VADDV(vaddvsb, 1, uint8_t, H1)
+DO_VADDV(vaddvsh, 2, uint16_t, H2)
+DO_VADDV(vaddvsw, 4, uint32_t, H4)
+DO_VADDV(vaddvub, 1, uint8_t, H1)
+DO_VADDV(vaddvuh, 2, uint16_t, H2)
+DO_VADDV(vaddvuw, 4, uint32_t, H4)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index b2020bd90b1..1794c50d0e8 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -34,6 +34,7 @@ typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
 typedef void MVEGenADCFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
+typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
 
 /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
 static inline long mve_qreg_offset(unsigned reg)
@@ -815,3 +816,50 @@ static bool trans_VSBC(DisasContext *s, arg_vadc *a)
 {
     return do_vadc(s, a, gen_helper_mve_vsbc, FPCR_C);
 }
+
+static bool trans_VADDV(DisasContext *s, arg_VADDV *a)
+{
+    /* VADDV: vector add across vector */
+    MVEGenVADDVFn *fns[4][2] = {
+        { gen_helper_mve_vaddvsb, gen_helper_mve_vaddvub },
+        { gen_helper_mve_vaddvsh, gen_helper_mve_vaddvuh },
+        { gen_helper_mve_vaddvsw, gen_helper_mve_vaddvuw },
+        { NULL, NULL }
+    };
+    TCGv_ptr qm;
+    TCGv_i32 rda;
+
+    if (!dc_isar_feature(aa32_mve, s)) {
+        return false;
+    }
+    if (a->size == 3) {
+        return false;
+    }
+    if (!mve_eci_check(s)) {
+        return true;
+    }
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    /*
+     * This insn is subject to beat-wise execution. Partial execution
+     * of an A=0 (no-accumulate) insn which does not execute the first
+     * beat must start with the current value of Rda, not zero.
+     */
+    if (a->a || mve_skip_first_beat(s)) {
+        /* Accumulate input from Rda */
+        rda = load_reg(s, a->rda);
+    } else {
+        /* Accumulate starting at zero */
+        rda = tcg_const_i32(0);
+    }
+
+    qm = mve_qreg_ptr(a->qm);
+    fns[a->size][a->u](rda, cpu_env, qm, rda);
+    store_reg(s, a->rda, rda);
+    tcg_temp_free_ptr(qm);
+
+    mve_update_eci(s);
+    return true;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH 55/55] target/arm: Make VMOV scalar <-> gpreg beatwise for MVE
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (53 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 54/55] target/arm: Implement MVE VADDV Peter Maydell
@ 2021-06-07 16:58 ` Peter Maydell
  2021-06-10 14:14   ` Richard Henderson
  2021-06-09 14:33 ` [PATCH 00/55] target/arm: First slice of MVE implementation no-reply
  55 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-07 16:58 UTC (permalink / raw)
  To: qemu-arm, qemu-devel; +Cc: Richard Henderson

In a CPU with MVE, the VMOV (vector lane to general-purpose register)
and VMOV (general-purpose register to vector lane) insns are not
predicated, but they are subject to beatwise execution if they
are not in an IT block.

Since our implementation always executes all 4 beats in one tick,
this means only that we need to handle PSR.ECI:
 * we must do the usual check for bad ECI state
 * we must advance ECI state if the insn succeeds
 * if ECI says we should not be executing the beat corresponding
   to the lane of the vector register being accessed then we
   should skip performing the move

Note that if PSR.ECI is non-zero then we cannot be in an IT block.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a32.h |  2 +
 target/arm/translate-mve.c |  4 +-
 target/arm/translate-vfp.c | 85 +++++++++++++++++++++++++++++++++++---
 3 files changed, 83 insertions(+), 8 deletions(-)

diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
index 0a0053949f5..6d384fc7966 100644
--- a/target/arm/translate-a32.h
+++ b/target/arm/translate-a32.h
@@ -46,6 +46,8 @@ long neon_full_reg_offset(unsigned reg);
 long neon_element_offset(int reg, int element, MemOp memop);
 void gen_rev16(TCGv_i32 dest, TCGv_i32 var);
 void clear_eci_state(DisasContext *s);
+bool mve_eci_check(DisasContext *s);
+void mve_update_eci(DisasContext *s);
 
 static inline TCGv_i32 load_cpu_offset(int offset)
 {
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 1794c50d0e8..b62e355a1a3 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -49,7 +49,7 @@ static TCGv_ptr mve_qreg_ptr(unsigned reg)
     return ret;
 }
 
-static bool mve_eci_check(DisasContext *s)
+bool mve_eci_check(DisasContext *s)
 {
     /*
      * This is a beatwise insn: check that ECI is valid (not a
@@ -72,7 +72,7 @@ static bool mve_eci_check(DisasContext *s)
     }
 }
 
-static void mve_update_eci(DisasContext *s)
+void mve_update_eci(DisasContext *s)
 {
     /*
      * The helper function will always update the CPUState field,
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index 6a572591ce9..b5bb8230cd9 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -553,6 +553,48 @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
     return true;
 }
 
+static bool mve_skip_vmov(DisasContext *s, int vn, int index, int size)
+{
+    /*
+     * In a CPU with MVE, the VMOV (vector lane to general-purpose register)
+     * and VMOV (general-purpose register to vector lane) insns are not
+     * predicated, but they are subject to beatwise execution if they are
+     * not in an IT block.
+     *
+     * Since our implementation always executes all 4 beats in one tick,
+     * this means only that if PSR.ECI says we should not be executing
+     * the beat corresponding to the lane of the vector register being
+     * accessed then we should skip performing the move, and that we need
+     * to do the usual check for bad ECI state and advance of ECI state.
+     *
+     * Note that if PSR.ECI is non-zero then we cannot be in an IT block.
+     *
+     * Return true if this VMOV scalar <-> gpreg should be skipped because
+     * the MVE PSR.ECI state says we skip the beat where the store happens.
+     */
+
+    /* Calculate the byte offset into Qn which we're going to access */
+    int ofs = (index << size) + ((vn & 1) * 8);
+
+    if (!dc_isar_feature(aa32_mve, s)) {
+        return false;
+    }
+
+    switch (s->eci) {
+    case ECI_NONE:
+        return false;
+    case ECI_A0:
+        return ofs < 4;
+    case ECI_A0A1:
+        return ofs < 8;
+    case ECI_A0A1A2:
+    case ECI_A0A1A2B0:
+        return ofs < 12;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
 {
     /* VMOV scalar to general purpose register */
@@ -575,14 +617,30 @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
         return false;
     }
 
+    if (dc_isar_feature(aa32_mve, s)) {
+        if (!mve_eci_check(s)) {
+            return true;
+        }
+    }
+
     if (!vfp_access_check(s)) {
         return true;
     }
 
-    tmp = tcg_temp_new_i32();
-    read_neon_element32(tmp, a->vn, a->index, a->size | (a->u ? 0 : MO_SIGN));
-    store_reg(s, a->rt, tmp);
+    if (!mve_skip_vmov(s, a->vn, a->index, a->size)) {
+        tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vn, a->index,
+                            a->size | (a->u ? 0 : MO_SIGN));
+        store_reg(s, a->rt, tmp);
+    }
 
+    if (dc_isar_feature(aa32_mve, s)) {
+        TCGv_i32 eci;
+
+        mve_update_eci(s);
+        eci = tcg_const_i32(s->eci << 4);
+        store_cpu_field(eci, condexec_bits);
+    }
     return true;
 }
 
@@ -608,14 +666,29 @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
         return false;
     }
 
+    if (dc_isar_feature(aa32_mve, s)) {
+        if (!mve_eci_check(s)) {
+            return true;
+        }
+    }
+
     if (!vfp_access_check(s)) {
         return true;
     }
 
-    tmp = load_reg(s, a->rt);
-    write_neon_element32(tmp, a->vn, a->index, a->size);
-    tcg_temp_free_i32(tmp);
+    if (!mve_skip_vmov(s, a->vn, a->index, a->size)) {
+        tmp = load_reg(s, a->rt);
+        write_neon_element32(tmp, a->vn, a->index, a->size);
+        tcg_temp_free_i32(tmp);
+    }
 
+    if (dc_isar_feature(aa32_mve, s)) {
+        TCGv_i32 eci;
+
+        mve_update_eci(s);
+        eci = tcg_const_i32(s->eci << 4);
+        store_cpu_field(eci, condexec_bits);
+    }
     return true;
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH 02/55] target/arm: Enable FPSCR.QC bit for MVE
  2021-06-07 16:57 ` [PATCH 02/55] target/arm: Enable FPSCR.QC bit for MVE Peter Maydell
@ 2021-06-07 19:02   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-07 19:02 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> MVE has an FPSCR.QC bit similar to the A-profile Neon one; when MVE
> is implemented make the bit writeable, both in the generic "load and
> store FPSCR" helper functions and in the code for handling the NZCVQC
> sysreg which we had previously left as "TODO when we implement MVE".
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   target/arm/translate-vfp.c | 32 +++++++++++++++++++++++---------
>   target/arm/vfp_helper.c    |  3 ++-
>   2 files changed, 25 insertions(+), 10 deletions(-)
> 
> diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
> index d01e465821b..22a619eb2c5 100644
> --- a/target/arm/translate-vfp.c
> +++ b/target/arm/translate-vfp.c
> @@ -784,10 +784,19 @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
>       {
>           TCGv_i32 fpscr;
>           tmp = loadfn(s, opaque);
> -        /*
> -         * TODO: when we implement MVE, write the QC bit.
> -         * For non-MVE, QC is RES0.
> -         */
> +        if (dc_isar_feature(aa32_mve, s)) {
> +            /* QC is only present for MVE; otherwise RES0 */
> +            TCGv_i32 qc = tcg_temp_new_i32();
> +            TCGv_i32 zero;
> +            tcg_gen_andi_i32(qc, tmp, FPCR_QC);
> +            store_cpu_field(qc, vfp.qc[0]);
> +            zero = tcg_const_i32(0);
> +            store_cpu_field(zero, vfp.qc[1]);
> +            zero = tcg_const_i32(0);
> +            store_cpu_field(zero, vfp.qc[2]);
> +            zero = tcg_const_i32(0);
> +            store_cpu_field(zero, vfp.qc[3]);
> +        }

Ok I guess.  You could store the same i32 into all elements:

     tcg_gen_gvec_dup_i32(MO_32, offsetof(CPUARMState, vfp.qc),
                          16, 16, qc);

Either way,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 03/55] target/arm: Handle VPR semantics in existing code
  2021-06-07 16:57 ` [PATCH 03/55] target/arm: Handle VPR semantics in existing code Peter Maydell
@ 2021-06-07 21:19   ` Richard Henderson
  2021-06-10  9:28     ` Peter Maydell
  0 siblings, 1 reply; 130+ messages in thread
From: Richard Henderson @ 2021-06-07 21:19 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> @@ -410,16 +415,19 @@ void HELPER(v7m_preserve_fp_state)(CPUARMState *env)
>       env->v7m.fpccr[is_secure] &= ~R_V7M_FPCCR_LSPACT_MASK;
>   
>       if (ts) {
> -        /* Clear s0 to s31 and the FPSCR */
> +        /* Clear s0 to s31 and the FPSCR and VPR */
>           int i;
>   
>           for (i = 0; i < 32; i += 2) {
>               *aa32_vfp_dreg(env, i / 2) = 0;
>           }
>           vfp_set_fpscr(env, 0);
> +        if (cpu_isar_feature(aa32_mve, cpu)) {
> +            env->v7m.vpr = 0;
> +        }

If the vpr does not exist without mve, is it cleaner to simply set vpr 
unconditionally?

Either way it looks good.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 04/55] target/arm: Add handling for PSR.ECI/ICI
  2021-06-07 16:57 ` [PATCH 04/55] target/arm: Add handling for PSR.ECI/ICI Peter Maydell
@ 2021-06-07 23:33   ` Richard Henderson
  2021-06-10 10:17     ` Peter Maydell
  0 siblings, 1 reply; 130+ messages in thread
From: Richard Henderson @ 2021-06-07 23:33 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> +void clear_eci_state(DisasContext *s)
> +{
> +    /*
> +     * Clear any ECI/ICI state: used when a load multiple/store
> +     * multiple insn executes.
> +     */
> +    if (s->eci) {
> +        TCGv_i32 tmp = tcg_temp_new_i32();
> +        tcg_gen_movi_i32(tmp, 0);

tcg_const_i32 or preferably tcg_constant_i32.


> +    /*
> +     * the CONDEXEC TB flags are CPSR bits [15:10][26:25]. On A-profile this
> +     * is always the IT bits. On M-profile, some of the reserved encodings
> +     * of IT are used instead to indicate either ICI or ECI, which
> +     * indicate partial progress of a restartable insn that was interrupted
> +     * partway through by an exception:
> +     *  * if CONDEXEC[3:0] != 0b0000 : CONDEXEC is IT bits
> +     *  * if CONDEXEC[3:0] == 0b0000 : CONDEXEC is ICI or ECI bits
> +     * In all cases CONDEXEC == 0 means "not in IT block or restartable
> +     * insn, behave normally".
> +     */
> +    if (condexec & 0xf) {
> +        dc->condexec_mask = (condexec & 0xf) << 1;
> +        dc->condexec_cond = condexec >> 4;
> +        dc->eci = 0;
> +    } else {
> +        dc->condexec_mask = 0;
> +        dc->condexec_cond = 0;
> +        if (arm_feature(env, ARM_FEATURE_M)) {
> +            dc->eci = condexec >> 4;
> +        }

This else leaves eci uninitialized.

>       dc->insn = insn;
>   
> +    if (dc->eci) {
> +        /*
> +         * For M-profile continuable instructions, ECI/ICI handling
> +         * falls into these cases:
> +         *  - interrupt-continuable instructions
> +         *     These are the various load/store multiple insns (both
> +         *     integer and fp). The ICI bits indicate the register
> +         *     where the load/store can resume. We make the IMPDEF
> +         *     choice to always do "instruction restart", ie ignore
> +         *     the ICI value and always execute the ldm/stm from the
> +         *     start. So all we need to do is zero PSR.ICI if the
> +         *     insn executes.
> +         *  - MVE instructions subject to beat-wise execution
> +         *     Here the ECI bits indicate which beats have already been
> +         *     executed, and we must honour this. Each insn of this
> +         *     type will handle it correctly. We will update PSR.ECI
> +         *     in the helper function for the insn (some ECI values
> +         *     mean that the following insn also has been partially
> +         *     executed).
> +         *  - Special cases which don't advance ECI
> +         *     The insns LE, LETP and BKPT leave the ECI/ICI state
> +         *     bits untouched.
> +         *  - all other insns (the common case)
> +         *     Non-zero ECI/ICI means an INVSTATE UsageFault.
> +         *     We place a rewind-marker here. Insns in the previous
> +         *     three categories will set a flag in the DisasContext.
> +         *     If the flag isn't set after we call disas_thumb_insn()
> +         *     or disas_thumb2_insn() then we know we have a "some other
> +         *     insn" case. We will rewind to the marker (ie throwing away
> +         *     all the generated code) and instead emit "take exception".
> +         */
> +        dc->eci_handled = false;

This should be done in arm_tr_init_disas_context, I think, unconditionally, 
next to eci.

> +        dc->insn_eci_rewind = tcg_last_op();

I believe that this is identical to dc->insn_start.  Certainly there does not 
seem to be any possibility of any opcodes emitted in between.

If you think we should use a different field, then initialize it to null next 
to eci/eci_handled.


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 05/55] target/arm: Let vfp_access_check() handle late NOCP checks
  2021-06-07 16:57 ` [PATCH 05/55] target/arm: Let vfp_access_check() handle late NOCP checks Peter Maydell
@ 2021-06-07 23:50   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-07 23:50 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> In commit a3494d4671797c we reworked the M-profile handling of its
> checks for when the NOCP exception should be raised because the FPU
> is disabled, so that (in line with the architecture) the NOCP check
> is done early over a large range of the encoding space, and takes
> precedence over UNDEF exceptions.  As part of this, we removed the
> code from full_vfp_access_check() which raised an exception there for
> M-profile with the FPU disabled, because it was no longer reachable.
> 
> For MVE, some instructions which are outside the "coprocessor space"
> region of the encoding space must nonetheless do "is the FPU enabled"
> checks and possibly raise a NOCP exception.  (In particular this
> covers the MVE-specific low-overhead branch insns LCTP, DLSTP and
> WLSTP.) To support these insns, reinstate the code in
> full_vfp_access_check(), so that their trans functions can call
> vfp_access_check() and get the correct behaviour.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/translate-vfp.c | 20 +++++++++++++++-----
>   1 file changed, 15 insertions(+), 5 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 06/55] target/arm: Implement MVE LCTP
  2021-06-07 16:57 ` [PATCH 06/55] target/arm: Implement MVE LCTP Peter Maydell
@ 2021-06-08  0:05   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08  0:05 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> Implement the MVE LCTP instruction.
> 
> We put its decode and implementation with the other
> low-overhead-branch insns because although it is only present if MVE
> is implemented it is logically in the same group as the other LOB
> insns.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/t32.decode  |  2 ++
>   target/arm/translate.c | 24 ++++++++++++++++++++++++
>   2 files changed, 26 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 07/55] target/arm: Implement MVE WLSTP insn
  2021-06-07 16:57 ` [PATCH 07/55] target/arm: Implement MVE WLSTP insn Peter Maydell
@ 2021-06-08  1:42   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08  1:42 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> +    WLS          1111 0 0000 100     rn:4 1100 . .......... 1 imm=%lob_imm size=4
> +    {
> +      # This is WLSTP
> +      WLS        1111 0 0000 0 size:2 rn:4 1100 . .......... 1 imm=%lob_imm
> +      LE         1111 0 0000 0 f:1 0 1111 1100 . .......... 1 imm=%lob_imm
> +    }

I guess it doesn't matter, but I'd swap these two, as LE is the more specific 
encoding.

> @@ -8148,10 +8152,40 @@ static bool trans_WLS(DisasContext *s, arg_WLS *a)
>            */
>           return false;
>       }
> +    if (a->size != 4) {
> +        /* WLSTP */
> +        if (!dc_isar_feature(aa32_mve, s)) {
> +            return false;
> +        }
> +        /*
> +         * We need to check that the FPU is enabled here, but mustn't
> +         * call vfp_access_check() to do that because we don't want to
> +         * do the lazy state preservation in the "loop count is zero" case.
> +         * Do the check-and-raise-exception by hand.
> +         */
> +        if (s->fp_excp_el) {
> +            gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
> +                               syn_uncategorized(), s->fp_excp_el);
> +        }

Surely return true here...

> +    if (a->size != 4) {
> +        /*
> +         * WLSTP: set FPSCR.LTPSIZE. This requires that we do the
> +         * lazy state preservation, new FP context creation, etc,
> +         * that vfp_access_check() does. We know that the actual
> +         * access check will succeed (ie it won't generate code that
> +         * throws an exception) because we did that check by hand earlier.
> +         */
> +        bool ok = vfp_access_check(s);
> +        assert(ok);

... otherwise this assert will trigger.


r~
r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 08/55] target/arm: Implement MVE DLSTP
  2021-06-07 16:57 ` [PATCH 08/55] target/arm: Implement MVE DLSTP Peter Maydell
@ 2021-06-08  2:56   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08  2:56 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> +    {
> +      # This is DLSTP
> +      DLS        1111 0 0000 0 size:2 rn:4 1110 0000 0000 0001
> +      LCTP       1111 0 0000 000     1111 1110 0000 0000 0001
> +    }

Same comment with LCTP being the more specific encoding.
Either way,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 09/55] target/arm: Implement MVE LETP insn
  2021-06-07 16:57 ` [PATCH 09/55] target/arm: Implement MVE LETP insn Peter Maydell
@ 2021-06-08  3:40   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08  3:40 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> Implement the MVE LETP insn.  This is like the existing LE loop-end
> insn, but it must perform an FPU-enabled check, and on loop-exit it
> resets LTPSIZE to 4.
> 
> To accommodate the requirement to do something on loop-exit, we drop
> the use of condlabel and instead manage both the TB exits manually,
> in the same way we already do in trans_WLS().
> 
> The other MVE-specific change to the LE insn is that we must raise an
> INVSTATE UsageFault insn if LTPSIZE is not 4.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
> This amounts to a complete rewrite of trans_LE()...
> ---
>   target/arm/t32.decode  |   2 +-
>   target/arm/translate.c | 104 +++++++++++++++++++++++++++++++++++++----
>   2 files changed, 97 insertions(+), 9 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 10/55] target/arm: Add framework for MVE decode
  2021-06-07 16:57 ` [PATCH 10/55] target/arm: Add framework for MVE decode Peter Maydell
@ 2021-06-08  3:59   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08  3:59 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> Add the framework for decoding MVE insns, with the necessary new
> files and the meson.build rules, but no actual content yet.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/translate-a32.h |  1 +
>   target/arm/mve.decode      | 20 ++++++++++++++++++++
>   target/arm/translate-mve.c | 29 +++++++++++++++++++++++++++++
>   target/arm/translate.c     |  1 +
>   target/arm/meson.build     |  2 ++
>   5 files changed, 53 insertions(+)
>   create mode 100644 target/arm/mve.decode
>   create mode 100644 target/arm/translate-mve.c

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 31/55] include/qemu/int128.h: Add function to create Int128 from int64_t
  2021-06-07 16:57 ` [PATCH 31/55] include/qemu/int128.h: Add function to create Int128 from int64_t Peter Maydell
@ 2021-06-08  6:45   ` Philippe Mathieu-Daudé
  2021-06-09  0:51   ` Richard Henderson
  1 sibling, 0 replies; 130+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-06-08  6:45 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel; +Cc: Richard Henderson

On 6/7/21 6:57 PM, Peter Maydell wrote:
> int128_make64() creates an Int128 from an unsigned 64 bit value; add
> a function int128_makes64() creating an Int128 from a signed 64 bit
> value.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  include/qemu/int128.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/include/qemu/int128.h b/include/qemu/int128.h
> index 52fc2384211..64500385e37 100644
> --- a/include/qemu/int128.h
> +++ b/include/qemu/int128.h
> @@ -11,6 +11,11 @@ static inline Int128 int128_make64(uint64_t a)
>      return a;
>  }

> +static inline Int128 int128_makes64(int64_t a)
> +{
> +    return (Int128) { a, a >> 63 };

This file would be easier to review using explicit field names:

       return (Int128) { .lo = a, .hi = a >> 63 };

Also, maybe we could rename int128_makeX -> int128_make_uX
before introducing int128_make_sX.

Regardless:
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

> +}
> +
>  static inline Int128 int128_make128(uint64_t lo, uint64_t hi)
>  {
>      return (Int128) { lo, hi };
> 



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 15/55] bitops.h: Provide hswap32(), hswap64(), wswap64() swapping operations
  2021-06-07 16:57 ` [PATCH 15/55] bitops.h: Provide hswap32(), hswap64(), wswap64() swapping operations Peter Maydell
@ 2021-06-08  6:53   ` Philippe Mathieu-Daudé
  2021-06-08 22:14   ` Richard Henderson
  1 sibling, 0 replies; 130+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-06-08  6:53 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel; +Cc: Richard Henderson

On 6/7/21 6:57 PM, Peter Maydell wrote:
> Currently the ARM SVE helper code defines locally some utility
> functions for swapping 16-bit halfwords within 32-bit or 64-bit
> values and for swapping 32-bit words within 64-bit values,
> parallel to the byte-swapping bswap16/32/64 functions.
> 
> We want these also for the ARM MVE code, and they're potentially
> generally useful for other targets, so move them to bitops.h.
> (We don't put them in bswap.h with the bswap* functions because
> they are implemented in terms of the rotate operations also
> defined in bitops.h, and including bitops.h from bswap.h seems
> better avoided.)
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  include/qemu/bitops.h   | 29 +++++++++++++++++++++++++++++
>  target/arm/sve_helper.c | 20 --------------------
>  2 files changed, 29 insertions(+), 20 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 11/55] target/arm: Implement MVE VLDR/VSTR (non-widening forms)
  2021-06-07 16:57 ` [PATCH 11/55] target/arm: Implement MVE VLDR/VSTR (non-widening forms) Peter Maydell
@ 2021-06-08 21:33   ` Richard Henderson
  2021-06-08 21:43     ` Richard Henderson
                       ` (2 more replies)
  0 siblings, 3 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08 21:33 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> +static uint16_t mve_element_mask(CPUARMState *env)
> +{
> +    /*
> +     * Return the mask of which elements in the MVE vector should be
> +     * updated. This is a combination of multiple things:
> +     *  (1) by default, we update every lane in the vector
> +     *  (2) VPT predication stores its state in the VPR register;
> +     *  (3) low-overhead-branch tail predication will mask out part
> +     *      the vector on the final iteration of the loop
> +     *  (4) if EPSR.ECI is set then we must execute only some beats
> +     *      of the insn
> +     * We combine all these into a 16-bit result with the same semantics
> +     * as VPR.P0: 0 to mask the lane, 1 if it is active.
> +     * 8-bit vector ops will look at all bits of the result;
> +     * 16-bit ops will look at bits 0, 2, 4, ...;
> +     * 32-bit ops will look at bits 0, 4, 8 and 12.
> +     * Compare pseudocode GetCurInstrBeat(), though that only returns
> +     * the 4-bit slice of the mask corresponding to a single beat.
> +     */
> +    uint16_t mask = extract32(env->v7m.vpr, R_V7M_VPR_P0_SHIFT,
> +                              R_V7M_VPR_P0_LENGTH);

Any reason you're not using FIELD_EX32 and and FIELD_DP32 so far in this file?

> +#define DO_VLDR(OP, ESIZE, LDTYPE, TYPE, H)                             \
> +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr)    \
> +    {                                                                   \
> +        TYPE *d = vd;                                                   \
> +        uint16_t mask = mve_element_mask(env);                          \
> +        unsigned b, e;                                                  \

esize is redundant with sizeof(type); perhaps just make it a local variable?

> diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
> index c54d5cb7305..e8bb2372ad9 100644
> --- a/target/arm/translate-mve.c
> +++ b/target/arm/translate-mve.c
> @@ -1,6 +1,6 @@
>   /*
>    *  ARM translation: M-profile MVE instructions
> -
> + *
>    *  Copyright (c) 2021 Linaro, Ltd.

Is this just diff silliness?  I see that it has decided that helper-mve.h is a 
rename from translate-mve.c...

> +static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
> +{
> +    TCGv_i32 addr;
> +    uint32_t offset;
> +    TCGv_ptr qreg;
> +
> +    if (!dc_isar_feature(aa32_mve, s)) {
> +        return false;
> +    }
> +
> +    if (a->qd > 7 || !fn) {
> +        return false;
> +    }

It's a funny old decode,

   if D then UNDEFINED.
   d = D:Qd,

Is the spec forward looking to more than 7 Q registers?
It's tempting to just drop the D:Qd from the decode...

> +static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
> +{
> +    MVEGenLdStFn *ldfns[] = {

static MVEGenLdStFn * const ldfns

> +    MVEGenLdStFn *stfns[] = {

Likewise, though...

> +    return do_ldst(s, a, a->l ? ldfns[a->size] : stfns[a->size]);

... just put em together into a two-dimensional array, with a->l as the second 
index?


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 11/55] target/arm: Implement MVE VLDR/VSTR (non-widening forms)
  2021-06-08 21:33   ` Richard Henderson
@ 2021-06-08 21:43     ` Richard Henderson
  2021-06-09 10:01     ` Peter Maydell
  2021-06-10 14:01     ` Peter Maydell
  2 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08 21:43 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/8/21 2:33 PM, Richard Henderson wrote:
> 
>> +static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
>> +{
>> +    MVEGenLdStFn *ldfns[] = {
> 
> static MVEGenLdStFn * const ldfns
> 
>> +    MVEGenLdStFn *stfns[] = {
> 
> Likewise, though...
> 
>> +    return do_ldst(s, a, a->l ? ldfns[a->size] : stfns[a->size]);
> 
> ... just put em together into a two-dimensional array, with a->l as the second 
> index?

... or separate VLDR from VSTR.

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 12/55] target/arm: Implement widening/narrowing MVE VLDR/VSTR insns
  2021-06-07 16:57 ` [PATCH 12/55] target/arm: Implement widening/narrowing MVE VLDR/VSTR insns Peter Maydell
@ 2021-06-08 21:46   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08 21:46 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> +#define DO_VLDST_WIDE_NARROW(OP, SLD, ULD, ST)                          \
> +    static bool trans_##OP(DisasContext *s, arg_VLDR_VSTR *a)           \
> +    {                                                                   \
> +        MVEGenLdStFn *ldfns[] = {                                       \
> +            gen_helper_mve_##SLD,                                       \
> +            gen_helper_mve_##ULD,                                       \
> +        };                                                              \
> +        MVEGenLdStFn *stfns[] = {                                       \
> +            gen_helper_mve_##ST,                                        \
> +            NULL,                                                       \
> +        };                                                              \
> +        return do_ldst(s, a, a->l ? ldfns[a->u] : stfns[a->u]);         \
> +    }

static const on the arrays, or array, as before.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 13/55] target/arm: Implement MVE VCLZ
  2021-06-07 16:57 ` [PATCH 13/55] target/arm: Implement MVE VCLZ Peter Maydell
@ 2021-06-08 22:10   ` Richard Henderson
  2021-06-10 12:40     ` Peter Maydell
  0 siblings, 1 reply; 130+ messages in thread
From: Richard Henderson @ 2021-06-08 22:10 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> Implement the MVE VCLZ insn (and the necessary machinery
> for MVE 1-input vector ops).
> 
> Note that for non-load instructions predication is always performed
> at a byte level granularity regardless of element size (R_ZLSJ),
> and so the masking logic here differs from that used in the VLDR
> and VSTR helpers.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    |  4 ++++
>   target/arm/mve.decode      |  8 +++++++
>   target/arm/mve_helper.c    | 48 ++++++++++++++++++++++++++++++++++++++
>   target/arm/translate-mve.c | 43 ++++++++++++++++++++++++++++++++++
>   4 files changed, 103 insertions(+)
> 
> diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
> index e47d4164ae7..c5c1315b161 100644
> --- a/target/arm/helper-mve.h
> +++ b/target/arm/helper-mve.h
> @@ -32,3 +32,7 @@ DEF_HELPER_FLAGS_3(mve_vldrh_uw, TCG_CALL_NO_WG, void, env, ptr, i32)
>   DEF_HELPER_FLAGS_3(mve_vstrb_h, TCG_CALL_NO_WG, void, env, ptr, i32)
>   DEF_HELPER_FLAGS_3(mve_vstrb_w, TCG_CALL_NO_WG, void, env, ptr, i32)
>   DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
> +
> +DEF_HELPER_FLAGS_3(mve_vclzb, TCG_CALL_NO_WG, void, env, ptr, ptr)
> +DEF_HELPER_FLAGS_3(mve_vclzh, TCG_CALL_NO_WG, void, env, ptr, ptr)
> +DEF_HELPER_FLAGS_3(mve_vclzw, TCG_CALL_NO_WG, void, env, ptr, ptr)
> diff --git a/target/arm/mve.decode b/target/arm/mve.decode
> index 3bc5f034531..24999bf703e 100644
> --- a/target/arm/mve.decode
> +++ b/target/arm/mve.decode
> @@ -20,13 +20,17 @@
>   #
>   
>   %qd 22:1 13:3
> +%qm 5:1 1:3
>   
>   &vldr_vstr rn qd imm p a w size l u
> +&1op qd qm size
>   
>   @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
>   # Note that both Rn and Qd are 3 bits only (no D bit)
>   @vldst_wn ... u:1 ... . . . . l:1 . rn:3 qd:3 . ... .. imm:7 &vldr_vstr
>   
> +@1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
> +
>   # Vector loads and stores
>   
>   # Widening loads and narrowing stores:
> @@ -61,3 +65,7 @@ VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111101 .......   @vldr_vstr \
>                    size=1 p=1
>   VLDR_VSTR        1110110 1 a:1 . w:1 . .... ... 111110 .......   @vldr_vstr \
>                    size=2 p=1
> +
> +# Vector miscellaneous
> +
> +VCLZ             1111 1111 1 . 11 .. 00 ... 0 0100 11 . 0 ... 0 @1op
> diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
> index 6a2fc1c37cd..b7c44f57c09 100644
> --- a/target/arm/mve_helper.c
> +++ b/target/arm/mve_helper.c
> @@ -196,3 +196,51 @@ DO_VSTR(vstrh_w, 4, stw, int32_t, H4)
>   
>   #undef DO_VLDR
>   #undef DO_VSTR
> +
> +/*
> + * Take the bottom bits of mask (which is 1 bit per lane) and
> + * convert to a mask which has 1s in each byte which is predicated.
> + */
> +static uint8_t mask_to_bytemask1(uint16_t mask)
> +{
> +    return (mask & 1) ? 0xff : 0;
> +}
> +
> +static uint16_t mask_to_bytemask2(uint16_t mask)
> +{
> +    static const uint16_t masks[] = { 0x0000, 0x00ff, 0xff00, 0xffff };
> +    return masks[mask & 3];
> +}
> +
> +static uint32_t mask_to_bytemask4(uint16_t mask)
> +{
> +    static const uint32_t masks[] = {
> +        0x00000000, 0x000000ff, 0x0000ff00, 0x0000ffff,
> +        0x00ff0000, 0x00ff00ff, 0x00ffff00, 0x00ffffff,
> +        0xff000000, 0xff0000ff, 0xff00ff00, 0xff00ffff,
> +        0xffff0000, 0xffff00ff, 0xffffff00, 0xffffffff,
> +    };

I'll note that

(1) the values for the mask_to_bytemask2 array overlap the first 4 values of 
the mask_to_bytemask4 array, and

(2) both of these overlap with the larger

static inline uint64_t expand_pred_b(uint8_t byte)

from SVE.  It'd be nice to share the storage, whatever the actual functional 
interface into the array.

> +#define DO_1OP(OP, ESIZE, TYPE, H, FN)                                  \
> +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
> +    {                                                                   \
> +        TYPE *d = vd, *m = vm;                                          \
> +        uint16_t mask = mve_element_mask(env);                          \
> +        unsigned e;                                                     \
> +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
> +            TYPE r = FN(m[H(e)]);                                       \
> +            uint64_t bytemask = mask_to_bytemask##ESIZE(mask);          \

Why uint64_t and not TYPE?  Or uint32_t?

> +    if (!mve_eci_check(s)) {
> +        return true;
> +    }
> +
> +    if (!vfp_access_check(s)) {
> +        return true;
> +    }

Not the first instance, but is it worth saving 4 lines per and combining these 
into one IF?

> +#define DO_1OP(INSN, FN)                                        \
> +    static bool trans_##INSN(DisasContext *s, arg_1op *a)       \
> +    {                                                           \
> +        MVEGenOneOpFn *fns[] = {                                \

static const.


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 14/55] target/arm: Implement MVE VCLS
  2021-06-07 16:57 ` [PATCH 14/55] target/arm: Implement MVE VCLS Peter Maydell
@ 2021-06-08 22:12   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08 22:12 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> Implement the MVE VCLS insn.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    | 4 ++++
>   target/arm/mve.decode      | 1 +
>   target/arm/mve_helper.c    | 7 +++++++
>   target/arm/translate-mve.c | 1 +
>   4 files changed, 13 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 15/55] bitops.h: Provide hswap32(), hswap64(), wswap64() swapping operations
  2021-06-07 16:57 ` [PATCH 15/55] bitops.h: Provide hswap32(), hswap64(), wswap64() swapping operations Peter Maydell
  2021-06-08  6:53   ` Philippe Mathieu-Daudé
@ 2021-06-08 22:14   ` Richard Henderson
  1 sibling, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08 22:14 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> Currently the ARM SVE helper code defines locally some utility
> functions for swapping 16-bit halfwords within 32-bit or 64-bit
> values and for swapping 32-bit words within 64-bit values,
> parallel to the byte-swapping bswap16/32/64 functions.
> 
> We want these also for the ARM MVE code, and they're potentially
> generally useful for other targets, so move them to bitops.h.
> (We don't put them in bswap.h with the bswap* functions because
> they are implemented in terms of the rotate operations also
> defined in bitops.h, and including bitops.h from bswap.h seems
> better avoided.)
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   include/qemu/bitops.h   | 29 +++++++++++++++++++++++++++++
>   target/arm/sve_helper.c | 20 --------------------
>   2 files changed, 29 insertions(+), 20 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 16/55] target/arm: Implement MVE VREV16, VREV32, VREV64
  2021-06-07 16:57 ` [PATCH 16/55] target/arm: Implement MVE VREV16, VREV32, VREV64 Peter Maydell
@ 2021-06-08 22:23   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08 22:23 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> +static uint64_t mask_to_bytemask8(uint16_t mask)
> +{
> +    return mask_to_bytemask4(mask) |
> +        ((uint64_t)mask_to_bytemask4(mask >> 4) << 32);
> +}

Again, suggest to share the array from expand_pred_b.

> +DO_1OP(vrev16b, 2, uint16_t, H2, bswap16)
> +DO_1OP(vrev32b, 4, uint32_t, H4, bswap32)
> +DO_1OP(vrev32h, 4, uint32_t, H4, hswap32)
> +DO_1OP(vrev64b, 8, uint64_t, , bswap64)
> +DO_1OP(vrev64h, 8, uint64_t, , hswap64)
> +DO_1OP(vrev64w, 8, uint64_t, , wswap64)

I've started to wonder if we shouldn't add a no-op H8, just so we don't have 
the empty argument for checkpatch to complain about.

And in this particular case I suppose we could H##ESIZE, which would then 
negate my earlier suggestion for using sizeof.

> +    MVEGenOneOpFn *fns[] = {

static const, etc.


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 17/55] target/arm: Implement MVE VMVN (register)
  2021-06-07 16:57 ` [PATCH 17/55] target/arm: Implement MVE VMVN (register) Peter Maydell
@ 2021-06-08 22:27   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08 22:27 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> +DO_1OP(vmvn, 1, uint8_t, H1, DO_NOT)

This is a logical operation; you might as well perform in uint64_t.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 18/55] target/arm: Implement MVE VABS
  2021-06-07 16:57 ` [PATCH 18/55] target/arm: Implement MVE VABS Peter Maydell
@ 2021-06-08 22:34   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08 22:34 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> +DO_1OP(vfabsh, 2, uint16_t, H2, DO_FABS)
> +DO_1OP(vfabss, 4, uint32_t, H4, DO_FABS)

Could just as plausibly be done on uint64_t.

#define DO_FABSH(N)  ((N) & dup_const(MO_16, 0x7fff))
#define DO_FABSS(N)  ((N) & dup_const(MO_32, 0x7fffffff))

> +    MVEGenOneOpFn *fns[] = {

static const


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 19/55] target/arm: Implement MVE VNEG
  2021-06-07 16:57 ` [PATCH 19/55] target/arm: Implement MVE VNEG Peter Maydell
@ 2021-06-08 22:40   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08 22:40 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> +#define DO_NEG(N)    (-(N))
> +#define DO_FNEG(N)    ((N) ^ ~((__typeof(N))-1 >> 1))
> +
> +DO_1OP(vnegb, 1, int8_t, H1, DO_NEG)
> +DO_1OP(vnegh, 2, int16_t, H2, DO_NEG)
> +DO_1OP(vnegw, 4, int32_t, H4, DO_NEG)
> +
> +DO_1OP(vfnegh, 2, uint16_t, H2, DO_FNEG)
> +DO_1OP(vfnegs, 4, uint32_t, H4, DO_FNEG)

Similar comments to abs.  Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 20/55] target/arm: Implement MVE VDUP
  2021-06-07 16:57 ` [PATCH 20/55] target/arm: Implement MVE VDUP Peter Maydell
@ 2021-06-08 23:17   ` Richard Henderson
  2021-06-09 10:06     ` Peter Maydell
  0 siblings, 1 reply; 130+ messages in thread
From: Richard Henderson @ 2021-06-08 23:17 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> +#define DO_VDUP(OP, ESIZE, TYPE, H)                                     \
> +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t val)     \
> +    {                                                                   \
> +        TYPE *d = vd;                                                   \
> +        uint16_t mask = mve_element_mask(env);                          \
> +        unsigned e;                                                     \
> +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
> +            uint64_t bytemask = mask_to_bytemask##ESIZE(mask);          \
> +            d[H(e)] &= ~bytemask;                                       \
> +            d[H(e)] |= (val & bytemask);                                \
> +        }                                                               \
> +        mve_advance_vpt(env);                                           \
> +    }
> +
> +DO_VDUP(vdupb, 1, uint8_t, H1)
> +DO_VDUP(vduph, 2, uint16_t, H2)
> +DO_VDUP(vdupw, 4, uint32_t, H4)

Hmm.  I think the masking should be done at either uint32_t or uint64_t.  Doing 
it byte-by-byte is wasteful.

Whether you want to do the replication in tcg (I can export gen_dup_i32 from 
tcg-op-gvec.c) and have one helper, or do the replication here e.g.

static void do_vdup(CPUARMState *env, void *vd, uint64_t val);
void helper(mve_vdupb)(CPUARMState *env, void *vd, uint32_t val)
{
     do_vdup(env, vd, dup_const(MO_8, val));
}


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 21/55] target/arm: Implement MVE VAND, VBIC, VORR, VORN, VEOR
  2021-06-07 16:57 ` [PATCH 21/55] target/arm: Implement MVE VAND, VBIC, VORR, VORN, VEOR Peter Maydell
@ 2021-06-08 23:23   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08 23:23 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> +DO_2OP(vand, 1, uint8_t, H1, DO_AND)
> +DO_2OP(vbic, 1, uint8_t, H1, DO_BIC)
> +DO_2OP(vorr, 1, uint8_t, H1, DO_ORR)
> +DO_2OP(vorn, 1, uint8_t, H1, DO_ORN)
> +DO_2OP(veor, 1, uint8_t, H1, DO_EOR)

Again, logicals should use uint64_t.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 22/55] target/arm: Implement MVE VADD, VSUB, VMUL
  2021-06-07 16:57 ` [PATCH 22/55] target/arm: Implement MVE VADD, VSUB, VMUL Peter Maydell
@ 2021-06-08 23:25   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08 23:25 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> +#define DO_2OP(INSN, FN) \
> +    static bool trans_##INSN(DisasContext *s, arg_2op *a)       \
> +    {                                                           \
> +        MVEGenTwoOpFn *fns[] = {                                \

static const, otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 23/55] target/arm: Implement MVE VMULH
  2021-06-07 16:57 ` [PATCH 23/55] target/arm: Implement MVE VMULH Peter Maydell
@ 2021-06-08 23:29   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08 23:29 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> Implement the MVE VMULH insn, which performs a vector
> multiply and returns the high half of the result.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    |  7 +++++++
>   target/arm/mve.decode      |  3 +++
>   target/arm/mve_helper.c    | 26 ++++++++++++++++++++++++++
>   target/arm/translate-mve.c |  2 ++
>   4 files changed, 38 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 24/55] target/arm: Implement MVE VRMULH
  2021-06-07 16:57 ` [PATCH 24/55] target/arm: Implement MVE VRMULH Peter Maydell
@ 2021-06-08 23:33   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08 23:33 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> Implement the MVE VRMULH insn, which performs a rounding multiply
> and then returns the high half.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    |  7 +++++++
>   target/arm/mve.decode      |  3 +++
>   target/arm/mve_helper.c    | 22 ++++++++++++++++++++++
>   target/arm/translate-mve.c |  2 ++
>   4 files changed, 34 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 25/55] target/arm: Implement MVE VMAX, VMIN
  2021-06-07 16:57 ` [PATCH 25/55] target/arm: Implement MVE VMAX, VMIN Peter Maydell
@ 2021-06-08 23:35   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08 23:35 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> Implement the MVE VMAX and VMIN insns.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    | 14 ++++++++++++++
>   target/arm/mve.decode      |  5 +++++
>   target/arm/mve_helper.c    | 14 ++++++++++++++
>   target/arm/translate-mve.c |  4 ++++
>   4 files changed, 37 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 26/55] target/arm: Implement MVE VABD
  2021-06-07 16:57 ` [PATCH 26/55] target/arm: Implement MVE VABD Peter Maydell
@ 2021-06-08 23:39   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08 23:39 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> Implement the MVE VABD insn.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    | 7 +++++++
>   target/arm/mve.decode      | 3 +++
>   target/arm/mve_helper.c    | 5 +++++
>   target/arm/translate-mve.c | 2 ++
>   4 files changed, 17 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 27/55] target/arm: Implement MVE VHADD, VHSUB
  2021-06-07 16:57 ` [PATCH 27/55] target/arm: Implement MVE VHADD, VHSUB Peter Maydell
@ 2021-06-08 23:43   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08 23:43 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> Implement MVE VHADD and VHSUB insns, which perform an addition
> or subtraction and then halve the result.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    | 14 ++++++++++++++
>   target/arm/mve.decode      |  5 +++++
>   target/arm/mve_helper.c    | 25 +++++++++++++++++++++++++
>   target/arm/translate-mve.c |  4 ++++
>   4 files changed, 48 insertions(+)
> 
> diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
> index bfe2057592f..7b22990c3ba 100644
> --- a/target/arm/helper-mve.h
> +++ b/target/arm/helper-mve.h
> @@ -118,3 +118,17 @@ DEF_HELPER_FLAGS_4(mve_vabdsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
>   DEF_HELPER_FLAGS_4(mve_vabdub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
>   DEF_HELPER_FLAGS_4(mve_vabduh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
>   DEF_HELPER_FLAGS_4(mve_vabduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
> +
> +DEF_HELPER_FLAGS_4(mve_vhaddsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
> +DEF_HELPER_FLAGS_4(mve_vhaddsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
> +DEF_HELPER_FLAGS_4(mve_vhaddsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
> +DEF_HELPER_FLAGS_4(mve_vhaddub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
> +DEF_HELPER_FLAGS_4(mve_vhadduh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
> +DEF_HELPER_FLAGS_4(mve_vhadduw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
> +
> +DEF_HELPER_FLAGS_4(mve_vhsubsb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
> +DEF_HELPER_FLAGS_4(mve_vhsubsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
> +DEF_HELPER_FLAGS_4(mve_vhsubsw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
> +DEF_HELPER_FLAGS_4(mve_vhsubub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
> +DEF_HELPER_FLAGS_4(mve_vhsubuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
> +DEF_HELPER_FLAGS_4(mve_vhsubuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
> diff --git a/target/arm/mve.decode b/target/arm/mve.decode
> index 087d3db2a31..241d1c44c19 100644
> --- a/target/arm/mve.decode
> +++ b/target/arm/mve.decode
> @@ -96,6 +96,11 @@ VMIN_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 1 ... 0 @2op
>   VABD_S           111 0 1111 0 . .. ... 0 ... 0 0111 . 1 . 0 ... 0 @2op
>   VABD_U           111 1 1111 0 . .. ... 0 ... 0 0111 . 1 . 0 ... 0 @2op
>   
> +VHADD_S          111 0 1111 0 . .. ... 0 ... 0 0000 . 1 . 0 ... 0 @2op
> +VHADD_U          111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 0 ... 0 @2op
> +VHSUB_S          111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
> +VHSUB_U          111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
> +
>   # Vector miscellaneous
>   
>   VCLS             1111 1111 1 . 11 .. 00 ... 0 0100 01 . 0 ... 0 @1op
> diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
> index f026a9969d6..5982f6bf5eb 100644
> --- a/target/arm/mve_helper.c
> +++ b/target/arm/mve_helper.c
> @@ -415,3 +415,28 @@ DO_2OP_U(vminu, DO_MIN)
>   
>   DO_2OP_S(vabds, DO_ABD)
>   DO_2OP_U(vabdu, DO_ABD)
> +
> +static inline uint32_t do_vhadd_u(uint32_t n, uint32_t m)
> +{
> +    return ((uint64_t)n + m) >> 1;
> +}
> +
> +static inline int32_t do_vhadd_s(int32_t n, int32_t m)
> +{
> +    return ((int64_t)n + m) >> 1;
> +}
> +
> +static inline uint32_t do_vhsub_u(uint32_t n, uint32_t m)
> +{
> +    return ((uint64_t)n - m) >> 1;
> +}
> +
> +static inline int32_t do_vhsub_s(int32_t n, int32_t m)
> +{
> +    return ((int64_t)n - m) >> 1;
> +}

Use 64-bit inputs and you don't need to replicate these for signed/unsigned. 
But either way,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 28/55] target/arm: Implement MVE VMULL
  2021-06-07 16:57 ` [PATCH 28/55] target/arm: Implement MVE VMULL Peter Maydell
@ 2021-06-08 23:52   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-08 23:52 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> Implement the MVE VMULL insn, which multiplies two single
> width integer elements to produce a double width result.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    | 14 ++++++++++++++
>   target/arm/mve.decode      |  5 +++++
>   target/arm/mve_helper.c    | 35 +++++++++++++++++++++++++++++++++++
>   target/arm/translate-mve.c |  4 ++++
>   4 files changed, 58 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 29/55] target/arm: Implement MVE VMLALDAV
  2021-06-07 16:57 ` [PATCH 29/55] target/arm: Implement MVE VMLALDAV Peter Maydell
@ 2021-06-09  0:46   ` Richard Henderson
  2021-06-09  0:46   ` Richard Henderson
  1 sibling, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09  0:46 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> Implement the MVE VMLALDAV insn, which multiplies pairs of integer
> elements, accumulating them into a 64-bit result in a pair of
> general-purpose registers.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    |   8 +++
>   target/arm/translate.h     |  10 ++++
>   target/arm/mve.decode      |  15 ++++++
>   target/arm/mve_helper.c    |  32 ++++++++++++
>   target/arm/translate-mve.c | 100 +++++++++++++++++++++++++++++++++++++
>   5 files changed, 165 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 29/55] target/arm: Implement MVE VMLALDAV
  2021-06-07 16:57 ` [PATCH 29/55] target/arm: Implement MVE VMLALDAV Peter Maydell
  2021-06-09  0:46   ` Richard Henderson
@ 2021-06-09  0:46   ` Richard Henderson
  1 sibling, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09  0:46 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> +static bool trans_VMLALDAV_S(DisasContext *s, arg_vmlaldav *a)
> +{
> +    MVEGenDualAccOpFn *fns[4][2] = {

static const, otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 30/55] target/arm: Implement MVE VMLSLDAV
  2021-06-07 16:57 ` [PATCH 30/55] target/arm: Implement MVE VMLSLDAV Peter Maydell
@ 2021-06-09  0:47   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09  0:47 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> +static bool trans_VMLSLDAV(DisasContext *s, arg_vmlaldav *a)
> +{
> +    MVEGenDualAccOpFn *fns[4][2] = {

static const, otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 31/55] include/qemu/int128.h: Add function to create Int128 from int64_t
  2021-06-07 16:57 ` [PATCH 31/55] include/qemu/int128.h: Add function to create Int128 from int64_t Peter Maydell
  2021-06-08  6:45   ` Philippe Mathieu-Daudé
@ 2021-06-09  0:51   ` Richard Henderson
  1 sibling, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09  0:51 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> int128_make64() creates an Int128 from an unsigned 64 bit value; add
> a function int128_makes64() creating an Int128 from a signed 64 bit
> value.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   include/qemu/int128.h | 10 ++++++++++
>   1 file changed, 10 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 32/55] target/arm: Implement MVE VRMLALDAVH, VRMLSLDAVH
  2021-06-07 16:57 ` [PATCH 32/55] target/arm: Implement MVE VRMLALDAVH, VRMLSLDAVH Peter Maydell
@ 2021-06-09  1:05   ` Richard Henderson
  2021-06-14 10:19     ` Peter Maydell
  0 siblings, 1 reply; 130+ messages in thread
From: Richard Henderson @ 2021-06-09  1:05 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> +#define DO_LDAVH(OP, ESIZE, TYPE, H, XCHG, EVENACC, ODDACC, TO128)      \
> +    uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,         \
> +                                    void *vm, uint64_t a)               \
> +    {                                                                   \
> +        uint16_t mask = mve_element_mask(env);                          \
> +        unsigned e;                                                     \
> +        TYPE *n = vn, *m = vm;                                          \
> +        Int128 acc = TO128(a);                                          \

This seems to miss the << 8.

Which suggests that the whole thing can be done without Int128:

> +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
> +            if (mask & 1) {                                             \
> +                if (e & 1) {                                            \
> +                    acc = ODDACC(acc, TO128(n[H(e - 1 * XCHG)] * m[H(e)])); \

   tmp = n * m;
   tmp = (tmp >> 8) + ((tmp >> 7) & 1);
   acc ODDACC tmp;

> +static bool trans_VRMLALDAVH_S(DisasContext *s, arg_vmlaldav *a)
> +{
> +    MVEGenDualAccOpFn *fns[] = {

static const, etc.


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 11/55] target/arm: Implement MVE VLDR/VSTR (non-widening forms)
  2021-06-08 21:33   ` Richard Henderson
  2021-06-08 21:43     ` Richard Henderson
@ 2021-06-09 10:01     ` Peter Maydell
  2021-06-09 17:09       ` Richard Henderson
  2021-06-10 14:01     ` Peter Maydell
  2 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-09 10:01 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Tue, 8 Jun 2021 at 22:33, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 6/7/21 9:57 AM, Peter Maydell wrote:
> > +static uint16_t mve_element_mask(CPUARMState *env)
> > +{
> > +    /*
> > +     * Return the mask of which elements in the MVE vector should be
> > +     * updated. This is a combination of multiple things:
> > +     *  (1) by default, we update every lane in the vector
> > +     *  (2) VPT predication stores its state in the VPR register;
> > +     *  (3) low-overhead-branch tail predication will mask out part
> > +     *      the vector on the final iteration of the loop
> > +     *  (4) if EPSR.ECI is set then we must execute only some beats
> > +     *      of the insn
> > +     * We combine all these into a 16-bit result with the same semantics
> > +     * as VPR.P0: 0 to mask the lane, 1 if it is active.
> > +     * 8-bit vector ops will look at all bits of the result;
> > +     * 16-bit ops will look at bits 0, 2, 4, ...;
> > +     * 32-bit ops will look at bits 0, 4, 8 and 12.
> > +     * Compare pseudocode GetCurInstrBeat(), though that only returns
> > +     * the 4-bit slice of the mask corresponding to a single beat.
> > +     */
> > +    uint16_t mask = extract32(env->v7m.vpr, R_V7M_VPR_P0_SHIFT,
> > +                              R_V7M_VPR_P0_LENGTH);
>
> Any reason you're not using FIELD_EX32 and and FIELD_DP32 so far in this file?

Just habit, really, I think.

> > +#define DO_VLDR(OP, ESIZE, LDTYPE, TYPE, H)                             \
> > +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr)    \
> > +    {                                                                   \
> > +        TYPE *d = vd;                                                   \
> > +        uint16_t mask = mve_element_mask(env);                          \
> > +        unsigned b, e;                                                  \
>
> esize is redundant with sizeof(type); perhaps just make it a local variable?
>
> > diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
> > index c54d5cb7305..e8bb2372ad9 100644
> > --- a/target/arm/translate-mve.c
> > +++ b/target/arm/translate-mve.c
> > @@ -1,6 +1,6 @@
> >   /*
> >    *  ARM translation: M-profile MVE instructions
> > -
> > + *
> >    *  Copyright (c) 2021 Linaro, Ltd.
>
> Is this just diff silliness?  I see that it has decided that helper-mve.h is a
> rename from translate-mve.c...

Not sure. I fixed at least one similar issue before sending, I guess
I missed this one.

> > +static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
> > +{
> > +    TCGv_i32 addr;
> > +    uint32_t offset;
> > +    TCGv_ptr qreg;
> > +
> > +    if (!dc_isar_feature(aa32_mve, s)) {
> > +        return false;
> > +    }
> > +
> > +    if (a->qd > 7 || !fn) {
> > +        return false;
> > +    }
>
> It's a funny old decode,
>
>    if D then UNDEFINED.
>    d = D:Qd,
>
> Is the spec forward looking to more than 7 Q registers?
> It's tempting to just drop the D:Qd from the decode...

I don't know, but looking at the decode it certainly seems
like the door is being left open to Q8..Q15. Other signs of
this include the existence of the VFPSmallRegisterBank()
function and the way that VLLDM and VLSTM have T2 encodings
whose only difference from the T1 encodings is that you can
specify registers up to D31. Decoding D:Qd and then doing the
range check seemed more in line with the spirit of this, though
of course leaving the D=1 UNDEF to decodetree works too.
(Some insns really do only use 3 bit register fields without
the extra D bit, so if we left all the fields 3 bit and later needed
to handle Q8..Q15 we'd have to go through everything to work out
which type of insn it was.)

> > +static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
> > +{
> > +    MVEGenLdStFn *ldfns[] = {
>
> static MVEGenLdStFn * const ldfns
>
> > +    MVEGenLdStFn *stfns[] = {
>
> Likewise, though...
>
> > +    return do_ldst(s, a, a->l ? ldfns[a->size] : stfns[a->size]);
>
> ... just put em together into a two-dimensional array, with a->l as the second
> index?

Yeah. (I was being a bit lazy because I can never remember which
way round the initializers go in a 2D array :-)

-- PMM


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 20/55] target/arm: Implement MVE VDUP
  2021-06-08 23:17   ` Richard Henderson
@ 2021-06-09 10:06     ` Peter Maydell
  2021-06-09 17:16       ` Richard Henderson
  0 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-09 10:06 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Wed, 9 Jun 2021 at 00:17, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 6/7/21 9:57 AM, Peter Maydell wrote:
> > +#define DO_VDUP(OP, ESIZE, TYPE, H)                                     \
> > +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t val)     \
> > +    {                                                                   \
> > +        TYPE *d = vd;                                                   \
> > +        uint16_t mask = mve_element_mask(env);                          \
> > +        unsigned e;                                                     \
> > +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
> > +            uint64_t bytemask = mask_to_bytemask##ESIZE(mask);          \
> > +            d[H(e)] &= ~bytemask;                                       \
> > +            d[H(e)] |= (val & bytemask);                                \
> > +        }                                                               \
> > +        mve_advance_vpt(env);                                           \
> > +    }
> > +
> > +DO_VDUP(vdupb, 1, uint8_t, H1)
> > +DO_VDUP(vduph, 2, uint16_t, H2)
> > +DO_VDUP(vdupw, 4, uint32_t, H4)
>
> Hmm.  I think the masking should be done at either uint32_t or uint64_t.  Doing
> it byte-by-byte is wasteful.

Mmm. I think some of this structure is holdover from an initial
misinterpretation
of the spec that all these ops looked at the predicate bit for the LS byte
of the element to see if the entire element was acted upon, in which case
you do need to work element-by-element with the right size. (This is actually
true for some operations, but mostly the predicate bits do bytewise masking
and can give you a partial chunk of a result element, as here.)

-- PMM


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 00/55] target/arm: First slice of MVE implementation
  2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
                   ` (54 preceding siblings ...)
  2021-06-07 16:58 ` [PATCH 55/55] target/arm: Make VMOV scalar <-> gpreg beatwise for MVE Peter Maydell
@ 2021-06-09 14:33 ` no-reply
  55 siblings, 0 replies; 130+ messages in thread
From: no-reply @ 2021-06-09 14:33 UTC (permalink / raw)
  To: peter.maydell; +Cc: qemu-arm, richard.henderson, qemu-devel

Patchew URL: https://patchew.org/QEMU/20210607165821.9892-1-peter.maydell@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210607165821.9892-1-peter.maydell@linaro.org
Subject: [PATCH 00/55] target/arm: First slice of MVE implementation

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
e6bde0c target/arm: Make VMOV scalar <-> gpreg beatwise for MVE
46ab47a target/arm: Implement MVE VADDV
8c6a715 target/arm: Implement MVE VHCADD
ee4a883 target/arm: Implement MVE VCADD
d6d031c target/arm: Implement MVE VADC, VSBC
f2abf78 target/arm: Implement MVE VRHADD
f59e727 target/arm: Implement MVE VQDMULL (vector)
b5a1e75 target/arm: Implement MVE VQDMLSDH and VQRDMLSDH
2a85a42 target/arm: Implement MVE VQDMLADH and VQRDMLADH
c631126 target/arm: Implement MVE VRSHL
96c9251 target/arm: Implement MVE VSHL insn
6617f9d target/arm: Implement MVE VQRSHL
ea1d028 target/arm: Implement MVE VQSHL (vector)
7f5704f target/arm: Implement MVE VQADD, VQSUB (vector)
fe87207 target/arm: Implement MVE VQDMULH, VQRDMULH (vector)
fae9d05 target/arm: Implement MVE VQDMULL scalar
3dde42e target/arm: Implement MVE VQDMULH and VQRDMULH (scalar)
ad46ffc target/arm: Implement MVE VQADD and VQSUB
eb22856 target/arm: Implement MVE VPST
110b41d target/arm: Implement MVE VBRSR
b92e1c2 target/arm: Implement MVE VHADD, VHSUB (scalar)
bd4a331 target/arm: Implement MVE VSUB, VMUL (scalar)
ddbfc63 target/arm: Implement MVE VADD (scalar)
1722cbe target/arm: Implement MVE VRMLALDAVH, VRMLSLDAVH
53fa7e7 include/qemu/int128.h: Add function to create Int128 from int64_t
6ba3e6f target/arm: Implement MVE VMLSLDAV
34c471e target/arm: Implement MVE VMLALDAV
5faff7d target/arm: Implement MVE VMULL
7c4e6a2 target/arm: Implement MVE VHADD, VHSUB
fe67781 target/arm: Implement MVE VABD
1f8942c target/arm: Implement MVE VMAX, VMIN
0097a91 target/arm: Implement MVE VRMULH
8649950 target/arm: Implement MVE VMULH
7af4e69 target/arm: Implement MVE VADD, VSUB, VMUL
ac5934b target/arm: Implement MVE VAND, VBIC, VORR, VORN, VEOR
ff0cfd3 target/arm: Implement MVE VDUP
2f66a74 target/arm: Implement MVE VNEG
0482da4 target/arm: Implement MVE VABS
60d8fd8 target/arm: Implement MVE VMVN (register)
1eaba2f target/arm: Implement MVE VREV16, VREV32, VREV64
ace7aae bitops.h: Provide hswap32(), hswap64(), wswap64() swapping operations
d6bed53 target/arm: Implement MVE VCLS
22d128d target/arm: Implement MVE VCLZ
c1690b6 target/arm: Implement widening/narrowing MVE VLDR/VSTR insns
a08b6b5 target/arm: Implement MVE VLDR/VSTR (non-widening forms)
3679c9d target/arm: Add framework for MVE decode
22dd9b5 target/arm: Implement MVE LETP insn
b8d39cd target/arm: Implement MVE DLSTP
63e6f6d target/arm: Implement MVE WLSTP insn
8261501 target/arm: Implement MVE LCTP
f77f199 target/arm: Let vfp_access_check() handle late NOCP checks
192982d target/arm: Add handling for PSR.ECI/ICI
c578a8f target/arm: Handle VPR semantics in existing code
7620752 target/arm: Enable FPSCR.QC bit for MVE
df2cbf9 tcg: Introduce tcg_remove_ops_after

=== OUTPUT BEGIN ===
1/55 Checking commit df2cbf927c33 (tcg: Introduce tcg_remove_ops_after)
2/55 Checking commit 7620752e8687 (target/arm: Enable FPSCR.QC bit for MVE)
3/55 Checking commit c578a8fc03b3 (target/arm: Handle VPR semantics in existing code)
4/55 Checking commit 192982d41d0b (target/arm: Add handling for PSR.ECI/ICI)
5/55 Checking commit f77f199bec23 (target/arm: Let vfp_access_check() handle late NOCP checks)
6/55 Checking commit 8261501d3a71 (target/arm: Implement MVE LCTP)
7/55 Checking commit 63e6f6d469dd (target/arm: Implement MVE WLSTP insn)
8/55 Checking commit b8d39cd81fc0 (target/arm: Implement MVE DLSTP)
9/55 Checking commit 22dd9b506bd5 (target/arm: Implement MVE LETP insn)
10/55 Checking commit 3679c9dc0348 (target/arm: Add framework for MVE decode)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#41: 
new file mode 100644

total: 0 errors, 1 warnings, 77 lines checked

Patch 10/55 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
11/55 Checking commit a08b6b59eba3 (target/arm: Implement MVE VLDR/VSTR (non-widening forms))
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#29: 
new file mode 100644

WARNING: Block comments use a leading /* on a separate line
#285: FILE: target/arm/mve_helper.c:150:
+        /*                                                              \

total: 0 errors, 2 warnings, 396 lines checked

Patch 11/55 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
12/55 Checking commit c1690b678167 (target/arm: Implement widening/narrowing MVE VLDR/VSTR insns)
13/55 Checking commit 22d128d956e2 (target/arm: Implement MVE VCLZ)
ERROR: spaces required around that '*' (ctx:WxV)
#139: FILE: target/arm/translate-mve.c:172:
+static bool do_1op(DisasContext *s, arg_1op *a, MVEGenOneOpFn fn)
                                             ^

ERROR: spaces required around that '*' (ctx:WxV)
#168: FILE: target/arm/translate-mve.c:201:
+    static bool trans_##INSN(DisasContext *s, arg_1op *a)       \
                                                       ^

total: 2 errors, 0 warnings, 134 lines checked

Patch 13/55 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

14/55 Checking commit d6bed5354490 (target/arm: Implement MVE VCLS)
15/55 Checking commit ace7aae7c151 (bitops.h: Provide hswap32(), hswap64(), wswap64() swapping operations)
16/55 Checking commit 1eaba2f5eb33 (target/arm: Implement MVE VREV16, VREV32, VREV64)
ERROR: spaces required around that '*' (ctx:WxV)
#82: FILE: target/arm/translate-mve.c:215:
+static bool trans_VREV16(DisasContext *s, arg_1op *a)
                                                   ^

total: 1 errors, 0 warnings, 75 lines checked

Patch 16/55 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

17/55 Checking commit 60d8fd8cecf2 (target/arm: Implement MVE VMVN (register))
18/55 Checking commit 0482da461163 (target/arm: Implement MVE VABS)
ERROR: spaces required around that '-' (ctx:VxV)
#53: FILE: target/arm/mve_helper.c:273:
+#define DO_FABS(N)    (N & ((__typeof(N))-1 >> 1))
                                          ^

total: 1 errors, 0 warnings, 52 lines checked

Patch 18/55 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

19/55 Checking commit 2f66a740d4f3 (target/arm: Implement MVE VNEG)
ERROR: spaces required around that '-' (ctx:VxV)
#52: FILE: target/arm/mve_helper.c:283:
+#define DO_FNEG(N)    ((N) ^ ~((__typeof(N))-1 >> 1))
                                             ^

total: 1 errors, 0 warnings, 51 lines checked

Patch 19/55 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

20/55 Checking commit ff0cfd392e9a (target/arm: Implement MVE VDUP)
ERROR: spaces required around that '*' (ctx:WxV)
#97: FILE: target/arm/translate-mve.c:172:
+static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
                                                  ^

total: 1 errors, 0 warnings, 102 lines checked

Patch 20/55 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

21/55 Checking commit ac5934badd3b (target/arm: Implement MVE VAND, VBIC, VORR, VORN, VEOR)
22/55 Checking commit 7af4e691edf5 (target/arm: Implement MVE VADD, VSUB, VMUL)
23/55 Checking commit 864995039f4b (target/arm: Implement MVE VMULH)
24/55 Checking commit 0097a91b701e (target/arm: Implement MVE VRMULH)
25/55 Checking commit 1f8942c64026 (target/arm: Implement MVE VMAX, VMIN)
26/55 Checking commit fe67781e01b8 (target/arm: Implement MVE VABD)
27/55 Checking commit 7c4e6a230416 (target/arm: Implement MVE VHADD, VHSUB)
28/55 Checking commit 5faff7d64e24 (target/arm: Implement MVE VMULL)
WARNING: line over 80 characters
#71: FILE: target/arm/mve_helper.c:344:
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn, void *vm) \

total: 0 errors, 1 warnings, 82 lines checked

Patch 28/55 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
29/55 Checking commit 34c471ee2b2a (target/arm: Implement MVE VMLALDAV)
ERROR: space prohibited between function name and open parenthesis '('
#83: FILE: target/arm/mve_helper.c:493:
+                    a ODDACC (int64_t)n[H(e - 1 * XCHG)] * m[H(e)];     \

ERROR: space prohibited between function name and open parenthesis '('
#85: FILE: target/arm/mve_helper.c:495:
+                    a EVENACC (int64_t)n[H(e + 1 * XCHG)] * m[H(e)];    \

ERROR: spaces required around that '+=' (ctx:WxB)
#93: FILE: target/arm/mve_helper.c:503:
+DO_LDAV(vmlaldavsh, 2, int16_t, H2, false, +=, +=)
                                                ^

ERROR: spaces required around that '+=' (ctx:WxB)
#94: FILE: target/arm/mve_helper.c:504:
+DO_LDAV(vmlaldavxsh, 2, int16_t, H2, true, +=, +=)
                                                ^

ERROR: spaces required around that '+=' (ctx:WxB)
#95: FILE: target/arm/mve_helper.c:505:
+DO_LDAV(vmlaldavsw, 4, int32_t, H4, false, +=, +=)
                                                ^

ERROR: spaces required around that '+=' (ctx:WxB)
#96: FILE: target/arm/mve_helper.c:506:
+DO_LDAV(vmlaldavxsw, 4, int32_t, H4, true, +=, +=)
                                                ^

ERROR: spaces required around that '+=' (ctx:WxB)
#98: FILE: target/arm/mve_helper.c:508:
+DO_LDAV(vmlaldavuh, 2, uint16_t, H2, false, +=, +=)
                                                 ^

ERROR: spaces required around that '+=' (ctx:WxB)
#99: FILE: target/arm/mve_helper.c:509:
+DO_LDAV(vmlaldavuw, 4, uint32_t, H4, false, +=, +=)
                                                 ^

WARNING: line over 80 characters
#108: FILE: target/arm/translate-mve.c:34:
+typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);

ERROR: spaces required around that '*' (ctx:WxV)
#140: FILE: target/arm/translate-mve.c:418:
+static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
                                                            ^

total: 9 errors, 1 warnings, 201 lines checked

Patch 29/55 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

30/55 Checking commit 6ba3e6f73795 (target/arm: Implement MVE VMLSLDAV)
ERROR: spaces required around that '-=' (ctx:WxB)
#52: FILE: target/arm/mve_helper.c:511:
+DO_LDAV(vmlsldavsh, 2, int16_t, H2, false, +=, -=)
                                                ^

ERROR: spaces required around that '-=' (ctx:WxB)
#53: FILE: target/arm/mve_helper.c:512:
+DO_LDAV(vmlsldavxsh, 2, int16_t, H2, true, +=, -=)
                                                ^

ERROR: spaces required around that '-=' (ctx:WxB)
#54: FILE: target/arm/mve_helper.c:513:
+DO_LDAV(vmlsldavsw, 4, int32_t, H4, false, +=, -=)
                                                ^

ERROR: spaces required around that '-=' (ctx:WxB)
#55: FILE: target/arm/mve_helper.c:514:
+DO_LDAV(vmlsldavxsw, 4, int32_t, H4, true, +=, -=)
                                                ^

total: 4 errors, 0 warnings, 35 lines checked

Patch 30/55 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

31/55 Checking commit 53fa7e73f80e (include/qemu/int128.h: Add function to create Int128 from int64_t)
32/55 Checking commit 1722cbe910a5 (target/arm: Implement MVE VRMLALDAVH, VRMLSLDAVH)
WARNING: line over 80 characters
#99: FILE: target/arm/mve_helper.c:543:
+DO_LDAVH(vrmlaldavhsw, 4, int32_t, H4, false, int128_add, int128_add, int128_makes64)

WARNING: line over 80 characters
#100: FILE: target/arm/mve_helper.c:544:
+DO_LDAVH(vrmlaldavhxsw, 4, int32_t, H4, true, int128_add, int128_add, int128_makes64)

WARNING: line over 80 characters
#102: FILE: target/arm/mve_helper.c:546:
+DO_LDAVH(vrmlaldavhuw, 4, uint32_t, H4, false, int128_add, int128_add, int128_make64)

WARNING: line over 80 characters
#104: FILE: target/arm/mve_helper.c:548:
+DO_LDAVH(vrmlsldavhsw, 4, int32_t, H4, false, int128_add, int128_sub, int128_makes64)

WARNING: line over 80 characters
#105: FILE: target/arm/mve_helper.c:549:
+DO_LDAVH(vrmlsldavhxsw, 4, int32_t, H4, true, int128_add, int128_sub, int128_makes64)

total: 0 errors, 5 warnings, 96 lines checked

Patch 32/55 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
33/55 Checking commit ddbfc63b7af0 (target/arm: Implement MVE VADD (scalar))
ERROR: spaces required around that '*' (ctx:WxV)
#115: FILE: target/arm/translate-mve.c:419:
+static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
                                                        ^

ERROR: spaces required around that '*' (ctx:WxV)
#150: FILE: target/arm/translate-mve.c:454:
+    static bool trans_##INSN(DisasContext *s, arg_2scalar *a)   \
                                                           ^

total: 2 errors, 0 warnings, 124 lines checked

Patch 33/55 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

34/55 Checking commit bd4a331cf4ad (target/arm: Implement MVE VSUB, VMUL (scalar))
35/55 Checking commit b92e1c20de21 (target/arm: Implement MVE VHADD, VHSUB (scalar))
36/55 Checking commit 110b41d919be (target/arm: Implement MVE VBRSR)
37/55 Checking commit eb2285611568 (target/arm: Implement MVE VPST)
38/55 Checking commit ad46ffc66649 (target/arm: Implement MVE VQADD and VQSUB)
39/55 Checking commit 3dde42e4527d (target/arm: Implement MVE VQDMULH and VQRDMULH (scalar))
WARNING: line over 80 characters
#28: FILE: target/arm/helper-mve.h:194:
+DEF_HELPER_FLAGS_4(mve_vqdmulh_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)

WARNING: line over 80 characters
#29: FILE: target/arm/helper-mve.h:195:
+DEF_HELPER_FLAGS_4(mve_vqdmulh_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)

WARNING: line over 80 characters
#30: FILE: target/arm/helper-mve.h:196:
+DEF_HELPER_FLAGS_4(mve_vqdmulh_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)

WARNING: line over 80 characters
#32: FILE: target/arm/helper-mve.h:198:
+DEF_HELPER_FLAGS_4(mve_vqrdmulh_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)

WARNING: line over 80 characters
#33: FILE: target/arm/helper-mve.h:199:
+DEF_HELPER_FLAGS_4(mve_vqrdmulh_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)

WARNING: line over 80 characters
#34: FILE: target/arm/helper-mve.h:200:
+DEF_HELPER_FLAGS_4(mve_vqrdmulh_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)

total: 0 errors, 6 warnings, 68 lines checked

Patch 39/55 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
40/55 Checking commit fae9d0559093 (target/arm: Implement MVE VQDMULL scalar)
WARNING: line over 80 characters
#31: FILE: target/arm/helper-mve.h:206:
+DEF_HELPER_FLAGS_4(mve_vqdmullb_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)

WARNING: line over 80 characters
#32: FILE: target/arm/helper-mve.h:207:
+DEF_HELPER_FLAGS_4(mve_vqdmullb_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)

WARNING: line over 80 characters
#33: FILE: target/arm/helper-mve.h:208:
+DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)

WARNING: line over 80 characters
#34: FILE: target/arm/helper-mve.h:209:
+DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)

ERROR: spaces required around that '*' (ctx:WxV)
#176: FILE: target/arm/translate-mve.c:480:
+static bool trans_VQDMULLB_scalar(DisasContext *s, arg_2scalar *a)
                                                                ^

total: 1 errors, 4 warnings, 164 lines checked

Patch 40/55 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

41/55 Checking commit fe872077b91c (target/arm: Implement MVE VQDMULH, VQRDMULH (vector))
WARNING: line over 80 characters
#60: FILE: target/arm/mve_helper.c:361:
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn, void *vm) \

total: 0 errors, 1 warnings, 70 lines checked

Patch 41/55 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
42/55 Checking commit 7f5704ffde05 (target/arm: Implement MVE VQADD, VQSUB (vector))
43/55 Checking commit ea1d0281b67d (target/arm: Implement MVE VQSHL (vector))
44/55 Checking commit 6617f9dafe1e (target/arm: Implement MVE VQRSHL)
45/55 Checking commit 96c9251bdef5 (target/arm: Implement MVE VSHL insn)
46/55 Checking commit c6311264aa0c (target/arm: Implement MVE VRSHL)
47/55 Checking commit 2a85a4276f80 (target/arm: Implement MVE VQDMLADH and VQRDMLADH)
ERROR: "foo * bar" should be "foo *bar"
#106: FILE: target/arm/mve_helper.c:868:
+    int64_t r = ((int64_t)a * b + (int64_t)c * d) * 2 + (round << 7);

ERROR: "foo * bar" should be "foo *bar"
#113: FILE: target/arm/mve_helper.c:875:
+    int64_t r = ((int64_t)a * b + (int64_t)c * d) * 2 + (round << 15);

ERROR: "foo * bar" should be "foo *bar"
#120: FILE: target/arm/mve_helper.c:882:
+    int64_t m1 = (int64_t)a * b;

total: 3 errors, 0 warnings, 136 lines checked

Patch 47/55 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

48/55 Checking commit b5a1e7581977 (target/arm: Implement MVE VQDMLSDH and VQRDMLSDH)
ERROR: "foo * bar" should be "foo *bar"
#74: FILE: target/arm/mve_helper.c:909:
+    int64_t r = ((int64_t)a * b - (int64_t)c * d) * 2 + (round << 7);

ERROR: "foo * bar" should be "foo *bar"
#81: FILE: target/arm/mve_helper.c:916:
+    int64_t r = ((int64_t)a * b - (int64_t)c * d) * 2 + (round << 15);

ERROR: "foo * bar" should be "foo *bar"
#88: FILE: target/arm/mve_helper.c:923:
+    int64_t m1 = (int64_t)a * b;

total: 3 errors, 0 warnings, 99 lines checked

Patch 48/55 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

49/55 Checking commit f59e7278e2a7 (target/arm: Implement MVE VQDMULL (vector))
50/55 Checking commit f2abf7866196 (target/arm: Implement MVE VRHADD)
51/55 Checking commit d6d031c60c1d (target/arm: Implement MVE VADC, VSBC)
WARNING: Block comments use a leading /* on a separate line
#83: FILE: target/arm/mve_helper.c:591:
+        /* If we do no additions at all the flags are preserved */      \

ERROR: space prohibited before that close parenthesis ')'
#102: FILE: target/arm/mve_helper.c:610:
+DO_VADC(vadc, )

WARNING: line over 80 characters
#116: FILE: target/arm/translate-mve.c:36:
+typedef void MVEGenADCFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);

total: 1 errors, 2 warnings, 147 lines checked

Patch 51/55 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

52/55 Checking commit ee4a8833b853 (target/arm: Implement MVE VCADD)
WARNING: line over 80 characters
#70: FILE: target/arm/mve_helper.c:614:
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn, void *vm) \

WARNING: Block comments use a leading /* on a separate line
#76: FILE: target/arm/mve_helper.c:620:
+        /* Calculate all results first to avoid overwriting inputs */   \

total: 0 errors, 2 warnings, 77 lines checked

Patch 52/55 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
53/55 Checking commit 8c6a715db76c (target/arm: Implement MVE VHCADD)
54/55 Checking commit 46ab47ae5365 (target/arm: Implement MVE VADDV)
55/55 Checking commit e6bde0cf767c (target/arm: Make VMOV scalar <-> gpreg beatwise for MVE)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20210607165821.9892-1-peter.maydell@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 11/55] target/arm: Implement MVE VLDR/VSTR (non-widening forms)
  2021-06-09 10:01     ` Peter Maydell
@ 2021-06-09 17:09       ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 17:09 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-arm, QEMU Developers

On 6/9/21 3:01 AM, Peter Maydell wrote:
>> Is the spec forward looking to more than 7 Q registers?
>> It's tempting to just drop the D:Qd from the decode...
> 
> I don't know, but looking at the decode it certainly seems
> like the door is being left open to Q8..Q15. Other signs of
> this include the existence of the VFPSmallRegisterBank()
> function and the way that VLLDM and VLSTM have T2 encodings
> whose only difference from the T1 encodings is that you can
> specify registers up to D31. Decoding D:Qd and then doing the
> range check seemed more in line with the spirit of this...

I agree.  We should leave the decode in place.

Do you think it's worthwhile adding a single hook for the register range check 
now?  E.g.

   if (!mve_check_qreg_bank(s, a->qd | a->qn | a->qm)) {
       return false;
   }

static bool mve_check_qreg_bank(DisasContext *s, int qmask)
{
     /*
      * See VFPSmallRegisterBank, always true for armv8.1-m.
      * So only Q0...Q7 are supported.
      */
     return qmask < 8;
}

And, as needed, another one for dregs.


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 20/55] target/arm: Implement MVE VDUP
  2021-06-09 10:06     ` Peter Maydell
@ 2021-06-09 17:16       ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 17:16 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-arm, QEMU Developers

On 6/9/21 3:06 AM, Peter Maydell wrote:
> Mmm. I think some of this structure is holdover from an initial
> misinterpretation
> of the spec that all these ops looked at the predicate bit for the LS byte
> of the element to see if the entire element was acted upon, in which case
> you do need to work element-by-element with the right size. (This is actually
> true for some operations, but mostly the predicate bits do bytewise masking
> and can give you a partial chunk of a result element, as here.)

Even if the operation did look at specific predicate bits, that simply puts it 
in line with SVE, which is quite happy with expand_pred_[bhsd].


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 33/55] target/arm: Implement MVE VADD (scalar)
  2021-06-07 16:57 ` [PATCH 33/55] target/arm: Implement MVE VADD (scalar) Peter Maydell
@ 2021-06-09 17:58   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 17:58 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:57 AM, Peter Maydell wrote:
> Implement the scalar form of the MVE VADD insn. This takes the
> scalar operand from a general purpose register.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    |  4 ++++
>   target/arm/mve.decode      |  7 ++++++
>   target/arm/mve_helper.c    | 25 +++++++++++++++++++
>   target/arm/translate-mve.c | 49 ++++++++++++++++++++++++++++++++++++++
>   4 files changed, 85 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

> +        MVEGenTwoOpScalarFn *fns[] = {                          \

static const, which I will quit mentioning.


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 34/55] target/arm: Implement MVE VSUB, VMUL (scalar)
  2021-06-07 16:58 ` [PATCH 34/55] target/arm: Implement MVE VSUB, VMUL (scalar) Peter Maydell
@ 2021-06-09 18:00   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 18:00 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> Implement the scalar forms of the MVE VSUB and VMUL insns.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    | 8 ++++++++
>   target/arm/mve.decode      | 2 ++
>   target/arm/mve_helper.c    | 2 ++
>   target/arm/translate-mve.c | 2 ++
>   4 files changed, 14 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 35/55] target/arm: Implement MVE VHADD, VHSUB (scalar)
  2021-06-07 16:58 ` [PATCH 35/55] target/arm: Implement MVE VHADD, VHSUB (scalar) Peter Maydell
@ 2021-06-09 18:02   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 18:02 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> Implement the scalar variants of the MVE VHADD and VHSUB insns.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    | 16 ++++++++++++++++
>   target/arm/mve.decode      |  4 ++++
>   target/arm/mve_helper.c    |  8 ++++++++
>   target/arm/translate-mve.c |  4 ++++
>   4 files changed, 32 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 36/55] target/arm: Implement MVE VBRSR
  2021-06-07 16:58 ` [PATCH 36/55] target/arm: Implement MVE VBRSR Peter Maydell
@ 2021-06-09 18:08   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 18:08 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> Implement the MVE VBRSR insn, which reverses a specified
> number of bits in each element, setting the rest to zero.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    |  4 ++++
>   target/arm/mve.decode      |  1 +
>   target/arm/mve_helper.c    | 43 ++++++++++++++++++++++++++++++++++++++
>   target/arm/translate-mve.c |  1 +
>   4 files changed, 49 insertions(+)

What an interesting operation combination.  I wonder what dsp loop kernel it 
goes with...

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 37/55] target/arm: Implement MVE VPST
  2021-06-07 16:58 ` [PATCH 37/55] target/arm: Implement MVE VPST Peter Maydell
@ 2021-06-09 18:23   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 18:23 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> Implement the MVE VPST insn, which sets the predicate mask
> fields in the VPR to the immediate value encoded in the insn.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/mve.decode      |  4 +++
>   target/arm/translate-mve.c | 59 ++++++++++++++++++++++++++++++++++++++
>   2 files changed, 63 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 38/55] target/arm: Implement MVE VQADD and VQSUB
  2021-06-07 16:58 ` [PATCH 38/55] target/arm: Implement MVE VQADD and VQSUB Peter Maydell
@ 2021-06-09 18:46   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 18:46 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> +#define DO_2OP_SAT_SCALAR(OP, ESIZE, TYPE, H, FN)                       \
> +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
> +                                uint32_t rm)                            \
> +    {                                                                   \
> +        TYPE *d = vd, *n = vn;                                          \
> +        TYPE m = rm;                                                    \
> +        uint16_t mask = mve_element_mask(env);                          \
> +        unsigned e;                                                     \
> +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
> +            bool sat = false;                                           \
> +            TYPE r = FN(n[H(e)], m, &sat);                              \
> +            uint64_t bytemask = mask_to_bytemask##ESIZE(mask);          \
> +            d[H(e)] &= ~bytemask;                                       \
> +            d[H(e)] |= (r & bytemask);                                  \
> +            if (sat && (mask & 1)) {                                    \
> +                env->vfp.qc[0] = 1;                                     \
> +            }                                                           \
> +        }                                                               \
> +        mve_advance_vpt(env);                                           \
> +    }

Perhaps slightly better as

   bool qc = false;

     qc |= sat & mask & 1;

   if (qc) {
     env->vfp.qc[0] = qc;
   }

Maybe reverse the store into &sat (set false if no saturation), and init as

     bool sat = mask & 1;

Though if you choose not to exploit this kind of conditional store, perhaps it 
would be better to fully set *s within do_sat_bhw.  That is, do not rely on 
initialization to false outside the subroutine.

Which you choose,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 39/55] target/arm: Implement MVE VQDMULH and VQRDMULH (scalar)
  2021-06-07 16:58 ` [PATCH 39/55] target/arm: Implement MVE VQDMULH and VQRDMULH (scalar) Peter Maydell
@ 2021-06-09 18:58   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 18:58 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> Implement the MVE VQDMULH and VQRDMULH scalar insns, which multiply
> elements by the scalar, double, possibly round, take the high half
> and saturate.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    |  8 ++++++++
>   target/arm/mve.decode      |  3 +++
>   target/arm/mve_helper.c    | 25 +++++++++++++++++++++++++
>   target/arm/translate-mve.c |  2 ++
>   4 files changed, 38 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 40/55] target/arm: Implement MVE VQDMULL scalar
  2021-06-07 16:58 ` [PATCH 40/55] target/arm: Implement MVE VQDMULL scalar Peter Maydell
@ 2021-06-09 19:11   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 19:11 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> Implement the MVE VQDMULL scalar insn. This multiplies the top or
> bottom half of each element by the scalar, doubles and saturates
> to a double-width result.
> 
> Note that this encoding overlaps with VQADD and VQSUB; it uses
> what in VQADD and VQSUB would be the 'size=0b11' encoding.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    |  5 +++
>   target/arm/mve.decode      | 23 +++++++++++---
>   target/arm/mve_helper.c    | 65 ++++++++++++++++++++++++++++++++++++++
>   target/arm/translate-mve.c | 30 ++++++++++++++++++
>   4 files changed, 119 insertions(+), 4 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 41/55] target/arm: Implement MVE VQDMULH, VQRDMULH (vector)
  2021-06-07 16:58 ` [PATCH 41/55] target/arm: Implement MVE VQDMULH, VQRDMULH (vector) Peter Maydell
@ 2021-06-09 19:13   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 19:13 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> Implement the vector forms of the MVE VQDMULH and VQRDMULH insns.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    |  8 ++++++++
>   target/arm/mve.decode      |  3 +++
>   target/arm/mve_helper.c    | 27 +++++++++++++++++++++++++++
>   target/arm/translate-mve.c |  2 ++
>   4 files changed, 40 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 42/55] target/arm: Implement MVE VQADD, VQSUB (vector)
  2021-06-07 16:58 ` [PATCH 42/55] target/arm: Implement MVE VQADD, VQSUB (vector) Peter Maydell
@ 2021-06-09 19:15   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 19:15 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> Implement the vector forms of the MVE VQADD and VQSUB insns.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    | 16 ++++++++++++++++
>   target/arm/mve.decode      |  5 +++++
>   target/arm/mve_helper.c    | 14 ++++++++++++++
>   target/arm/translate-mve.c |  4 ++++
>   4 files changed, 39 insertions(+)
> 

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 43/55] target/arm: Implement MVE VQSHL (vector)
  2021-06-07 16:58 ` [PATCH 43/55] target/arm: Implement MVE VQSHL (vector) Peter Maydell
@ 2021-06-09 19:26   ` Richard Henderson
  2021-06-14 11:04     ` Peter Maydell
  0 siblings, 1 reply; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 19:26 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> Implement the MVE VQSHL insn (encoding T4, which is the
> vector-shift-by-vector version).
> 
> The DO_SQSHL_OP and DO_UQSHL_OP macros here are derived from
> the neon_helper.c code for qshl_u{8,16,32} and qshl_s{8,16,32}.

Ah, from before the sve2 merge, and associated cleanup.
There are now helper functions in vec_internal.h for this.

The decode looks fine.


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 44/55] target/arm: Implement MVE VQRSHL
  2021-06-07 16:58 ` [PATCH 44/55] target/arm: Implement MVE VQRSHL Peter Maydell
@ 2021-06-09 19:29   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 19:29 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> Implement the MV VQRSHL (vector) insn.  Again, the code to perform
> the actual shifts is borrowed from neon_helper.c.

Again, there are helpers in vec_internal.h now.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 45/55] target/arm: Implement MVE VSHL insn
  2021-06-07 16:58 ` [PATCH 45/55] target/arm: Implement MVE VSHL insn Peter Maydell
@ 2021-06-09 19:40   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 19:40 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> +static inline uint32_t do_ushl(uint32_t n, int8_t shift, int esize)
> +{
> +    if (shift >= esize || shift <= -esize) {
> +        return 0;
> +    } else if (shift < 0) {
> +        return n >> -shift;
> +    } else {
> +        return n << shift;
> +    }
> +}

Current form uses the helpers.

#define NEON_FN(dest, src1, src2) \
     (dest = do_uqrshl_bhs(src1, (int8_t)src2, 16, false, NULL))
NEON_VOP(shl_u16, neon_u16, 2)
#undef NEON_FN

etc.  Otherwise,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 46/55] target/arm: Implement MVE VRSHL
  2021-06-07 16:58 ` [PATCH 46/55] target/arm: Implement MVE VRSHL Peter Maydell
@ 2021-06-09 19:43   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 19:43 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> Implement the MVE VRSHL insn (vector form).
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    |  8 ++++++++
>   target/arm/mve.decode      |  3 +++
>   target/arm/mve_helper.c    | 36 ++++++++++++++++++++++++++++++++++++
>   target/arm/translate-mve.c |  2 ++
>   4 files changed, 49 insertions(+)

Similarly use vec_internal.h.  Otherwise,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 47/55] target/arm: Implement MVE VQDMLADH and VQRDMLADH
  2021-06-07 16:58 ` [PATCH 47/55] target/arm: Implement MVE VQDMLADH and VQRDMLADH Peter Maydell
@ 2021-06-09 20:05   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 20:05 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> +static int32_t do_vqdmladh_w(int32_t a, int32_t b, int32_t c, int32_t d,
> +                             int round, bool *sat)
> +{
> +    int64_t m1 = (int64_t)a * b;
> +    int64_t m2 = (int64_t)c * d;
> +    int64_t r;
> +    /*
> +     * Architecturally we should do the entire add, double, round
> +     * and then check for saturation. We do three saturating adds,
> +     * but we need to be careful about the order. If the first
> +     * m1 + m2 saturates then it's impossible for the *2+rc to
> +     * bring it back into the non-saturated range. However, if
> +     * m1 + m2 is negative then it's possible that doing the doubling
> +     * would take the intermediate result below INT64_MAX and the
> +     * addition of the rounding constant then brings it back in range.
> +     * So we add half the rounding constant before doubling rather
> +     * than adding the rounding constant after the doubling.
> +     */
> +    if (sadd64_overflow(m1, m2, &r) ||
> +        sadd64_overflow(r, (round << 30), &r) ||
> +        sadd64_overflow(r, r, &r)) {

Ooh, ahh, an operation that doesn't even exist in SVE2.
Nice use of the new interface, btw.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 48/55] target/arm: Implement MVE VQDMLSDH and VQRDMLSDH
  2021-06-07 16:58 ` [PATCH 48/55] target/arm: Implement MVE VQDMLSDH and VQRDMLSDH Peter Maydell
@ 2021-06-09 20:08   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 20:08 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> Implement the MVE VQDMLSDH and VQRDMLSDH insns, which are
> like VQDMLADH and VQRDMLADH except that products are subtracted
> rather than added.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    | 16 ++++++++++++++
>   target/arm/mve.decode      |  5 +++++
>   target/arm/mve_helper.c    | 44 ++++++++++++++++++++++++++++++++++++++
>   target/arm/translate-mve.c |  4 ++++
>   4 files changed, 69 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 49/55] target/arm: Implement MVE VQDMULL (vector)
  2021-06-07 16:58 ` [PATCH 49/55] target/arm: Implement MVE VQDMULL (vector) Peter Maydell
@ 2021-06-09 20:20   ` Richard Henderson
  2021-06-10 19:08     ` Peter Maydell
  0 siblings, 1 reply; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 20:20 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> +++ b/target/arm/mve.decode
> @@ -39,6 +39,8 @@
>   @1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
>   @2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
>   @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
> +@2op_sz28 .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn \
> +     size=%size_28

Move this back to VQDMULL[BT]_scalar, I think.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 50/55] target/arm: Implement MVE VRHADD
  2021-06-07 16:58 ` [PATCH 50/55] target/arm: Implement MVE VRHADD Peter Maydell
@ 2021-06-09 20:24   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 20:24 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> Implement the MVE VRHADD insn, which performs a rounded halving
> addition.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    | 8 ++++++++
>   target/arm/mve.decode      | 3 +++
>   target/arm/mve_helper.c    | 6 ++++++
>   target/arm/translate-mve.c | 2 ++
>   4 files changed, 19 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 51/55] target/arm: Implement MVE VADC, VSBC
  2021-06-07 16:58 ` [PATCH 51/55] target/arm: Implement MVE VADC, VSBC Peter Maydell
@ 2021-06-09 21:06   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 21:06 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> +#define DO_VADC(OP, INV)                                                \
> +    uint32_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,         \
> +                                    void *vn, void *vm, uint32_t nzcv)  \
> +    {                                                                   \
> +        uint32_t *d = vd, *n = vn, *m = vm;                             \
> +        uint16_t mask = mve_element_mask(env);                          \
> +        unsigned e;                                                     \
> +        int carry = (nzcv & FPCR_C) ? 1 : 0;                            \
> +        /* If we do no additions at all the flags are preserved */      \
> +        bool updates_flags = (mask & 0x1111) != 0;                      \
> +        for (e = 0; e < 16 / 4; e++, mask >>= 4) {                      \
> +            uint64_t r = (uint64_t)n[H4(e)] + INV(m[H4(e)]) + carry;    \
> +            if (mask & 1) {                                             \
> +                carry = r >> 32;                                        \
> +            }                                                           \
> +            uint64_t bytemask = mask_to_bytemask4(mask);                \
> +            d[H4(e)] &= ~bytemask;                                      \
> +            d[H4(e)] |= (r & bytemask);                                 \
> +        }                                                               \
> +        mve_advance_vpt(env);                                           \
> +        if (updates_flags) {                                            \
> +            nzcv = carry ? FPCR_C : 0;                                  \
> +        }                                                               \
> +        return nzcv;                                                    \
> +    }
...
> +    /*
> +     * This insn is subject to beat-wise execution.  Partial execution
> +     * of an I=1 (initial carry input fixed) insn which does not
> +     * execute the first beat must start with the current FPSCR.NZCV
> +     * value, not the fixed constant input.
> +     */
> +    if (a->i && !mve_skip_first_beat(s)) {
> +        /* Carry input is 0 (VADCI) or 1 (VSBCI), NZV zeroed */
> +        nzcv = tcg_const_i32(fixed_carry);
> +    } else {
> +        /* Carry input from existing NZCV flag values */
> +        nzcv = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
> +        tcg_gen_andi_i32(nzcv, nzcv, FPCR_NZCV_MASK);
> +    }
> +    qd = mve_qreg_ptr(a->qd);
> +    qn = mve_qreg_ptr(a->qn);
> +    qm = mve_qreg_ptr(a->qm);
> +    fn(nzcv, cpu_env, qd, qn, qm, nzcv);
> +    fpscr = load_cpu_field(vfp.xregs[ARM_VFP_FPSCR]);
> +    tcg_gen_andi_i32(fpscr, fpscr, ~FPCR_NZCV_MASK);
> +    tcg_gen_or_i32(fpscr, fpscr, nzcv);
> +    store_cpu_field(fpscr, vfp.xregs[ARM_VFP_FPSCR]);

Hmm.  It seems like you're having to work extra hard in tcg to extract and 
store nzcv.

How about four helper functions instead of 2.  E.g.

static void do_vadc(CPUARMState *env, uint32_t *d,
                     uint32_t *n, uint32_t *m,
                     uint32_t inv, uint32_t carry_in,
                     bool update_flags)
{
     uint16_t mask = mve_element_mask(env);
     unsigned e;

     /* If any additions trigger, we will update flags. */
     if (mask & 0x1111) {
         update_flags = true;
     }

     for (e = 0; e < 16 / 4; e++, mask >>= 4) {
         uint32_t bmask = mask_to_bytemask4(mask);
         uint64_t r = carry_in;
         r += n[H4(e)];
         r += m[H4(e)] ^ inv;
         if (mask & 1) {
             carry_in = r >> 32;
         }
         d[H4(e)] = (d[H4(e)] & ~bmask) | ((uint32_t)r & bmask);
     }

     if (update_flags) {
         /* Store C, clear NZV. */
         env->vfp.xregs[ARM_VFP_FPSCR] &= ~FPCR_NZCV_MASK;
         env->vfp.xregs[ARM_VFP_FPSCR] |= carry_in * FPCR_C;
     }
     mve_advance_vpt(env);                                           }

void HELPER(mve_vadc)(CPUARMState *env, void *vd,
                       void *vn, void *vm)
{
     bool carry_in = env->vfp.xregs[ARM_VFP_FPSCR] & FPCR_C;
     do_vadc(env, vd, vn, vm, 0, carry_in, false);
}

void HELPER(mve_vsbc)(CPUARMState *env, void *vd,
                       void *vn, void *vm)
{
     bool carry_in = env->vfp.xregs[ARM_VFP_FPSCR] & FPCR_C;
     do_vadc(env, vd, vn, vm, -1, carry_in, false);
}

void HELPER(mve_vadci)(CPUARMState *env, void *vd,
                        void *vn, void *vm)
{
     do_vadc(env, vd, vn, vm, 0, 0, true);
}

void HELPER(mve_vsbci)(CPUARMState *env, void *vd,
                       void *vn, void *vm)
{
     do_vadc(env, vd, vn, vm, -1, 1, true);
}


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 52/55] target/arm: Implement MVE VCADD
  2021-06-07 16:58 ` [PATCH 52/55] target/arm: Implement MVE VCADD Peter Maydell
@ 2021-06-09 21:16   ` Richard Henderson
  2021-06-10 19:16     ` Peter Maydell
  0 siblings, 1 reply; 130+ messages in thread
From: Richard Henderson @ 2021-06-09 21:16 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> +#define DO_VCADD(OP, ESIZE, TYPE, H, FN0, FN1)                          \
> +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn, void *vm) \
> +    {                                                                   \
> +        TYPE *d = vd, *n = vn, *m = vm;                                 \
> +        uint16_t mask = mve_element_mask(env);                          \
> +        unsigned e;                                                     \
> +        TYPE r[16 / ESIZE];                                             \
> +        /* Calculate all results first to avoid overwriting inputs */   \
> +        for (e = 0; e < 16 / ESIZE; e++) {                              \
> +            if (!(e & 1)) {                                             \
> +                r[e] = FN0(n[H(e)], m[H(e + 1)]);                       \
> +            } else {                                                    \
> +                r[e] = FN1(n[H(e)], m[H(e - 1)]);                       \
> +            }                                                           \
> +        }                                                               \
> +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
> +            uint64_t bytemask = mask_to_bytemask##ESIZE(mask);          \
> +            d[H(e)] &= ~bytemask;                                       \
> +            d[H(e)] |= (r[e] & bytemask);                               \
> +        }                                                               \
> +        mve_advance_vpt(env);                                           \
> +    }

I guess this is ok. You could unroll the loop once, so that you compute only 
even+odd results before writeback.

> +/*
> + * VCADD Qd == Qm at size MO_32 is UNPREDICTABLE; we choose not to diagnose
> + * so we can reuse the DO_2OP macro. (Our implementation calculates the
> + * "expected" results in this case.)
> + */
You've done this elsewhere, though.

Either way,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 53/55] target/arm: Implement MVE VHCADD
  2021-06-07 16:58 ` [PATCH 53/55] target/arm: Implement MVE VHCADD Peter Maydell
@ 2021-06-10  3:50   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-10  3:50 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> +#define DO_HADD(N, M) (((int64_t)(N) + (int64_t)(M)) >> 1)
> +#define DO_HSUB(N, M) (((int64_t)(N) - (int64_t)(M)) >> 1)

You've already got do_vhadd_[us] defined from vadd[su]...


r~




^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 03/55] target/arm: Handle VPR semantics in existing code
  2021-06-07 21:19   ` Richard Henderson
@ 2021-06-10  9:28     ` Peter Maydell
  0 siblings, 0 replies; 130+ messages in thread
From: Peter Maydell @ 2021-06-10  9:28 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Mon, 7 Jun 2021 at 22:19, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 6/7/21 9:57 AM, Peter Maydell wrote:
> > @@ -410,16 +415,19 @@ void HELPER(v7m_preserve_fp_state)(CPUARMState *env)
> >       env->v7m.fpccr[is_secure] &= ~R_V7M_FPCCR_LSPACT_MASK;
> >
> >       if (ts) {
> > -        /* Clear s0 to s31 and the FPSCR */
> > +        /* Clear s0 to s31 and the FPSCR and VPR */
> >           int i;
> >
> >           for (i = 0; i < 32; i += 2) {
> >               *aa32_vfp_dreg(env, i / 2) = 0;
> >           }
> >           vfp_set_fpscr(env, 0);
> > +        if (cpu_isar_feature(aa32_mve, cpu)) {
> > +            env->v7m.vpr = 0;
> > +        }
>
> If the vpr does not exist without mve, is it cleaner to simply set vpr
> unconditionally?

I thought about that, but in the end went for the condition, just
to preserve the parallelism with the places where we do need
the condition. There didn't seem to me to be much in it.

-- PMM


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 04/55] target/arm: Add handling for PSR.ECI/ICI
  2021-06-07 23:33   ` Richard Henderson
@ 2021-06-10 10:17     ` Peter Maydell
  2021-06-10 13:39       ` Richard Henderson
  0 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-10 10:17 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Tue, 8 Jun 2021 at 00:33, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 6/7/21 9:57 AM, Peter Maydell wrote:
> > +void clear_eci_state(DisasContext *s)
> > +{
> > +    /*
> > +     * Clear any ECI/ICI state: used when a load multiple/store
> > +     * multiple insn executes.
> > +     */
> > +    if (s->eci) {
> > +        TCGv_i32 tmp = tcg_temp_new_i32();
> > +        tcg_gen_movi_i32(tmp, 0);
>
> tcg_const_i32 or preferably tcg_constant_i32.

I'll use tcg_const_i32(), yep. (I think I copied this absent-mindedly
from some of the existing code in translate.c that uses tcg_gen_movi_i32().)
Can't use tcg_constant_i32() because store_cpu_field() wants to
tcg_temp_free_i32() its argument.

> > +    if (condexec & 0xf) {
> > +        dc->condexec_mask = (condexec & 0xf) << 1;
> > +        dc->condexec_cond = condexec >> 4;
> > +        dc->eci = 0;
> > +    } else {
> > +        dc->condexec_mask = 0;
> > +        dc->condexec_cond = 0;
> > +        if (arm_feature(env, ARM_FEATURE_M)) {
> > +            dc->eci = condexec >> 4;
> > +        }
>
> This else leaves eci uninitialized.

Strictly speaking it doesn't, because gen_intermediate_code
zero-initializes the DisasContext with a "{ }" struct initializer.
But it's clearer to explicitly initialize here I guess. Fixed.

> >       dc->insn = insn;
> >
> > +    if (dc->eci) {
> > +        /*
> > +         * For M-profile continuable instructions, ECI/ICI handling
> > +         * falls into these cases:
> > +         *  - interrupt-continuable instructions
> > +         *     These are the various load/store multiple insns (both
> > +         *     integer and fp). The ICI bits indicate the register
> > +         *     where the load/store can resume. We make the IMPDEF
> > +         *     choice to always do "instruction restart", ie ignore
> > +         *     the ICI value and always execute the ldm/stm from the
> > +         *     start. So all we need to do is zero PSR.ICI if the
> > +         *     insn executes.
> > +         *  - MVE instructions subject to beat-wise execution
> > +         *     Here the ECI bits indicate which beats have already been
> > +         *     executed, and we must honour this. Each insn of this
> > +         *     type will handle it correctly. We will update PSR.ECI
> > +         *     in the helper function for the insn (some ECI values
> > +         *     mean that the following insn also has been partially
> > +         *     executed).
> > +         *  - Special cases which don't advance ECI
> > +         *     The insns LE, LETP and BKPT leave the ECI/ICI state
> > +         *     bits untouched.
> > +         *  - all other insns (the common case)
> > +         *     Non-zero ECI/ICI means an INVSTATE UsageFault.
> > +         *     We place a rewind-marker here. Insns in the previous
> > +         *     three categories will set a flag in the DisasContext.
> > +         *     If the flag isn't set after we call disas_thumb_insn()
> > +         *     or disas_thumb2_insn() then we know we have a "some other
> > +         *     insn" case. We will rewind to the marker (ie throwing away
> > +         *     all the generated code) and instead emit "take exception".
> > +         */
> > +        dc->eci_handled = false;
>
> This should be done in arm_tr_init_disas_context, I think, unconditionally,
> next to eci.
>
> > +        dc->insn_eci_rewind = tcg_last_op();
>
> I believe that this is identical to dc->insn_start.  Certainly there does not
> seem to be any possibility of any opcodes emitted in between.

There's quite a wide separation between where we set insn_start and here
(we set insn_start in arm_tr_insn_start, then there's whatever the accel/tcg
framework chooses to do between the insn_start callback and the translate_insn
callback, then the arm_pre_translate_insn() code). So I felt that a separate
pointer was easier to reason about.

In fact, looking again at the accel/tcg code, if we rewind to insn_start
that will delete any code emitted by the breakpoint_check hook,
anything emitted by plugin_gen_insn_start(), and anything emitted by
gen_io_start() if this is a CF_LAST_IO insn. I think we want to keep
all of those.

> If you think we should use a different field, then initialize it to null next
> to eci/eci_handled.

Done.

-- PMM


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 13/55] target/arm: Implement MVE VCLZ
  2021-06-08 22:10   ` Richard Henderson
@ 2021-06-10 12:40     ` Peter Maydell
  2021-06-10 14:03       ` Richard Henderson
  0 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-10 12:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Tue, 8 Jun 2021 at 23:10, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 6/7/21 9:57 AM, Peter Maydell wrote:
> > Implement the MVE VCLZ insn (and the necessary machinery
> > for MVE 1-input vector ops).
> >
> > Note that for non-load instructions predication is always performed
> > at a byte level granularity regardless of element size (R_ZLSJ),
> > and so the masking logic here differs from that used in the VLDR
> > and VSTR helpers.
> >
> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

> > +
> > +/*
> > + * Take the bottom bits of mask (which is 1 bit per lane) and
> > + * convert to a mask which has 1s in each byte which is predicated.
> > + */
> > +static uint8_t mask_to_bytemask1(uint16_t mask)
> > +{
> > +    return (mask & 1) ? 0xff : 0;
> > +}
> > +
> > +static uint16_t mask_to_bytemask2(uint16_t mask)
> > +{
> > +    static const uint16_t masks[] = { 0x0000, 0x00ff, 0xff00, 0xffff };
> > +    return masks[mask & 3];
> > +}
> > +
> > +static uint32_t mask_to_bytemask4(uint16_t mask)
> > +{
> > +    static const uint32_t masks[] = {
> > +        0x00000000, 0x000000ff, 0x0000ff00, 0x0000ffff,
> > +        0x00ff0000, 0x00ff00ff, 0x00ffff00, 0x00ffffff,
> > +        0xff000000, 0xff0000ff, 0xff00ff00, 0xff00ffff,
> > +        0xffff0000, 0xffff00ff, 0xffffff00, 0xffffffff,
> > +    };
>
> I'll note that
>
> (1) the values for the mask_to_bytemask2 array overlap the first 4 values of
> the mask_to_bytemask4 array, and
>
> (2) both of these overlap with the larger
>
> static inline uint64_t expand_pred_b(uint8_t byte)
>
> from SVE.  It'd be nice to share the storage, whatever the actual functional
> interface into the array.

Yeah, I guess so. I didn't really feel like trying to
abstract that out...

> > +#define DO_1OP(OP, ESIZE, TYPE, H, FN)                                  \
> > +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
> > +    {                                                                   \
> > +        TYPE *d = vd, *m = vm;                                          \
> > +        uint16_t mask = mve_element_mask(env);                          \
> > +        unsigned e;                                                     \
> > +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
> > +            TYPE r = FN(m[H(e)]);                                       \
> > +            uint64_t bytemask = mask_to_bytemask##ESIZE(mask);          \
>
> Why uint64_t and not TYPE?  Or uint32_t?

A later patch adds the mask_to_bytemask8(), so I wanted
a type that was definitely unsigned (so TYPE isn't any good)
and which was definitely big enough for 64 bits.

> > +    if (!mve_eci_check(s)) {
> > +        return true;
> > +    }
> > +
> > +    if (!vfp_access_check(s)) {
> > +        return true;
> > +    }
>
> Not the first instance, but is it worth saving 4 lines per and combining these
> into one IF?

Yes, I think so.

-- PMM


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 04/55] target/arm: Add handling for PSR.ECI/ICI
  2021-06-10 10:17     ` Peter Maydell
@ 2021-06-10 13:39       ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-10 13:39 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-arm, QEMU Developers

On 6/10/21 3:17 AM, Peter Maydell wrote:
> Can't use tcg_constant_i32() because store_cpu_field() wants to
> tcg_temp_free_i32() its argument.

Yes you can.  I thought I documented somewhere that constant is silently 
ignored by free.  Oh dear, now I see that I have conflicting docs -- will fix.

> In fact, looking again at the accel/tcg code, if we rewind to insn_start
> that will delete any code emitted by the breakpoint_check hook,
> anything emitted by plugin_gen_insn_start(), and anything emitted by
> gen_io_start() if this is a CF_LAST_IO insn. I think we want to keep
> all of those.

Hmm.  I guess BP_CPU does say DISAS_TOO_MANY for execute only one more insn, 
and the plugin stuff.  Good point.


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 11/55] target/arm: Implement MVE VLDR/VSTR (non-widening forms)
  2021-06-08 21:33   ` Richard Henderson
  2021-06-08 21:43     ` Richard Henderson
  2021-06-09 10:01     ` Peter Maydell
@ 2021-06-10 14:01     ` Peter Maydell
  2 siblings, 0 replies; 130+ messages in thread
From: Peter Maydell @ 2021-06-10 14:01 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Tue, 8 Jun 2021 at 22:33, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 6/7/21 9:57 AM, Peter Maydell wrote:
> > +#define DO_VLDR(OP, ESIZE, LDTYPE, TYPE, H)                             \
> > +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr)    \
> > +    {                                                                   \
> > +        TYPE *d = vd;                                                   \
> > +        uint16_t mask = mve_element_mask(env);                          \
> > +        unsigned b, e;                                                  \
>
> esize is redundant with sizeof(type); perhaps just make it a local variable?

That's OK here, but not for most of the other macros, where we need
ESIZE as a macro argument so we can do mask_to_bytemask##ESIZE.

-- PMM


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 13/55] target/arm: Implement MVE VCLZ
  2021-06-10 12:40     ` Peter Maydell
@ 2021-06-10 14:03       ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-10 14:03 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-arm, QEMU Developers

On 6/10/21 5:40 AM, Peter Maydell wrote:
>>> +#define DO_1OP(OP, ESIZE, TYPE, H, FN)                                  \
>>> +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
>>> +    {                                                                   \
>>> +        TYPE *d = vd, *m = vm;                                          \
>>> +        uint16_t mask = mve_element_mask(env);                          \
>>> +        unsigned e;                                                     \
>>> +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
>>> +            TYPE r = FN(m[H(e)]);                                       \
>>> +            uint64_t bytemask = mask_to_bytemask##ESIZE(mask);          \
>>
>> Why uint64_t and not TYPE?  Or uint32_t?
> 
> A later patch adds the mask_to_bytemask8(), so I wanted
> a type that was definitely unsigned (so TYPE isn't any good)
> and which was definitely big enough for 64 bits.

Hmm.  I was just concerned about the unnecessary type extension involved.

What about changing the interface.  Not to return a mask as you do here, but to 
perform the entire merge operation.  E.g.

static uint8_t mergemask1(uint8_t d, uint8_t r, uint16_t mask)
{
     return mask & 1 ? r : d;
}

static uint16_t mergemask2(uint16_t d, uint16_t r, uint16_t mask)
{
     uint16_t bmask = array_whotsit[mask & 3];
     return (d & ~bmask) | (r & bmask);
}

etc.

Or maybe with a pointer argument for D, so that the load+store is done there as 
well.  In which case you could use QEMU_GENERIC to select the function invoked, 
instead of using token pasting everywhere.  E.g.

static void mergemask_ub(uint8_t *d, uint8_r, uint16_t mask)
{
     if (mask & 1) {
         *d = r;
     }
}

static void mergemask_sb(int8_t *d, int8_r, uint16_t mask)
{
     mergemask_ub((uint8_t *)d, r, mask);
}

static void mergemask_uh(uint16_t *d, uint16_r, uint16_t mask)
{
     uint16_t bmask = array_whotsit[mask & 3];
     *d = (*d & ~bmask) | (r & bmask);
}

...

#define mergemask(D, R, M) \
     QEMU_GENERIC(D, (uint8_t *, mergemask_ub), \
                     (int8_t *,  mergemask_sb), \
                     ... )

BTW, now that we're at minimal gcc 7, I think we can shift to -std=gnu11 so 
that we can drop QEMU_GENERIC and just use _Generic, which is much easier to 
read than the above, and will give better error messages for missing cases. 
Anyway...

Which takes your boilerplate down to

> +#define DO_1OP(OP, ESIZE, TYPE, H, FN)                                  \
> +    void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm)         \
> +    {                                                                   \
> +        TYPE *d = vd, *m = vm;                                          \
> +        uint16_t mask = mve_element_mask(env);                          \
> +        for (unsigned e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {     \
> +            mergemask(&d[H(e)], FN(m[H(e)]), mask);                     \
> +        }                                                               \
> +        mve_advance_vpt(env);                                           \
> +    }

which looks pretty tidy to me.


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 54/55] target/arm: Implement MVE VADDV
  2021-06-07 16:58 ` [PATCH 54/55] target/arm: Implement MVE VADDV Peter Maydell
@ 2021-06-10 14:06   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-10 14:06 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> Implement the MVE VADDV insn, which performs an addition
> across vector lanes.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/helper-mve.h    |  7 ++++++
>   target/arm/mve.decode      |  2 ++
>   target/arm/mve_helper.c    | 24 +++++++++++++++++++
>   target/arm/translate-mve.c | 48 ++++++++++++++++++++++++++++++++++++++
>   4 files changed, 81 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 55/55] target/arm: Make VMOV scalar <-> gpreg beatwise for MVE
  2021-06-07 16:58 ` [PATCH 55/55] target/arm: Make VMOV scalar <-> gpreg beatwise for MVE Peter Maydell
@ 2021-06-10 14:14   ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-10 14:14 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 6/7/21 9:58 AM, Peter Maydell wrote:
> +    if (dc_isar_feature(aa32_mve, s)) {
> +        TCGv_i32 eci;
> +
> +        mve_update_eci(s);
> +        eci = tcg_const_i32(s->eci << 4);
> +        store_cpu_field(eci, condexec_bits);
> +    }

I think it would be handy to package this up into an mve_update_and_store_eci 
function.  There are 3 copies, including the previous in VPST.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 49/55] target/arm: Implement MVE VQDMULL (vector)
  2021-06-09 20:20   ` Richard Henderson
@ 2021-06-10 19:08     ` Peter Maydell
  2021-06-10 19:34       ` Richard Henderson
  0 siblings, 1 reply; 130+ messages in thread
From: Peter Maydell @ 2021-06-10 19:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Wed, 9 Jun 2021 at 21:20, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 6/7/21 9:58 AM, Peter Maydell wrote:
> > +++ b/target/arm/mve.decode
> > @@ -39,6 +39,8 @@
> >   @1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
> >   @2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
> >   @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
> > +@2op_sz28 .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn \
> > +     size=%size_28
>
> Move this back to VQDMULL[BT]_scalar, I think.

Why? VQDMULL[BT]_scalar uses an entirely different format
(as a scalar it uses the &2scalar arg struct with an rm field
for the gp register).

-- PMM


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 52/55] target/arm: Implement MVE VCADD
  2021-06-09 21:16   ` Richard Henderson
@ 2021-06-10 19:16     ` Peter Maydell
  0 siblings, 0 replies; 130+ messages in thread
From: Peter Maydell @ 2021-06-10 19:16 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Wed, 9 Jun 2021 at 22:16, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 6/7/21 9:58 AM, Peter Maydell wrote:
> > +/*
> > + * VCADD Qd == Qm at size MO_32 is UNPREDICTABLE; we choose not to diagnose
> > + * so we can reuse the DO_2OP macro. (Our implementation calculates the
> > + * "expected" results in this case.)
> > + */
> You've done this elsewhere, though.

Yeah, because in those cases the op had to have its own hand-written
trans_ function for other reasons so the check was easy to add. Hence
the comment about why this particular case doesn't do that.

> Either way,
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 49/55] target/arm: Implement MVE VQDMULL (vector)
  2021-06-10 19:08     ` Peter Maydell
@ 2021-06-10 19:34       ` Richard Henderson
  0 siblings, 0 replies; 130+ messages in thread
From: Richard Henderson @ 2021-06-10 19:34 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-arm, QEMU Developers

On 6/10/21 12:08 PM, Peter Maydell wrote:
> On Wed, 9 Jun 2021 at 21:20, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> On 6/7/21 9:58 AM, Peter Maydell wrote:
>>> +++ b/target/arm/mve.decode
>>> @@ -39,6 +39,8 @@
>>>    @1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
>>>    @2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
>>>    @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
>>> +@2op_sz28 .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn \
>>> +     size=%size_28
>>
>> Move this back to VQDMULL[BT]_scalar, I think.
> 
> Why? VQDMULL[BT]_scalar uses an entirely different format
> (as a scalar it uses the &2scalar arg struct with an rm field
> for the gp register).

Oops, mis-read the @2op vs @2scalar.  Sorry.

r~



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 32/55] target/arm: Implement MVE VRMLALDAVH, VRMLSLDAVH
  2021-06-09  1:05   ` Richard Henderson
@ 2021-06-14 10:19     ` Peter Maydell
  0 siblings, 0 replies; 130+ messages in thread
From: Peter Maydell @ 2021-06-14 10:19 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Wed, 9 Jun 2021 at 02:05, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 6/7/21 9:57 AM, Peter Maydell wrote:
> > +#define DO_LDAVH(OP, ESIZE, TYPE, H, XCHG, EVENACC, ODDACC, TO128)      \
> > +    uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,         \
> > +                                    void *vm, uint64_t a)               \
> > +    {                                                                   \
> > +        uint16_t mask = mve_element_mask(env);                          \
> > +        unsigned e;                                                     \
> > +        TYPE *n = vn, *m = vm;                                          \
> > +        Int128 acc = TO128(a);                                          \
>
> This seems to miss the << 8.

Oops, yes it does.

> Which suggests that the whole thing can be done without Int128:
>
> > +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
> > +            if (mask & 1) {                                             \
> > +                if (e & 1) {                                            \
> > +                    acc = ODDACC(acc, TO128(n[H(e - 1 * XCHG)] * m[H(e)])); \
>
>    tmp = n * m;
>    tmp = (tmp >> 8) + ((tmp >> 7) & 1);
>    acc ODDACC tmp;

I'm not sure about this suggestion though. It throws away all
of the bottom 7 bits of the product, but because we're iterating through
this 4 times and adding (potentially) four of these products together,
those bottom 7 bits in the 4 products might be able to add together
to become significant enough to affect the final result.

-- PMM


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH 43/55] target/arm: Implement MVE VQSHL (vector)
  2021-06-09 19:26   ` Richard Henderson
@ 2021-06-14 11:04     ` Peter Maydell
  0 siblings, 0 replies; 130+ messages in thread
From: Peter Maydell @ 2021-06-14 11:04 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Wed, 9 Jun 2021 at 20:26, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 6/7/21 9:58 AM, Peter Maydell wrote:
> > Implement the MVE VQSHL insn (encoding T4, which is the
> > vector-shift-by-vector version).
> >
> > The DO_SQSHL_OP and DO_UQSHL_OP macros here are derived from
> > the neon_helper.c code for qshl_u{8,16,32} and qshl_s{8,16,32}.
>
> Ah, from before the sve2 merge, and associated cleanup.
> There are now helper functions in vec_internal.h for this.

Ah, that's helpful. Annoyingly, the helper wants to take a
uint32_t* for the "write to this when saturating" argument,
and I have a bool*...

-- PMM


^ permalink raw reply	[flat|nested] 130+ messages in thread

end of thread, other threads:[~2021-06-14 11:06 UTC | newest]

Thread overview: 130+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-07 16:57 [PATCH 00/55] target/arm: First slice of MVE implementation Peter Maydell
2021-06-07 16:57 ` [PATCH 01/55] tcg: Introduce tcg_remove_ops_after Peter Maydell
2021-06-07 16:57 ` [PATCH 02/55] target/arm: Enable FPSCR.QC bit for MVE Peter Maydell
2021-06-07 19:02   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 03/55] target/arm: Handle VPR semantics in existing code Peter Maydell
2021-06-07 21:19   ` Richard Henderson
2021-06-10  9:28     ` Peter Maydell
2021-06-07 16:57 ` [PATCH 04/55] target/arm: Add handling for PSR.ECI/ICI Peter Maydell
2021-06-07 23:33   ` Richard Henderson
2021-06-10 10:17     ` Peter Maydell
2021-06-10 13:39       ` Richard Henderson
2021-06-07 16:57 ` [PATCH 05/55] target/arm: Let vfp_access_check() handle late NOCP checks Peter Maydell
2021-06-07 23:50   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 06/55] target/arm: Implement MVE LCTP Peter Maydell
2021-06-08  0:05   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 07/55] target/arm: Implement MVE WLSTP insn Peter Maydell
2021-06-08  1:42   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 08/55] target/arm: Implement MVE DLSTP Peter Maydell
2021-06-08  2:56   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 09/55] target/arm: Implement MVE LETP insn Peter Maydell
2021-06-08  3:40   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 10/55] target/arm: Add framework for MVE decode Peter Maydell
2021-06-08  3:59   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 11/55] target/arm: Implement MVE VLDR/VSTR (non-widening forms) Peter Maydell
2021-06-08 21:33   ` Richard Henderson
2021-06-08 21:43     ` Richard Henderson
2021-06-09 10:01     ` Peter Maydell
2021-06-09 17:09       ` Richard Henderson
2021-06-10 14:01     ` Peter Maydell
2021-06-07 16:57 ` [PATCH 12/55] target/arm: Implement widening/narrowing MVE VLDR/VSTR insns Peter Maydell
2021-06-08 21:46   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 13/55] target/arm: Implement MVE VCLZ Peter Maydell
2021-06-08 22:10   ` Richard Henderson
2021-06-10 12:40     ` Peter Maydell
2021-06-10 14:03       ` Richard Henderson
2021-06-07 16:57 ` [PATCH 14/55] target/arm: Implement MVE VCLS Peter Maydell
2021-06-08 22:12   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 15/55] bitops.h: Provide hswap32(), hswap64(), wswap64() swapping operations Peter Maydell
2021-06-08  6:53   ` Philippe Mathieu-Daudé
2021-06-08 22:14   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 16/55] target/arm: Implement MVE VREV16, VREV32, VREV64 Peter Maydell
2021-06-08 22:23   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 17/55] target/arm: Implement MVE VMVN (register) Peter Maydell
2021-06-08 22:27   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 18/55] target/arm: Implement MVE VABS Peter Maydell
2021-06-08 22:34   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 19/55] target/arm: Implement MVE VNEG Peter Maydell
2021-06-08 22:40   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 20/55] target/arm: Implement MVE VDUP Peter Maydell
2021-06-08 23:17   ` Richard Henderson
2021-06-09 10:06     ` Peter Maydell
2021-06-09 17:16       ` Richard Henderson
2021-06-07 16:57 ` [PATCH 21/55] target/arm: Implement MVE VAND, VBIC, VORR, VORN, VEOR Peter Maydell
2021-06-08 23:23   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 22/55] target/arm: Implement MVE VADD, VSUB, VMUL Peter Maydell
2021-06-08 23:25   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 23/55] target/arm: Implement MVE VMULH Peter Maydell
2021-06-08 23:29   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 24/55] target/arm: Implement MVE VRMULH Peter Maydell
2021-06-08 23:33   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 25/55] target/arm: Implement MVE VMAX, VMIN Peter Maydell
2021-06-08 23:35   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 26/55] target/arm: Implement MVE VABD Peter Maydell
2021-06-08 23:39   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 27/55] target/arm: Implement MVE VHADD, VHSUB Peter Maydell
2021-06-08 23:43   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 28/55] target/arm: Implement MVE VMULL Peter Maydell
2021-06-08 23:52   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 29/55] target/arm: Implement MVE VMLALDAV Peter Maydell
2021-06-09  0:46   ` Richard Henderson
2021-06-09  0:46   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 30/55] target/arm: Implement MVE VMLSLDAV Peter Maydell
2021-06-09  0:47   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 31/55] include/qemu/int128.h: Add function to create Int128 from int64_t Peter Maydell
2021-06-08  6:45   ` Philippe Mathieu-Daudé
2021-06-09  0:51   ` Richard Henderson
2021-06-07 16:57 ` [PATCH 32/55] target/arm: Implement MVE VRMLALDAVH, VRMLSLDAVH Peter Maydell
2021-06-09  1:05   ` Richard Henderson
2021-06-14 10:19     ` Peter Maydell
2021-06-07 16:57 ` [PATCH 33/55] target/arm: Implement MVE VADD (scalar) Peter Maydell
2021-06-09 17:58   ` Richard Henderson
2021-06-07 16:58 ` [PATCH 34/55] target/arm: Implement MVE VSUB, VMUL (scalar) Peter Maydell
2021-06-09 18:00   ` Richard Henderson
2021-06-07 16:58 ` [PATCH 35/55] target/arm: Implement MVE VHADD, VHSUB (scalar) Peter Maydell
2021-06-09 18:02   ` Richard Henderson
2021-06-07 16:58 ` [PATCH 36/55] target/arm: Implement MVE VBRSR Peter Maydell
2021-06-09 18:08   ` Richard Henderson
2021-06-07 16:58 ` [PATCH 37/55] target/arm: Implement MVE VPST Peter Maydell
2021-06-09 18:23   ` Richard Henderson
2021-06-07 16:58 ` [PATCH 38/55] target/arm: Implement MVE VQADD and VQSUB Peter Maydell
2021-06-09 18:46   ` Richard Henderson
2021-06-07 16:58 ` [PATCH 39/55] target/arm: Implement MVE VQDMULH and VQRDMULH (scalar) Peter Maydell
2021-06-09 18:58   ` Richard Henderson
2021-06-07 16:58 ` [PATCH 40/55] target/arm: Implement MVE VQDMULL scalar Peter Maydell
2021-06-09 19:11   ` Richard Henderson
2021-06-07 16:58 ` [PATCH 41/55] target/arm: Implement MVE VQDMULH, VQRDMULH (vector) Peter Maydell
2021-06-09 19:13   ` Richard Henderson
2021-06-07 16:58 ` [PATCH 42/55] target/arm: Implement MVE VQADD, VQSUB (vector) Peter Maydell
2021-06-09 19:15   ` Richard Henderson
2021-06-07 16:58 ` [PATCH 43/55] target/arm: Implement MVE VQSHL (vector) Peter Maydell
2021-06-09 19:26   ` Richard Henderson
2021-06-14 11:04     ` Peter Maydell
2021-06-07 16:58 ` [PATCH 44/55] target/arm: Implement MVE VQRSHL Peter Maydell
2021-06-09 19:29   ` Richard Henderson
2021-06-07 16:58 ` [PATCH 45/55] target/arm: Implement MVE VSHL insn Peter Maydell
2021-06-09 19:40   ` Richard Henderson
2021-06-07 16:58 ` [PATCH 46/55] target/arm: Implement MVE VRSHL Peter Maydell
2021-06-09 19:43   ` Richard Henderson
2021-06-07 16:58 ` [PATCH 47/55] target/arm: Implement MVE VQDMLADH and VQRDMLADH Peter Maydell
2021-06-09 20:05   ` Richard Henderson
2021-06-07 16:58 ` [PATCH 48/55] target/arm: Implement MVE VQDMLSDH and VQRDMLSDH Peter Maydell
2021-06-09 20:08   ` Richard Henderson
2021-06-07 16:58 ` [PATCH 49/55] target/arm: Implement MVE VQDMULL (vector) Peter Maydell
2021-06-09 20:20   ` Richard Henderson
2021-06-10 19:08     ` Peter Maydell
2021-06-10 19:34       ` Richard Henderson
2021-06-07 16:58 ` [PATCH 50/55] target/arm: Implement MVE VRHADD Peter Maydell
2021-06-09 20:24   ` Richard Henderson
2021-06-07 16:58 ` [PATCH 51/55] target/arm: Implement MVE VADC, VSBC Peter Maydell
2021-06-09 21:06   ` Richard Henderson
2021-06-07 16:58 ` [PATCH 52/55] target/arm: Implement MVE VCADD Peter Maydell
2021-06-09 21:16   ` Richard Henderson
2021-06-10 19:16     ` Peter Maydell
2021-06-07 16:58 ` [PATCH 53/55] target/arm: Implement MVE VHCADD Peter Maydell
2021-06-10  3:50   ` Richard Henderson
2021-06-07 16:58 ` [PATCH 54/55] target/arm: Implement MVE VADDV Peter Maydell
2021-06-10 14:06   ` Richard Henderson
2021-06-07 16:58 ` [PATCH 55/55] target/arm: Make VMOV scalar <-> gpreg beatwise for MVE Peter Maydell
2021-06-10 14:14   ` Richard Henderson
2021-06-09 14:33 ` [PATCH 00/55] target/arm: First slice of MVE implementation no-reply

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.