* [PATCH v2 00/18] target/arm: MVE slice 4
@ 2021-08-26 13:17 Peter Maydell
2021-08-26 13:17 ` [PATCH v2 01/18] target/arm: Implement MVE VADD (floating-point) Peter Maydell
` (17 more replies)
0 siblings, 18 replies; 20+ messages in thread
From: Peter Maydell @ 2021-08-26 13:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel
This patchseries is the fourth slice of the MVE implementation,
including the "turn it on" patch. These patches have all been
on-list before and have been reviewed; changes since v1 are
mostly just the ones requested by Richard.
v1->v2:
* use float* types in macro arguments, not uint*_t
(in most patches)
* make do_vcvt_sh/do_vcvt_hs functions, not macros
* new DO_2OP_FP_ALL, DO_2OP_FP_SCALAR_ALL macros that invoke
DO_2OP_FP/DO_2OP_FP_SCALAR once each for float16, float32
* pass a CHS bool to DO_VFMA rather than a function name
thanks
-- PMM
Peter Maydell (18):
target/arm: Implement MVE VADD (floating-point)
target/arm: Implement MVE VSUB, VMUL, VABD, VMAXNM, VMINNM
target/arm: Implement MVE VCADD
target/arm: Implement MVE VFMA and VFMS
target/arm: Implement MVE VCMUL and VCMLA
target/arm: Implement MVE VMAXNMA and VMINNMA
target/arm: Implement MVE scalar fp insns
target/arm: Implement MVE fp-with-scalar VFMA, VFMAS
softfloat: Remove assertion preventing silencing of NaN in default-NaN
mode
target/arm: Implement MVE FP max/min across vector
target/arm: Implement MVE fp vector comparisons
target/arm: Implement MVE fp scalar comparisons
target/arm: Implement MVE VCVT between floating and fixed point
target/arm: Implement MVE VCVT between fp and integer
target/arm: Implement MVE VCVT with specified rounding mode
target/arm: Implement MVE VCVT between single and half precision
target/arm: Implement MVE VRINT insns
target/arm: Enable MVE in Cortex-M55
target/arm/helper-mve.h | 142 +++++++
target/arm/translate.h | 6 +
target/arm/mve.decode | 277 ++++++++++++--
target/arm/cpu_tcg.c | 7 +-
target/arm/mve_helper.c | 650 +++++++++++++++++++++++++++++++++
target/arm/translate-mve.c | 277 +++++++++++++-
target/arm/translate-neon.c | 6 -
fpu/softfloat-specialize.c.inc | 1 -
8 files changed, 1318 insertions(+), 48 deletions(-)
--
2.20.1
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 01/18] target/arm: Implement MVE VADD (floating-point)
2021-08-26 13:17 [PATCH v2 00/18] target/arm: MVE slice 4 Peter Maydell
@ 2021-08-26 13:17 ` Peter Maydell
2021-08-26 13:17 ` [PATCH v2 02/18] target/arm: Implement MVE VSUB, VMUL, VABD, VMAXNM, VMINNM Peter Maydell
` (16 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-08-26 13:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Implement the MVE VADD (floating-point) insn. Handling of this is
similar to the 2-operand integer insns, except that we must take care
to only update the floating point exception status if the least
significant bit of the predicate mask for each element is active.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
v1->v2: use float16/float32 type; add DO_2OP_FP_ALL macro
to invoke DO_2OP_FP for both float16 and float32
---
target/arm/helper-mve.h | 3 +++
target/arm/translate.h | 6 ++++++
target/arm/mve.decode | 10 ++++++++++
target/arm/mve_helper.c | 40 +++++++++++++++++++++++++++++++++++++
target/arm/translate-mve.c | 17 ++++++++++++++++
target/arm/translate-neon.c | 6 ------
6 files changed, 76 insertions(+), 6 deletions(-)
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 3db9b15f121..32fd2e1f9be 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -410,6 +410,9 @@ DEF_HELPER_FLAGS_4(mve_vhcadd270b, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
DEF_HELPER_FLAGS_4(mve_vhcadd270h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
DEF_HELPER_FLAGS_4(mve_vhcadd270w, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vfaddh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vfadds, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/translate.h b/target/arm/translate.h
index 241596c5bda..8636c20c3b4 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -181,6 +181,12 @@ static inline int rsub_8(DisasContext *s, int x)
return 8 - x;
}
+static inline int neon_3same_fp_size(DisasContext *s, int x)
+{
+ /* Convert 0==fp32, 1==fp16 into a MO_* value */
+ return MO_32 - x;
+}
+
static inline int arm_dc_feature(DisasContext *dc, int feature)
{
return (dc->features & (1ULL << feature)) != 0;
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 87446816293..e211cb016c6 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -26,6 +26,10 @@
# VQDMULL has size in bit 28: 0 for 16 bit, 1 for 32 bit
%size_28 28:1 !function=plus_1
+# 2 operand fp insns have size in bit 20: 1 for 16 bit, 0 for 32 bit,
+# like Neon FP insns.
+%2op_fp_size 20:1 !function=neon_3same_fp_size
+
# 1imm format immediate
%imm_28_16_0 28:1 16:3 0:4
@@ -118,6 +122,9 @@
@vmaxv .... .... .... size:2 .. rda:4 .... .... .... &vmaxv qm=%qm
+@2op_fp .... .... .... .... .... .... .... .... &2op \
+ qd=%qd qn=%qn qm=%qm size=%2op_fp_size
+
# Vector loads and stores
# Widening loads and narrowing stores:
@@ -615,3 +622,6 @@ VCMPGE_scalar 1111 1110 0 . .. ... 1 ... 1 1111 0 1 0 0 .... @vcmp_scalar
VCMPLT_scalar 1111 1110 0 . .. ... 1 ... 1 1111 1 1 0 0 .... @vcmp_scalar
VCMPGT_scalar 1111 1110 0 . .. ... 1 ... 1 1111 0 1 1 0 .... @vcmp_scalar
VCMPLE_scalar 1111 1110 0 . .. ... 1 ... 1 1111 1 1 1 0 .... @vcmp_scalar
+
+# 2-operand FP
+VADD_fp 1110 1111 0 . 0 . ... 0 ... 0 1101 . 1 . 0 ... 0 @2op_fp
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index c2826eb5f9f..abca7c0b2ab 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -25,6 +25,7 @@
#include "exec/cpu_ldst.h"
#include "exec/exec-all.h"
#include "tcg/tcg.h"
+#include "fpu/softfloat.h"
static uint16_t mve_eci_mask(CPUARMState *env)
{
@@ -2798,3 +2799,42 @@ DO_VMAXMINA(vmaxaw, 4, int32_t, uint32_t, DO_MAX)
DO_VMAXMINA(vminab, 1, int8_t, uint8_t, DO_MIN)
DO_VMAXMINA(vminah, 2, int16_t, uint16_t, DO_MIN)
DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
+
+/*
+ * 2-operand floating point. Note that if an element is partially
+ * predicated we must do the FP operation to update the non-predicated
+ * bytes, but we must be careful to avoid updating the FP exception
+ * state unless byte 0 of the element was unpredicated.
+ */
+#define DO_2OP_FP(OP, ESIZE, TYPE, FN) \
+ void HELPER(glue(mve_, OP))(CPUARMState *env, \
+ void *vd, void *vn, void *vm) \
+ { \
+ TYPE *d = vd, *n = vn, *m = vm; \
+ TYPE r; \
+ uint16_t mask = mve_element_mask(env); \
+ unsigned e; \
+ float_status *fpst; \
+ float_status scratch_fpst; \
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
+ if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) { \
+ continue; \
+ } \
+ fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 : \
+ &env->vfp.standard_fp_status; \
+ if (!(mask & 1)) { \
+ /* We need the result but without updating flags */ \
+ scratch_fpst = *fpst; \
+ fpst = &scratch_fpst; \
+ } \
+ r = FN(n[H##ESIZE(e)], m[H##ESIZE(e)], fpst); \
+ mergemask(&d[H##ESIZE(e)], r, mask); \
+ } \
+ mve_advance_vpt(env); \
+ }
+
+#define DO_2OP_FP_ALL(OP, FN) \
+ DO_2OP_FP(OP##h, 2, float16, float16_##FN) \
+ DO_2OP_FP(OP##s, 4, float32, float32_##FN)
+
+DO_2OP_FP_ALL(vfadd, add)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 78229c44c68..d2c40ede564 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -831,6 +831,23 @@ static bool trans_VSBCI(DisasContext *s, arg_2op *a)
return do_2op(s, a, gen_helper_mve_vsbci);
}
+#define DO_2OP_FP(INSN, FN) \
+ static bool trans_##INSN(DisasContext *s, arg_2op *a) \
+ { \
+ static MVEGenTwoOpFn * const fns[] = { \
+ NULL, \
+ gen_helper_mve_##FN##h, \
+ gen_helper_mve_##FN##s, \
+ NULL, \
+ }; \
+ if (!dc_isar_feature(aa32_mve_fp, s)) { \
+ return false; \
+ } \
+ return do_2op(s, a, fns[a->size]); \
+ }
+
+DO_2OP_FP(VADD_fp, vfadd)
+
static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
MVEGenTwoOpScalarFn fn)
{
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index c53ab20fa48..dd43de558e4 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -28,12 +28,6 @@
#include "translate.h"
#include "translate-a32.h"
-static inline int neon_3same_fp_size(DisasContext *s, int x)
-{
- /* Convert 0==fp32, 1==fp16 into a MO_* value */
- return MO_32 - x;
-}
-
/* Include the generated Neon decoder */
#include "decode-neon-dp.c.inc"
#include "decode-neon-ls.c.inc"
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 02/18] target/arm: Implement MVE VSUB, VMUL, VABD, VMAXNM, VMINNM
2021-08-26 13:17 [PATCH v2 00/18] target/arm: MVE slice 4 Peter Maydell
2021-08-26 13:17 ` [PATCH v2 01/18] target/arm: Implement MVE VADD (floating-point) Peter Maydell
@ 2021-08-26 13:17 ` Peter Maydell
2021-08-26 13:17 ` [PATCH v2 03/18] target/arm: Implement MVE VCADD Peter Maydell
` (15 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-08-26 13:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Implement more simple 2-operand floating point MVE insns.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
v1->v2: use DO_2OP_FP_ALL
---
target/arm/helper-mve.h | 15 +++++++++++++++
target/arm/mve.decode | 6 ++++++
target/arm/mve_helper.c | 16 ++++++++++++++++
target/arm/translate-mve.c | 5 +++++
4 files changed, 42 insertions(+)
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 32fd2e1f9be..370876d7934 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -413,6 +413,21 @@ DEF_HELPER_FLAGS_4(mve_vhcadd270w, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
DEF_HELPER_FLAGS_4(mve_vfaddh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
DEF_HELPER_FLAGS_4(mve_vfadds, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vfsubh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vfsubs, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vfmulh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vfmuls, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vfabdh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vfabds, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vmaxnmh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmaxnms, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vminnmh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vminnms, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index e211cb016c6..cdbfaa4245b 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -625,3 +625,9 @@ VCMPLE_scalar 1111 1110 0 . .. ... 1 ... 1 1111 1 1 1 0 .... @vcmp_scalar
# 2-operand FP
VADD_fp 1110 1111 0 . 0 . ... 0 ... 0 1101 . 1 . 0 ... 0 @2op_fp
+VSUB_fp 1110 1111 0 . 1 . ... 0 ... 0 1101 . 1 . 0 ... 0 @2op_fp
+VMUL_fp 1111 1111 0 . 0 . ... 0 ... 0 1101 . 1 . 1 ... 0 @2op_fp
+VABD_fp 1111 1111 0 . 1 . ... 0 ... 0 1101 . 1 . 0 ... 0 @2op_fp
+
+VMAXNM 1111 1111 0 . 0 . ... 0 ... 0 1111 . 1 . 1 ... 0 @2op_fp
+VMINNM 1111 1111 0 . 1 . ... 0 ... 0 1111 . 1 . 1 ... 0 @2op_fp
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index abca7c0b2ab..d6bc686c985 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -2838,3 +2838,19 @@ DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
DO_2OP_FP(OP##s, 4, float32, float32_##FN)
DO_2OP_FP_ALL(vfadd, add)
+DO_2OP_FP_ALL(vfsub, sub)
+DO_2OP_FP_ALL(vfmul, mul)
+
+static inline float16 float16_abd(float16 a, float16 b, float_status *s)
+{
+ return float16_abs(float16_sub(a, b, s));
+}
+
+static inline float32 float32_abd(float32 a, float32 b, float_status *s)
+{
+ return float32_abs(float32_sub(a, b, s));
+}
+
+DO_2OP_FP_ALL(vfabd, abd)
+DO_2OP_FP_ALL(vmaxnm, maxnum)
+DO_2OP_FP_ALL(vminnm, minnum)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index d2c40ede564..98282335820 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -847,6 +847,11 @@ static bool trans_VSBCI(DisasContext *s, arg_2op *a)
}
DO_2OP_FP(VADD_fp, vfadd)
+DO_2OP_FP(VSUB_fp, vfsub)
+DO_2OP_FP(VMUL_fp, vfmul)
+DO_2OP_FP(VABD_fp, vfabd)
+DO_2OP_FP(VMAXNM, vmaxnm)
+DO_2OP_FP(VMINNM, vminnm)
static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
MVEGenTwoOpScalarFn fn)
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 03/18] target/arm: Implement MVE VCADD
2021-08-26 13:17 [PATCH v2 00/18] target/arm: MVE slice 4 Peter Maydell
2021-08-26 13:17 ` [PATCH v2 01/18] target/arm: Implement MVE VADD (floating-point) Peter Maydell
2021-08-26 13:17 ` [PATCH v2 02/18] target/arm: Implement MVE VSUB, VMUL, VABD, VMAXNM, VMINNM Peter Maydell
@ 2021-08-26 13:17 ` Peter Maydell
2021-08-26 13:17 ` [PATCH v2 04/18] target/arm: Implement MVE VFMA and VFMS Peter Maydell
` (14 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-08-26 13:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Implement the MVE VCADD insn. Note that here the size bit is the
opposite sense to the other 2-operand fp insns.
We don't check for the sz == 1 && Qd == Qm UNPREDICTABLE case,
because that would mean we can't use the DO_2OP_FP macro in
translate-mve.c.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
v1->v2: use float16/float32
---
target/arm/helper-mve.h | 6 ++++++
target/arm/mve.decode | 8 ++++++++
target/arm/mve_helper.c | 40 ++++++++++++++++++++++++++++++++++++++
target/arm/translate-mve.c | 4 +++-
4 files changed, 57 insertions(+), 1 deletion(-)
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 370876d7934..42eba8ea96d 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -428,6 +428,12 @@ DEF_HELPER_FLAGS_4(mve_vmaxnms, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
DEF_HELPER_FLAGS_4(mve_vminnmh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
DEF_HELPER_FLAGS_4(mve_vminnms, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vfcadd90h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vfcadd90s, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vfcadd270h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vfcadd270s, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index cdbfaa4245b..c728c7089ac 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -29,6 +29,8 @@
# 2 operand fp insns have size in bit 20: 1 for 16 bit, 0 for 32 bit,
# like Neon FP insns.
%2op_fp_size 20:1 !function=neon_3same_fp_size
+# VCADD is an exception, where bit 20 is 0 for 16 bit and 1 for 32 bit
+%2op_fp_size_rev 20:1 !function=plus_1
# 1imm format immediate
%imm_28_16_0 28:1 16:3 0:4
@@ -125,6 +127,9 @@
@2op_fp .... .... .... .... .... .... .... .... &2op \
qd=%qd qn=%qn qm=%qm size=%2op_fp_size
+@2op_fp_size_rev .... .... .... .... .... .... .... .... &2op \
+ qd=%qd qn=%qn qm=%qm size=%2op_fp_size_rev
+
# Vector loads and stores
# Widening loads and narrowing stores:
@@ -631,3 +636,6 @@ VABD_fp 1111 1111 0 . 1 . ... 0 ... 0 1101 . 1 . 0 ... 0 @2op_fp
VMAXNM 1111 1111 0 . 0 . ... 0 ... 0 1111 . 1 . 1 ... 0 @2op_fp
VMINNM 1111 1111 0 . 1 . ... 0 ... 0 1111 . 1 . 1 ... 0 @2op_fp
+
+VCADD90_fp 1111 1100 1 . 0 . ... 0 ... 0 1000 . 1 . 0 ... 0 @2op_fp_size_rev
+VCADD270_fp 1111 1101 1 . 0 . ... 0 ... 0 1000 . 1 . 0 ... 0 @2op_fp_size_rev
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index d6bc686c985..2cc8b3e11b7 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -2854,3 +2854,43 @@ static inline float32 float32_abd(float32 a, float32 b, float_status *s)
DO_2OP_FP_ALL(vfabd, abd)
DO_2OP_FP_ALL(vmaxnm, maxnum)
DO_2OP_FP_ALL(vminnm, minnum)
+
+#define DO_VCADD_FP(OP, ESIZE, TYPE, FN0, FN1) \
+ void HELPER(glue(mve_, OP))(CPUARMState *env, \
+ void *vd, void *vn, void *vm) \
+ { \
+ TYPE *d = vd, *n = vn, *m = vm; \
+ TYPE r[16 / ESIZE]; \
+ uint16_t tm, mask = mve_element_mask(env); \
+ unsigned e; \
+ float_status *fpst; \
+ float_status scratch_fpst; \
+ /* Calculate all results first to avoid overwriting inputs */ \
+ for (e = 0, tm = mask; e < 16 / ESIZE; e++, tm >>= ESIZE) { \
+ if ((tm & MAKE_64BIT_MASK(0, ESIZE)) == 0) { \
+ r[e] = 0; \
+ continue; \
+ } \
+ fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 : \
+ &env->vfp.standard_fp_status; \
+ if (!(tm & 1)) { \
+ /* We need the result but without updating flags */ \
+ scratch_fpst = *fpst; \
+ fpst = &scratch_fpst; \
+ } \
+ if (!(e & 1)) { \
+ r[e] = FN0(n[H##ESIZE(e)], m[H##ESIZE(e + 1)], fpst); \
+ } else { \
+ r[e] = FN1(n[H##ESIZE(e)], m[H##ESIZE(e - 1)], fpst); \
+ } \
+ } \
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
+ mergemask(&d[H##ESIZE(e)], r[e], mask); \
+ } \
+ mve_advance_vpt(env); \
+ }
+
+DO_VCADD_FP(vfcadd90h, 2, float16, float16_sub, float16_add)
+DO_VCADD_FP(vfcadd90s, 4, float32, float32_sub, float32_add)
+DO_VCADD_FP(vfcadd270h, 2, float16, float16_add, float16_sub)
+DO_VCADD_FP(vfcadd270s, 4, float32, float32_add, float32_sub)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 98282335820..6203e3ff916 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -852,6 +852,8 @@ DO_2OP_FP(VMUL_fp, vfmul)
DO_2OP_FP(VABD_fp, vfabd)
DO_2OP_FP(VMAXNM, vmaxnm)
DO_2OP_FP(VMINNM, vminnm)
+DO_2OP_FP(VCADD90_fp, vfcadd90)
+DO_2OP_FP(VCADD270_fp, vfcadd270)
static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
MVEGenTwoOpScalarFn fn)
@@ -883,7 +885,7 @@ static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
return true;
}
-#define DO_2OP_SCALAR(INSN, FN) \
+#define DO_2OP_SCALAR(INSN, FN) \
static bool trans_##INSN(DisasContext *s, arg_2scalar *a) \
{ \
static MVEGenTwoOpScalarFn * const fns[] = { \
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 04/18] target/arm: Implement MVE VFMA and VFMS
2021-08-26 13:17 [PATCH v2 00/18] target/arm: MVE slice 4 Peter Maydell
` (2 preceding siblings ...)
2021-08-26 13:17 ` [PATCH v2 03/18] target/arm: Implement MVE VCADD Peter Maydell
@ 2021-08-26 13:17 ` Peter Maydell
2021-08-26 13:17 ` [PATCH v2 05/18] target/arm: Implement MVE VCMUL and VCMLA Peter Maydell
` (13 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-08-26 13:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Implement the MVE VFMA and VFMS insns.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
v1->v2: use float16/float32 types; pass CHS bool to
DO_VFMA rather than fn macro, as suggested by rth
---
target/arm/helper-mve.h | 6 ++++++
target/arm/mve.decode | 3 +++
target/arm/mve_helper.c | 37 +++++++++++++++++++++++++++++++++++++
target/arm/translate-mve.c | 2 ++
4 files changed, 48 insertions(+)
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 42eba8ea96d..c230610d25c 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -434,6 +434,12 @@ DEF_HELPER_FLAGS_4(mve_vfcadd90s, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
DEF_HELPER_FLAGS_4(mve_vfcadd270h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
DEF_HELPER_FLAGS_4(mve_vfcadd270s, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vfmah, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vfmas, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vfmsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vfmss, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index c728c7089ac..3a2056f6b34 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -639,3 +639,6 @@ VMINNM 1111 1111 0 . 1 . ... 0 ... 0 1111 . 1 . 1 ... 0 @2op_fp
VCADD90_fp 1111 1100 1 . 0 . ... 0 ... 0 1000 . 1 . 0 ... 0 @2op_fp_size_rev
VCADD270_fp 1111 1101 1 . 0 . ... 0 ... 0 1000 . 1 . 0 ... 0 @2op_fp_size_rev
+
+VFMA 1110 1111 0 . 0 . ... 0 ... 0 1100 . 1 . 1 ... 0 @2op_fp
+VFMS 1110 1111 0 . 1 . ... 0 ... 0 1100 . 1 . 1 ... 0 @2op_fp
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 2cc8b3e11b7..d7f250a4455 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -2894,3 +2894,40 @@ DO_VCADD_FP(vfcadd90h, 2, float16, float16_sub, float16_add)
DO_VCADD_FP(vfcadd90s, 4, float32, float32_sub, float32_add)
DO_VCADD_FP(vfcadd270h, 2, float16, float16_add, float16_sub)
DO_VCADD_FP(vfcadd270s, 4, float32, float32_add, float32_sub)
+
+#define DO_VFMA(OP, ESIZE, TYPE, CHS) \
+ void HELPER(glue(mve_, OP))(CPUARMState *env, \
+ void *vd, void *vn, void *vm) \
+ { \
+ TYPE *d = vd, *n = vn, *m = vm; \
+ TYPE r; \
+ uint16_t mask = mve_element_mask(env); \
+ unsigned e; \
+ float_status *fpst; \
+ float_status scratch_fpst; \
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
+ if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) { \
+ continue; \
+ } \
+ fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 : \
+ &env->vfp.standard_fp_status; \
+ if (!(mask & 1)) { \
+ /* We need the result but without updating flags */ \
+ scratch_fpst = *fpst; \
+ fpst = &scratch_fpst; \
+ } \
+ r = n[H##ESIZE(e)]; \
+ if (CHS) { \
+ r = TYPE##_chs(r); \
+ } \
+ r = TYPE##_muladd(r, m[H##ESIZE(e)], d[H##ESIZE(e)], \
+ 0, fpst); \
+ mergemask(&d[H##ESIZE(e)], r, mask); \
+ } \
+ mve_advance_vpt(env); \
+ }
+
+DO_VFMA(vfmah, 2, float16, false)
+DO_VFMA(vfmas, 4, float32, false)
+DO_VFMA(vfmsh, 2, float16, true)
+DO_VFMA(vfmss, 4, float32, true)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 6203e3ff916..d61abc6d46f 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -854,6 +854,8 @@ DO_2OP_FP(VMAXNM, vmaxnm)
DO_2OP_FP(VMINNM, vminnm)
DO_2OP_FP(VCADD90_fp, vfcadd90)
DO_2OP_FP(VCADD270_fp, vfcadd270)
+DO_2OP_FP(VFMA, vfma)
+DO_2OP_FP(VFMS, vfms)
static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
MVEGenTwoOpScalarFn fn)
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 05/18] target/arm: Implement MVE VCMUL and VCMLA
2021-08-26 13:17 [PATCH v2 00/18] target/arm: MVE slice 4 Peter Maydell
` (3 preceding siblings ...)
2021-08-26 13:17 ` [PATCH v2 04/18] target/arm: Implement MVE VFMA and VFMS Peter Maydell
@ 2021-08-26 13:17 ` Peter Maydell
2021-08-26 13:17 ` [PATCH v2 06/18] target/arm: Implement MVE VMAXNMA and VMINNMA Peter Maydell
` (12 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-08-26 13:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Implement the MVE VCMUL and VCMLA insns.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
v1->v2: use float* types, avoid passing in CHS function
---
target/arm/helper-mve.h | 18 ++++++++
target/arm/mve.decode | 35 ++++++++++++----
target/arm/mve_helper.c | 86 ++++++++++++++++++++++++++++++++++++++
target/arm/translate-mve.c | 8 ++++
4 files changed, 139 insertions(+), 8 deletions(-)
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index c230610d25c..73950403bc3 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -440,6 +440,24 @@ DEF_HELPER_FLAGS_4(mve_vfmas, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
DEF_HELPER_FLAGS_4(mve_vfmsh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
DEF_HELPER_FLAGS_4(mve_vfmss, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vcmul0h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vcmul0s, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vcmul90h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vcmul90s, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vcmul180h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vcmul180s, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vcmul270h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vcmul270s, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vcmla0h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vcmla0s, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vcmla90h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vcmla90s, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vcmla180h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vcmla180s, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vcmla270h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vcmla270s, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
DEF_HELPER_FLAGS_4(mve_vadd_scalarb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vadd_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 3a2056f6b34..403381eef61 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -286,15 +286,29 @@ VQSHL_U 111 1 1111 0 . .. ... 0 ... 0 0100 . 1 . 1 ... 0 @2op_rev
VQRSHL_S 111 0 1111 0 . .. ... 0 ... 0 0101 . 1 . 1 ... 0 @2op_rev
VQRSHL_U 111 1 1111 0 . .. ... 0 ... 0 0101 . 1 . 1 ... 0 @2op_rev
-VQDMLADH 1110 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 0 @2op
-VQDMLADHX 1110 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 0 @2op
-VQRDMLADH 1110 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 1 @2op
-VQRDMLADHX 1110 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
+{
+ VCMUL0 111 . 1110 0 . 11 ... 0 ... 0 1110 . 0 . 0 ... 0 @2op_sz28
+ VQDMLADH 1110 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 0 @2op
+ VQDMLSDH 1111 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 0 @2op
+}
-VQDMLSDH 1111 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 0 @2op
-VQDMLSDHX 1111 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 0 @2op
-VQRDMLSDH 1111 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 1 @2op
-VQRDMLSDHX 1111 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
+{
+ VCMUL180 111 . 1110 0 . 11 ... 0 ... 1 1110 . 0 . 0 ... 0 @2op_sz28
+ VQDMLADHX 111 0 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 0 @2op
+ VQDMLSDHX 111 1 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 0 @2op
+}
+
+{
+ VCMUL90 111 . 1110 0 . 11 ... 0 ... 0 1110 . 0 . 0 ... 1 @2op_sz28
+ VQRDMLADH 111 0 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 1 @2op
+ VQRDMLSDH 111 1 1110 0 . .. ... 0 ... 0 1110 . 0 . 0 ... 1 @2op
+}
+
+{
+ VCMUL270 111 . 1110 0 . 11 ... 0 ... 1 1110 . 0 . 0 ... 1 @2op_sz28
+ VQRDMLADHX 111 0 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
+ VQRDMLSDHX 111 1 1110 0 . .. ... 0 ... 1 1110 . 0 . 0 ... 1 @2op
+}
VQDMULLB 111 . 1110 0 . 11 ... 0 ... 0 1111 . 0 . 0 ... 1 @2op_sz28
VQDMULLT 111 . 1110 0 . 11 ... 0 ... 1 1111 . 0 . 0 ... 1 @2op_sz28
@@ -642,3 +656,8 @@ VCADD270_fp 1111 1101 1 . 0 . ... 0 ... 0 1000 . 1 . 0 ... 0 @2op_fp_size_
VFMA 1110 1111 0 . 0 . ... 0 ... 0 1100 . 1 . 1 ... 0 @2op_fp
VFMS 1110 1111 0 . 1 . ... 0 ... 0 1100 . 1 . 1 ... 0 @2op_fp
+
+VCMLA0 1111 110 00 . 1 . ... 0 ... 0 1000 . 1 . 0 ... 0 @2op_fp_size_rev
+VCMLA90 1111 110 01 . 1 . ... 0 ... 0 1000 . 1 . 0 ... 0 @2op_fp_size_rev
+VCMLA180 1111 110 10 . 1 . ... 0 ... 0 1000 . 1 . 0 ... 0 @2op_fp_size_rev
+VCMLA270 1111 110 11 . 1 . ... 0 ... 0 1000 . 1 . 0 ... 0 @2op_fp_size_rev
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index d7f250a4455..e478408fddd 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -2931,3 +2931,89 @@ DO_VFMA(vfmah, 2, float16, false)
DO_VFMA(vfmas, 4, float32, false)
DO_VFMA(vfmsh, 2, float16, true)
DO_VFMA(vfmss, 4, float32, true)
+
+#define DO_VCMLA(OP, ESIZE, TYPE, ROT, FN) \
+ void HELPER(glue(mve_, OP))(CPUARMState *env, \
+ void *vd, void *vn, void *vm) \
+ { \
+ TYPE *d = vd, *n = vn, *m = vm; \
+ TYPE r0, r1, e1, e2, e3, e4; \
+ uint16_t mask = mve_element_mask(env); \
+ unsigned e; \
+ float_status *fpst0, *fpst1; \
+ float_status scratch_fpst; \
+ /* We loop through pairs of elements at a time */ \
+ for (e = 0; e < 16 / ESIZE; e += 2, mask >>= ESIZE * 2) { \
+ if ((mask & MAKE_64BIT_MASK(0, ESIZE * 2)) == 0) { \
+ continue; \
+ } \
+ fpst0 = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 : \
+ &env->vfp.standard_fp_status; \
+ fpst1 = fpst0; \
+ if (!(mask & 1)) { \
+ scratch_fpst = *fpst0; \
+ fpst0 = &scratch_fpst; \
+ } \
+ if (!(mask & (1 << ESIZE))) { \
+ scratch_fpst = *fpst1; \
+ fpst1 = &scratch_fpst; \
+ } \
+ switch (ROT) { \
+ case 0: \
+ e1 = m[H##ESIZE(e)]; \
+ e2 = n[H##ESIZE(e)]; \
+ e3 = m[H##ESIZE(e + 1)]; \
+ e4 = n[H##ESIZE(e)]; \
+ break; \
+ case 1: \
+ e1 = TYPE##_chs(m[H##ESIZE(e + 1)]); \
+ e2 = n[H##ESIZE(e + 1)]; \
+ e3 = m[H##ESIZE(e)]; \
+ e4 = n[H##ESIZE(e + 1)]; \
+ break; \
+ case 2: \
+ e1 = TYPE##_chs(m[H##ESIZE(e)]); \
+ e2 = n[H##ESIZE(e)]; \
+ e3 = TYPE##_chs(m[H##ESIZE(e + 1)]); \
+ e4 = n[H##ESIZE(e)]; \
+ break; \
+ case 3: \
+ e1 = m[H##ESIZE(e + 1)]; \
+ e2 = n[H##ESIZE(e + 1)]; \
+ e3 = TYPE##_chs(m[H##ESIZE(e)]); \
+ e4 = n[H##ESIZE(e + 1)]; \
+ break; \
+ default: \
+ g_assert_not_reached(); \
+ } \
+ r0 = FN(e2, e1, d[H##ESIZE(e)], fpst0); \
+ r1 = FN(e4, e3, d[H##ESIZE(e + 1)], fpst1); \
+ mergemask(&d[H##ESIZE(e)], r0, mask); \
+ mergemask(&d[H##ESIZE(e + 1)], r1, mask >> ESIZE); \
+ } \
+ mve_advance_vpt(env); \
+ }
+
+#define DO_VCMULH(N, M, D, S) float16_mul(N, M, S)
+#define DO_VCMULS(N, M, D, S) float32_mul(N, M, S)
+
+#define DO_VCMLAH(N, M, D, S) float16_muladd(N, M, D, 0, S)
+#define DO_VCMLAS(N, M, D, S) float32_muladd(N, M, D, 0, S)
+
+DO_VCMLA(vcmul0h, 2, float16, 0, DO_VCMULH)
+DO_VCMLA(vcmul0s, 4, float32, 0, DO_VCMULS)
+DO_VCMLA(vcmul90h, 2, float16, 1, DO_VCMULH)
+DO_VCMLA(vcmul90s, 4, float32, 1, DO_VCMULS)
+DO_VCMLA(vcmul180h, 2, float16, 2, DO_VCMULH)
+DO_VCMLA(vcmul180s, 4, float32, 2, DO_VCMULS)
+DO_VCMLA(vcmul270h, 2, float16, 3, DO_VCMULH)
+DO_VCMLA(vcmul270s, 4, float32, 3, DO_VCMULS)
+
+DO_VCMLA(vcmla0h, 2, float16, 0, DO_VCMLAH)
+DO_VCMLA(vcmla0s, 4, float32, 0, DO_VCMLAS)
+DO_VCMLA(vcmla90h, 2, float16, 1, DO_VCMLAH)
+DO_VCMLA(vcmla90s, 4, float32, 1, DO_VCMLAS)
+DO_VCMLA(vcmla180h, 2, float16, 2, DO_VCMLAH)
+DO_VCMLA(vcmla180s, 4, float32, 2, DO_VCMLAS)
+DO_VCMLA(vcmla270h, 2, float16, 3, DO_VCMLAH)
+DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index d61abc6d46f..d62ed1fc295 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -856,6 +856,14 @@ DO_2OP_FP(VCADD90_fp, vfcadd90)
DO_2OP_FP(VCADD270_fp, vfcadd270)
DO_2OP_FP(VFMA, vfma)
DO_2OP_FP(VFMS, vfms)
+DO_2OP_FP(VCMUL0, vcmul0)
+DO_2OP_FP(VCMUL90, vcmul90)
+DO_2OP_FP(VCMUL180, vcmul180)
+DO_2OP_FP(VCMUL270, vcmul270)
+DO_2OP_FP(VCMLA0, vcmla0)
+DO_2OP_FP(VCMLA90, vcmla90)
+DO_2OP_FP(VCMLA180, vcmla180)
+DO_2OP_FP(VCMLA270, vcmla270)
static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
MVEGenTwoOpScalarFn fn)
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 06/18] target/arm: Implement MVE VMAXNMA and VMINNMA
2021-08-26 13:17 [PATCH v2 00/18] target/arm: MVE slice 4 Peter Maydell
` (4 preceding siblings ...)
2021-08-26 13:17 ` [PATCH v2 05/18] target/arm: Implement MVE VCMUL and VCMLA Peter Maydell
@ 2021-08-26 13:17 ` Peter Maydell
2021-08-26 13:17 ` [PATCH v2 07/18] target/arm: Implement MVE scalar fp insns Peter Maydell
` (11 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-08-26 13:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Implement the MVE VMAXNMA and VMINNMA insns; these are 2-operand, but
the destination register must be the same as one of the source
registers.
We defer the decode of the size in bit 28 to the individual insn
patterns rather than doing it in the format, because otherwise we
would have a single insn pattern that overlapped with two groups (eg
VMAXNMA with the VMULH_S and VMULH_U groups). Having two insn
patterns per insn seems clearer than a complex multilevel nesting
of overlapping and non-overlapping groups.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
v1->v2: use DO_2OP_FP_ALL macro
---
target/arm/helper-mve.h | 6 ++++++
target/arm/mve.decode | 11 +++++++++++
target/arm/mve_helper.c | 23 +++++++++++++++++++++++
target/arm/translate-mve.c | 2 ++
4 files changed, 42 insertions(+)
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 73950403bc3..57ab3f7b59f 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -428,6 +428,12 @@ DEF_HELPER_FLAGS_4(mve_vmaxnms, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
DEF_HELPER_FLAGS_4(mve_vminnmh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
DEF_HELPER_FLAGS_4(mve_vminnms, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmaxnmah, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmaxnmas, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
+DEF_HELPER_FLAGS_4(mve_vminnmah, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vminnmas, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
DEF_HELPER_FLAGS_4(mve_vfcadd90h, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
DEF_HELPER_FLAGS_4(mve_vfcadd90s, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 403381eef61..b0622e1f62c 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -130,6 +130,11 @@
@2op_fp_size_rev .... .... .... .... .... .... .... .... &2op \
qd=%qd qn=%qn qm=%qm size=%2op_fp_size_rev
+# 2-operand, but Qd and Qn share a field. Size is in bit 28, but we
+# don't decode it in this format
+@vmaxnma .... .... .... .... .... .... .... .... &2op \
+ qd=%qd qn=%qd qm=%qm
+
# Vector loads and stores
# Widening loads and narrowing stores:
@@ -199,6 +204,8 @@ VMUL 1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
# The VSHLL T2 encoding is not a @2op pattern, but is here because it
# overlaps what would be size=0b11 VMULH/VRMULH
{
+ VMAXNMA 111 0 1110 0 . 11 1111 ... 0 1110 1 0 . 0 ... 1 @vmaxnma size=2
+
VSHLL_BS 111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
VSHLL_BS 111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
@@ -211,6 +218,8 @@ VMUL 1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
}
{
+ VMAXNMA 111 1 1110 0 . 11 1111 ... 0 1110 1 0 . 0 ... 1 @vmaxnma size=1
+
VSHLL_BU 111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
VSHLL_BU 111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
@@ -221,6 +230,7 @@ VMUL 1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
}
{
+ VMINNMA 111 0 1110 0 . 11 1111 ... 1 1110 1 0 . 0 ... 1 @vmaxnma size=2
VSHLL_TS 111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
VSHLL_TS 111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
@@ -233,6 +243,7 @@ VMUL 1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
}
{
+ VMINNMA 111 1 1110 0 . 11 1111 ... 1 1110 1 0 . 0 ... 1 @vmaxnma size=1
VSHLL_TU 111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
VSHLL_TU 111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index e478408fddd..a6ad894414a 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -2855,6 +2855,29 @@ DO_2OP_FP_ALL(vfabd, abd)
DO_2OP_FP_ALL(vmaxnm, maxnum)
DO_2OP_FP_ALL(vminnm, minnum)
+static inline float16 float16_maxnuma(float16 a, float16 b, float_status *s)
+{
+ return float16_maxnum(float16_abs(a), float16_abs(b), s);
+}
+
+static inline float32 float32_maxnuma(float32 a, float32 b, float_status *s)
+{
+ return float32_maxnum(float32_abs(a), float32_abs(b), s);
+}
+
+static inline float16 float16_minnuma(float16 a, float16 b, float_status *s)
+{
+ return float16_minnum(float16_abs(a), float16_abs(b), s);
+}
+
+static inline float32 float32_minnuma(float32 a, float32 b, float_status *s)
+{
+ return float32_minnum(float32_abs(a), float32_abs(b), s);
+}
+
+DO_2OP_FP_ALL(vmaxnma, maxnuma)
+DO_2OP_FP_ALL(vminnma, minnuma)
+
#define DO_VCADD_FP(OP, ESIZE, TYPE, FN0, FN1) \
void HELPER(glue(mve_, OP))(CPUARMState *env, \
void *vd, void *vn, void *vm) \
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index d62ed1fc295..4d702da808d 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -864,6 +864,8 @@ DO_2OP_FP(VCMLA0, vcmla0)
DO_2OP_FP(VCMLA90, vcmla90)
DO_2OP_FP(VCMLA180, vcmla180)
DO_2OP_FP(VCMLA270, vcmla270)
+DO_2OP_FP(VMAXNMA, vmaxnma)
+DO_2OP_FP(VMINNMA, vminnma)
static bool do_2op_scalar(DisasContext *s, arg_2scalar *a,
MVEGenTwoOpScalarFn fn)
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 07/18] target/arm: Implement MVE scalar fp insns
2021-08-26 13:17 [PATCH v2 00/18] target/arm: MVE slice 4 Peter Maydell
` (5 preceding siblings ...)
2021-08-26 13:17 ` [PATCH v2 06/18] target/arm: Implement MVE VMAXNMA and VMINNMA Peter Maydell
@ 2021-08-26 13:17 ` Peter Maydell
2021-08-26 13:17 ` [PATCH v2 08/18] target/arm: Implement MVE fp-with-scalar VFMA, VFMAS Peter Maydell
` (10 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-08-26 13:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Implement the MVE scalar floating point insns VADD, VSUB and VMUL.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
v1->v2: add and use DO_2OP_FP_SCALAR_ALL macro; use float* types
---
target/arm/helper-mve.h | 9 +++++++++
target/arm/mve.decode | 27 +++++++++++++++++++++------
target/arm/mve_helper.c | 35 +++++++++++++++++++++++++++++++++++
target/arm/translate-mve.c | 20 ++++++++++++++++++++
4 files changed, 85 insertions(+), 6 deletions(-)
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 57ab3f7b59f..091ec4b4270 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -800,3 +800,12 @@ DEF_HELPER_FLAGS_3(mve_vcmpgt_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
DEF_HELPER_FLAGS_3(mve_vcmple_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
DEF_HELPER_FLAGS_3(mve_vcmple_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
DEF_HELPER_FLAGS_3(mve_vcmple_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vfadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vfadd_scalars, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vfsub_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vfsub_scalars, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vfmul_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vfmul_scalars, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index b0622e1f62c..5ba8b6deeaa 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -31,6 +31,8 @@
%2op_fp_size 20:1 !function=neon_3same_fp_size
# VCADD is an exception, where bit 20 is 0 for 16 bit and 1 for 32 bit
%2op_fp_size_rev 20:1 !function=plus_1
+# FP scalars have size in bit 28, 1 for 16 bit, 0 for 32 bit
+%2op_fp_scalar_size 28:1 !function=neon_3same_fp_size
# 1imm format immediate
%imm_28_16_0 28:1 16:3 0:4
@@ -135,6 +137,9 @@
@vmaxnma .... .... .... .... .... .... .... .... &2op \
qd=%qd qn=%qd qm=%qm
+@2op_fp_scalar .... .... .... .... .... .... .... rm:4 &2scalar \
+ qd=%qd qn=%qn size=%2op_fp_scalar_size
+
# Vector loads and stores
# Widening loads and narrowing stores:
@@ -471,10 +476,17 @@ VSUB_scalar 1110 1110 0 . .. ... 1 ... 1 1111 . 100 .... @2scalar
VBRSR 1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
}
-VHADD_S_scalar 1110 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
-VHADD_U_scalar 1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
-VHSUB_S_scalar 1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
-VHSUB_U_scalar 1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
+{
+ VADD_fp_scalar 111 . 1110 0 . 11 ... 0 ... 0 1111 . 100 .... @2op_fp_scalar
+ VHADD_S_scalar 1110 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
+ VHADD_U_scalar 1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
+}
+
+{
+ VSUB_fp_scalar 111 . 1110 0 . 11 ... 0 ... 1 1111 . 100 .... @2op_fp_scalar
+ VHSUB_S_scalar 1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
+ VHSUB_U_scalar 1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
+}
{
VQADD_S_scalar 1110 1110 0 . .. ... 0 ... 0 1111 . 110 .... @2scalar
@@ -490,8 +502,11 @@ VHSUB_U_scalar 1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
size=%size_28
}
-VQDMULH_scalar 1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
-VQRDMULH_scalar 1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
+{
+ VMUL_fp_scalar 111 . 1110 0 . 11 ... 1 ... 0 1110 . 110 .... @2op_fp_scalar
+ VQDMULH_scalar 1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
+ VQRDMULH_scalar 1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
+}
# The U bit (28) is don't-care because it does not affect the result
VMLA 111- 1110 0 . .. ... 1 ... 0 1110 . 100 .... @2scalar
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index a6ad894414a..b49975fdc01 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -3040,3 +3040,38 @@ DO_VCMLA(vcmla180h, 2, float16, 2, DO_VCMLAH)
DO_VCMLA(vcmla180s, 4, float32, 2, DO_VCMLAS)
DO_VCMLA(vcmla270h, 2, float16, 3, DO_VCMLAH)
DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
+
+#define DO_2OP_FP_SCALAR(OP, ESIZE, TYPE, FN) \
+ void HELPER(glue(mve_, OP))(CPUARMState *env, \
+ void *vd, void *vn, uint32_t rm) \
+ { \
+ TYPE *d = vd, *n = vn; \
+ TYPE r, m = rm; \
+ uint16_t mask = mve_element_mask(env); \
+ unsigned e; \
+ float_status *fpst; \
+ float_status scratch_fpst; \
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
+ if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) { \
+ continue; \
+ } \
+ fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 : \
+ &env->vfp.standard_fp_status; \
+ if (!(mask & 1)) { \
+ /* We need the result but without updating flags */ \
+ scratch_fpst = *fpst; \
+ fpst = &scratch_fpst; \
+ } \
+ r = FN(n[H##ESIZE(e)], m, fpst); \
+ mergemask(&d[H##ESIZE(e)], r, mask); \
+ } \
+ mve_advance_vpt(env); \
+ }
+
+#define DO_2OP_FP_SCALAR_ALL(OP, FN) \
+ DO_2OP_FP_SCALAR(OP##h, 2, float16, float16_##FN) \
+ DO_2OP_FP_SCALAR(OP##s, 4, float32, float32_##FN)
+
+DO_2OP_FP_SCALAR_ALL(vfadd_scalar, add)
+DO_2OP_FP_SCALAR_ALL(vfsub_scalar, sub)
+DO_2OP_FP_SCALAR_ALL(vfmul_scalar, mul)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 4d702da808d..bc4b3f840a0 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -960,6 +960,26 @@ static bool trans_VQDMULLT_scalar(DisasContext *s, arg_2scalar *a)
return do_2op_scalar(s, a, fns[a->size]);
}
+
+#define DO_2OP_FP_SCALAR(INSN, FN) \
+ static bool trans_##INSN(DisasContext *s, arg_2scalar *a) \
+ { \
+ static MVEGenTwoOpScalarFn * const fns[] = { \
+ NULL, \
+ gen_helper_mve_##FN##h, \
+ gen_helper_mve_##FN##s, \
+ NULL, \
+ }; \
+ if (!dc_isar_feature(aa32_mve_fp, s)) { \
+ return false; \
+ } \
+ return do_2op_scalar(s, a, fns[a->size]); \
+ }
+
+DO_2OP_FP_SCALAR(VADD_fp_scalar, vfadd_scalar)
+DO_2OP_FP_SCALAR(VSUB_fp_scalar, vfsub_scalar)
+DO_2OP_FP_SCALAR(VMUL_fp_scalar, vfmul_scalar)
+
static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
MVEGenLongDualAccOpFn *fn)
{
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 08/18] target/arm: Implement MVE fp-with-scalar VFMA, VFMAS
2021-08-26 13:17 [PATCH v2 00/18] target/arm: MVE slice 4 Peter Maydell
` (6 preceding siblings ...)
2021-08-26 13:17 ` [PATCH v2 07/18] target/arm: Implement MVE scalar fp insns Peter Maydell
@ 2021-08-26 13:17 ` Peter Maydell
2021-08-26 13:17 ` [PATCH v2 09/18] softfloat: Remove assertion preventing silencing of NaN in default-NaN mode Peter Maydell
` (9 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-08-26 13:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Implement the MVE fp-with-scalar VFMA and VFMAS insns.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
v1->v2: use float* types
---
target/arm/helper-mve.h | 6 ++++++
target/arm/mve.decode | 14 +++++++++++---
target/arm/mve_helper.c | 37 +++++++++++++++++++++++++++++++++++++
target/arm/translate-mve.c | 2 ++
4 files changed, 56 insertions(+), 3 deletions(-)
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 091ec4b4270..cb7b6423239 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -809,3 +809,9 @@ DEF_HELPER_FLAGS_4(mve_vfsub_scalars, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vfmul_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vfmul_scalars, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vfma_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vfma_scalars, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vfmas_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vfmas_scalars, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 5ba8b6deeaa..d2bd6815bc3 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -508,9 +508,17 @@ VSUB_scalar 1110 1110 0 . .. ... 1 ... 1 1111 . 100 .... @2scalar
VQRDMULH_scalar 1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
}
-# The U bit (28) is don't-care because it does not affect the result
-VMLA 111- 1110 0 . .. ... 1 ... 0 1110 . 100 .... @2scalar
-VMLAS 111- 1110 0 . .. ... 1 ... 1 1110 . 100 .... @2scalar
+{
+ VFMA_scalar 111 . 1110 0 . 11 ... 1 ... 0 1110 . 100 .... @2op_fp_scalar
+ # The U bit (28) is don't-care because it does not affect the result
+ VMLA 111 - 1110 0 . .. ... 1 ... 0 1110 . 100 .... @2scalar
+}
+
+{
+ VFMAS_scalar 111 . 1110 0 . 11 ... 1 ... 1 1110 . 100 .... @2op_fp_scalar
+ # The U bit (28) is don't-care because it does not affect the result
+ VMLAS 111 - 1110 0 . .. ... 1 ... 1 1110 . 100 .... @2scalar
+}
VQRDMLAH 1110 1110 0 . .. ... 0 ... 0 1110 . 100 .... @2scalar
VQRDMLASH 1110 1110 0 . .. ... 0 ... 1 1110 . 100 .... @2scalar
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index b49975fdc01..36f0910b856 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -3075,3 +3075,40 @@ DO_VCMLA(vcmla270s, 4, float32, 3, DO_VCMLAS)
DO_2OP_FP_SCALAR_ALL(vfadd_scalar, add)
DO_2OP_FP_SCALAR_ALL(vfsub_scalar, sub)
DO_2OP_FP_SCALAR_ALL(vfmul_scalar, mul)
+
+#define DO_2OP_FP_ACC_SCALAR(OP, ESIZE, TYPE, FN) \
+ void HELPER(glue(mve_, OP))(CPUARMState *env, \
+ void *vd, void *vn, uint32_t rm) \
+ { \
+ TYPE *d = vd, *n = vn; \
+ TYPE r, m = rm; \
+ uint16_t mask = mve_element_mask(env); \
+ unsigned e; \
+ float_status *fpst; \
+ float_status scratch_fpst; \
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
+ if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) { \
+ continue; \
+ } \
+ fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 : \
+ &env->vfp.standard_fp_status; \
+ if (!(mask & 1)) { \
+ /* We need the result but without updating flags */ \
+ scratch_fpst = *fpst; \
+ fpst = &scratch_fpst; \
+ } \
+ r = FN(n[H##ESIZE(e)], m, d[H##ESIZE(e)], 0, fpst); \
+ mergemask(&d[H##ESIZE(e)], r, mask); \
+ } \
+ mve_advance_vpt(env); \
+ }
+
+/* VFMAS is vector * vector + scalar, so swap op2 and op3 */
+#define DO_VFMAS_SCALARH(N, M, D, F, S) float16_muladd(N, D, M, F, S)
+#define DO_VFMAS_SCALARS(N, M, D, F, S) float32_muladd(N, D, M, F, S)
+
+/* VFMA is vector * scalar + vector */
+DO_2OP_FP_ACC_SCALAR(vfma_scalarh, 2, float16, float16_muladd)
+DO_2OP_FP_ACC_SCALAR(vfma_scalars, 4, float32, float32_muladd)
+DO_2OP_FP_ACC_SCALAR(vfmas_scalarh, 2, float16, DO_VFMAS_SCALARH)
+DO_2OP_FP_ACC_SCALAR(vfmas_scalars, 4, float32, DO_VFMAS_SCALARS)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index bc4b3f840a0..3627ba227f2 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -979,6 +979,8 @@ static bool trans_VQDMULLT_scalar(DisasContext *s, arg_2scalar *a)
DO_2OP_FP_SCALAR(VADD_fp_scalar, vfadd_scalar)
DO_2OP_FP_SCALAR(VSUB_fp_scalar, vfsub_scalar)
DO_2OP_FP_SCALAR(VMUL_fp_scalar, vfmul_scalar)
+DO_2OP_FP_SCALAR(VFMA_scalar, vfma_scalar)
+DO_2OP_FP_SCALAR(VFMAS_scalar, vfmas_scalar)
static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
MVEGenLongDualAccOpFn *fn)
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 09/18] softfloat: Remove assertion preventing silencing of NaN in default-NaN mode
2021-08-26 13:17 [PATCH v2 00/18] target/arm: MVE slice 4 Peter Maydell
` (7 preceding siblings ...)
2021-08-26 13:17 ` [PATCH v2 08/18] target/arm: Implement MVE fp-with-scalar VFMA, VFMAS Peter Maydell
@ 2021-08-26 13:17 ` Peter Maydell
2021-08-26 13:17 ` [PATCH v2 10/18] target/arm: Implement MVE FP max/min across vector Peter Maydell
` (8 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-08-26 13:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel
In commit a777d6033447a we added an assertion to parts_silence_nan() that
prohibits calling float*_silence_nan() when in default-NaN mode.
This ties together a property of the output ("do we generate a default
NaN when the result is a NaN?") with an operation on an input ("silence
this input NaN").
It's true that most of the time when in default-NaN mode you won't
need to silence an input NaN, because you can just produce the
default NaN as the result instead. But some functions like
float*_maxnum() are defined to be able to work with quiet NaNs, so
silencing an input SNaN is still reasonable. In particular, the
upcoming implementation of MVE VMAXNMV would fall over this assertion
if we didn't delete it.
Delete the assertion.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
fpu/softfloat-specialize.c.inc | 1 -
1 file changed, 1 deletion(-)
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index 12467bb9bba..f2ad0f335e6 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -198,7 +198,6 @@ static void parts128_default_nan(FloatParts128 *p, float_status *status)
static uint64_t parts_silence_nan_frac(uint64_t frac, float_status *status)
{
g_assert(!no_signaling_nans(status));
- g_assert(!status->default_nan_mode);
/* The only snan_bit_is_one target without default_nan_mode is HPPA. */
if (snan_bit_is_one(status)) {
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 10/18] target/arm: Implement MVE FP max/min across vector
2021-08-26 13:17 [PATCH v2 00/18] target/arm: MVE slice 4 Peter Maydell
` (8 preceding siblings ...)
2021-08-26 13:17 ` [PATCH v2 09/18] softfloat: Remove assertion preventing silencing of NaN in default-NaN mode Peter Maydell
@ 2021-08-26 13:17 ` Peter Maydell
2021-08-30 9:17 ` Peter Maydell
2021-08-26 13:17 ` [PATCH v2 11/18] target/arm: Implement MVE fp vector comparisons Peter Maydell
` (7 subsequent siblings)
17 siblings, 1 reply; 20+ messages in thread
From: Peter Maydell @ 2021-08-26 13:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Implement the MVE VMAXNMV, VMINNMV, VMAXNMAV, VMINNMAV insns. These
calculate the maximum or minimum of floating point elements across a
vector, starting with a value in a general purpose register and
returning the result there.
The pseudocode silences a possible SNaN in the accumulating result
on every iteration (by calling FPConvertNaN), but we do it only
on the input ra, because if none of the inputs to float*_maxnum
or float*_minnum are SNaNs then the result can't be an SNaN.
Note that we can't use the float*_maxnuma() etc functions we defined
earlier for VMAXNMA and VMINNMA, because we mustn't take the absolute
value of the starting general-purpose register value, which could be
negative.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
v1->v2: use float* types only
---
target/arm/helper-mve.h | 12 +++++++++++
target/arm/mve.decode | 12 +++++++++++
target/arm/mve_helper.c | 44 ++++++++++++++++++++++++++++++++++++++
target/arm/translate-mve.c | 20 +++++++++++++++++
4 files changed, 88 insertions(+)
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index cb7b6423239..47fd18dddbf 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -614,6 +614,18 @@ DEF_HELPER_FLAGS_3(mve_vminavb, TCG_CALL_NO_WG, i32, env, ptr, i32)
DEF_HELPER_FLAGS_3(mve_vminavh, TCG_CALL_NO_WG, i32, env, ptr, i32)
DEF_HELPER_FLAGS_3(mve_vminavw, TCG_CALL_NO_WG, i32, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vmaxnmvh, TCG_CALL_NO_WG, i32, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vmaxnmvs, TCG_CALL_NO_WG, i32, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(mve_vminnmvh, TCG_CALL_NO_WG, i32, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vminnmvs, TCG_CALL_NO_WG, i32, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(mve_vmaxnmavh, TCG_CALL_NO_WG, i32, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vmaxnmavs, TCG_CALL_NO_WG, i32, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(mve_vminnmavh, TCG_CALL_NO_WG, i32, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vminnmavs, TCG_CALL_NO_WG, i32, env, ptr, i32)
+
DEF_HELPER_FLAGS_3(mve_vaddlv_s, TCG_CALL_NO_WG, i64, env, ptr, i64)
DEF_HELPER_FLAGS_3(mve_vaddlv_u, TCG_CALL_NO_WG, i64, env, ptr, i64)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index d2bd6815bc3..1a18c3b8eeb 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -137,6 +137,10 @@
@vmaxnma .... .... .... .... .... .... .... .... &2op \
qd=%qd qn=%qd qm=%qm
+# Here also we don't decode the bit 28 size in the format to avoid
+# awkward nested overlap groups
+@vmaxnmv .... .... .... .... rda:4 .... .... .... &vmaxv qm=%qm
+
@2op_fp_scalar .... .... .... .... .... .... .... rm:4 &2scalar \
qd=%qd qn=%qn size=%2op_fp_scalar_size
@@ -440,6 +444,10 @@ VMLADAV_S 1110 1110 1111 ... 0 ... . 1111 . 0 . 0 ... 1 @vmladav_nosz
VMLADAV_U 1111 1110 1111 ... 0 ... . 1111 . 0 . 0 ... 1 @vmladav_nosz
{
+ VMAXNMAV 1110 1110 1110 11 00 .... 1111 0 0 . 0 ... 0 @vmaxnmv size=2
+ VMINNMAV 1110 1110 1110 11 00 .... 1111 1 0 . 0 ... 0 @vmaxnmv size=2
+ VMAXNMV 1110 1110 1110 11 10 .... 1111 0 0 . 0 ... 0 @vmaxnmv size=2
+ VMINNMV 1110 1110 1110 11 10 .... 1111 1 0 . 0 ... 0 @vmaxnmv size=2
VMAXV_S 1110 1110 1110 .. 10 .... 1111 0 0 . 0 ... 0 @vmaxv
VMINV_S 1110 1110 1110 .. 10 .... 1111 1 0 . 0 ... 0 @vmaxv
VMAXAV 1110 1110 1110 .. 00 .... 1111 0 0 . 0 ... 0 @vmaxv
@@ -449,6 +457,10 @@ VMLADAV_U 1111 1110 1111 ... 0 ... . 1111 . 0 . 0 ... 1 @vmladav_nosz
}
{
+ VMAXNMAV 1111 1110 1110 11 00 .... 1111 0 0 . 0 ... 0 @vmaxnmv size=1
+ VMINNMAV 1111 1110 1110 11 00 .... 1111 1 0 . 0 ... 0 @vmaxnmv size=1
+ VMAXNMV 1111 1110 1110 11 10 .... 1111 0 0 . 0 ... 0 @vmaxnmv size=1
+ VMINNMV 1111 1110 1110 11 10 .... 1111 1 0 . 0 ... 0 @vmaxnmv size=1
VMAXV_U 1111 1110 1110 .. 10 .... 1111 0 0 . 0 ... 0 @vmaxv
VMINV_U 1111 1110 1110 .. 10 .... 1111 1 0 . 0 ... 0 @vmaxv
VMLADAV_U 1111 1110 1111 ... 0 ... . 1111 . 0 . 0 ... 0 @vmladav_nosz
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 36f0910b856..52e5a8f2a8b 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -3112,3 +3112,47 @@ DO_2OP_FP_ACC_SCALAR(vfma_scalarh, 2, float16, float16_muladd)
DO_2OP_FP_ACC_SCALAR(vfma_scalars, 4, float32, float32_muladd)
DO_2OP_FP_ACC_SCALAR(vfmas_scalarh, 2, float16, DO_VFMAS_SCALARH)
DO_2OP_FP_ACC_SCALAR(vfmas_scalars, 4, float32, DO_VFMAS_SCALARS)
+
+/* Floating point max/min across vector. */
+#define DO_FP_VMAXMINV(OP, ESIZE, TYPE, ABS, FN) \
+ uint32_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vm, \
+ uint32_t ra_in) \
+ { \
+ uint16_t mask = mve_element_mask(env); \
+ unsigned e; \
+ TYPE *m = vm; \
+ TYPE ra = (TYPE)ra_in; \
+ float_status *fpst = (ESIZE == 2) ? \
+ &env->vfp.standard_fp_status_f16 : \
+ &env->vfp.standard_fp_status; \
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
+ if (mask & 1) { \
+ TYPE v = m[H##ESIZE(e)]; \
+ if (TYPE##_is_signaling_nan(ra, fpst)) { \
+ ra = TYPE##_silence_nan(ra, fpst); \
+ float_raise(float_flag_invalid, fpst); \
+ } \
+ if (TYPE##_is_signaling_nan(v, fpst)) { \
+ v = TYPE##_silence_nan(v, fpst); \
+ float_raise(float_flag_invalid, fpst); \
+ } \
+ if (ABS) { \
+ v = TYPE##_abs(v); \
+ } \
+ ra = FN(ra, v, fpst); \
+ } \
+ } \
+ mve_advance_vpt(env); \
+ return ra; \
+ } \
+
+#define NOP(X) (X)
+
+DO_FP_VMAXMINV(vmaxnmvh, 2, float16, false, float16_maxnum)
+DO_FP_VMAXMINV(vmaxnmvs, 4, float32, false, float32_maxnum)
+DO_FP_VMAXMINV(vminnmvh, 2, float16, false, float16_minnum)
+DO_FP_VMAXMINV(vminnmvs, 4, float32, false, float32_minnum)
+DO_FP_VMAXMINV(vmaxnmavh, 2, float16, true, float16_maxnum)
+DO_FP_VMAXMINV(vmaxnmavs, 4, float32, true, float32_maxnum)
+DO_FP_VMAXMINV(vminnmavh, 2, float16, true, float16_minnum)
+DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 3627ba227f2..4e2aa2cae2d 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -1806,6 +1806,26 @@ DO_VMAXV(VMINV_S, vminvs)
DO_VMAXV(VMINV_U, vminvu)
DO_VMAXV(VMINAV, vminav)
+#define DO_VMAXV_FP(INSN, FN) \
+ static bool trans_##INSN(DisasContext *s, arg_vmaxv *a) \
+ { \
+ static MVEGenVADDVFn * const fns[] = { \
+ NULL, \
+ gen_helper_mve_##FN##h, \
+ gen_helper_mve_##FN##s, \
+ NULL, \
+ }; \
+ if (!dc_isar_feature(aa32_mve_fp, s)) { \
+ return false; \
+ } \
+ return do_vmaxv(s, a, fns[a->size]); \
+ }
+
+DO_VMAXV_FP(VMAXNMV, vmaxnmv)
+DO_VMAXV_FP(VMINNMV, vminnmv)
+DO_VMAXV_FP(VMAXNMAV, vmaxnmav)
+DO_VMAXV_FP(VMINNMAV, vminnmav)
+
static bool do_vabav(DisasContext *s, arg_vabav *a, MVEGenVABAVFn *fn)
{
/* Absolute difference accumulated across vector */
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 11/18] target/arm: Implement MVE fp vector comparisons
2021-08-26 13:17 [PATCH v2 00/18] target/arm: MVE slice 4 Peter Maydell
` (9 preceding siblings ...)
2021-08-26 13:17 ` [PATCH v2 10/18] target/arm: Implement MVE FP max/min across vector Peter Maydell
@ 2021-08-26 13:17 ` Peter Maydell
2021-08-26 13:17 ` [PATCH v2 12/18] target/arm: Implement MVE fp scalar comparisons Peter Maydell
` (6 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-08-26 13:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Implement the MVE fp vector comparisons VCMP and VPT.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
v1->v2: use float* types
---
target/arm/helper-mve.h | 18 +++++++++++
target/arm/mve.decode | 39 +++++++++++++++++++----
target/arm/mve_helper.c | 64 ++++++++++++++++++++++++++++++++++++++
target/arm/translate-mve.c | 22 +++++++++++++
4 files changed, 137 insertions(+), 6 deletions(-)
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 47fd18dddbf..0c15c531641 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -813,6 +813,24 @@ DEF_HELPER_FLAGS_3(mve_vcmple_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
DEF_HELPER_FLAGS_3(mve_vcmple_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
DEF_HELPER_FLAGS_3(mve_vcmple_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vfcmpeqh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vfcmpeqs, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
+DEF_HELPER_FLAGS_3(mve_vfcmpneh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vfcmpnes, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
+DEF_HELPER_FLAGS_3(mve_vfcmpgeh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vfcmpges, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
+DEF_HELPER_FLAGS_3(mve_vfcmplth, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vfcmplts, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
+DEF_HELPER_FLAGS_3(mve_vfcmpgth, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vfcmpgts, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
+DEF_HELPER_FLAGS_3(mve_vfcmpleh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vfcmples, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
DEF_HELPER_FLAGS_4(mve_vfadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vfadd_scalars, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 1a18c3b8eeb..7767ecae2ac 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -124,6 +124,9 @@
@vcmp_scalar .... .... .. size:2 qn:3 . .... .... .... rm:4 &vcmp_scalar \
mask=%mask_22_13
+@vcmp_fp .... .... .... qn:3 . .... .... .... .... &vcmp \
+ qm=%qm size=%2op_fp_scalar_size mask=%mask_22_13
+
@vmaxv .... .... .... size:2 .. rda:4 .... .... .... &vmaxv qm=%qm
@2op_fp .... .... .... .... .... .... .... .... &2op \
@@ -663,17 +666,41 @@ VSHLC 111 0 1110 1 . 1 imm:5 ... 0 1111 1100 rdm:4 qd=%qd
# Comparisons. We expand out the conditions which are split across
# encodings T1, T2, T3 and the fc bits. These include VPT, which is
# effectively "VCMP then VPST". A plain "VCMP" has a mask field of zero.
-VCMPEQ 1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 0 @vcmp
-VCMPNE 1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 0 @vcmp
+{
+ VCMPEQ_fp 111 . 1110 0 . 11 ... 1 ... 0 1111 0 0 . 0 ... 0 @vcmp_fp
+ VCMPEQ 111 1 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 0 @vcmp
+}
+
+{
+ VCMPNE_fp 111 . 1110 0 . 11 ... 1 ... 0 1111 1 0 . 0 ... 0 @vcmp_fp
+ VCMPNE 111 1 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 0 @vcmp
+}
+
+{
+ VCMPGE_fp 111 . 1110 0 . 11 ... 1 ... 1 1111 0 0 . 0 ... 0 @vcmp_fp
+ VCMPGE 111 1 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 0 @vcmp
+}
+
+{
+ VCMPLT_fp 111 . 1110 0 . 11 ... 1 ... 1 1111 1 0 . 0 ... 0 @vcmp_fp
+ VCMPLT 111 1 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 0 @vcmp
+}
+
+{
+ VCMPGT_fp 111 . 1110 0 . 11 ... 1 ... 1 1111 0 0 . 0 ... 1 @vcmp_fp
+ VCMPGT 111 1 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 1 @vcmp
+}
+
+{
+ VCMPLE_fp 111 . 1110 0 . 11 ... 1 ... 1 1111 1 0 . 0 ... 1 @vcmp_fp
+ VCMPLE 1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 1 @vcmp
+}
+
{
VPSEL 1111 1110 0 . 11 ... 1 ... 0 1111 . 0 . 0 ... 1 @2op_nosz
VCMPCS 1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 1 @vcmp
VCMPHI 1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 1 @vcmp
}
-VCMPGE 1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 0 @vcmp
-VCMPLT 1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 0 @vcmp
-VCMPGT 1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 1 @vcmp
-VCMPLE 1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 1 @vcmp
{
VPNOT 1111 1110 0 0 11 000 1 000 0 1111 0100 1101
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 52e5a8f2a8b..07a1ab88814 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -3156,3 +3156,67 @@ DO_FP_VMAXMINV(vmaxnmavh, 2, float16, true, float16_maxnum)
DO_FP_VMAXMINV(vmaxnmavs, 4, float32, true, float32_maxnum)
DO_FP_VMAXMINV(vminnmavh, 2, float16, true, float16_minnum)
DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
+
+/* FP compares; note that all comparisons signal InvalidOp for QNaNs */
+#define DO_VCMP_FP(OP, ESIZE, TYPE, FN) \
+ void HELPER(glue(mve_, OP))(CPUARMState *env, void *vn, void *vm) \
+ { \
+ TYPE *n = vn, *m = vm; \
+ uint16_t mask = mve_element_mask(env); \
+ uint16_t eci_mask = mve_eci_mask(env); \
+ uint16_t beatpred = 0; \
+ uint16_t emask = MAKE_64BIT_MASK(0, ESIZE); \
+ unsigned e; \
+ float_status *fpst; \
+ float_status scratch_fpst; \
+ bool r; \
+ for (e = 0; e < 16 / ESIZE; e++, emask <<= ESIZE) { \
+ if ((mask & emask) == 0) { \
+ continue; \
+ } \
+ fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 : \
+ &env->vfp.standard_fp_status; \
+ if (!(mask & (1 << (e * ESIZE)))) { \
+ /* We need the result but without updating flags */ \
+ scratch_fpst = *fpst; \
+ fpst = &scratch_fpst; \
+ } \
+ r = FN(n[H##ESIZE(e)], m[H##ESIZE(e)], fpst); \
+ /* Comparison sets 0/1 bits for each byte in the element */ \
+ beatpred |= r * emask; \
+ } \
+ beatpred &= mask; \
+ env->v7m.vpr = (env->v7m.vpr & ~(uint32_t)eci_mask) | \
+ (beatpred & eci_mask); \
+ mve_advance_vpt(env); \
+ }
+
+/*
+ * Some care is needed here to get the correct result for the unordered case.
+ * Architecturally EQ, GE and GT are defined to be false for unordered, but
+ * the NE, LT and LE comparisons are defined as simple logical inverses of
+ * EQ, GE and GT and so they must return true for unordered. The softfloat
+ * comparison functions float*_{eq,le,lt} all return false for unordered.
+ */
+#define DO_GE16(X, Y, S) float16_le(Y, X, S)
+#define DO_GE32(X, Y, S) float32_le(Y, X, S)
+#define DO_GT16(X, Y, S) float16_lt(Y, X, S)
+#define DO_GT32(X, Y, S) float32_lt(Y, X, S)
+
+DO_VCMP_FP(vfcmpeqh, 2, float16, float16_eq)
+DO_VCMP_FP(vfcmpeqs, 4, float32, float32_eq)
+
+DO_VCMP_FP(vfcmpneh, 2, float16, !float16_eq)
+DO_VCMP_FP(vfcmpnes, 4, float32, !float32_eq)
+
+DO_VCMP_FP(vfcmpgeh, 2, float16, DO_GE16)
+DO_VCMP_FP(vfcmpges, 4, float32, DO_GE32)
+
+DO_VCMP_FP(vfcmplth, 2, float16, !DO_GE16)
+DO_VCMP_FP(vfcmplts, 4, float32, !DO_GE32)
+
+DO_VCMP_FP(vfcmpgth, 2, float16, DO_GT16)
+DO_VCMP_FP(vfcmpgts, 4, float32, DO_GT32)
+
+DO_VCMP_FP(vfcmpleh, 2, float16, !DO_GT16)
+DO_VCMP_FP(vfcmples, 4, float32, !DO_GT32)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 4e2aa2cae2d..da14a6f790e 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -1758,6 +1758,28 @@ DO_VCMP(VCMPLT, vcmplt)
DO_VCMP(VCMPGT, vcmpgt)
DO_VCMP(VCMPLE, vcmple)
+#define DO_VCMP_FP(INSN, FN) \
+ static bool trans_##INSN(DisasContext *s, arg_vcmp *a) \
+ { \
+ static MVEGenCmpFn * const fns[] = { \
+ NULL, \
+ gen_helper_mve_##FN##h, \
+ gen_helper_mve_##FN##s, \
+ NULL, \
+ }; \
+ if (!dc_isar_feature(aa32_mve_fp, s)) { \
+ return false; \
+ } \
+ return do_vcmp(s, a, fns[a->size]); \
+ }
+
+DO_VCMP_FP(VCMPEQ_fp, vfcmpeq)
+DO_VCMP_FP(VCMPNE_fp, vfcmpne)
+DO_VCMP_FP(VCMPGE_fp, vfcmpge)
+DO_VCMP_FP(VCMPLT_fp, vfcmplt)
+DO_VCMP_FP(VCMPGT_fp, vfcmpgt)
+DO_VCMP_FP(VCMPLE_fp, vfcmple)
+
static bool do_vmaxv(DisasContext *s, arg_vmaxv *a, MVEGenVADDVFn fn)
{
/*
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 12/18] target/arm: Implement MVE fp scalar comparisons
2021-08-26 13:17 [PATCH v2 00/18] target/arm: MVE slice 4 Peter Maydell
` (10 preceding siblings ...)
2021-08-26 13:17 ` [PATCH v2 11/18] target/arm: Implement MVE fp vector comparisons Peter Maydell
@ 2021-08-26 13:17 ` Peter Maydell
2021-08-26 13:17 ` [PATCH v2 13/18] target/arm: Implement MVE VCVT between floating and fixed point Peter Maydell
` (5 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-08-26 13:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Implement the MVE fp scalar comparisons VCMP and VPT.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
v1->v2: Use float* types
---
target/arm/helper-mve.h | 18 +++++++++++
target/arm/mve.decode | 61 +++++++++++++++++++++++++++++--------
target/arm/mve_helper.c | 62 ++++++++++++++++++++++++++++++--------
target/arm/translate-mve.c | 14 +++++++++
4 files changed, 131 insertions(+), 24 deletions(-)
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 0c15c531641..9ee841cdf01 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -831,6 +831,24 @@ DEF_HELPER_FLAGS_3(mve_vfcmpgts, TCG_CALL_NO_WG, void, env, ptr, ptr)
DEF_HELPER_FLAGS_3(mve_vfcmpleh, TCG_CALL_NO_WG, void, env, ptr, ptr)
DEF_HELPER_FLAGS_3(mve_vfcmples, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vfcmpeq_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vfcmpeq_scalars, TCG_CALL_NO_WG, void, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(mve_vfcmpne_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vfcmpne_scalars, TCG_CALL_NO_WG, void, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(mve_vfcmpge_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vfcmpge_scalars, TCG_CALL_NO_WG, void, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(mve_vfcmplt_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vfcmplt_scalars, TCG_CALL_NO_WG, void, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(mve_vfcmpgt_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vfcmpgt_scalars, TCG_CALL_NO_WG, void, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(mve_vfcmple_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vfcmple_scalars, TCG_CALL_NO_WG, void, env, ptr, i32)
+
DEF_HELPER_FLAGS_4(mve_vfadd_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vfadd_scalars, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 7767ecae2ac..aa113279dc5 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -127,6 +127,11 @@
@vcmp_fp .... .... .... qn:3 . .... .... .... .... &vcmp \
qm=%qm size=%2op_fp_scalar_size mask=%mask_22_13
+# Bit 28 is a 2op_fp_scalar_size bit, but we do not decode it in this
+# format to avoid complicated overlapping-instruction-groups
+@vcmp_fp_scalar .... .... .... qn:3 . .... .... .... rm:4 &vcmp_scalar \
+ mask=%mask_22_13
+
@vmaxv .... .... .... size:2 .. rda:4 .... .... .... &vmaxv qm=%qm
@2op_fp .... .... .... .... .... .... .... .... &2op \
@@ -400,8 +405,10 @@ VDUP 1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
VIWDUP 1110 1110 0 . .. ... 1 ... 0 1111 . 110 ... . @viwdup
}
{
- VDDUP 1110 1110 0 . .. ... 1 ... 1 1111 . 110 111 . @vidup
- VDWDUP 1110 1110 0 . .. ... 1 ... 1 1111 . 110 ... . @viwdup
+ VCMPGT_fp_scalar 1110 1110 0 . 11 ... 1 ... 1 1111 0110 .... @vcmp_fp_scalar size=2
+ VCMPLE_fp_scalar 1110 1110 0 . 11 ... 1 ... 1 1111 1110 .... @vcmp_fp_scalar size=2
+ VDDUP 1110 1110 0 . .. ... 1 ... 1 1111 . 110 111 . @vidup
+ VDWDUP 1110 1110 0 . .. ... 1 ... 1 1111 . 110 ... . @viwdup
}
# multiply-add long dual accumulate
@@ -472,8 +479,17 @@ VMLADAV_U 1111 1110 1111 ... 0 ... . 1111 . 0 . 0 ... 1 @vmladav_nosz
# Scalar operations
-VADD_scalar 1110 1110 0 . .. ... 1 ... 0 1111 . 100 .... @2scalar
-VSUB_scalar 1110 1110 0 . .. ... 1 ... 1 1111 . 100 .... @2scalar
+{
+ VCMPEQ_fp_scalar 1110 1110 0 . 11 ... 1 ... 0 1111 0100 .... @vcmp_fp_scalar size=2
+ VCMPNE_fp_scalar 1110 1110 0 . 11 ... 1 ... 0 1111 1100 .... @vcmp_fp_scalar size=2
+ VADD_scalar 1110 1110 0 . .. ... 1 ... 0 1111 . 100 .... @2scalar
+}
+
+{
+ VCMPLT_fp_scalar 1110 1110 0 . 11 ... 1 ... 1 1111 1100 .... @vcmp_fp_scalar size=2
+ VCMPGE_fp_scalar 1110 1110 0 . 11 ... 1 ... 1 1111 0100 .... @vcmp_fp_scalar size=2
+ VSUB_scalar 1110 1110 0 . .. ... 1 ... 1 1111 . 100 .... @2scalar
+}
{
VSHL_S_scalar 1110 1110 0 . 11 .. 01 ... 1 1110 0110 .... @shl_scalar
@@ -703,17 +719,38 @@ VSHLC 111 0 1110 1 . 1 imm:5 ... 0 1111 1100 rdm:4 qd=%qd
}
{
- VPNOT 1111 1110 0 0 11 000 1 000 0 1111 0100 1101
- VPST 1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
- VCMPEQ_scalar 1111 1110 0 . .. ... 1 ... 0 1111 0 1 0 0 .... @vcmp_scalar
+ VPNOT 1111 1110 0 0 11 000 1 000 0 1111 0100 1101
+ VPST 1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
+ VCMPEQ_fp_scalar 1111 1110 0 . 11 ... 1 ... 0 1111 0100 .... @vcmp_fp_scalar size=1
+ VCMPEQ_scalar 1111 1110 0 . .. ... 1 ... 0 1111 0100 .... @vcmp_scalar
}
-VCMPNE_scalar 1111 1110 0 . .. ... 1 ... 0 1111 1 1 0 0 .... @vcmp_scalar
+
+{
+ VCMPNE_fp_scalar 1111 1110 0 . 11 ... 1 ... 0 1111 1100 .... @vcmp_fp_scalar size=1
+ VCMPNE_scalar 1111 1110 0 . .. ... 1 ... 0 1111 1100 .... @vcmp_scalar
+}
+
+{
+ VCMPGT_fp_scalar 1111 1110 0 . 11 ... 1 ... 1 1111 0110 .... @vcmp_fp_scalar size=1
+ VCMPGT_scalar 1111 1110 0 . .. ... 1 ... 1 1111 0110 .... @vcmp_scalar
+}
+
+{
+ VCMPLE_fp_scalar 1111 1110 0 . 11 ... 1 ... 1 1111 1110 .... @vcmp_fp_scalar size=1
+ VCMPLE_scalar 1111 1110 0 . .. ... 1 ... 1 1111 1110 .... @vcmp_scalar
+}
+
+{
+ VCMPGE_fp_scalar 1111 1110 0 . 11 ... 1 ... 1 1111 0100 .... @vcmp_fp_scalar size=1
+ VCMPGE_scalar 1111 1110 0 . .. ... 1 ... 1 1111 0100 .... @vcmp_scalar
+}
+{
+ VCMPLT_fp_scalar 1111 1110 0 . 11 ... 1 ... 1 1111 1100 .... @vcmp_fp_scalar size=1
+ VCMPLT_scalar 1111 1110 0 . .. ... 1 ... 1 1111 1100 .... @vcmp_scalar
+}
+
VCMPCS_scalar 1111 1110 0 . .. ... 1 ... 0 1111 0 1 1 0 .... @vcmp_scalar
VCMPHI_scalar 1111 1110 0 . .. ... 1 ... 0 1111 1 1 1 0 .... @vcmp_scalar
-VCMPGE_scalar 1111 1110 0 . .. ... 1 ... 1 1111 0 1 0 0 .... @vcmp_scalar
-VCMPLT_scalar 1111 1110 0 . .. ... 1 ... 1 1111 1 1 0 0 .... @vcmp_scalar
-VCMPGT_scalar 1111 1110 0 . .. ... 1 ... 1 1111 0 1 1 0 .... @vcmp_scalar
-VCMPLE_scalar 1111 1110 0 . .. ... 1 ... 1 1111 1 1 1 0 .... @vcmp_scalar
# 2-operand FP
VADD_fp 1110 1111 0 . 0 . ... 0 ... 0 1101 . 1 . 0 ... 0 @2op_fp
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 07a1ab88814..891926c124d 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -3191,6 +3191,44 @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
mve_advance_vpt(env); \
}
+#define DO_VCMP_FP_SCALAR(OP, ESIZE, TYPE, FN) \
+ void HELPER(glue(mve_, OP))(CPUARMState *env, void *vn, \
+ uint32_t rm) \
+ { \
+ TYPE *n = vn; \
+ uint16_t mask = mve_element_mask(env); \
+ uint16_t eci_mask = mve_eci_mask(env); \
+ uint16_t beatpred = 0; \
+ uint16_t emask = MAKE_64BIT_MASK(0, ESIZE); \
+ unsigned e; \
+ float_status *fpst; \
+ float_status scratch_fpst; \
+ bool r; \
+ for (e = 0; e < 16 / ESIZE; e++, emask <<= ESIZE) { \
+ if ((mask & emask) == 0) { \
+ continue; \
+ } \
+ fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 : \
+ &env->vfp.standard_fp_status; \
+ if (!(mask & (1 << (e * ESIZE)))) { \
+ /* We need the result but without updating flags */ \
+ scratch_fpst = *fpst; \
+ fpst = &scratch_fpst; \
+ } \
+ r = FN(n[H##ESIZE(e)], (TYPE)rm, fpst); \
+ /* Comparison sets 0/1 bits for each byte in the element */ \
+ beatpred |= r * emask; \
+ } \
+ beatpred &= mask; \
+ env->v7m.vpr = (env->v7m.vpr & ~(uint32_t)eci_mask) | \
+ (beatpred & eci_mask); \
+ mve_advance_vpt(env); \
+ }
+
+#define DO_VCMP_FP_BOTH(VOP, SOP, ESIZE, TYPE, FN) \
+ DO_VCMP_FP(VOP, ESIZE, TYPE, FN) \
+ DO_VCMP_FP_SCALAR(SOP, ESIZE, TYPE, FN)
+
/*
* Some care is needed here to get the correct result for the unordered case.
* Architecturally EQ, GE and GT are defined to be false for unordered, but
@@ -3203,20 +3241,20 @@ DO_FP_VMAXMINV(vminnmavs, 4, float32, true, float32_minnum)
#define DO_GT16(X, Y, S) float16_lt(Y, X, S)
#define DO_GT32(X, Y, S) float32_lt(Y, X, S)
-DO_VCMP_FP(vfcmpeqh, 2, float16, float16_eq)
-DO_VCMP_FP(vfcmpeqs, 4, float32, float32_eq)
+DO_VCMP_FP_BOTH(vfcmpeqh, vfcmpeq_scalarh, 2, float16, float16_eq)
+DO_VCMP_FP_BOTH(vfcmpeqs, vfcmpeq_scalars, 4, float32, float32_eq)
-DO_VCMP_FP(vfcmpneh, 2, float16, !float16_eq)
-DO_VCMP_FP(vfcmpnes, 4, float32, !float32_eq)
+DO_VCMP_FP_BOTH(vfcmpneh, vfcmpne_scalarh, 2, float16, !float16_eq)
+DO_VCMP_FP_BOTH(vfcmpnes, vfcmpne_scalars, 4, float32, !float32_eq)
-DO_VCMP_FP(vfcmpgeh, 2, float16, DO_GE16)
-DO_VCMP_FP(vfcmpges, 4, float32, DO_GE32)
+DO_VCMP_FP_BOTH(vfcmpgeh, vfcmpge_scalarh, 2, float16, DO_GE16)
+DO_VCMP_FP_BOTH(vfcmpges, vfcmpge_scalars, 4, float32, DO_GE32)
-DO_VCMP_FP(vfcmplth, 2, float16, !DO_GE16)
-DO_VCMP_FP(vfcmplts, 4, float32, !DO_GE32)
+DO_VCMP_FP_BOTH(vfcmplth, vfcmplt_scalarh, 2, float16, !DO_GE16)
+DO_VCMP_FP_BOTH(vfcmplts, vfcmplt_scalars, 4, float32, !DO_GE32)
-DO_VCMP_FP(vfcmpgth, 2, float16, DO_GT16)
-DO_VCMP_FP(vfcmpgts, 4, float32, DO_GT32)
+DO_VCMP_FP_BOTH(vfcmpgth, vfcmpgt_scalarh, 2, float16, DO_GT16)
+DO_VCMP_FP_BOTH(vfcmpgts, vfcmpgt_scalars, 4, float32, DO_GT32)
-DO_VCMP_FP(vfcmpleh, 2, float16, !DO_GT16)
-DO_VCMP_FP(vfcmples, 4, float32, !DO_GT32)
+DO_VCMP_FP_BOTH(vfcmpleh, vfcmple_scalarh, 2, float16, !DO_GT16)
+DO_VCMP_FP_BOTH(vfcmples, vfcmple_scalars, 4, float32, !DO_GT32)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index da14a6f790e..e8a3dec6683 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -1771,6 +1771,20 @@ DO_VCMP(VCMPLE, vcmple)
return false; \
} \
return do_vcmp(s, a, fns[a->size]); \
+ } \
+ static bool trans_##INSN##_scalar(DisasContext *s, \
+ arg_vcmp_scalar *a) \
+ { \
+ static MVEGenScalarCmpFn * const fns[] = { \
+ NULL, \
+ gen_helper_mve_##FN##_scalarh, \
+ gen_helper_mve_##FN##_scalars, \
+ NULL, \
+ }; \
+ if (!dc_isar_feature(aa32_mve_fp, s)) { \
+ return false; \
+ } \
+ return do_vcmp_scalar(s, a, fns[a->size]); \
}
DO_VCMP_FP(VCMPEQ_fp, vfcmpeq)
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 13/18] target/arm: Implement MVE VCVT between floating and fixed point
2021-08-26 13:17 [PATCH v2 00/18] target/arm: MVE slice 4 Peter Maydell
` (11 preceding siblings ...)
2021-08-26 13:17 ` [PATCH v2 12/18] target/arm: Implement MVE fp scalar comparisons Peter Maydell
@ 2021-08-26 13:17 ` Peter Maydell
2021-08-26 13:17 ` [PATCH v2 14/18] target/arm: Implement MVE VCVT between fp and integer Peter Maydell
` (4 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-08-26 13:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Implement the MVE VCVT insns which convert between floating and fixed
point. As with the Neon equivalents, these use essentially the same
constant encoding as right-shift-by-immediate.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper-mve.h | 9 +++++++++
target/arm/mve.decode | 19 +++++++++++++++++++
target/arm/mve_helper.c | 36 ++++++++++++++++++++++++++++++++++++
target/arm/translate-mve.c | 18 ++++++++++++++++++
4 files changed, 82 insertions(+)
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 9ee841cdf01..f3c2b43bf43 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -863,3 +863,12 @@ DEF_HELPER_FLAGS_4(mve_vfma_scalars, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vfmas_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vfmas_scalars, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vcvt_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vcvt_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vcvt_hs, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vcvt_hu, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vcvt_sf, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vcvt_uf, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vcvt_fs, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vcvt_fu, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index aa113279dc5..d9fcc42d36d 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -771,3 +771,22 @@ VCMLA0 1111 110 00 . 1 . ... 0 ... 0 1000 . 1 . 0 ... 0 @2op_fp_size_
VCMLA90 1111 110 01 . 1 . ... 0 ... 0 1000 . 1 . 0 ... 0 @2op_fp_size_rev
VCMLA180 1111 110 10 . 1 . ... 0 ... 0 1000 . 1 . 0 ... 0 @2op_fp_size_rev
VCMLA270 1111 110 11 . 1 . ... 0 ... 0 1000 . 1 . 0 ... 0 @2op_fp_size_rev
+
+# floating-point <-> fixed-point conversions. Naming convention:
+# VCVT_<from><to>, S = signed int, U = unsigned int, H = halfprec, F = singleprec
+@vcvt .... .... .. 1 ..... .... .. 1 . .... .... &2shift \
+ qd=%qd qm=%qm shift=%rshift_i5 size=2
+@vcvt_f16 .... .... .. 11 .... .... .. 0 . .... .... &2shift \
+ qd=%qd qm=%qm shift=%rshift_i4 size=1
+
+VCVT_SH_fixed 1110 1111 1 . ...... ... 0 11 . 0 01 . 1 ... 0 @vcvt_f16
+VCVT_UH_fixed 1111 1111 1 . ...... ... 0 11 . 0 01 . 1 ... 0 @vcvt_f16
+
+VCVT_HS_fixed 1110 1111 1 . ...... ... 0 11 . 1 01 . 1 ... 0 @vcvt_f16
+VCVT_HU_fixed 1111 1111 1 . ...... ... 0 11 . 1 01 . 1 ... 0 @vcvt_f16
+
+VCVT_SF_fixed 1110 1111 1 . ...... ... 0 11 . 0 01 . 1 ... 0 @vcvt
+VCVT_UF_fixed 1111 1111 1 . ...... ... 0 11 . 0 01 . 1 ... 0 @vcvt
+
+VCVT_FS_fixed 1110 1111 1 . ...... ... 0 11 . 1 01 . 1 ... 0 @vcvt
+VCVT_FU_fixed 1111 1111 1 . ...... ... 0 11 . 1 01 . 1 ... 0 @vcvt
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 891926c124d..d829ffe12d6 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -3258,3 +3258,39 @@ DO_VCMP_FP_BOTH(vfcmpgts, vfcmpgt_scalars, 4, float32, DO_GT32)
DO_VCMP_FP_BOTH(vfcmpleh, vfcmple_scalarh, 2, float16, !DO_GT16)
DO_VCMP_FP_BOTH(vfcmples, vfcmple_scalars, 4, float32, !DO_GT32)
+
+#define DO_VCVT_FIXED(OP, ESIZE, TYPE, FN) \
+ void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vm, \
+ uint32_t shift) \
+ { \
+ TYPE *d = vd, *m = vm; \
+ TYPE r; \
+ uint16_t mask = mve_element_mask(env); \
+ unsigned e; \
+ float_status *fpst; \
+ float_status scratch_fpst; \
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
+ if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) { \
+ continue; \
+ } \
+ fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 : \
+ &env->vfp.standard_fp_status; \
+ if (!(mask & 1)) { \
+ /* We need the result but without updating flags */ \
+ scratch_fpst = *fpst; \
+ fpst = &scratch_fpst; \
+ } \
+ r = FN(m[H##ESIZE(e)], shift, fpst); \
+ mergemask(&d[H##ESIZE(e)], r, mask); \
+ } \
+ mve_advance_vpt(env); \
+ }
+
+DO_VCVT_FIXED(vcvt_sh, 2, int16_t, helper_vfp_shtoh)
+DO_VCVT_FIXED(vcvt_uh, 2, uint16_t, helper_vfp_uhtoh)
+DO_VCVT_FIXED(vcvt_hs, 2, int16_t, helper_vfp_toshh_round_to_zero)
+DO_VCVT_FIXED(vcvt_hu, 2, uint16_t, helper_vfp_touhh_round_to_zero)
+DO_VCVT_FIXED(vcvt_sf, 4, int32_t, helper_vfp_sltos)
+DO_VCVT_FIXED(vcvt_uf, 4, uint32_t, helper_vfp_ultos)
+DO_VCVT_FIXED(vcvt_fs, 4, int32_t, helper_vfp_tosls_round_to_zero)
+DO_VCVT_FIXED(vcvt_fu, 4, uint32_t, helper_vfp_touls_round_to_zero)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index e8a3dec6683..9269dbc3324 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -1439,6 +1439,24 @@ DO_2SHIFT(VRSHRI_U, vrshli_u, true)
DO_2SHIFT(VSRI, vsri, false)
DO_2SHIFT(VSLI, vsli, false)
+#define DO_2SHIFT_FP(INSN, FN) \
+ static bool trans_##INSN(DisasContext *s, arg_2shift *a) \
+ { \
+ if (!dc_isar_feature(aa32_mve_fp, s)) { \
+ return false; \
+ } \
+ return do_2shift(s, a, gen_helper_mve_##FN, false); \
+ }
+
+DO_2SHIFT_FP(VCVT_SH_fixed, vcvt_sh)
+DO_2SHIFT_FP(VCVT_UH_fixed, vcvt_uh)
+DO_2SHIFT_FP(VCVT_HS_fixed, vcvt_hs)
+DO_2SHIFT_FP(VCVT_HU_fixed, vcvt_hu)
+DO_2SHIFT_FP(VCVT_SF_fixed, vcvt_sf)
+DO_2SHIFT_FP(VCVT_UF_fixed, vcvt_uf)
+DO_2SHIFT_FP(VCVT_FS_fixed, vcvt_fs)
+DO_2SHIFT_FP(VCVT_FU_fixed, vcvt_fu)
+
static bool do_2shift_scalar(DisasContext *s, arg_shl_scalar *a,
MVEGenTwoOpShiftFn *fn)
{
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 14/18] target/arm: Implement MVE VCVT between fp and integer
2021-08-26 13:17 [PATCH v2 00/18] target/arm: MVE slice 4 Peter Maydell
` (12 preceding siblings ...)
2021-08-26 13:17 ` [PATCH v2 13/18] target/arm: Implement MVE VCVT between floating and fixed point Peter Maydell
@ 2021-08-26 13:17 ` Peter Maydell
2021-08-26 13:17 ` [PATCH v2 15/18] target/arm: Implement MVE VCVT with specified rounding mode Peter Maydell
` (3 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-08-26 13:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Implement the MVE "VCVT (between floating-point and integer)" insn.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/mve.decode | 7 +++++++
target/arm/translate-mve.c | 32 ++++++++++++++++++++++++++++++++
2 files changed, 39 insertions(+)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index d9fcc42d36d..9a40ff9f43c 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -790,3 +790,10 @@ VCVT_UF_fixed 1111 1111 1 . ...... ... 0 11 . 0 01 . 1 ... 0 @vcvt
VCVT_FS_fixed 1110 1111 1 . ...... ... 0 11 . 1 01 . 1 ... 0 @vcvt
VCVT_FU_fixed 1111 1111 1 . ...... ... 0 11 . 1 01 . 1 ... 0 @vcvt
+
+# VCVT between floating point and integer (halfprec and single);
+# VCVT_<from><to>, S = signed int, U = unsigned int, F = float
+VCVT_SF 1111 1111 1 . 11 .. 11 ... 0 011 00 1 . 0 ... 0 @1op
+VCVT_UF 1111 1111 1 . 11 .. 11 ... 0 011 01 1 . 0 ... 0 @1op
+VCVT_FS 1111 1111 1 . 11 .. 11 ... 0 011 10 1 . 0 ... 0 @1op
+VCVT_FU 1111 1111 1 . 11 .. 11 ... 0 011 11 1 . 0 ... 0 @1op
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 9269dbc3324..351033af1ec 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -543,6 +543,38 @@ DO_1OP(VQNEG, vqneg)
DO_1OP(VMAXA, vmaxa)
DO_1OP(VMINA, vmina)
+/*
+ * For simple float/int conversions we use the fixed-point
+ * conversion helpers with a zero shift count
+ */
+#define DO_VCVT(INSN, HFN, SFN) \
+ static void gen_##INSN##h(TCGv_ptr env, TCGv_ptr qd, TCGv_ptr qm) \
+ { \
+ gen_helper_mve_##HFN(env, qd, qm, tcg_constant_i32(0)); \
+ } \
+ static void gen_##INSN##s(TCGv_ptr env, TCGv_ptr qd, TCGv_ptr qm) \
+ { \
+ gen_helper_mve_##SFN(env, qd, qm, tcg_constant_i32(0)); \
+ } \
+ static bool trans_##INSN(DisasContext *s, arg_1op *a) \
+ { \
+ static MVEGenOneOpFn * const fns[] = { \
+ NULL, \
+ gen_##INSN##h, \
+ gen_##INSN##s, \
+ NULL, \
+ }; \
+ if (!dc_isar_feature(aa32_mve_fp, s)) { \
+ return false; \
+ } \
+ return do_1op(s, a, fns[a->size]); \
+ }
+
+DO_VCVT(VCVT_SF, vcvt_sh, vcvt_sf)
+DO_VCVT(VCVT_UF, vcvt_uh, vcvt_uf)
+DO_VCVT(VCVT_FS, vcvt_hs, vcvt_fs)
+DO_VCVT(VCVT_FU, vcvt_hu, vcvt_fu)
+
/* Narrowing moves: only size 0 and 1 are valid */
#define DO_VMOVN(INSN, FN) \
static bool trans_##INSN(DisasContext *s, arg_1op *a) \
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 15/18] target/arm: Implement MVE VCVT with specified rounding mode
2021-08-26 13:17 [PATCH v2 00/18] target/arm: MVE slice 4 Peter Maydell
` (13 preceding siblings ...)
2021-08-26 13:17 ` [PATCH v2 14/18] target/arm: Implement MVE VCVT between fp and integer Peter Maydell
@ 2021-08-26 13:17 ` Peter Maydell
2021-08-26 13:17 ` [PATCH v2 16/18] target/arm: Implement MVE VCVT between single and half precision Peter Maydell
` (2 subsequent siblings)
17 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-08-26 13:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Implement the MVE VCVT which converts from floating-point to integer
using a rounding mode specified by the instruction. We implement
this similarly to the Neon equivalents, by passing the required
rounding mode as an extra integer parameter to the helper functions.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper-mve.h | 5 ++++
target/arm/mve.decode | 10 ++++++++
target/arm/mve_helper.c | 38 ++++++++++++++++++++++++++++
target/arm/translate-mve.c | 52 ++++++++++++++++++++++++++++++++++++++
4 files changed, 105 insertions(+)
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index f3c2b43bf43..6d4052a5269 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -177,6 +177,11 @@ DEF_HELPER_FLAGS_3(mve_vminab, TCG_CALL_NO_WG, void, env, ptr, ptr)
DEF_HELPER_FLAGS_3(mve_vminah, TCG_CALL_NO_WG, void, env, ptr, ptr)
DEF_HELPER_FLAGS_3(mve_vminaw, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vcvt_rm_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vcvt_rm_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vcvt_rm_ss, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vcvt_rm_us, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
DEF_HELPER_FLAGS_3(mve_vmovnbb, TCG_CALL_NO_WG, void, env, ptr, ptr)
DEF_HELPER_FLAGS_3(mve_vmovnbh, TCG_CALL_NO_WG, void, env, ptr, ptr)
DEF_HELPER_FLAGS_3(mve_vmovntb, TCG_CALL_NO_WG, void, env, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 9a40ff9f43c..410ea746fcf 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -797,3 +797,13 @@ VCVT_SF 1111 1111 1 . 11 .. 11 ... 0 011 00 1 . 0 ... 0 @1op
VCVT_UF 1111 1111 1 . 11 .. 11 ... 0 011 01 1 . 0 ... 0 @1op
VCVT_FS 1111 1111 1 . 11 .. 11 ... 0 011 10 1 . 0 ... 0 @1op
VCVT_FU 1111 1111 1 . 11 .. 11 ... 0 011 11 1 . 0 ... 0 @1op
+
+# VCVT from floating point to integer with specified rounding mode
+VCVTAS 1111 1111 1 . 11 .. 11 ... 000 00 0 1 . 0 ... 0 @1op
+VCVTAU 1111 1111 1 . 11 .. 11 ... 000 00 1 1 . 0 ... 0 @1op
+VCVTNS 1111 1111 1 . 11 .. 11 ... 000 01 0 1 . 0 ... 0 @1op
+VCVTNU 1111 1111 1 . 11 .. 11 ... 000 01 1 1 . 0 ... 0 @1op
+VCVTPS 1111 1111 1 . 11 .. 11 ... 000 10 0 1 . 0 ... 0 @1op
+VCVTPU 1111 1111 1 . 11 .. 11 ... 000 10 1 1 . 0 ... 0 @1op
+VCVTMS 1111 1111 1 . 11 .. 11 ... 000 11 0 1 . 0 ... 0 @1op
+VCVTMU 1111 1111 1 . 11 .. 11 ... 000 11 1 1 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index d829ffe12d6..a793199fbee 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -3294,3 +3294,41 @@ DO_VCVT_FIXED(vcvt_sf, 4, int32_t, helper_vfp_sltos)
DO_VCVT_FIXED(vcvt_uf, 4, uint32_t, helper_vfp_ultos)
DO_VCVT_FIXED(vcvt_fs, 4, int32_t, helper_vfp_tosls_round_to_zero)
DO_VCVT_FIXED(vcvt_fu, 4, uint32_t, helper_vfp_touls_round_to_zero)
+
+/* VCVT with specified rmode */
+#define DO_VCVT_RMODE(OP, ESIZE, TYPE, FN) \
+ void HELPER(glue(mve_, OP))(CPUARMState *env, \
+ void *vd, void *vm, uint32_t rmode) \
+ { \
+ TYPE *d = vd, *m = vm; \
+ TYPE r; \
+ uint16_t mask = mve_element_mask(env); \
+ unsigned e; \
+ float_status *fpst; \
+ float_status scratch_fpst; \
+ float_status *base_fpst = (ESIZE == 2) ? \
+ &env->vfp.standard_fp_status_f16 : \
+ &env->vfp.standard_fp_status; \
+ uint32_t prev_rmode = get_float_rounding_mode(base_fpst); \
+ set_float_rounding_mode(rmode, base_fpst); \
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
+ if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) { \
+ continue; \
+ } \
+ fpst = base_fpst; \
+ if (!(mask & 1)) { \
+ /* We need the result but without updating flags */ \
+ scratch_fpst = *fpst; \
+ fpst = &scratch_fpst; \
+ } \
+ r = FN(m[H##ESIZE(e)], 0, fpst); \
+ mergemask(&d[H##ESIZE(e)], r, mask); \
+ } \
+ set_float_rounding_mode(prev_rmode, base_fpst); \
+ mve_advance_vpt(env); \
+ }
+
+DO_VCVT_RMODE(vcvt_rm_sh, 2, uint16_t, helper_vfp_toshh)
+DO_VCVT_RMODE(vcvt_rm_uh, 2, uint16_t, helper_vfp_touhh)
+DO_VCVT_RMODE(vcvt_rm_ss, 4, uint32_t, helper_vfp_tosls)
+DO_VCVT_RMODE(vcvt_rm_us, 4, uint32_t, helper_vfp_touls)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 351033af1ec..e80a55eb62e 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -49,6 +49,7 @@ typedef void MVEGenCmpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
typedef void MVEGenScalarCmpFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
typedef void MVEGenVABAVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
typedef void MVEGenDualAccOpFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
+typedef void MVEGenVCVTRmodeFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
/* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
static inline long mve_qreg_offset(unsigned reg)
@@ -575,6 +576,57 @@ DO_VCVT(VCVT_UF, vcvt_uh, vcvt_uf)
DO_VCVT(VCVT_FS, vcvt_hs, vcvt_fs)
DO_VCVT(VCVT_FU, vcvt_hu, vcvt_fu)
+static bool do_vcvt_rmode(DisasContext *s, arg_1op *a,
+ enum arm_fprounding rmode, bool u)
+{
+ /*
+ * Handle VCVT fp to int with specified rounding mode.
+ * This is a 1op fn but we must pass the rounding mode as
+ * an immediate to the helper.
+ */
+ TCGv_ptr qd, qm;
+ static MVEGenVCVTRmodeFn * const fns[4][2] = {
+ { NULL, NULL },
+ { gen_helper_mve_vcvt_rm_sh, gen_helper_mve_vcvt_rm_uh },
+ { gen_helper_mve_vcvt_rm_ss, gen_helper_mve_vcvt_rm_us },
+ { NULL, NULL },
+ };
+ MVEGenVCVTRmodeFn *fn = fns[a->size][u];
+
+ if (!dc_isar_feature(aa32_mve_fp, s) ||
+ !mve_check_qreg_bank(s, a->qd | a->qm) ||
+ !fn) {
+ return false;
+ }
+
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
+ return true;
+ }
+
+ qd = mve_qreg_ptr(a->qd);
+ qm = mve_qreg_ptr(a->qm);
+ fn(cpu_env, qd, qm, tcg_constant_i32(arm_rmode_to_sf(rmode)));
+ tcg_temp_free_ptr(qd);
+ tcg_temp_free_ptr(qm);
+ mve_update_eci(s);
+ return true;
+}
+
+#define DO_VCVT_RMODE(INSN, RMODE, U) \
+ static bool trans_##INSN(DisasContext *s, arg_1op *a) \
+ { \
+ return do_vcvt_rmode(s, a, RMODE, U); \
+ } \
+
+DO_VCVT_RMODE(VCVTAS, FPROUNDING_TIEAWAY, false)
+DO_VCVT_RMODE(VCVTAU, FPROUNDING_TIEAWAY, true)
+DO_VCVT_RMODE(VCVTNS, FPROUNDING_TIEEVEN, false)
+DO_VCVT_RMODE(VCVTNU, FPROUNDING_TIEEVEN, true)
+DO_VCVT_RMODE(VCVTPS, FPROUNDING_POSINF, false)
+DO_VCVT_RMODE(VCVTPU, FPROUNDING_POSINF, true)
+DO_VCVT_RMODE(VCVTMS, FPROUNDING_NEGINF, false)
+DO_VCVT_RMODE(VCVTMU, FPROUNDING_NEGINF, true)
+
/* Narrowing moves: only size 0 and 1 are valid */
#define DO_VMOVN(INSN, FN) \
static bool trans_##INSN(DisasContext *s, arg_1op *a) \
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 16/18] target/arm: Implement MVE VCVT between single and half precision
2021-08-26 13:17 [PATCH v2 00/18] target/arm: MVE slice 4 Peter Maydell
` (14 preceding siblings ...)
2021-08-26 13:17 ` [PATCH v2 15/18] target/arm: Implement MVE VCVT with specified rounding mode Peter Maydell
@ 2021-08-26 13:17 ` Peter Maydell
2021-08-26 13:17 ` [PATCH v2 17/18] target/arm: Implement MVE VRINT insns Peter Maydell
2021-08-26 13:17 ` [PATCH v2 18/18] target/arm: Enable MVE in Cortex-M55 Peter Maydell
17 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-08-26 13:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Implement the MVE VCVT instruction which converts between single
and half precision floating point.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
v1->v2: make do_vcvt_sh/do_vcvt_hs functions, not macros
---
target/arm/helper-mve.h | 5 +++
target/arm/mve.decode | 8 ++++
target/arm/mve_helper.c | 81 ++++++++++++++++++++++++++++++++++++++
target/arm/translate-mve.c | 14 +++++++
4 files changed, 108 insertions(+)
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 6d4052a5269..f6345c7abbe 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -182,6 +182,11 @@ DEF_HELPER_FLAGS_4(mve_vcvt_rm_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vcvt_rm_ss, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vcvt_rm_us, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vcvtb_sh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vcvtt_sh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vcvtb_hs, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vcvtt_hs, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
DEF_HELPER_FLAGS_3(mve_vmovnbb, TCG_CALL_NO_WG, void, env, ptr, ptr)
DEF_HELPER_FLAGS_3(mve_vmovnbh, TCG_CALL_NO_WG, void, env, ptr, ptr)
DEF_HELPER_FLAGS_3(mve_vmovntb, TCG_CALL_NO_WG, void, env, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 410ea746fcf..32de4af3170 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -221,6 +221,8 @@ VMUL 1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
# The VSHLL T2 encoding is not a @2op pattern, but is here because it
# overlaps what would be size=0b11 VMULH/VRMULH
{
+ VCVTB_SH 111 0 1110 0 . 11 1111 ... 0 1110 0 0 . 0 ... 1 @1op_nosz
+
VMAXNMA 111 0 1110 0 . 11 1111 ... 0 1110 1 0 . 0 ... 1 @vmaxnma size=2
VSHLL_BS 111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
@@ -235,6 +237,8 @@ VMUL 1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
}
{
+ VCVTB_HS 111 1 1110 0 . 11 1111 ... 0 1110 0 0 . 0 ... 1 @1op_nosz
+
VMAXNMA 111 1 1110 0 . 11 1111 ... 0 1110 1 0 . 0 ... 1 @vmaxnma size=1
VSHLL_BU 111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
@@ -247,6 +251,8 @@ VMUL 1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
}
{
+ VCVTT_SH 111 0 1110 0 . 11 1111 ... 1 1110 0 0 . 0 ... 1 @1op_nosz
+
VMINNMA 111 0 1110 0 . 11 1111 ... 1 1110 1 0 . 0 ... 1 @vmaxnma size=2
VSHLL_TS 111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
VSHLL_TS 111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
@@ -260,6 +266,8 @@ VMUL 1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
}
{
+ VCVTT_HS 111 1 1110 0 . 11 1111 ... 1 1110 0 0 . 0 ... 1 @1op_nosz
+
VMINNMA 111 1 1110 0 . 11 1111 ... 1 1110 1 0 . 0 ... 1 @vmaxnma size=1
VSHLL_TU 111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
VSHLL_TU 111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index a793199fbee..1ed76ac5ed8 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -3332,3 +3332,84 @@ DO_VCVT_RMODE(vcvt_rm_sh, 2, uint16_t, helper_vfp_toshh)
DO_VCVT_RMODE(vcvt_rm_uh, 2, uint16_t, helper_vfp_touhh)
DO_VCVT_RMODE(vcvt_rm_ss, 4, uint32_t, helper_vfp_tosls)
DO_VCVT_RMODE(vcvt_rm_us, 4, uint32_t, helper_vfp_touls)
+
+/*
+ * VCVT between halfprec and singleprec. As usual for halfprec
+ * conversions, FZ16 is ignored and AHP is observed.
+ */
+static void do_vcvt_sh(CPUARMState *env, void *vd, void *vm, int top)
+{
+ uint16_t *d = vd;
+ uint32_t *m = vm;
+ uint16_t r;
+ uint16_t mask = mve_element_mask(env);
+ bool ieee = !(env->vfp.xregs[ARM_VFP_FPSCR] & FPCR_AHP);
+ unsigned e;
+ float_status *fpst;
+ float_status scratch_fpst;
+ float_status *base_fpst = &env->vfp.standard_fp_status;
+ bool old_fz = get_flush_to_zero(base_fpst);
+ set_flush_to_zero(false, base_fpst);
+ for (e = 0; e < 16 / 4; e++, mask >>= 4) {
+ if ((mask & MAKE_64BIT_MASK(0, 4)) == 0) {
+ continue;
+ }
+ fpst = base_fpst;
+ if (!(mask & 1)) {
+ /* We need the result but without updating flags */
+ scratch_fpst = *fpst;
+ fpst = &scratch_fpst;
+ }
+ r = float32_to_float16(m[H4(e)], ieee, fpst);
+ mergemask(&d[H2(e * 2 + top)], r, mask >> (top * 2));
+ }
+ set_flush_to_zero(old_fz, base_fpst);
+ mve_advance_vpt(env);
+}
+
+static void do_vcvt_hs(CPUARMState *env, void *vd, void *vm, int top)
+{
+ uint32_t *d = vd;
+ uint16_t *m = vm;
+ uint32_t r;
+ uint16_t mask = mve_element_mask(env);
+ bool ieee = !(env->vfp.xregs[ARM_VFP_FPSCR] & FPCR_AHP);
+ unsigned e;
+ float_status *fpst;
+ float_status scratch_fpst;
+ float_status *base_fpst = &env->vfp.standard_fp_status;
+ bool old_fiz = get_flush_inputs_to_zero(base_fpst);
+ set_flush_inputs_to_zero(false, base_fpst);
+ for (e = 0; e < 16 / 4; e++, mask >>= 4) {
+ if ((mask & MAKE_64BIT_MASK(0, 4)) == 0) {
+ continue;
+ }
+ fpst = base_fpst;
+ if (!(mask & (1 << (top * 2)))) {
+ /* We need the result but without updating flags */
+ scratch_fpst = *fpst;
+ fpst = &scratch_fpst;
+ }
+ r = float16_to_float32(m[H2(e * 2 + top)], ieee, fpst);
+ mergemask(&d[H4(e)], r, mask);
+ }
+ set_flush_inputs_to_zero(old_fiz, base_fpst);
+ mve_advance_vpt(env);
+}
+
+void HELPER(mve_vcvtb_sh)(CPUARMState *env, void *vd, void *vm)
+{
+ do_vcvt_sh(env, vd, vm, 0);
+}
+void HELPER(mve_vcvtt_sh)(CPUARMState *env, void *vd, void *vm)
+{
+ do_vcvt_sh(env, vd, vm, 1);
+}
+void HELPER(mve_vcvtb_hs)(CPUARMState *env, void *vd, void *vm)
+{
+ do_vcvt_hs(env, vd, vm, 0);
+}
+void HELPER(mve_vcvtt_hs)(CPUARMState *env, void *vd, void *vm)
+{
+ do_vcvt_hs(env, vd, vm, 1);
+}
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index e80a55eb62e..194ef99cc74 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -627,6 +627,20 @@ DO_VCVT_RMODE(VCVTPU, FPROUNDING_POSINF, true)
DO_VCVT_RMODE(VCVTMS, FPROUNDING_NEGINF, false)
DO_VCVT_RMODE(VCVTMU, FPROUNDING_NEGINF, true)
+#define DO_VCVT_SH(INSN, FN) \
+ static bool trans_##INSN(DisasContext *s, arg_1op *a) \
+ { \
+ if (!dc_isar_feature(aa32_mve_fp, s)) { \
+ return false; \
+ } \
+ return do_1op(s, a, gen_helper_mve_##FN); \
+ } \
+
+DO_VCVT_SH(VCVTB_SH, vcvtb_sh)
+DO_VCVT_SH(VCVTT_SH, vcvtt_sh)
+DO_VCVT_SH(VCVTB_HS, vcvtb_hs)
+DO_VCVT_SH(VCVTT_HS, vcvtt_hs)
+
/* Narrowing moves: only size 0 and 1 are valid */
#define DO_VMOVN(INSN, FN) \
static bool trans_##INSN(DisasContext *s, arg_1op *a) \
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 17/18] target/arm: Implement MVE VRINT insns
2021-08-26 13:17 [PATCH v2 00/18] target/arm: MVE slice 4 Peter Maydell
` (15 preceding siblings ...)
2021-08-26 13:17 ` [PATCH v2 16/18] target/arm: Implement MVE VCVT between single and half precision Peter Maydell
@ 2021-08-26 13:17 ` Peter Maydell
2021-08-26 13:17 ` [PATCH v2 18/18] target/arm: Enable MVE in Cortex-M55 Peter Maydell
17 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-08-26 13:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel
Implement the MVE VRINT insns, which round floating point inputs
to integer values, leaving them in floating point format.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
v1->v2: use float* types
---
target/arm/helper-mve.h | 6 +++++
target/arm/mve.decode | 7 ++++++
target/arm/mve_helper.c | 35 +++++++++++++++++++++++++++++
target/arm/translate-mve.c | 45 ++++++++++++++++++++++++++++++++++++++
4 files changed, 93 insertions(+)
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index f6345c7abbe..76bd25006d8 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -882,3 +882,9 @@ DEF_HELPER_FLAGS_4(mve_vcvt_sf, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vcvt_uf, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vcvt_fs, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(mve_vcvt_fu, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vrint_rm_h, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vrint_rm_s, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(mve_vrintx_h, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vrintx_s, TCG_CALL_NO_WG, void, env, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 32de4af3170..72b93bfcaa3 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -815,3 +815,10 @@ VCVTPS 1111 1111 1 . 11 .. 11 ... 000 10 0 1 . 0 ... 0 @1op
VCVTPU 1111 1111 1 . 11 .. 11 ... 000 10 1 1 . 0 ... 0 @1op
VCVTMS 1111 1111 1 . 11 .. 11 ... 000 11 0 1 . 0 ... 0 @1op
VCVTMU 1111 1111 1 . 11 .. 11 ... 000 11 1 1 . 0 ... 0 @1op
+
+VRINTN 1111 1111 1 . 11 .. 10 ... 001 000 1 . 0 ... 0 @1op
+VRINTX 1111 1111 1 . 11 .. 10 ... 001 001 1 . 0 ... 0 @1op
+VRINTA 1111 1111 1 . 11 .. 10 ... 001 010 1 . 0 ... 0 @1op
+VRINTZ 1111 1111 1 . 11 .. 10 ... 001 011 1 . 0 ... 0 @1op
+VRINTM 1111 1111 1 . 11 .. 10 ... 001 101 1 . 0 ... 0 @1op
+VRINTP 1111 1111 1 . 11 .. 10 ... 001 111 1 . 0 ... 0 @1op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 1ed76ac5ed8..846962bf4c5 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -3333,6 +3333,12 @@ DO_VCVT_RMODE(vcvt_rm_uh, 2, uint16_t, helper_vfp_touhh)
DO_VCVT_RMODE(vcvt_rm_ss, 4, uint32_t, helper_vfp_tosls)
DO_VCVT_RMODE(vcvt_rm_us, 4, uint32_t, helper_vfp_touls)
+#define DO_VRINT_RM_H(M, F, S) helper_rinth(M, S)
+#define DO_VRINT_RM_S(M, F, S) helper_rints(M, S)
+
+DO_VCVT_RMODE(vrint_rm_h, 2, uint16_t, DO_VRINT_RM_H)
+DO_VCVT_RMODE(vrint_rm_s, 4, uint32_t, DO_VRINT_RM_S)
+
/*
* VCVT between halfprec and singleprec. As usual for halfprec
* conversions, FZ16 is ignored and AHP is observed.
@@ -3413,3 +3419,32 @@ void HELPER(mve_vcvtt_hs)(CPUARMState *env, void *vd, void *vm)
{
do_vcvt_hs(env, vd, vm, 1);
}
+
+#define DO_1OP_FP(OP, ESIZE, TYPE, FN) \
+ void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vm) \
+ { \
+ TYPE *d = vd, *m = vm; \
+ TYPE r; \
+ uint16_t mask = mve_element_mask(env); \
+ unsigned e; \
+ float_status *fpst; \
+ float_status scratch_fpst; \
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
+ if ((mask & MAKE_64BIT_MASK(0, ESIZE)) == 0) { \
+ continue; \
+ } \
+ fpst = (ESIZE == 2) ? &env->vfp.standard_fp_status_f16 : \
+ &env->vfp.standard_fp_status; \
+ if (!(mask & 1)) { \
+ /* We need the result but without updating flags */ \
+ scratch_fpst = *fpst; \
+ fpst = &scratch_fpst; \
+ } \
+ r = FN(m[H##ESIZE(e)], fpst); \
+ mergemask(&d[H##ESIZE(e)], r, mask); \
+ } \
+ mve_advance_vpt(env); \
+ }
+
+DO_1OP_FP(vrintx_h, 2, float16, float16_round_to_int)
+DO_1OP_FP(vrintx_s, 4, float32, float32_round_to_int)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 194ef99cc74..2ed91577ec8 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -641,6 +641,51 @@ DO_VCVT_SH(VCVTT_SH, vcvtt_sh)
DO_VCVT_SH(VCVTB_HS, vcvtb_hs)
DO_VCVT_SH(VCVTT_HS, vcvtt_hs)
+#define DO_VRINT(INSN, RMODE) \
+ static void gen_##INSN##h(TCGv_ptr env, TCGv_ptr qd, TCGv_ptr qm) \
+ { \
+ gen_helper_mve_vrint_rm_h(env, qd, qm, \
+ tcg_constant_i32(arm_rmode_to_sf(RMODE))); \
+ } \
+ static void gen_##INSN##s(TCGv_ptr env, TCGv_ptr qd, TCGv_ptr qm) \
+ { \
+ gen_helper_mve_vrint_rm_s(env, qd, qm, \
+ tcg_constant_i32(arm_rmode_to_sf(RMODE))); \
+ } \
+ static bool trans_##INSN(DisasContext *s, arg_1op *a) \
+ { \
+ static MVEGenOneOpFn * const fns[] = { \
+ NULL, \
+ gen_##INSN##h, \
+ gen_##INSN##s, \
+ NULL, \
+ }; \
+ if (!dc_isar_feature(aa32_mve_fp, s)) { \
+ return false; \
+ } \
+ return do_1op(s, a, fns[a->size]); \
+ }
+
+DO_VRINT(VRINTN, FPROUNDING_TIEEVEN)
+DO_VRINT(VRINTA, FPROUNDING_TIEAWAY)
+DO_VRINT(VRINTZ, FPROUNDING_ZERO)
+DO_VRINT(VRINTM, FPROUNDING_NEGINF)
+DO_VRINT(VRINTP, FPROUNDING_POSINF)
+
+static bool trans_VRINTX(DisasContext *s, arg_1op *a)
+{
+ static MVEGenOneOpFn * const fns[] = {
+ NULL,
+ gen_helper_mve_vrintx_h,
+ gen_helper_mve_vrintx_s,
+ NULL,
+ };
+ if (!dc_isar_feature(aa32_mve_fp, s)) {
+ return false;
+ }
+ return do_1op(s, a, fns[a->size]);
+}
+
/* Narrowing moves: only size 0 and 1 are valid */
#define DO_VMOVN(INSN, FN) \
static bool trans_##INSN(DisasContext *s, arg_1op *a) \
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 18/18] target/arm: Enable MVE in Cortex-M55
2021-08-26 13:17 [PATCH v2 00/18] target/arm: MVE slice 4 Peter Maydell
` (16 preceding siblings ...)
2021-08-26 13:17 ` [PATCH v2 17/18] target/arm: Implement MVE VRINT insns Peter Maydell
@ 2021-08-26 13:17 ` Peter Maydell
17 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-08-26 13:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel
We now have a complete MVE emulation, so we can enable it in our
Cortex-M55 model by setting the ID registers to match those of a
Cortex-M55 with full MVE support.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/cpu_tcg.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/target/arm/cpu_tcg.c b/target/arm/cpu_tcg.c
index ed444bf436a..33cc75af57d 100644
--- a/target/arm/cpu_tcg.c
+++ b/target/arm/cpu_tcg.c
@@ -654,12 +654,9 @@ static void cortex_m55_initfn(Object *obj)
cpu->revidr = 0;
cpu->pmsav7_dregion = 16;
cpu->sau_sregion = 8;
- /*
- * These are the MVFR* values for the FPU, no MVE configuration;
- * we will update them later when we implement MVE
- */
+ /* These are the MVFR* values for the FPU + full MVE configuration */
cpu->isar.mvfr0 = 0x10110221;
- cpu->isar.mvfr1 = 0x12100011;
+ cpu->isar.mvfr1 = 0x12100211;
cpu->isar.mvfr2 = 0x00000040;
cpu->isar.id_pfr0 = 0x20000030;
cpu->isar.id_pfr1 = 0x00000230;
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 10/18] target/arm: Implement MVE FP max/min across vector
2021-08-26 13:17 ` [PATCH v2 10/18] target/arm: Implement MVE FP max/min across vector Peter Maydell
@ 2021-08-30 9:17 ` Peter Maydell
0 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-08-30 9:17 UTC (permalink / raw)
To: qemu-arm, QEMU Developers
On Thu, 26 Aug 2021 at 14:17, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> Implement the MVE VMAXNMV, VMINNMV, VMAXNMAV, VMINNMAV insns. These
> calculate the maximum or minimum of floating point elements across a
> vector, starting with a value in a general purpose register and
> returning the result there.
>
> @@ -440,6 +444,10 @@ VMLADAV_S 1110 1110 1111 ... 0 ... . 1111 . 0 . 0 ... 1 @vmladav_nosz
> VMLADAV_U 1111 1110 1111 ... 0 ... . 1111 . 0 . 0 ... 1 @vmladav_nosz
>
> {
> + VMAXNMAV 1110 1110 1110 11 00 .... 1111 0 0 . 0 ... 0 @vmaxnmv size=2
> + VMINNMAV 1110 1110 1110 11 00 .... 1111 1 0 . 0 ... 0 @vmaxnmv size=2
> + VMAXNMV 1110 1110 1110 11 10 .... 1111 0 0 . 0 ... 0 @vmaxnmv size=2
> + VMINNMV 1110 1110 1110 11 10 .... 1111 1 0 . 0 ... 0 @vmaxnmv size=2
> VMAXV_S 1110 1110 1110 .. 10 .... 1111 0 0 . 0 ... 0 @vmaxv
> VMINV_S 1110 1110 1110 .. 10 .... 1111 1 0 . 0 ... 0 @vmaxv
> VMAXAV 1110 1110 1110 .. 00 .... 1111 0 0 . 0 ... 0 @vmaxv
> @@ -449,6 +457,10 @@ VMLADAV_U 1111 1110 1111 ... 0 ... . 1111 . 0 . 0 ... 1 @vmladav_nosz
> }
>
> {
> + VMAXNMAV 1111 1110 1110 11 00 .... 1111 0 0 . 0 ... 0 @vmaxnmv size=1
> + VMINNMAV 1111 1110 1110 11 00 .... 1111 1 0 . 0 ... 0 @vmaxnmv size=1
> + VMAXNMV 1111 1110 1110 11 10 .... 1111 0 0 . 0 ... 0 @vmaxnmv size=1
> + VMINNMV 1111 1110 1110 11 10 .... 1111 1 0 . 0 ... 0 @vmaxnmv size=1
> VMAXV_U 1111 1110 1110 .. 10 .... 1111 0 0 . 0 ... 0 @vmaxv
> VMINV_U 1111 1110 1110 .. 10 .... 1111 1 0 . 0 ... 0 @vmaxv
> VMLADAV_U 1111 1110 1111 ... 0 ... . 1111 . 0 . 0 ... 0 @vmladav_nosz
I just realized that I forgot to do part of Richard's review
comments on the first version of this patch, about adding some
extra [] blocks here.
https://patchew.org/QEMU/20210729111512.16541-1-peter.maydell@linaro.org/20210729111512.16541-46-peter.maydell@linaro.org/
Diff to squash in:
index 1a18c3b8eeb..a46372f8c77 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -444,25 +444,33 @@ VMLADAV_S 1110 1110 1111 ... 0 ... .
1111 . 0 . 0 ... 1 @vmladav_nosz
VMLADAV_U 1111 1110 1111 ... 0 ... . 1111 . 0 . 0 ... 1 @vmladav_nosz
{
- VMAXNMAV 1110 1110 1110 11 00 .... 1111 0 0 . 0 ... 0 @vmaxnmv size=2
- VMINNMAV 1110 1110 1110 11 00 .... 1111 1 0 . 0 ... 0 @vmaxnmv size=2
- VMAXNMV 1110 1110 1110 11 10 .... 1111 0 0 . 0 ... 0 @vmaxnmv size=2
- VMINNMV 1110 1110 1110 11 10 .... 1111 1 0 . 0 ... 0 @vmaxnmv size=2
- VMAXV_S 1110 1110 1110 .. 10 .... 1111 0 0 . 0 ... 0 @vmaxv
- VMINV_S 1110 1110 1110 .. 10 .... 1111 1 0 . 0 ... 0 @vmaxv
- VMAXAV 1110 1110 1110 .. 00 .... 1111 0 0 . 0 ... 0 @vmaxv
- VMINAV 1110 1110 1110 .. 00 .... 1111 1 0 . 0 ... 0 @vmaxv
+ [
+ VMAXNMAV 1110 1110 1110 11 00 .... 1111 0 0 . 0 ... 0 @vmaxnmv size=2
+ VMINNMAV 1110 1110 1110 11 00 .... 1111 1 0 . 0 ... 0 @vmaxnmv size=2
+ VMAXNMV 1110 1110 1110 11 10 .... 1111 0 0 . 0 ... 0 @vmaxnmv size=2
+ VMINNMV 1110 1110 1110 11 10 .... 1111 1 0 . 0 ... 0 @vmaxnmv size=2
+ ]
+ [
+ VMAXV_S 1110 1110 1110 .. 10 .... 1111 0 0 . 0 ... 0 @vmaxv
+ VMINV_S 1110 1110 1110 .. 10 .... 1111 1 0 . 0 ... 0 @vmaxv
+ VMAXAV 1110 1110 1110 .. 00 .... 1111 0 0 . 0 ... 0 @vmaxv
+ VMINAV 1110 1110 1110 .. 00 .... 1111 1 0 . 0 ... 0 @vmaxv
+ ]
VMLADAV_S 1110 1110 1111 ... 0 ... . 1111 . 0 . 0 ... 0 @vmladav_nosz
VRMLALDAVH_S 1110 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
}
{
- VMAXNMAV 1111 1110 1110 11 00 .... 1111 0 0 . 0 ... 0 @vmaxnmv size=1
- VMINNMAV 1111 1110 1110 11 00 .... 1111 1 0 . 0 ... 0 @vmaxnmv size=1
- VMAXNMV 1111 1110 1110 11 10 .... 1111 0 0 . 0 ... 0 @vmaxnmv size=1
- VMINNMV 1111 1110 1110 11 10 .... 1111 1 0 . 0 ... 0 @vmaxnmv size=1
- VMAXV_U 1111 1110 1110 .. 10 .... 1111 0 0 . 0 ... 0 @vmaxv
- VMINV_U 1111 1110 1110 .. 10 .... 1111 1 0 . 0 ... 0 @vmaxv
+ [
+ VMAXNMAV 1111 1110 1110 11 00 .... 1111 0 0 . 0 ... 0 @vmaxnmv size=1
+ VMINNMAV 1111 1110 1110 11 00 .... 1111 1 0 . 0 ... 0 @vmaxnmv size=1
+ VMAXNMV 1111 1110 1110 11 10 .... 1111 0 0 . 0 ... 0 @vmaxnmv size=1
+ VMINNMV 1111 1110 1110 11 10 .... 1111 1 0 . 0 ... 0 @vmaxnmv size=1
+ ]
+ [
+ VMAXV_U 1111 1110 1110 .. 10 .... 1111 0 0 . 0 ... 0 @vmaxv
+ VMINV_U 1111 1110 1110 .. 10 .... 1111 1 0 . 0 ... 0 @vmaxv
+ ]
VMLADAV_U 1111 1110 1111 ... 0 ... . 1111 . 0 . 0 ... 0 @vmladav_nosz
VRMLALDAVH_U 1111 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
}
-- PMM
^ permalink raw reply related [flat|nested] 20+ messages in thread
end of thread, other threads:[~2021-08-30 9:19 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-26 13:17 [PATCH v2 00/18] target/arm: MVE slice 4 Peter Maydell
2021-08-26 13:17 ` [PATCH v2 01/18] target/arm: Implement MVE VADD (floating-point) Peter Maydell
2021-08-26 13:17 ` [PATCH v2 02/18] target/arm: Implement MVE VSUB, VMUL, VABD, VMAXNM, VMINNM Peter Maydell
2021-08-26 13:17 ` [PATCH v2 03/18] target/arm: Implement MVE VCADD Peter Maydell
2021-08-26 13:17 ` [PATCH v2 04/18] target/arm: Implement MVE VFMA and VFMS Peter Maydell
2021-08-26 13:17 ` [PATCH v2 05/18] target/arm: Implement MVE VCMUL and VCMLA Peter Maydell
2021-08-26 13:17 ` [PATCH v2 06/18] target/arm: Implement MVE VMAXNMA and VMINNMA Peter Maydell
2021-08-26 13:17 ` [PATCH v2 07/18] target/arm: Implement MVE scalar fp insns Peter Maydell
2021-08-26 13:17 ` [PATCH v2 08/18] target/arm: Implement MVE fp-with-scalar VFMA, VFMAS Peter Maydell
2021-08-26 13:17 ` [PATCH v2 09/18] softfloat: Remove assertion preventing silencing of NaN in default-NaN mode Peter Maydell
2021-08-26 13:17 ` [PATCH v2 10/18] target/arm: Implement MVE FP max/min across vector Peter Maydell
2021-08-30 9:17 ` Peter Maydell
2021-08-26 13:17 ` [PATCH v2 11/18] target/arm: Implement MVE fp vector comparisons Peter Maydell
2021-08-26 13:17 ` [PATCH v2 12/18] target/arm: Implement MVE fp scalar comparisons Peter Maydell
2021-08-26 13:17 ` [PATCH v2 13/18] target/arm: Implement MVE VCVT between floating and fixed point Peter Maydell
2021-08-26 13:17 ` [PATCH v2 14/18] target/arm: Implement MVE VCVT between fp and integer Peter Maydell
2021-08-26 13:17 ` [PATCH v2 15/18] target/arm: Implement MVE VCVT with specified rounding mode Peter Maydell
2021-08-26 13:17 ` [PATCH v2 16/18] target/arm: Implement MVE VCVT between single and half precision Peter Maydell
2021-08-26 13:17 ` [PATCH v2 17/18] target/arm: Implement MVE VRINT insns Peter Maydell
2021-08-26 13:17 ` [PATCH v2 18/18] target/arm: Enable MVE in Cortex-M55 Peter Maydell
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.