* [Qemu-devel] [PATCH v2 00/11] ARM v8.1 simd + v8.3 complex insns
@ 2017-12-18 17:24 Richard Henderson
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 01/11] target/arm: Add ARM_FEATURE_V8_1_SIMD Richard Henderson
` (10 more replies)
0 siblings, 11 replies; 30+ messages in thread
From: Richard Henderson @ 2017-12-18 17:24 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
I've incorporated Alex's feedback from v1, and have now rebased
upon Alex's work so that the complex insns now support fp16.
r~
Richard Henderson (11):
target/arm: Add ARM_FEATURE_V8_1_SIMD
target/arm: Decode aa64 armv8.1 scalar three same extra
target/arm: Decode aa64 armv8.1 three same extra
target/arm: Decode aa64 armv8.1 scalar/vector x indexed element
target/arm: Decode aa32 armv8.1 three same
target/arm: Decode aa32 armv8.1 two reg and a scalar
target/arm: Add ARM_FEATURE_V8_FCMA
target/arm: Decode aa64 armv8.3 fcadd
target/arm: Decode aa64 armv8.3 fcmla
target/arm: Decode aa32 armv8.3 3-same
target/arm: Decode aa32 armv8.3 2-reg-index
target/arm/cpu.h | 2 +
target/arm/helper.h | 31 ++++
linux-user/elfload.c | 10 ++
target/arm/advsimd_helper.c | 420 ++++++++++++++++++++++++++++++++++++++++++++
target/arm/cpu.c | 2 +
target/arm/cpu64.c | 2 +
target/arm/translate-a64.c | 382 ++++++++++++++++++++++++++++++++++++----
target/arm/translate.c | 239 ++++++++++++++++++++++---
target/arm/Makefile.objs | 2 +-
9 files changed, 1032 insertions(+), 58 deletions(-)
create mode 100644 target/arm/advsimd_helper.c
--
2.14.3
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH v2 01/11] target/arm: Add ARM_FEATURE_V8_1_SIMD
2017-12-18 17:24 [Qemu-devel] [PATCH v2 00/11] ARM v8.1 simd + v8.3 complex insns Richard Henderson
@ 2017-12-18 17:24 ` Richard Henderson
2018-01-15 17:21 ` Peter Maydell
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 02/11] target/arm: Decode aa64 armv8.1 scalar three same extra Richard Henderson
` (9 subsequent siblings)
10 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2017-12-18 17:24 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Enable it for the "any" CPU used by *-linux-user.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/cpu.h | 1 +
linux-user/elfload.c | 9 +++++++++
target/arm/cpu.c | 1 +
target/arm/cpu64.c | 1 +
4 files changed, 12 insertions(+)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 715ec6a476..e047756b80 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1351,6 +1351,7 @@ enum arm_features {
ARM_FEATURE_VBAR, /* has cp15 VBAR */
ARM_FEATURE_M_SECURITY, /* M profile Security Extension */
ARM_FEATURE_JAZELLE, /* has (trivial) Jazelle implementation */
+ ARM_FEATURE_V8_1_SIMD, /* has ARMv8.1-SIMD */
ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
};
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 20f3d8c2c3..95f550518e 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -512,6 +512,14 @@ enum {
ARM_HWCAP_A64_SHA1 = 1 << 5,
ARM_HWCAP_A64_SHA2 = 1 << 6,
ARM_HWCAP_A64_CRC32 = 1 << 7,
+ ARM_HWCAP_A64_ATOMICS = 1 << 8,
+ ARM_HWCAP_A64_FPHP = 1 << 9,
+ ARM_HWCAP_A64_ASIMDHP = 1 << 10,
+ ARM_HWCAP_A64_CPUID = 1 << 11,
+ ARM_HWCAP_A64_ASIMDRDM = 1 << 12,
+ ARM_HWCAP_A64_JSCVT = 1 << 13,
+ ARM_HWCAP_A64_FCMA = 1 << 14,
+ ARM_HWCAP_A64_LRCPC = 1 << 15,
};
#define ELF_HWCAP get_elf_hwcap()
@@ -532,6 +540,7 @@ static uint32_t get_elf_hwcap(void)
GET_FEATURE(ARM_FEATURE_V8_SHA1, ARM_HWCAP_A64_SHA1);
GET_FEATURE(ARM_FEATURE_V8_SHA256, ARM_HWCAP_A64_SHA2);
GET_FEATURE(ARM_FEATURE_CRC, ARM_HWCAP_A64_CRC32);
+ GET_FEATURE(ARM_FEATURE_V8_1_SIMD, ARM_HWCAP_A64_ASIMDRDM);
#undef GET_FEATURE
return hwcaps;
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 7f7a3d1e32..afe84645af 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1628,6 +1628,7 @@ static void arm_any_initfn(Object *obj)
set_feature(&cpu->env, ARM_FEATURE_V8_SHA256);
set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
set_feature(&cpu->env, ARM_FEATURE_CRC);
+ set_feature(&cpu->env, ARM_FEATURE_V8_1_SIMD);
cpu->midr = 0xffffffff;
}
#endif
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 0dc4debd9c..67a01bf7ce 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -226,6 +226,7 @@ static void aarch64_any_initfn(Object *obj)
set_feature(&cpu->env, ARM_FEATURE_V8_SHA256);
set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
set_feature(&cpu->env, ARM_FEATURE_CRC);
+ set_feature(&cpu->env, ARM_FEATURE_V8_1_SIMD);
set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
cpu->ctr = 0x80038003; /* 32 byte I and D cacheline size, VIPT icache */
cpu->dcz_blocksize = 7; /* 512 bytes */
--
2.14.3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH v2 02/11] target/arm: Decode aa64 armv8.1 scalar three same extra
2017-12-18 17:24 [Qemu-devel] [PATCH v2 00/11] ARM v8.1 simd + v8.3 complex insns Richard Henderson
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 01/11] target/arm: Add ARM_FEATURE_V8_1_SIMD Richard Henderson
@ 2017-12-18 17:24 ` Richard Henderson
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 03/11] target/arm: Decode aa64 armv8.1 " Richard Henderson
` (8 subsequent siblings)
10 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2017-12-18 17:24 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 4 ++
target/arm/advsimd_helper.c | 109 ++++++++++++++++++++++++++++++++++++++++++++
target/arm/translate-a64.c | 90 ++++++++++++++++++++++++++++++++++++
target/arm/Makefile.objs | 2 +-
4 files changed, 204 insertions(+), 1 deletion(-)
create mode 100644 target/arm/advsimd_helper.c
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 2485fc322d..d103f3d8bf 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -366,8 +366,12 @@ DEF_HELPER_FLAGS_1(neon_rbit_u8, TCG_CALL_NO_RWG_SE, i32, i32)
DEF_HELPER_3(neon_qdmulh_s16, i32, env, i32, i32)
DEF_HELPER_3(neon_qrdmulh_s16, i32, env, i32, i32)
+DEF_HELPER_4(neon_qrdmlah_s16, i32, env, i32, i32, i32)
+DEF_HELPER_4(neon_qrdmlsh_s16, i32, env, i32, i32, i32)
DEF_HELPER_3(neon_qdmulh_s32, i32, env, i32, i32)
DEF_HELPER_3(neon_qrdmulh_s32, i32, env, i32, i32)
+DEF_HELPER_4(neon_qrdmlah_s32, i32, env, s32, s32, s32)
+DEF_HELPER_4(neon_qrdmlsh_s32, i32, env, s32, s32, s32)
DEF_HELPER_1(neon_narrow_u8, i32, i64)
DEF_HELPER_1(neon_narrow_u16, i32, i64)
diff --git a/target/arm/advsimd_helper.c b/target/arm/advsimd_helper.c
new file mode 100644
index 0000000000..b91d181741
--- /dev/null
+++ b/target/arm/advsimd_helper.c
@@ -0,0 +1,109 @@
+/*
+ * ARM AdvSIMD Vector Operations
+ *
+ * Copyright (c) 2017 Linaro
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "exec/helper-proto.h"
+#include "tcg/tcg-gvec-desc.h"
+
+
+#define SET_QC() env->vfp.xregs[ARM_VFP_FPSCR] |= CPSR_Q
+
+/* Signed saturating rounding doubling multiply-accumulate high half, 16-bit */
+static uint16_t inl_qrdmlah_s16(CPUARMState *env, int16_t src1,
+ int16_t src2, int16_t src3)
+{
+ /* Simplify:
+ * = ((a3 << 16) + ((e1 * e2) << 1) + (1 << 15)) >> 16
+ * = ((a3 << 15) + (e1 * e2) + (1 << 14)) >> 15
+ */
+ int32_t ret = (int32_t)src1 * src2;
+ ret = ((int32_t)src3 << 15) + ret + (1 << 14);
+ ret >>= 15;
+ if (ret != (int16_t)ret) {
+ SET_QC();
+ ret = (ret < 0 ? -0x8000 : 0x7fff);
+ }
+ return ret;
+}
+
+uint32_t HELPER(neon_qrdmlah_s16)(CPUARMState *env, uint32_t src1,
+ uint32_t src2, uint32_t src3)
+{
+ uint16_t e1 = inl_qrdmlah_s16(env, src1, src2, src3);
+ uint16_t e2 = inl_qrdmlah_s16(env, src1 >> 16, src2 >> 16, src3 >> 16);
+ return deposit32(e1, 16, 16, e2);
+}
+
+/* Signed saturating rounding doubling multiply-subtract high half, 16-bit */
+static uint16_t inl_qrdmlsh_s16(CPUARMState *env, int16_t src1,
+ int16_t src2, int16_t src3)
+{
+ /* Similarly, using subtraction:
+ * = ((a3 << 16) - ((e1 * e2) << 1) + (1 << 15)) >> 16
+ * = ((a3 << 15) - (e1 * e2) + (1 << 14)) >> 15
+ */
+ int32_t ret = (int32_t)src1 * src2;
+ ret = ((int32_t)src3 << 15) - ret + (1 << 14);
+ ret >>= 15;
+ if (ret != (int16_t)ret) {
+ SET_QC();
+ ret = (ret < 0 ? -0x8000 : 0x7fff);
+ }
+ return ret;
+}
+
+uint32_t HELPER(neon_qrdmlsh_s16)(CPUARMState *env, uint32_t src1,
+ uint32_t src2, uint32_t src3)
+{
+ uint16_t e1 = inl_qrdmlsh_s16(env, src1, src2, src3);
+ uint16_t e2 = inl_qrdmlsh_s16(env, src1 >> 16, src2 >> 16, src3 >> 16);
+ return deposit32(e1, 16, 16, e2);
+}
+
+/* Signed saturating rounding doubling multiply-accumulate high half, 32-bit */
+uint32_t HELPER(neon_qrdmlah_s32)(CPUARMState *env, int32_t src1,
+ int32_t src2, int32_t src3)
+{
+ /* Simplify similarly to int_qrdmlah_s16 above. */
+ int64_t ret = (int64_t)src1 * src2;
+ ret = ((int64_t)src3 << 31) + ret + (1 << 30);
+ ret >>= 31;
+ if (ret != (int32_t)ret) {
+ SET_QC();
+ ret = (ret < 0 ? INT32_MIN : INT32_MAX);
+ }
+ return ret;
+}
+
+/* Signed saturating rounding doubling multiply-subtract high half, 32-bit */
+uint32_t HELPER(neon_qrdmlsh_s32)(CPUARMState *env, int32_t src1,
+ int32_t src2, int32_t src3)
+{
+ /* Simplify similarly to int_qrdmlsh_s16 above. */
+ int64_t ret = (int64_t)src1 * src2;
+ ret = ((int64_t)src3 << 31) - ret + (1 << 30);
+ ret >>= 31;
+ if (ret != (int32_t)ret) {
+ SET_QC();
+ ret = (ret < 0 ? INT32_MIN : INT32_MAX);
+ }
+ return ret;
+}
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index d8702b10f5..0b090fe086 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -7822,6 +7822,95 @@ static void disas_simd_scalar_three_reg_same_fp16(DisasContext *s, uint32_t insn
tcg_temp_free_ptr(fpst);
}
+/* AdvSIMD scalar three same extra
+ * 31 30 29 28 24 23 22 21 20 16 15 14 11 10 9 5 4 0
+ * +-----+---+-----------+------+---+------+---+--------+---+----+----+
+ * | 0 1 | U | 1 1 1 1 0 | size | 0 | Rm | 1 | opcode | 1 | Rn | Rd |
+ * +-----+---+-----------+------+---+------+---+--------+---+----+----+
+ */
+static void disas_simd_scalar_three_reg_same_extra(DisasContext *s,
+ uint32_t insn)
+{
+ int rd = extract32(insn, 0, 5);
+ int rn = extract32(insn, 5, 5);
+ int opcode = extract32(insn, 11, 4);
+ int rm = extract32(insn, 16, 5);
+ int size = extract32(insn, 22, 2);
+ bool u = extract32(insn, 29, 1);
+ TCGv_i32 ele1, ele2, ele3;
+ TCGv_i64 res;
+ int feature;
+
+ if (!u) {
+ unallocated_encoding(s);
+ return;
+ }
+
+ switch (opcode) {
+ case 0x0: /* SQRDMLAH (vector) */
+ case 0x1: /* SQRDMLSH (vector) */
+ if (size != 1 && size != 2) {
+ unallocated_encoding(s);
+ return;
+ }
+ feature = ARM_FEATURE_V8_1_SIMD;
+ break;
+ default:
+ unallocated_encoding(s);
+ return;
+ }
+
+ if (!arm_dc_feature(s, feature)) {
+ unallocated_encoding(s);
+ return;
+ }
+ if (!fp_access_check(s)) {
+ return;
+ }
+
+ /* Do a single operation on the lowest element in the vector.
+ * We use the standard Neon helpers and rely on 0 OP 0 == 0
+ * with no side effects for all these operations.
+ * OPTME: special-purpose helpers would avoid doing some
+ * unnecessary work in the helper for the 16 bit cases.
+ */
+ ele1 = tcg_temp_new_i32();
+ ele2 = tcg_temp_new_i32();
+ ele3 = tcg_temp_new_i32();
+
+ read_vec_element_i32(s, ele1, rn, 0, size);
+ read_vec_element_i32(s, ele2, rm, 0, size);
+ read_vec_element_i32(s, ele3, rd, 0, size);
+
+ switch (opcode) {
+ case 0x0: /* SQRDMLAH */
+ if (size == 1) {
+ gen_helper_neon_qrdmlah_s16(ele3, cpu_env, ele1, ele2, ele3);
+ } else {
+ gen_helper_neon_qrdmlah_s32(ele3, cpu_env, ele1, ele2, ele3);
+ }
+ break;
+ case 0x1: /* SQRDMLSH */
+ if (size == 1) {
+ gen_helper_neon_qrdmlsh_s16(ele3, cpu_env, ele1, ele2, ele3);
+ } else {
+ gen_helper_neon_qrdmlsh_s32(ele3, cpu_env, ele1, ele2, ele3);
+ }
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ tcg_temp_free_i32(ele1);
+ tcg_temp_free_i32(ele2);
+
+ res = tcg_temp_new_i64();
+ tcg_gen_extu_i32_i64(res, ele3);
+ tcg_temp_free_i32(ele3);
+
+ write_fp_dreg(s, rd, res);
+ tcg_temp_free_i64(res);
+}
+
static void handle_2misc_64(DisasContext *s, int opcode, bool u,
TCGv_i64 tcg_rd, TCGv_i64 tcg_rn,
TCGv_i32 tcg_rmode, TCGv_ptr tcg_fpstatus)
@@ -12344,6 +12433,7 @@ static const AArch64DecodeTable data_proc_simd[] = {
{ 0x0e000800, 0xbf208c00, disas_simd_zip_trn },
{ 0x2e000000, 0xbf208400, disas_simd_ext },
{ 0x5e200400, 0xdf200400, disas_simd_scalar_three_reg_same },
+ { 0x5e008400, 0xdf208400, disas_simd_scalar_three_reg_same_extra },
{ 0x5e200000, 0xdf200c00, disas_simd_scalar_three_reg_diff },
{ 0x5e200800, 0xdf3e0c00, disas_simd_scalar_two_reg_misc },
{ 0x5e300800, 0xdf3e0c00, disas_simd_scalar_pairwise },
diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
index 847fb52ee0..c2d32988f9 100644
--- a/target/arm/Makefile.objs
+++ b/target/arm/Makefile.objs
@@ -5,7 +5,7 @@ obj-$(call land,$(CONFIG_KVM),$(call lnot,$(TARGET_AARCH64))) += kvm32.o
obj-$(call land,$(CONFIG_KVM),$(TARGET_AARCH64)) += kvm64.o
obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
obj-y += translate.o op_helper.o helper.o cpu.o
-obj-y += neon_helper.o iwmmxt_helper.o
+obj-y += neon_helper.o iwmmxt_helper.o advsimd_helper.o
obj-y += gdbstub.o
obj-$(TARGET_AARCH64) += cpu64.o translate-a64.o helper-a64.o gdbstub64.o
obj-y += crypto_helper.o
--
2.14.3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH v2 03/11] target/arm: Decode aa64 armv8.1 three same extra
2017-12-18 17:24 [Qemu-devel] [PATCH v2 00/11] ARM v8.1 simd + v8.3 complex insns Richard Henderson
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 01/11] target/arm: Add ARM_FEATURE_V8_1_SIMD Richard Henderson
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 02/11] target/arm: Decode aa64 armv8.1 scalar three same extra Richard Henderson
@ 2017-12-18 17:24 ` Richard Henderson
2018-01-15 17:21 ` Peter Maydell
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 04/11] target/arm: Decode aa64 armv8.1 scalar/vector x indexed element Richard Henderson
` (7 subsequent siblings)
10 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2017-12-18 17:24 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 9 +++++
target/arm/advsimd_helper.c | 74 ++++++++++++++++++++++++++++++++++++++++
target/arm/translate-a64.c | 83 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 166 insertions(+)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index d103f3d8bf..06ca458614 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -558,6 +558,15 @@ DEF_HELPER_2(dc_zva, void, env, i64)
DEF_HELPER_FLAGS_2(neon_pmull_64_lo, TCG_CALL_NO_RWG_SE, i64, i64, i64)
DEF_HELPER_FLAGS_2(neon_pmull_64_hi, TCG_CALL_NO_RWG_SE, i64, i64, i64)
+DEF_HELPER_FLAGS_5(gvec_qrdmlah_s16, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_qrdmlsh_s16, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_qrdmlah_s32, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_qrdmlsh_s32, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+
#ifdef TARGET_AARCH64
#include "helper-a64.h"
#endif
diff --git a/target/arm/advsimd_helper.c b/target/arm/advsimd_helper.c
index b91d181741..d5185165a5 100644
--- a/target/arm/advsimd_helper.c
+++ b/target/arm/advsimd_helper.c
@@ -26,6 +26,16 @@
#define SET_QC() env->vfp.xregs[ARM_VFP_FPSCR] |= CPSR_Q
+static void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz)
+{
+ uint64_t *d = vd + opr_sz;
+ uintptr_t i;
+
+ for (i = opr_sz; i < max_sz; i += 8) {
+ *d++ = 0;
+ }
+}
+
/* Signed saturating rounding doubling multiply-accumulate high half, 16-bit */
static uint16_t inl_qrdmlah_s16(CPUARMState *env, int16_t src1,
int16_t src2, int16_t src3)
@@ -52,6 +62,22 @@ uint32_t HELPER(neon_qrdmlah_s16)(CPUARMState *env, uint32_t src1,
return deposit32(e1, 16, 16, e2);
}
+void HELPER(gvec_qrdmlah_s16)(void *vd, void *vn, void *vm,
+ void *ve, uint32_t desc)
+{
+ uintptr_t opr_sz = simd_oprsz(desc);
+ int16_t *d = vd;
+ int16_t *n = vn;
+ int16_t *m = vm;
+ CPUARMState *env = ve;
+ uintptr_t i;
+
+ for (i = 0; i < opr_sz / 2; ++i) {
+ d[i] = inl_qrdmlah_s16(env, n[i], m[i], d[i]);
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
/* Signed saturating rounding doubling multiply-subtract high half, 16-bit */
static uint16_t inl_qrdmlsh_s16(CPUARMState *env, int16_t src1,
int16_t src2, int16_t src3)
@@ -78,6 +104,22 @@ uint32_t HELPER(neon_qrdmlsh_s16)(CPUARMState *env, uint32_t src1,
return deposit32(e1, 16, 16, e2);
}
+void HELPER(gvec_qrdmlsh_s16)(void *vd, void *vn, void *vm,
+ void *ve, uint32_t desc)
+{
+ uintptr_t opr_sz = simd_oprsz(desc);
+ int16_t *d = vd;
+ int16_t *n = vn;
+ int16_t *m = vm;
+ CPUARMState *env = ve;
+ uintptr_t i;
+
+ for (i = 0; i < opr_sz / 2; ++i) {
+ d[i] = inl_qrdmlsh_s16(env, n[i], m[i], d[i]);
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
/* Signed saturating rounding doubling multiply-accumulate high half, 32-bit */
uint32_t HELPER(neon_qrdmlah_s32)(CPUARMState *env, int32_t src1,
int32_t src2, int32_t src3)
@@ -93,6 +135,22 @@ uint32_t HELPER(neon_qrdmlah_s32)(CPUARMState *env, int32_t src1,
return ret;
}
+void HELPER(gvec_qrdmlah_s32)(void *vd, void *vn, void *vm,
+ void *ve, uint32_t desc)
+{
+ uintptr_t opr_sz = simd_oprsz(desc);
+ int32_t *d = vd;
+ int32_t *n = vn;
+ int32_t *m = vm;
+ CPUARMState *env = ve;
+ uintptr_t i;
+
+ for (i = 0; i < opr_sz / 4; ++i) {
+ d[i] = helper_neon_qrdmlah_s32(env, n[i], m[i], d[i]);
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
/* Signed saturating rounding doubling multiply-subtract high half, 32-bit */
uint32_t HELPER(neon_qrdmlsh_s32)(CPUARMState *env, int32_t src1,
int32_t src2, int32_t src3)
@@ -107,3 +165,19 @@ uint32_t HELPER(neon_qrdmlsh_s32)(CPUARMState *env, int32_t src1,
}
return ret;
}
+
+void HELPER(gvec_qrdmlsh_s32)(void *vd, void *vn, void *vm,
+ void *ve, uint32_t desc)
+{
+ uintptr_t opr_sz = simd_oprsz(desc);
+ int32_t *d = vd;
+ int32_t *n = vn;
+ int32_t *m = vm;
+ CPUARMState *env = ve;
+ uintptr_t i;
+
+ for (i = 0; i < opr_sz / 4; ++i) {
+ d[i] = helper_neon_qrdmlsh_s32(env, n[i], m[i], d[i]);
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 0b090fe086..3836e94135 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -10678,7 +10678,89 @@ static void disas_simd_three_reg_same_fp16(DisasContext *s, uint32_t insn)
/* non-quad vector op */
clear_vec_high(s, rd);
}
+}
+
+/* AdvSIMD three same extra
+ * 31 30 29 28 24 23 22 21 20 16 15 14 11 10 9 5 4 0
+ * +---+---+---+-----------+------+---+------+---+--------+---+----+----+
+ * | 0 | Q | U | 0 1 1 1 0 | size | 0 | Rm | 1 | opcode | 1 | Rn | Rd |
+ * +---+---+---+-----------+------+---+------+---+--------+---+----+----+
+ */
+static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
+{
+ void (*fn_gvec_ptr)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
+ int rd = extract32(insn, 0, 5);
+ int rn = extract32(insn, 5, 5);
+ int opcode = extract32(insn, 11, 4);
+ int rm = extract32(insn, 16, 5);
+ int size = extract32(insn, 22, 2);
+ bool u = extract32(insn, 29, 1);
+ bool is_q = extract32(insn, 30, 1);
+ int feature;
+
+ if (!u) {
+ unallocated_encoding(s);
+ return;
+ }
+
+ switch (opcode) {
+ case 0x0: /* SQRDMLAH (vector) */
+ case 0x1: /* SQRDMLSH (vector) */
+ if (size != 1 && size != 2) {
+ unallocated_encoding(s);
+ return;
+ }
+ feature = ARM_FEATURE_V8_1_SIMD;
+ break;
+ default:
+ unallocated_encoding(s);
+ return;
+ }
+
+ if (!arm_dc_feature(s, feature)) {
+ unallocated_encoding(s);
+ return;
+ }
+ if (!fp_access_check(s)) {
+ return;
+ }
+
+ switch (opcode) {
+ case 0x0: /* SQRDMLAH (vector) */
+ switch (size) {
+ case 1:
+ fn_gvec_ptr = gen_helper_gvec_qrdmlah_s16;
+ break;
+ case 2:
+ fn_gvec_ptr = gen_helper_gvec_qrdmlah_s32;
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ goto do_env;
+
+ case 0x1: /* SQRDMLSH (vector) */
+ switch (size) {
+ case 1:
+ fn_gvec_ptr = gen_helper_gvec_qrdmlsh_s16;
+ break;
+ case 2:
+ fn_gvec_ptr = gen_helper_gvec_qrdmlsh_s32;
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ do_env:
+ tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
+ vec_full_reg_offset(s, rn),
+ vec_full_reg_offset(s, rm), cpu_env,
+ is_q ? 16 : 8, vec_full_reg_size(s),
+ 0, fn_gvec_ptr);
+ break;
+ default:
+ g_assert_not_reached();
+ }
}
static void handle_2misc_widening(DisasContext *s, int opcode, bool is_q,
@@ -12421,6 +12503,7 @@ static void disas_crypto_two_reg_sha(DisasContext *s, uint32_t insn)
static const AArch64DecodeTable data_proc_simd[] = {
/* pattern , mask , fn */
{ 0x0e200400, 0x9f200400, disas_simd_three_reg_same },
+ { 0x0e008400, 0x9f208400, disas_simd_three_reg_same_extra },
{ 0x0e200000, 0x9f200c00, disas_simd_three_reg_diff },
{ 0x0e200800, 0x9f3e0c00, disas_simd_two_reg_misc },
{ 0x0e300800, 0x9f3e0c00, disas_simd_across_lanes },
--
2.14.3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH v2 04/11] target/arm: Decode aa64 armv8.1 scalar/vector x indexed element
2017-12-18 17:24 [Qemu-devel] [PATCH v2 00/11] ARM v8.1 simd + v8.3 complex insns Richard Henderson
` (2 preceding siblings ...)
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 03/11] target/arm: Decode aa64 armv8.1 " Richard Henderson
@ 2017-12-18 17:24 ` Richard Henderson
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 05/11] target/arm: Decode aa32 armv8.1 three same Richard Henderson
` (6 subsequent siblings)
10 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2017-12-18 17:24 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/translate-a64.c | 46 ++++++++++++++++++++++++++++++++++++++++------
1 file changed, 40 insertions(+), 6 deletions(-)
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 3836e94135..85fc7af491 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -11857,12 +11857,23 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
is_long = true;
/* fall through */
case 0xc: /* SQDMULH */
- case 0xd: /* SQRDMULH */
if (u) {
unallocated_encoding(s);
return;
}
break;
+ case 0xd: /* SQRDMULH / SQRDMLAH */
+ if (u && !arm_dc_feature(s, ARM_FEATURE_V8_1_SIMD)) {
+ unallocated_encoding(s);
+ return;
+ }
+ break;
+ case 0xf: /* SQRDMLSH */
+ if (!u || !arm_dc_feature(s, ARM_FEATURE_V8_1_SIMD)) {
+ unallocated_encoding(s);
+ return;
+ }
+ break;
case 0x8: /* MUL */
if (u || is_scalar) {
unallocated_encoding(s);
@@ -12100,13 +12111,36 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
tcg_op, tcg_idx);
}
break;
- case 0xd: /* SQRDMULH */
+ case 0xd: /* SQRDMULH / SQRDMLAH */
+ if (u) { /* SQRDMLAH */
+ read_vec_element_i32(s, tcg_res, rd, pass,
+ is_scalar ? size : MO_32);
+ if (size == 1) {
+ gen_helper_neon_qrdmlah_s16(tcg_res, cpu_env,
+ tcg_op, tcg_idx, tcg_res);
+ } else {
+ gen_helper_neon_qrdmlah_s32(tcg_res, cpu_env,
+ tcg_op, tcg_idx, tcg_res);
+ }
+ } else { /* SQRDMULH */
+ if (size == 1) {
+ gen_helper_neon_qrdmulh_s16(tcg_res, cpu_env,
+ tcg_op, tcg_idx);
+ } else {
+ gen_helper_neon_qrdmulh_s32(tcg_res, cpu_env,
+ tcg_op, tcg_idx);
+ }
+ }
+ break;
+ case 0xf: /* SQRDMLSH */
+ read_vec_element_i32(s, tcg_res, rd, pass,
+ is_scalar ? size : MO_32);
if (size == 1) {
- gen_helper_neon_qrdmulh_s16(tcg_res, cpu_env,
- tcg_op, tcg_idx);
+ gen_helper_neon_qrdmlsh_s16(tcg_res, cpu_env,
+ tcg_op, tcg_idx, tcg_res);
} else {
- gen_helper_neon_qrdmulh_s32(tcg_res, cpu_env,
- tcg_op, tcg_idx);
+ gen_helper_neon_qrdmlsh_s32(tcg_res, cpu_env,
+ tcg_op, tcg_idx, tcg_res);
}
break;
default:
--
2.14.3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH v2 05/11] target/arm: Decode aa32 armv8.1 three same
2017-12-18 17:24 [Qemu-devel] [PATCH v2 00/11] ARM v8.1 simd + v8.3 complex insns Richard Henderson
` (3 preceding siblings ...)
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 04/11] target/arm: Decode aa64 armv8.1 scalar/vector x indexed element Richard Henderson
@ 2017-12-18 17:24 ` Richard Henderson
2018-01-15 17:37 ` Peter Maydell
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 06/11] target/arm: Decode aa32 armv8.1 two reg and a scalar Richard Henderson
` (5 subsequent siblings)
10 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2017-12-18 17:24 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/translate.c | 85 +++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 66 insertions(+), 19 deletions(-)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index c690658493..a9587ae242 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -25,6 +25,7 @@
#include "disas/disas.h"
#include "exec/exec-all.h"
#include "tcg-op.h"
+#include "tcg-op-gvec.h"
#include "qemu/log.h"
#include "qemu/bitops.h"
#include "arm_ldst.h"
@@ -5364,9 +5365,9 @@ static void gen_neon_narrow_op(int op, int u, int size,
#define NEON_3R_VPMAX 20
#define NEON_3R_VPMIN 21
#define NEON_3R_VQDMULH_VQRDMULH 22
-#define NEON_3R_VPADD 23
+#define NEON_3R_VPADD_VQRDMLAH 23
#define NEON_3R_SHA 24 /* SHA1C,SHA1P,SHA1M,SHA1SU0,SHA256H{2},SHA256SU1 */
-#define NEON_3R_VFM 25 /* VFMA, VFMS : float fused multiply-add */
+#define NEON_3R_VFM_VQRDMLSH 25 /* VFMA, VFMS : float fused multiply-add */
#define NEON_3R_FLOAT_ARITH 26 /* float VADD, VSUB, VPADD, VABD */
#define NEON_3R_FLOAT_MULTIPLY 27 /* float VMLA, VMLS, VMUL */
#define NEON_3R_FLOAT_CMP 28 /* float VCEQ, VCGE, VCGT */
@@ -5398,9 +5399,9 @@ static const uint8_t neon_3r_sizes[] = {
[NEON_3R_VPMAX] = 0x7,
[NEON_3R_VPMIN] = 0x7,
[NEON_3R_VQDMULH_VQRDMULH] = 0x6,
- [NEON_3R_VPADD] = 0x7,
+ [NEON_3R_VPADD_VQRDMLAH] = 0x7,
[NEON_3R_SHA] = 0xf, /* size field encodes op type */
- [NEON_3R_VFM] = 0x5, /* size bit 1 encodes op */
+ [NEON_3R_VFM_VQRDMLSH] = 0x7, /* For VFM, size bit 1 encodes op */
[NEON_3R_FLOAT_ARITH] = 0x5, /* size bit 1 encodes op */
[NEON_3R_FLOAT_MULTIPLY] = 0x5, /* size bit 1 encodes op */
[NEON_3R_FLOAT_CMP] = 0x5, /* size bit 1 encodes op */
@@ -5579,6 +5580,22 @@ static const uint8_t neon_2rm_sizes[] = {
[NEON_2RM_VCVT_UF] = 0x4,
};
+
+/* Expand v8.1 simd helper. */
+static int do_v81_helper(DisasContext *s, gen_helper_gvec_3_ptr *fn,
+ int q, int rd, int rn, int rm)
+{
+ if (arm_dc_feature(s, ARM_FEATURE_V8_1_SIMD)) {
+ int opr_sz = (1 + q) * 8;
+ tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd),
+ vfp_reg_offset(1, rn),
+ vfp_reg_offset(1, rm), cpu_env,
+ opr_sz, opr_sz, 0, fn);
+ return 0;
+ }
+ return 1;
+}
+
/* Translate a NEON data processing instruction. Return nonzero if the
instruction is invalid.
We process data in a mixture of 32-bit and 64-bit chunks.
@@ -5630,12 +5647,12 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
if (q && ((rd | rn | rm) & 1)) {
return 1;
}
- /*
- * The SHA-1/SHA-256 3-register instructions require special treatment
- * here, as their size field is overloaded as an op type selector, and
- * they all consume their input in a single pass.
- */
- if (op == NEON_3R_SHA) {
+ switch (op) {
+ case NEON_3R_SHA:
+ /* The SHA-1/SHA-256 3-register instructions require special
+ * treatment here, as their size field is overloaded as an
+ * op type selector, and they all consume their input in a
+ * single pass. */
if (!q) {
return 1;
}
@@ -5672,6 +5689,40 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
tcg_temp_free_i32(tmp2);
tcg_temp_free_i32(tmp3);
return 0;
+
+ case NEON_3R_VPADD_VQRDMLAH:
+ if (!u) {
+ break; /* VPADD */
+ }
+ /* VQRDMLAH */
+ switch (size) {
+ case 1:
+ return do_v81_helper(s, gen_helper_gvec_qrdmlah_s16,
+ q, rd, rn, rm);
+ case 2:
+ return do_v81_helper(s, gen_helper_gvec_qrdmlah_s32,
+ q, rd, rn, rm);
+ }
+ return 1;
+
+ case NEON_3R_VFM_VQRDMLSH:
+ if (!u) {
+ /* VFM, VFMS */
+ if ((5 & (1 << size)) == 0) {
+ return 1;
+ }
+ break;
+ }
+ /* VQRDMLSH */
+ switch (size) {
+ case 1:
+ return do_v81_helper(s, gen_helper_gvec_qrdmlsh_s16,
+ q, rd, rn, rm);
+ case 2:
+ return do_v81_helper(s, gen_helper_gvec_qrdmlsh_s32,
+ q, rd, rn, rm);
+ }
+ return 1;
}
if (size == 3 && op != NEON_3R_LOGIC) {
/* 64-bit element instructions. */
@@ -5757,11 +5808,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
rm = rtmp;
}
break;
- case NEON_3R_VPADD:
- if (u) {
- return 1;
- }
- /* Fall through */
+ case NEON_3R_VPADD_VQRDMLAH:
case NEON_3R_VPMAX:
case NEON_3R_VPMIN:
pairwise = 1;
@@ -5795,8 +5842,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
return 1;
}
break;
- case NEON_3R_VFM:
- if (!arm_dc_feature(s, ARM_FEATURE_VFP4) || u) {
+ case NEON_3R_VFM_VQRDMLSH:
+ if (!arm_dc_feature(s, ARM_FEATURE_VFP4)) {
return 1;
}
break;
@@ -5993,7 +6040,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
}
}
break;
- case NEON_3R_VPADD:
+ case NEON_3R_VPADD_VQRDMLAH:
switch (size) {
case 0: gen_helper_neon_padd_u8(tmp, tmp, tmp2); break;
case 1: gen_helper_neon_padd_u16(tmp, tmp, tmp2); break;
@@ -6092,7 +6139,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
}
}
break;
- case NEON_3R_VFM:
+ case NEON_3R_VFM_VQRDMLSH:
{
/* VFMA, VFMS: fused multiply-add */
TCGv_ptr fpstatus = get_fpstatus_ptr(1);
--
2.14.3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH v2 06/11] target/arm: Decode aa32 armv8.1 two reg and a scalar
2017-12-18 17:24 [Qemu-devel] [PATCH v2 00/11] ARM v8.1 simd + v8.3 complex insns Richard Henderson
` (4 preceding siblings ...)
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 05/11] target/arm: Decode aa32 armv8.1 three same Richard Henderson
@ 2017-12-18 17:24 ` Richard Henderson
2018-01-15 17:47 ` Peter Maydell
2018-01-26 13:41 ` [Qemu-devel] [Qemu-arm] " Philippe Mathieu-Daudé
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 07/11] target/arm: Add ARM_FEATURE_V8_FCMA Richard Henderson
` (4 subsequent siblings)
10 siblings, 2 replies; 30+ messages in thread
From: Richard Henderson @ 2017-12-18 17:24 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/translate.c | 38 +++++++++++++++++++++++++++++++++++---
1 file changed, 35 insertions(+), 3 deletions(-)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index a9587ae242..1a0b0eaced 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -6973,11 +6973,43 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
}
neon_store_reg64(cpu_V0, rd + pass);
}
+ break;
+ case 14: /* VQRDMLAH scalar */
+ case 15: /* VQRDMLSH scalar */
+ if (!arm_dc_feature(s, ARM_FEATURE_V8_1_SIMD)) {
+ return 1;
+ }
+ if (u && ((rd | rn) & 1)) {
+ return 1;
+ }
+ tmp2 = neon_get_scalar(size, rm);
+ for (pass = 0; pass < (u ? 4 : 2); pass++) {
+ void (*fn)(TCGv_i32, TCGv_env, TCGv_i32,
+ TCGv_i32, TCGv_i32);
-
+ tmp = neon_load_reg(rn, pass);
+ tmp3 = neon_load_reg(rd, pass);
+ if (op == 14) {
+ if (size == 1) {
+ fn = gen_helper_neon_qrdmlah_s16;
+ } else {
+ fn = gen_helper_neon_qrdmlah_s32;
+ }
+ } else {
+ if (size == 1) {
+ fn = gen_helper_neon_qrdmlsh_s16;
+ } else {
+ fn = gen_helper_neon_qrdmlsh_s32;
+ }
+ }
+ fn(tmp, cpu_env, tmp, tmp2, tmp3);
+ tcg_temp_free_i32(tmp3);
+ neon_store_reg(rd, pass, tmp);
+ }
+ tcg_temp_free_i32(tmp2);
break;
- default: /* 14 and 15 are RESERVED */
- return 1;
+ default:
+ g_assert_not_reached();
}
}
} else { /* size == 3 */
--
2.14.3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH v2 07/11] target/arm: Add ARM_FEATURE_V8_FCMA
2017-12-18 17:24 [Qemu-devel] [PATCH v2 00/11] ARM v8.1 simd + v8.3 complex insns Richard Henderson
` (5 preceding siblings ...)
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 06/11] target/arm: Decode aa32 armv8.1 two reg and a scalar Richard Henderson
@ 2017-12-18 17:24 ` Richard Henderson
2018-01-15 17:53 ` Peter Maydell
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 08/11] target/arm: Decode aa64 armv8.3 fcadd Richard Henderson
` (3 subsequent siblings)
10 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2017-12-18 17:24 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Enable it for the "any" CPU used by *-linux-user.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/cpu.h | 1 +
linux-user/elfload.c | 1 +
target/arm/cpu.c | 1 +
target/arm/cpu64.c | 1 +
4 files changed, 4 insertions(+)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index e047756b80..7a705a09a1 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1353,6 +1353,7 @@ enum arm_features {
ARM_FEATURE_JAZELLE, /* has (trivial) Jazelle implementation */
ARM_FEATURE_V8_1_SIMD, /* has ARMv8.1-SIMD */
ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
+ ARM_FEATURE_V8_FCMA, /* has complex number part of v8.3 extensions. */
};
static inline int arm_feature(CPUARMState *env, int feature)
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 95f550518e..e07184902f 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -541,6 +541,7 @@ static uint32_t get_elf_hwcap(void)
GET_FEATURE(ARM_FEATURE_V8_SHA256, ARM_HWCAP_A64_SHA2);
GET_FEATURE(ARM_FEATURE_CRC, ARM_HWCAP_A64_CRC32);
GET_FEATURE(ARM_FEATURE_V8_1_SIMD, ARM_HWCAP_A64_ASIMDRDM);
+ GET_FEATURE(ARM_FEATURE_V8_FCMA, ARM_HWCAP_A64_FCMA);
#undef GET_FEATURE
return hwcaps;
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index afe84645af..6cd8ae1459 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1629,6 +1629,7 @@ static void arm_any_initfn(Object *obj)
set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
set_feature(&cpu->env, ARM_FEATURE_CRC);
set_feature(&cpu->env, ARM_FEATURE_V8_1_SIMD);
+ set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
cpu->midr = 0xffffffff;
}
#endif
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 67a01bf7ce..43b42f95fd 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -228,6 +228,7 @@ static void aarch64_any_initfn(Object *obj)
set_feature(&cpu->env, ARM_FEATURE_CRC);
set_feature(&cpu->env, ARM_FEATURE_V8_1_SIMD);
set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
+ set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
cpu->ctr = 0x80038003; /* 32 byte I and D cacheline size, VIPT icache */
cpu->dcz_blocksize = 7; /* 512 bytes */
}
--
2.14.3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH v2 08/11] target/arm: Decode aa64 armv8.3 fcadd
2017-12-18 17:24 [Qemu-devel] [PATCH v2 00/11] ARM v8.1 simd + v8.3 complex insns Richard Henderson
` (6 preceding siblings ...)
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 07/11] target/arm: Add ARM_FEATURE_V8_FCMA Richard Henderson
@ 2017-12-18 17:24 ` Richard Henderson
2018-01-15 18:11 ` Peter Maydell
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 09/11] target/arm: Decode aa64 armv8.3 fcmla Richard Henderson
` (2 subsequent siblings)
10 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2017-12-18 17:24 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 7 ++++
target/arm/advsimd_helper.c | 93 +++++++++++++++++++++++++++++++++++++++++++++
target/arm/translate-a64.c | 36 +++++++++++++++++-
3 files changed, 135 insertions(+), 1 deletion(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 06ca458614..0f0fc942b0 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -567,6 +567,13 @@ DEF_HELPER_FLAGS_5(gvec_qrdmlah_s32, TCG_CALL_NO_RWG,
DEF_HELPER_FLAGS_5(gvec_qrdmlsh_s32, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fcaddh, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fcadds, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fcaddd, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+
#ifdef TARGET_AARCH64
#include "helper-a64.h"
#endif
diff --git a/target/arm/advsimd_helper.c b/target/arm/advsimd_helper.c
index d5185165a5..afc2bb1142 100644
--- a/target/arm/advsimd_helper.c
+++ b/target/arm/advsimd_helper.c
@@ -24,6 +24,18 @@
#include "tcg/tcg-gvec-desc.h"
+/* Note that vector data is stored in host-endian 64-bit chunks,
+ so addressing units smaller than that needs a host-endian fixup. */
+#ifdef HOST_WORDS_BIGENDIAN
+#define H1(x) ((x) ^ 7)
+#define H2(x) ((x) ^ 3)
+#define H4(x) ((x) ^ 1)
+#else
+#define H1(x) (x)
+#define H2(x) (x)
+#define H4(x) (x)
+#endif
+
#define SET_QC() env->vfp.xregs[ARM_VFP_FPSCR] |= CPSR_Q
static void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz)
@@ -181,3 +193,84 @@ void HELPER(gvec_qrdmlsh_s32)(void *vd, void *vn, void *vm,
}
clear_tail(d, opr_sz, simd_maxsz(desc));
}
+
+void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm,
+ void *vfpst, uint32_t desc)
+{
+ uintptr_t opr_sz = simd_oprsz(desc);
+ float16 *d = vd;
+ float16 *n = vn;
+ float16 *m = vm;
+ float_status *fpst = vfpst;
+ uint32_t neg_real = extract32(desc, SIMD_DATA_SHIFT, 1);
+ uint32_t neg_imag = neg_real ^ 1;
+ uintptr_t i;
+
+ neg_real <<= 15;
+ neg_imag <<= 15;
+
+ for (i = 0; i < opr_sz / 2; i += 2) {
+ float16 e0 = n[H2(i)];
+ float16 e1 = m[H2(i + 1)] ^ neg_imag;
+ float16 e2 = n[H2(i + 1)];
+ float16 e3 = m[H2(i)] ^ neg_real;
+
+ d[H2(i)] = float16_add(e0, e1, fpst);
+ d[H2(i + 1)] = float16_add(e2, e3, fpst);
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_fcadds)(void *vd, void *vn, void *vm,
+ void *vfpst, uint32_t desc)
+{
+ uintptr_t opr_sz = simd_oprsz(desc);
+ float32 *d = vd;
+ float32 *n = vn;
+ float32 *m = vm;
+ float_status *fpst = vfpst;
+ uint32_t neg_real = extract32(desc, SIMD_DATA_SHIFT, 1);
+ uint32_t neg_imag = neg_real ^ 1;
+ uintptr_t i;
+
+ neg_real <<= 31;
+ neg_imag <<= 31;
+
+ for (i = 0; i < opr_sz / 4; i += 2) {
+ float32 e0 = n[H4(i)];
+ float32 e1 = m[H4(i + 1)] ^ neg_imag;
+ float32 e2 = n[H4(i + 1)];
+ float32 e3 = m[H4(i)] ^ neg_real;
+
+ d[H4(i)] = float32_add(e0, e1, fpst);
+ d[H4(i + 1)] = float32_add(e2, e3, fpst);
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_fcaddd)(void *vd, void *vn, void *vm,
+ void *vfpst, uint32_t desc)
+{
+ uintptr_t opr_sz = simd_oprsz(desc);
+ float64 *d = vd;
+ float64 *n = vn;
+ float64 *m = vm;
+ float_status *fpst = vfpst;
+ uint64_t neg_real = extract64(desc, SIMD_DATA_SHIFT, 1);
+ uint64_t neg_imag = neg_real ^ 1;
+ uintptr_t i;
+
+ neg_real <<= 63;
+ neg_imag <<= 63;
+
+ for (i = 0; i < opr_sz / 8; i += 2) {
+ float64 e0 = n[i];
+ float64 e1 = m[i + 1] ^ neg_imag;
+ float64 e2 = n[i + 1];
+ float64 e3 = m[i] ^ neg_real;
+
+ d[i] = float64_add(e0, e1, fpst);
+ d[i + 1] = float64_add(e2, e3, fpst);
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 85fc7af491..89a0616894 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -10696,7 +10696,8 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
int size = extract32(insn, 22, 2);
bool u = extract32(insn, 29, 1);
bool is_q = extract32(insn, 30, 1);
- int feature;
+ int feature, data;
+ TCGv_ptr fpst;
if (!u) {
unallocated_encoding(s);
@@ -10712,6 +10713,14 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
feature = ARM_FEATURE_V8_1_SIMD;
break;
+ case 0xc: /* FCADD, #90 */
+ case 0xe: /* FCADD, #270 */
+ if (size == 0 || (size == 3 && !is_q)) {
+ unallocated_encoding(s);
+ return;
+ }
+ feature = ARM_FEATURE_V8_FCMA;
+ break;
default:
unallocated_encoding(s);
return;
@@ -10758,6 +10767,31 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
0, fn_gvec_ptr);
break;
+ case 0xc: /* FCADD, #90 */
+ case 0xe: /* FCADD, #270 */
+ switch (size) {
+ case 1:
+ fn_gvec_ptr = gen_helper_gvec_fcaddh;
+ break;
+ case 2:
+ fn_gvec_ptr = gen_helper_gvec_fcadds;
+ break;
+ case 3:
+ fn_gvec_ptr = gen_helper_gvec_fcaddd;
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ data = extract32(opcode, 1, 1);
+ fpst = get_fpstatus_ptr(size == 1);
+ tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
+ vec_full_reg_offset(s, rn),
+ vec_full_reg_offset(s, rm), fpst,
+ is_q ? 16 : 8, vec_full_reg_size(s),
+ data, fn_gvec_ptr);
+ tcg_temp_free_ptr(fpst);
+ break;
+
default:
g_assert_not_reached();
}
--
2.14.3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH v2 09/11] target/arm: Decode aa64 armv8.3 fcmla
2017-12-18 17:24 [Qemu-devel] [PATCH v2 00/11] ARM v8.1 simd + v8.3 complex insns Richard Henderson
` (7 preceding siblings ...)
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 08/11] target/arm: Decode aa64 armv8.3 fcadd Richard Henderson
@ 2017-12-18 17:24 ` Richard Henderson
2018-01-15 18:18 ` Peter Maydell
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 10/11] target/arm: Decode aa32 armv8.3 3-same Richard Henderson
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 11/11] target/arm: Decode aa32 armv8.3 2-reg-index Richard Henderson
10 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2017-12-18 17:24 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 11 ++++
target/arm/advsimd_helper.c | 144 ++++++++++++++++++++++++++++++++++++++++++
target/arm/translate-a64.c | 149 ++++++++++++++++++++++++++++++++------------
3 files changed, 265 insertions(+), 39 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 0f0fc942b0..5b6333347d 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -574,6 +574,17 @@ DEF_HELPER_FLAGS_5(gvec_fcadds, TCG_CALL_NO_RWG,
DEF_HELPER_FLAGS_5(gvec_fcaddd, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fcmlah, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fcmlah_idx, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fcmlas, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fcmlas_idx, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fcmlad, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+
#ifdef TARGET_AARCH64
#include "helper-a64.h"
#endif
diff --git a/target/arm/advsimd_helper.c b/target/arm/advsimd_helper.c
index afc2bb1142..6a2a53e111 100644
--- a/target/arm/advsimd_helper.c
+++ b/target/arm/advsimd_helper.c
@@ -274,3 +274,147 @@ void HELPER(gvec_fcaddd)(void *vd, void *vn, void *vm,
}
clear_tail(d, opr_sz, simd_maxsz(desc));
}
+
+void HELPER(gvec_fcmlah)(void *vd, void *vn, void *vm,
+ void *vfpst, uint32_t desc)
+{
+ uintptr_t opr_sz = simd_oprsz(desc);
+ float16 *d = vd;
+ float16 *n = vn;
+ float16 *m = vm;
+ float_status *fpst = vfpst;
+ intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+ uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+ uint32_t neg_real = flip ^ neg_imag;
+ uintptr_t i;
+
+ neg_real <<= 15;
+ neg_imag <<= 15;
+
+ for (i = 0; i < opr_sz / 2; i += 2) {
+ float16 e0 = n[H2(i + flip)];
+ float16 e1 = m[H2(i + flip)] ^ neg_real;
+ float16 e2 = e0;
+ float16 e3 = m[H2(i + 1 - flip)] ^ neg_imag;
+
+ d[H2(i)] = float16_muladd(e0, e1, d[H2(i)], 0, fpst);
+ d[H2(i + 1)] = float16_muladd(e2, e3, d[H2(i + 1)], 0, fpst);
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_fcmlah_idx)(void *vd, void *vn, void *vm,
+ void *vfpst, uint32_t desc)
+{
+ uintptr_t opr_sz = simd_oprsz(desc);
+ float16 *d = vd;
+ float16 *n = vn;
+ float16 *m = vm;
+ float_status *fpst = vfpst;
+ intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+ uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+ uint32_t neg_real = flip ^ neg_imag;
+ uintptr_t i;
+ float16 e1 = m[H2(flip)];
+ float16 e3 = m[H2(1 - flip)];
+
+ neg_real <<= 15;
+ neg_imag <<= 15;
+ e1 ^= neg_real;
+ e3 ^= neg_imag;
+
+ for (i = 0; i < opr_sz / 2; i += 2) {
+ float16 e0 = n[H2(i + flip)];
+ float16 e2 = e0;
+
+ d[H2(i)] = float16_muladd(e0, e1, d[H2(i)], 0, fpst);
+ d[H2(i + 1)] = float16_muladd(e2, e3, d[H2(i + 1)], 0, fpst);
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_fcmlas)(void *vd, void *vn, void *vm,
+ void *vfpst, uint32_t desc)
+{
+ uintptr_t opr_sz = simd_oprsz(desc);
+ float32 *d = vd;
+ float32 *n = vn;
+ float32 *m = vm;
+ float_status *fpst = vfpst;
+ intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+ uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+ uint32_t neg_real = flip ^ neg_imag;
+ uintptr_t i;
+
+ neg_real <<= 31;
+ neg_imag <<= 31;
+
+ for (i = 0; i < opr_sz / 4; i += 2) {
+ float32 e0 = n[H4(i + flip)];
+ float32 e1 = m[H4(i + flip)] ^ neg_real;
+ float32 e2 = e0;
+ float32 e3 = m[H4(i + 1 - flip)] ^ neg_imag;
+
+ d[H4(i)] = float32_muladd(e0, e1, d[H4(i)], 0, fpst);
+ d[H4(i + 1)] = float32_muladd(e2, e3, d[H4(i + 1)], 0, fpst);
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_fcmlas_idx)(void *vd, void *vn, void *vm,
+ void *vfpst, uint32_t desc)
+{
+ uintptr_t opr_sz = simd_oprsz(desc);
+ float32 *d = vd;
+ float32 *n = vn;
+ float32 *m = vm;
+ float_status *fpst = vfpst;
+ intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+ uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+ uint32_t neg_real = flip ^ neg_imag;
+ uintptr_t i;
+ float32 e1 = m[H4(flip)];
+ float32 e3 = m[H4(1 - flip)];
+
+ neg_real <<= 31;
+ neg_imag <<= 31;
+ e1 ^= neg_real;
+ e3 ^= neg_imag;
+
+ for (i = 0; i < opr_sz / 4; i += 2) {
+ float32 e0 = n[H4(i + flip)];
+ float32 e2 = e0;
+
+ d[H4(i)] = float32_muladd(e0, e1, d[H4(i)], 0, fpst);
+ d[H4(i + 1)] = float32_muladd(e2, e3, d[H4(i + 1)], 0, fpst);
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_fcmlad)(void *vd, void *vn, void *vm,
+ void *vfpst, uint32_t desc)
+{
+ uintptr_t opr_sz = simd_oprsz(desc);
+ float64 *d = vd;
+ float64 *n = vn;
+ float64 *m = vm;
+ float_status *fpst = vfpst;
+ intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
+ uint64_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+ uint64_t neg_real = flip ^ neg_imag;
+ uintptr_t i;
+
+ neg_real <<= 63;
+ neg_imag <<= 63;
+
+ for (i = 0; i < opr_sz / 8; i += 2) {
+ float64 e0 = n[i + flip];
+ float64 e1 = m[i + flip] ^ neg_real;
+ float64 e2 = e0;
+ float64 e3 = m[i + 1 - flip] ^ neg_imag;
+
+ d[i] = float64_muladd(e0, e1, d[i], 0, fpst);
+ d[i + 1] = float64_muladd(e2, e3, d[i + 1], 0, fpst);
+ }
+ clear_tail(d, opr_sz, simd_maxsz(desc));
+}
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 89a0616894..79fede35c1 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -10713,6 +10713,10 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
feature = ARM_FEATURE_V8_1_SIMD;
break;
+ case 0x8: /* FCMLA, #0 */
+ case 0x9: /* FCMLA, #90 */
+ case 0xa: /* FCMLA, #180 */
+ case 0xb: /* FCMLA, #270 */
case 0xc: /* FCADD, #90 */
case 0xe: /* FCADD, #270 */
if (size == 0 || (size == 3 && !is_q)) {
@@ -10767,6 +10771,26 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
0, fn_gvec_ptr);
break;
+ case 0x8: /* FCMLA, #0 */
+ case 0x9: /* FCMLA, #90 */
+ case 0xa: /* FCMLA, #180 */
+ case 0xb: /* FCMLA, #270 */
+ switch (size) {
+ case 1:
+ fn_gvec_ptr = gen_helper_gvec_fcmlah;
+ break;
+ case 2:
+ fn_gvec_ptr = gen_helper_gvec_fcmlas;
+ break;
+ case 3:
+ fn_gvec_ptr = gen_helper_gvec_fcmlad;
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ data = extract32(opcode, 0, 2);
+ goto do_fpst;
+
case 0xc: /* FCADD, #90 */
case 0xe: /* FCADD, #270 */
switch (size) {
@@ -10783,6 +10807,7 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
g_assert_not_reached();
}
data = extract32(opcode, 1, 1);
+ do_fpst:
fpst = get_fpstatus_ptr(size == 1);
tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
vec_full_reg_offset(s, rn),
@@ -11864,80 +11889,80 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
int rn = extract32(insn, 5, 5);
int rd = extract32(insn, 0, 5);
bool is_long = false;
- bool is_fp = false;
+ int is_fp = 0;
+ bool is_fp16 = false;
int index;
TCGv_ptr fpst;
- switch (opcode) {
- case 0x0: /* MLA */
- case 0x4: /* MLS */
- if (!u || is_scalar) {
+ switch (16 * u + opcode) {
+ case 0x00: /* MLA */
+ case 0x04: /* MLS */
+ case 0x08: /* MUL */
+ if (is_scalar) {
unallocated_encoding(s);
return;
}
break;
- case 0x2: /* SMLAL, SMLAL2, UMLAL, UMLAL2 */
- case 0x6: /* SMLSL, SMLSL2, UMLSL, UMLSL2 */
- case 0xa: /* SMULL, SMULL2, UMULL, UMULL2 */
+ case 0x02: /* SMLAL, SMLAL2 */
+ case 0x12: /* UMLAL, UMLAL2 */
+ case 0x06: /* SMLSL, SMLSL2 */
+ case 0x16: /* UMLSL, UMLSL2 */
+ case 0x0a: /* SMULL, SMULL2 */
+ case 0x1a: /* UMULL, UMULL2 */
if (is_scalar) {
unallocated_encoding(s);
return;
}
is_long = true;
break;
- case 0x3: /* SQDMLAL, SQDMLAL2 */
- case 0x7: /* SQDMLSL, SQDMLSL2 */
- case 0xb: /* SQDMULL, SQDMULL2 */
+ case 0x03: /* SQDMLAL, SQDMLAL2 */
+ case 0x07: /* SQDMLSL, SQDMLSL2 */
+ case 0x0b: /* SQDMULL, SQDMULL2 */
is_long = true;
- /* fall through */
- case 0xc: /* SQDMULH */
- if (u) {
- unallocated_encoding(s);
- return;
- }
break;
- case 0xd: /* SQRDMULH / SQRDMLAH */
- if (u && !arm_dc_feature(s, ARM_FEATURE_V8_1_SIMD)) {
- unallocated_encoding(s);
- return;
- }
+ case 0x0c: /* SQDMULH */
+ case 0x0d: /* SQRDMULH */
break;
- case 0xf: /* SQRDMLSH */
- if (!u || !arm_dc_feature(s, ARM_FEATURE_V8_1_SIMD)) {
+ case 0x1d: /* SQRDMLAH */
+ case 0x1f: /* SQRDMLSH */
+ if (!arm_dc_feature(s, ARM_FEATURE_V8_1_SIMD)) {
unallocated_encoding(s);
return;
}
break;
- case 0x8: /* MUL */
- if (u || is_scalar) {
+ case 0x11: /* FCMLA #0 */
+ case 0x13: /* FCMLA #90 */
+ case 0x15: /* FCMLA #180 */
+ case 0x17: /* FCMLA #270 */
+ if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)) {
unallocated_encoding(s);
return;
}
+ is_fp = 2;
break;
- case 0x1: /* FMLA */
- case 0x5: /* FMLS */
- if (u) {
- unallocated_encoding(s);
- return;
- }
- /* fall through */
- case 0x9: /* FMUL, FMULX */
- if (size == 1 || (size < 2 && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))) {
+ case 0x01: /* FMLA */
+ case 0x05: /* FMLS */
+ case 0x09: /* FMUL */
+ case 0x19: /* FMULX */
+ if (size == 1
+ || (size < 2 && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))) {
unallocated_encoding(s);
return;
}
- is_fp = true;
+ is_fp = 1;
break;
default:
unallocated_encoding(s);
return;
}
- if (is_fp) {
+ switch (is_fp) {
+ case 1: /* normal fp */
/* convert insn encoded size to TCGMemOp size */
switch (size) {
case 0: /* half-precision */
size = MO_16;
+ is_fp16 = true;
index = h << 2 | l << 1 | m;
break;
case 2: /* single precision */
@@ -11958,7 +11983,36 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
g_assert_not_reached();
break;
}
- } else {
+ break;
+
+ case 2: /* complex fp */
+ switch (size) {
+ case 1:
+ size = MO_32;
+ is_fp16 = true;
+ if (h && !is_q) {
+ unallocated_encoding(s);
+ return;
+ }
+ index = h << 1 | l;
+ rm |= (m << 4);
+ break;
+ case 2:
+ size = MO_64;
+ if (l || !is_q) {
+ unallocated_encoding(s);
+ return;
+ }
+ index = h;
+ rm |= (m << 4);
+ break;
+ default:
+ unallocated_encoding(s);
+ return;
+ }
+ break;
+
+ default: /* integer */
switch (size) {
case 1:
index = h << 2 | l << 1 | m;
@@ -11978,11 +12032,28 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
}
if (is_fp) {
- fpst = get_fpstatus_ptr(false);
+ fpst = get_fpstatus_ptr(is_fp16);
} else {
fpst = NULL;
}
+ switch (16 * u + opcode) {
+ case 0x11: /* FCMLA #0 */
+ case 0x13: /* FCMLA #90 */
+ case 0x15: /* FCMLA #180 */
+ case 0x17: /* FCMLA #270 */
+ tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
+ vec_full_reg_offset(s, rn),
+ vec_reg_offset(s, rm, index, size), fpst,
+ is_q ? 16 : 8, vec_full_reg_size(s),
+ extract32(insn, 13, 2), /* rot */
+ size == MO_64
+ ? gen_helper_gvec_fcmlas_idx
+ : gen_helper_gvec_fcmlah_idx);
+ tcg_temp_free_ptr(fpst);
+ return;
+ }
+
if (size == 3) {
TCGv_i64 tcg_idx = tcg_temp_new_i64();
int pass;
--
2.14.3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH v2 10/11] target/arm: Decode aa32 armv8.3 3-same
2017-12-18 17:24 [Qemu-devel] [PATCH v2 00/11] ARM v8.1 simd + v8.3 complex insns Richard Henderson
` (8 preceding siblings ...)
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 09/11] target/arm: Decode aa64 armv8.3 fcmla Richard Henderson
@ 2017-12-18 17:24 ` Richard Henderson
2018-01-15 18:46 ` Peter Maydell
2018-01-15 18:49 ` Peter Maydell
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 11/11] target/arm: Decode aa32 armv8.3 2-reg-index Richard Henderson
10 siblings, 2 replies; 30+ messages in thread
From: Richard Henderson @ 2017-12-18 17:24 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/translate.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 65 insertions(+)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 1a0b0eaced..e57844c019 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7662,6 +7662,65 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
return 0;
}
+/* ARMv8.3 reclaims a portion of the LDC2/STC2 coprocessor 8 space. */
+
+static int disas_neon_insn_cp8_3same(DisasContext *s, uint32_t insn)
+{
+ void (*fn_gvec_ptr)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
+ int rd, rn, rm, rot, size, opr_sz;
+ TCGv_ptr fpst;
+ bool q;
+
+ /* FIXME: this access check should not take precedence over UNDEF
+ * for invalid encodings; we will generate incorrect syndrome information
+ * for attempts to execute invalid vfp/neon encodings with FP disabled.
+ */
+ if (s->fp_excp_el) {
+ gen_exception_insn(s, 4, EXCP_UDEF,
+ syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
+ return 0;
+ }
+ if (!s->vfp_enabled) {
+ return 1;
+ }
+ if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)) {
+ return 1;
+ }
+
+ q = extract32(insn, 6, 1);
+ size = extract32(insn, 20, 1);
+ VFP_DREG_D(rd, insn);
+ VFP_DREG_N(rn, insn);
+ VFP_DREG_M(rm, insn);
+ if ((rd | rn | rm) & q) {
+ return 1;
+ }
+
+ if (extract32(insn, 21, 1)) {
+ /* VCMLA */
+ rot = extract32(insn, 23, 2);
+ fn_gvec_ptr = size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
+ } else if (extract32(insn, 23, 1)) {
+ /* VCADD */
+ rot = extract32(insn, 24, 1);
+ fn_gvec_ptr = size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
+ } else {
+ /* Assuming the register fields remain, only bit 24 remains undecoded:
+ * 1111_110x_0d0s_nnnn_dddd_1000_nqm0_mmmm
+ */
+ return 1;
+ }
+
+ opr_sz = (1 + q) * 8;
+ fpst = get_fpstatus_ptr(1);
+ tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd),
+ vfp_reg_offset(1, rn),
+ vfp_reg_offset(1, rm), fpst,
+ opr_sz, opr_sz, rot, fn_gvec_ptr);
+ tcg_temp_free_ptr(fpst);
+ return 0;
+}
+
static int disas_coproc_insn(DisasContext *s, uint32_t insn)
{
int cpnum, is64, crn, crm, opc1, opc2, isread, rt, rt2;
@@ -8406,6 +8465,12 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
}
}
}
+ } else if ((insn & 0x0e000f10) == 0x0c000800) {
+ /* ARMv8.3 neon ldc2/stc2 coprocessor 8 extension. */
+ if (disas_neon_insn_cp8_3same(s, insn)) {
+ goto illegal_op;
+ }
+ return;
} else if ((insn & 0x0fe00000) == 0x0c400000) {
/* Coprocessor double register transfer. */
ARCH(5TE);
--
2.14.3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH v2 11/11] target/arm: Decode aa32 armv8.3 2-reg-index
2017-12-18 17:24 [Qemu-devel] [PATCH v2 00/11] ARM v8.1 simd + v8.3 complex insns Richard Henderson
` (9 preceding siblings ...)
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 10/11] target/arm: Decode aa32 armv8.3 3-same Richard Henderson
@ 2017-12-18 17:24 ` Richard Henderson
2018-01-15 18:51 ` Peter Maydell
10 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2017-12-18 17:24 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/translate.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index e57844c019..490e120b0b 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7721,6 +7721,51 @@ static int disas_neon_insn_cp8_3same(DisasContext *s, uint32_t insn)
return 0;
}
+/* ARMv8.3 reclaims a portion of the CDP2 coprocessor 8 space. */
+
+static int disas_neon_insn_cp8_index(DisasContext *s, uint32_t insn)
+{
+ int rd, rn, rm, rot, size, opr_sz;
+ TCGv_ptr fpst;
+ bool q;
+
+ /* FIXME: this access check should not take precedence over UNDEF
+ * for invalid encodings; we will generate incorrect syndrome information
+ * for attempts to execute invalid vfp/neon encodings with FP disabled.
+ */
+ if (s->fp_excp_el) {
+ gen_exception_insn(s, 4, EXCP_UDEF,
+ syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
+ return 0;
+ }
+ if (!s->vfp_enabled || !arm_dc_feature(s, ARM_FEATURE_V8_FCMA)) {
+ return 1;
+ }
+
+ q = extract32(insn, 6, 1);
+ size = extract32(insn, 23, 1);
+
+ VFP_DREG_D(rd, insn);
+ VFP_DREG_N(rn, insn);
+ VFP_DREG_M(rm, insn);
+ if ((rd | rn) & q) {
+ return 1;
+ }
+
+ /* This entire space is VCMLA (indexed). */
+ rot = extract32(insn, 20, 2);
+ opr_sz = (1 + q) * 8;
+ fpst = get_fpstatus_ptr(1);
+ tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd),
+ vfp_reg_offset(1, rn),
+ vfp_reg_offset(1, rm), fpst,
+ opr_sz, opr_sz, rot,
+ size ? gen_helper_gvec_fcmlas_idx
+ : gen_helper_gvec_fcmlah_idx);
+ tcg_temp_free_ptr(fpst);
+ return 0;
+}
+
static int disas_coproc_insn(DisasContext *s, uint32_t insn)
{
int cpnum, is64, crn, crm, opc1, opc2, isread, rt, rt2;
@@ -8471,6 +8516,12 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
goto illegal_op;
}
return;
+ } else if ((insn & 0x0f000f10) == 0x0e000800) {
+ /* ARMv8.3 neon cdp2 coprocessor 8 extension. */
+ if (disas_neon_insn_cp8_index(s, insn)) {
+ goto illegal_op;
+ }
+ return;
} else if ((insn & 0x0fe00000) == 0x0c400000) {
/* Coprocessor double register transfer. */
ARCH(5TE);
--
2.14.3
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 01/11] target/arm: Add ARM_FEATURE_V8_1_SIMD
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 01/11] target/arm: Add ARM_FEATURE_V8_1_SIMD Richard Henderson
@ 2018-01-15 17:21 ` Peter Maydell
0 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2018-01-15 17:21 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:24, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Enable it for the "any" CPU used by *-linux-user.
>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/cpu.h | 1 +
> linux-user/elfload.c | 9 +++++++++
> target/arm/cpu.c | 1 +
> target/arm/cpu64.c | 1 +
> 4 files changed, 12 insertions(+)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 715ec6a476..e047756b80 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -1351,6 +1351,7 @@ enum arm_features {
> ARM_FEATURE_VBAR, /* has cp15 VBAR */
> ARM_FEATURE_M_SECURITY, /* M profile Security Extension */
> ARM_FEATURE_JAZELLE, /* has (trivial) Jazelle implementation */
> + ARM_FEATURE_V8_1_SIMD, /* has ARMv8.1-SIMD */
> ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
> };
I kind of prefer the kernel's choice of ASIMDRDM rather than
V8_1_SIMD, but the latter is the official architectural feature
name, so oh well.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 03/11] target/arm: Decode aa64 armv8.1 three same extra
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 03/11] target/arm: Decode aa64 armv8.1 " Richard Henderson
@ 2018-01-15 17:21 ` Peter Maydell
0 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2018-01-15 17:21 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:24, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/helper.h | 9 +++++
> target/arm/advsimd_helper.c | 74 ++++++++++++++++++++++++++++++++++++++++
> target/arm/translate-a64.c | 83 +++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 166 insertions(+)
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 05/11] target/arm: Decode aa32 armv8.1 three same
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 05/11] target/arm: Decode aa32 armv8.1 three same Richard Henderson
@ 2018-01-15 17:37 ` Peter Maydell
0 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2018-01-15 17:37 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:24, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/translate.c | 85 +++++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 66 insertions(+), 19 deletions(-)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index c690658493..a9587ae242 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -25,6 +25,7 @@
> #include "disas/disas.h"
> #include "exec/exec-all.h"
> #include "tcg-op.h"
> +#include "tcg-op-gvec.h"
> #include "qemu/log.h"
> #include "qemu/bitops.h"
> #include "arm_ldst.h"
> @@ -5364,9 +5365,9 @@ static void gen_neon_narrow_op(int op, int u, int size,
> #define NEON_3R_VPMAX 20
> #define NEON_3R_VPMIN 21
> #define NEON_3R_VQDMULH_VQRDMULH 22
> -#define NEON_3R_VPADD 23
> +#define NEON_3R_VPADD_VQRDMLAH 23
> #define NEON_3R_SHA 24 /* SHA1C,SHA1P,SHA1M,SHA1SU0,SHA256H{2},SHA256SU1 */
> -#define NEON_3R_VFM 25 /* VFMA, VFMS : float fused multiply-add */
> +#define NEON_3R_VFM_VQRDMLSH 25 /* VFMA, VFMS : float fused multiply-add */
If this case includes VQRDLMSH as well now, then the comment needs updating.
I would suggest just /* VFMA, VFMS, VQRDMLSH */
> #define NEON_3R_FLOAT_ARITH 26 /* float VADD, VSUB, VPADD, VABD */
> #define NEON_3R_FLOAT_MULTIPLY 27 /* float VMLA, VMLS, VMUL */
> #define NEON_3R_FLOAT_CMP 28 /* float VCEQ, VCGE, VCGT */
> @@ -5630,12 +5647,12 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
> if (q && ((rd | rn | rm) & 1)) {
> return 1;
> }
> - /*
> - * The SHA-1/SHA-256 3-register instructions require special treatment
> - * here, as their size field is overloaded as an op type selector, and
> - * they all consume their input in a single pass.
> - */
> - if (op == NEON_3R_SHA) {
> + switch (op) {
> + case NEON_3R_SHA:
> + /* The SHA-1/SHA-256 3-register instructions require special
> + * treatment here, as their size field is overloaded as an
> + * op type selector, and they all consume their input in a
> + * single pass. */
You've lost the newline before the '*/' here.
> if (!q) {
> return 1;
> }
> @@ -5672,6 +5689,40 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
> tcg_temp_free_i32(tmp2);
> tcg_temp_free_i32(tmp3);
> return 0;
> +
> + case NEON_3R_VPADD_VQRDMLAH:
> + if (!u) {
> + break; /* VPADD */
> + }
> + /* VQRDMLAH */
> + switch (size) {
> + case 1:
> + return do_v81_helper(s, gen_helper_gvec_qrdmlah_s16,
> + q, rd, rn, rm);
> + case 2:
> + return do_v81_helper(s, gen_helper_gvec_qrdmlah_s32,
> + q, rd, rn, rm);
> + }
> + return 1;
> +
> + case NEON_3R_VFM_VQRDMLSH:
> + if (!u) {
> + /* VFM, VFMS */
> + if ((5 & (1 << size)) == 0) {
You could write this 'if (size == 1)' (since the neon_e3r_sizes[]
check has already ruled out bit 3 being set)...
Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 06/11] target/arm: Decode aa32 armv8.1 two reg and a scalar
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 06/11] target/arm: Decode aa32 armv8.1 two reg and a scalar Richard Henderson
@ 2018-01-15 17:47 ` Peter Maydell
2018-01-26 7:18 ` Richard Henderson
2018-01-26 13:41 ` [Qemu-devel] [Qemu-arm] " Philippe Mathieu-Daudé
1 sibling, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2018-01-15 17:47 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:24, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/translate.c | 38 +++++++++++++++++++++++++++++++++++---
> 1 file changed, 35 insertions(+), 3 deletions(-)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index a9587ae242..1a0b0eaced 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -6973,11 +6973,43 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
> }
> neon_store_reg64(cpu_V0, rd + pass);
> }
> + break;
> + case 14: /* VQRDMLAH scalar */
> + case 15: /* VQRDMLSH scalar */
> + if (!arm_dc_feature(s, ARM_FEATURE_V8_1_SIMD)) {
> + return 1;
> + }
> + if (u && ((rd | rn) & 1)) {
> + return 1;
> + }
The pseudocode also has UNDEF if Q==1 && Vm<0> == 1 -- have we
already done that test earlier? I can't see it, but our neon decode
code is quite hard to read...
> + tmp2 = neon_get_scalar(size, rm);
> + for (pass = 0; pass < (u ? 4 : 2); pass++) {
> + void (*fn)(TCGv_i32, TCGv_env, TCGv_i32,
> + TCGv_i32, TCGv_i32);
Can we define a typedef for this, please ?
>
> -
> + tmp = neon_load_reg(rn, pass);
> + tmp3 = neon_load_reg(rd, pass);
> + if (op == 14) {
> + if (size == 1) {
> + fn = gen_helper_neon_qrdmlah_s16;
> + } else {
> + fn = gen_helper_neon_qrdmlah_s32;
> + }
> + } else {
> + if (size == 1) {
> + fn = gen_helper_neon_qrdmlsh_s16;
> + } else {
> + fn = gen_helper_neon_qrdmlsh_s32;
> + }
> + }
> + fn(tmp, cpu_env, tmp, tmp2, tmp3);
> + tcg_temp_free_i32(tmp3);
> + neon_store_reg(rd, pass, tmp);
> + }
> + tcg_temp_free_i32(tmp2);
> break;
> - default: /* 14 and 15 are RESERVED */
> - return 1;
> + default:
> + g_assert_not_reached();
> }
> }
> } else { /* size == 3 */
> --
> 2.14.3
Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 07/11] target/arm: Add ARM_FEATURE_V8_FCMA
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 07/11] target/arm: Add ARM_FEATURE_V8_FCMA Richard Henderson
@ 2018-01-15 17:53 ` Peter Maydell
2018-01-15 18:03 ` Richard Henderson
0 siblings, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2018-01-15 17:53 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:24, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Enable it for the "any" CPU used by *-linux-user.
>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/cpu.h | 1 +
> linux-user/elfload.c | 1 +
> target/arm/cpu.c | 1 +
> target/arm/cpu64.c | 1 +
> 4 files changed, 4 insertions(+)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index e047756b80..7a705a09a1 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -1353,6 +1353,7 @@ enum arm_features {
> ARM_FEATURE_JAZELLE, /* has (trivial) Jazelle implementation */
> ARM_FEATURE_V8_1_SIMD, /* has ARMv8.1-SIMD */
> ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
> + ARM_FEATURE_V8_FCMA, /* has complex number part of v8.3 extensions. */
This is a bit inconsistent. If we're going with the architectural
feature names this is "ARMv8.3-CompNum", so ARM_FEATURE_V8_3_COMPNUM.
If we're going with the ID_AA64ISARn_EL1 field names then
ARM_FEATURE_V8_1_SIMD should be ARM_FEATURE_V8_RDM...
> };
thanks
-- PMM
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 07/11] target/arm: Add ARM_FEATURE_V8_FCMA
2018-01-15 17:53 ` Peter Maydell
@ 2018-01-15 18:03 ` Richard Henderson
0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-01-15 18:03 UTC (permalink / raw)
To: Peter Maydell; +Cc: QEMU Developers, qemu-arm
On 01/15/2018 09:53 AM, Peter Maydell wrote:
> On 18 December 2017 at 17:24, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> Enable it for the "any" CPU used by *-linux-user.
>>
>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>> target/arm/cpu.h | 1 +
>> linux-user/elfload.c | 1 +
>> target/arm/cpu.c | 1 +
>> target/arm/cpu64.c | 1 +
>> 4 files changed, 4 insertions(+)
>>
>> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
>> index e047756b80..7a705a09a1 100644
>> --- a/target/arm/cpu.h
>> +++ b/target/arm/cpu.h
>> @@ -1353,6 +1353,7 @@ enum arm_features {
>> ARM_FEATURE_JAZELLE, /* has (trivial) Jazelle implementation */
>> ARM_FEATURE_V8_1_SIMD, /* has ARMv8.1-SIMD */
>> ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
>> + ARM_FEATURE_V8_FCMA, /* has complex number part of v8.3 extensions. */
>
> This is a bit inconsistent. If we're going with the architectural
> feature names this is "ARMv8.3-CompNum", so ARM_FEATURE_V8_3_COMPNUM.
> If we're going with the ID_AA64ISARn_EL1 field names then
> ARM_FEATURE_V8_1_SIMD should be ARM_FEATURE_V8_RDM...
You preferred ASIMDRDM earlier. Does this mean you prefer V8_RDM?
I hadn't noticed I was being inconsistent as to whence I drew my names. Gimme
your preferred names and I'll use 'em.
r~
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 08/11] target/arm: Decode aa64 armv8.3 fcadd
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 08/11] target/arm: Decode aa64 armv8.3 fcadd Richard Henderson
@ 2018-01-15 18:11 ` Peter Maydell
0 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2018-01-15 18:11 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:24, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/helper.h | 7 ++++
> target/arm/advsimd_helper.c | 93 +++++++++++++++++++++++++++++++++++++++++++++
> target/arm/translate-a64.c | 36 +++++++++++++++++-
> 3 files changed, 135 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/helper.h b/target/arm/helper.h
> index 06ca458614..0f0fc942b0 100644
> --- a/target/arm/helper.h
> +++ b/target/arm/helper.h
> @@ -567,6 +567,13 @@ DEF_HELPER_FLAGS_5(gvec_qrdmlah_s32, TCG_CALL_NO_RWG,
> DEF_HELPER_FLAGS_5(gvec_qrdmlsh_s32, TCG_CALL_NO_RWG,
> void, ptr, ptr, ptr, ptr, i32)
>
> +DEF_HELPER_FLAGS_5(gvec_fcaddh, TCG_CALL_NO_RWG,
> + void, ptr, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_5(gvec_fcadds, TCG_CALL_NO_RWG,
> + void, ptr, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_5(gvec_fcaddd, TCG_CALL_NO_RWG,
> + void, ptr, ptr, ptr, ptr, i32)
> +
> #ifdef TARGET_AARCH64
> #include "helper-a64.h"
> #endif
> diff --git a/target/arm/advsimd_helper.c b/target/arm/advsimd_helper.c
> index d5185165a5..afc2bb1142 100644
> --- a/target/arm/advsimd_helper.c
> +++ b/target/arm/advsimd_helper.c
> @@ -24,6 +24,18 @@
> #include "tcg/tcg-gvec-desc.h"
>
>
> +/* Note that vector data is stored in host-endian 64-bit chunks,
> + so addressing units smaller than that needs a host-endian fixup. */
> +#ifdef HOST_WORDS_BIGENDIAN
> +#define H1(x) ((x) ^ 7)
> +#define H2(x) ((x) ^ 3)
> +#define H4(x) ((x) ^ 1)
> +#else
> +#define H1(x) (x)
> +#define H2(x) (x)
> +#define H4(x) (x)
> +#endif
> +
> #define SET_QC() env->vfp.xregs[ARM_VFP_FPSCR] |= CPSR_Q
>
> static void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz)
> @@ -181,3 +193,84 @@ void HELPER(gvec_qrdmlsh_s32)(void *vd, void *vn, void *vm,
> }
> clear_tail(d, opr_sz, simd_maxsz(desc));
> }
> +
> +void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm,
> + void *vfpst, uint32_t desc)
> +{
> + uintptr_t opr_sz = simd_oprsz(desc);
> + float16 *d = vd;
> + float16 *n = vn;
> + float16 *m = vm;
> + float_status *fpst = vfpst;
> + uint32_t neg_real = extract32(desc, SIMD_DATA_SHIFT, 1);
> + uint32_t neg_imag = neg_real ^ 1;
> + uintptr_t i;
> +
> + neg_real <<= 15;
> + neg_imag <<= 15;
This took me a little while to figure out, given that it's
somewhat different from the pseudocode. (I think I was also
thrown by the <<= 15 being done on its own line rather than
in the assignment.) A comment might help?
> +
> + for (i = 0; i < opr_sz / 2; i += 2) {
> + float16 e0 = n[H2(i)];
> + float16 e1 = m[H2(i + 1)] ^ neg_imag;
> + float16 e2 = n[H2(i + 1)];
> + float16 e3 = m[H2(i)] ^ neg_real;
> +
> + d[H2(i)] = float16_add(e0, e1, fpst);
> + d[H2(i + 1)] = float16_add(e2, e3, fpst);
> + }
> + clear_tail(d, opr_sz, simd_maxsz(desc));
> +}
> +
> +void HELPER(gvec_fcadds)(void *vd, void *vn, void *vm,
> + void *vfpst, uint32_t desc)
> +{
> + uintptr_t opr_sz = simd_oprsz(desc);
> + float32 *d = vd;
> + float32 *n = vn;
> + float32 *m = vm;
> + float_status *fpst = vfpst;
> + uint32_t neg_real = extract32(desc, SIMD_DATA_SHIFT, 1);
> + uint32_t neg_imag = neg_real ^ 1;
> + uintptr_t i;
> +
> + neg_real <<= 31;
> + neg_imag <<= 31;
> +
> + for (i = 0; i < opr_sz / 4; i += 2) {
> + float32 e0 = n[H4(i)];
> + float32 e1 = m[H4(i + 1)] ^ neg_imag;
> + float32 e2 = n[H4(i + 1)];
> + float32 e3 = m[H4(i)] ^ neg_real;
> +
> + d[H4(i)] = float32_add(e0, e1, fpst);
> + d[H4(i + 1)] = float32_add(e2, e3, fpst);
> + }
> + clear_tail(d, opr_sz, simd_maxsz(desc));
> +}
> +
> +void HELPER(gvec_fcaddd)(void *vd, void *vn, void *vm,
> + void *vfpst, uint32_t desc)
> +{
> + uintptr_t opr_sz = simd_oprsz(desc);
> + float64 *d = vd;
> + float64 *n = vn;
> + float64 *m = vm;
> + float_status *fpst = vfpst;
> + uint64_t neg_real = extract64(desc, SIMD_DATA_SHIFT, 1);
> + uint64_t neg_imag = neg_real ^ 1;
> + uintptr_t i;
> +
> + neg_real <<= 63;
> + neg_imag <<= 63;
> +
> + for (i = 0; i < opr_sz / 8; i += 2) {
> + float64 e0 = n[i];
> + float64 e1 = m[i + 1] ^ neg_imag;
> + float64 e2 = n[i + 1];
> + float64 e3 = m[i] ^ neg_real;
> +
> + d[i] = float64_add(e0, e1, fpst);
> + d[i + 1] = float64_add(e2, e3, fpst);
> + }
> + clear_tail(d, opr_sz, simd_maxsz(desc));
> +}
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 85fc7af491..89a0616894 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -10696,7 +10696,8 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
> int size = extract32(insn, 22, 2);
> bool u = extract32(insn, 29, 1);
> bool is_q = extract32(insn, 30, 1);
> - int feature;
> + int feature, data;
> + TCGv_ptr fpst;
>
> if (!u) {
> unallocated_encoding(s);
> @@ -10712,6 +10713,14 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
> }
> feature = ARM_FEATURE_V8_1_SIMD;
> break;
> + case 0xc: /* FCADD, #90 */
> + case 0xe: /* FCADD, #270 */
> + if (size == 0 || (size == 3 && !is_q)) {
> + unallocated_encoding(s);
> + return;
> + }
The pseudocode says you need to also check whether the
FP16 extension is supported, or else UNDEF on size == 1.
> + feature = ARM_FEATURE_V8_FCMA;
> + break;
> default:
> unallocated_encoding(s);
> return;
> @@ -10758,6 +10767,31 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
> 0, fn_gvec_ptr);
> break;
>
> + case 0xc: /* FCADD, #90 */
> + case 0xe: /* FCADD, #270 */
> + switch (size) {
> + case 1:
> + fn_gvec_ptr = gen_helper_gvec_fcaddh;
> + break;
> + case 2:
> + fn_gvec_ptr = gen_helper_gvec_fcadds;
> + break;
> + case 3:
> + fn_gvec_ptr = gen_helper_gvec_fcaddd;
> + break;
> + default:
> + g_assert_not_reached();
> + }
> + data = extract32(opcode, 1, 1);
The pseudocode calls this field "rot", which is a bit more
descriptive than "data".
> + fpst = get_fpstatus_ptr(size == 1);
> + tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
> + vec_full_reg_offset(s, rn),
> + vec_full_reg_offset(s, rm), fpst,
> + is_q ? 16 : 8, vec_full_reg_size(s),
> + data, fn_gvec_ptr);
> + tcg_temp_free_ptr(fpst);
> + break;
> +
> default:
> g_assert_not_reached();
> }
> --
> 2.14.3
Otherwise I think this looks OK.
thanks
-- PMM
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 09/11] target/arm: Decode aa64 armv8.3 fcmla
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 09/11] target/arm: Decode aa64 armv8.3 fcmla Richard Henderson
@ 2018-01-15 18:18 ` Peter Maydell
2018-01-26 7:29 ` Richard Henderson
0 siblings, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2018-01-15 18:18 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:24, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/helper.h | 11 ++++
> target/arm/advsimd_helper.c | 144 ++++++++++++++++++++++++++++++++++++++++++
> target/arm/translate-a64.c | 149 ++++++++++++++++++++++++++++++++------------
> 3 files changed, 265 insertions(+), 39 deletions(-)
>
> diff --git a/target/arm/helper.h b/target/arm/helper.h
> index 0f0fc942b0..5b6333347d 100644
> --- a/target/arm/helper.h
> +++ b/target/arm/helper.h
> @@ -574,6 +574,17 @@ DEF_HELPER_FLAGS_5(gvec_fcadds, TCG_CALL_NO_RWG,
> DEF_HELPER_FLAGS_5(gvec_fcaddd, TCG_CALL_NO_RWG,
> void, ptr, ptr, ptr, ptr, i32)
>
> +DEF_HELPER_FLAGS_5(gvec_fcmlah, TCG_CALL_NO_RWG,
> + void, ptr, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_5(gvec_fcmlah_idx, TCG_CALL_NO_RWG,
> + void, ptr, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_5(gvec_fcmlas, TCG_CALL_NO_RWG,
> + void, ptr, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_5(gvec_fcmlas_idx, TCG_CALL_NO_RWG,
> + void, ptr, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_5(gvec_fcmlad, TCG_CALL_NO_RWG,
> + void, ptr, ptr, ptr, ptr, i32)
> +
> #ifdef TARGET_AARCH64
> #include "helper-a64.h"
> #endif
> diff --git a/target/arm/advsimd_helper.c b/target/arm/advsimd_helper.c
> index afc2bb1142..6a2a53e111 100644
> --- a/target/arm/advsimd_helper.c
> +++ b/target/arm/advsimd_helper.c
> @@ -274,3 +274,147 @@ void HELPER(gvec_fcaddd)(void *vd, void *vn, void *vm,
> }
> clear_tail(d, opr_sz, simd_maxsz(desc));
> }
> +
> +void HELPER(gvec_fcmlah)(void *vd, void *vn, void *vm,
> + void *vfpst, uint32_t desc)
> +{
> + uintptr_t opr_sz = simd_oprsz(desc);
> + float16 *d = vd;
> + float16 *n = vn;
> + float16 *m = vm;
> + float_status *fpst = vfpst;
> + intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
> + uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
> + uint32_t neg_real = flip ^ neg_imag;
> + uintptr_t i;
> +
> + neg_real <<= 15;
> + neg_imag <<= 15;
> +
> + for (i = 0; i < opr_sz / 2; i += 2) {
> + float16 e0 = n[H2(i + flip)];
> + float16 e1 = m[H2(i + flip)] ^ neg_real;
> + float16 e2 = e0;
> + float16 e3 = m[H2(i + 1 - flip)] ^ neg_imag;
This is again rather confusing to compare against the pseudocode.
What order are your e0/e1/e2/e3 compared to the pseudocode's
element1/element2/element3/element4 ?
> +
> + d[H2(i)] = float16_muladd(e0, e1, d[H2(i)], 0, fpst);
> + d[H2(i + 1)] = float16_muladd(e2, e3, d[H2(i + 1)], 0, fpst);
> + }
> + clear_tail(d, opr_sz, simd_maxsz(desc));
> +}
> +
> +void HELPER(gvec_fcmlah_idx)(void *vd, void *vn, void *vm,
> + void *vfpst, uint32_t desc)
> +{
> + uintptr_t opr_sz = simd_oprsz(desc);
> + float16 *d = vd;
> + float16 *n = vn;
> + float16 *m = vm;
> + float_status *fpst = vfpst;
> + intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
> + uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
> + uint32_t neg_real = flip ^ neg_imag;
> + uintptr_t i;
> + float16 e1 = m[H2(flip)];
> + float16 e3 = m[H2(1 - flip)];
> +
> + neg_real <<= 15;
> + neg_imag <<= 15;
> + e1 ^= neg_real;
> + e3 ^= neg_imag;
> +
> + for (i = 0; i < opr_sz / 2; i += 2) {
> + float16 e0 = n[H2(i + flip)];
> + float16 e2 = e0;
> +
> + d[H2(i)] = float16_muladd(e0, e1, d[H2(i)], 0, fpst);
> + d[H2(i + 1)] = float16_muladd(e2, e3, d[H2(i + 1)], 0, fpst);
> + }
> + clear_tail(d, opr_sz, simd_maxsz(desc));
> +}
> +
> +void HELPER(gvec_fcmlas)(void *vd, void *vn, void *vm,
> + void *vfpst, uint32_t desc)
> +{
> + uintptr_t opr_sz = simd_oprsz(desc);
> + float32 *d = vd;
> + float32 *n = vn;
> + float32 *m = vm;
> + float_status *fpst = vfpst;
> + intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
> + uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
> + uint32_t neg_real = flip ^ neg_imag;
> + uintptr_t i;
> +
> + neg_real <<= 31;
> + neg_imag <<= 31;
> +
> + for (i = 0; i < opr_sz / 4; i += 2) {
> + float32 e0 = n[H4(i + flip)];
> + float32 e1 = m[H4(i + flip)] ^ neg_real;
> + float32 e2 = e0;
> + float32 e3 = m[H4(i + 1 - flip)] ^ neg_imag;
> +
> + d[H4(i)] = float32_muladd(e0, e1, d[H4(i)], 0, fpst);
> + d[H4(i + 1)] = float32_muladd(e2, e3, d[H4(i + 1)], 0, fpst);
> + }
> + clear_tail(d, opr_sz, simd_maxsz(desc));
> +}
> +
> +void HELPER(gvec_fcmlas_idx)(void *vd, void *vn, void *vm,
> + void *vfpst, uint32_t desc)
> +{
> + uintptr_t opr_sz = simd_oprsz(desc);
> + float32 *d = vd;
> + float32 *n = vn;
> + float32 *m = vm;
> + float_status *fpst = vfpst;
> + intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
> + uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
> + uint32_t neg_real = flip ^ neg_imag;
> + uintptr_t i;
> + float32 e1 = m[H4(flip)];
> + float32 e3 = m[H4(1 - flip)];
> +
> + neg_real <<= 31;
> + neg_imag <<= 31;
> + e1 ^= neg_real;
> + e3 ^= neg_imag;
> +
> + for (i = 0; i < opr_sz / 4; i += 2) {
> + float32 e0 = n[H4(i + flip)];
> + float32 e2 = e0;
> +
> + d[H4(i)] = float32_muladd(e0, e1, d[H4(i)], 0, fpst);
> + d[H4(i + 1)] = float32_muladd(e2, e3, d[H4(i + 1)], 0, fpst);
> + }
> + clear_tail(d, opr_sz, simd_maxsz(desc));
> +}
> +
> +void HELPER(gvec_fcmlad)(void *vd, void *vn, void *vm,
> + void *vfpst, uint32_t desc)
> +{
> + uintptr_t opr_sz = simd_oprsz(desc);
> + float64 *d = vd;
> + float64 *n = vn;
> + float64 *m = vm;
> + float_status *fpst = vfpst;
> + intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
> + uint64_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
> + uint64_t neg_real = flip ^ neg_imag;
> + uintptr_t i;
> +
> + neg_real <<= 63;
> + neg_imag <<= 63;
> +
> + for (i = 0; i < opr_sz / 8; i += 2) {
> + float64 e0 = n[i + flip];
> + float64 e1 = m[i + flip] ^ neg_real;
> + float64 e2 = e0;
> + float64 e3 = m[i + 1 - flip] ^ neg_imag;
> +
> + d[i] = float64_muladd(e0, e1, d[i], 0, fpst);
> + d[i + 1] = float64_muladd(e2, e3, d[i + 1], 0, fpst);
> + }
> + clear_tail(d, opr_sz, simd_maxsz(desc));
> +}
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 89a0616894..79fede35c1 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -10713,6 +10713,10 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
> }
> feature = ARM_FEATURE_V8_1_SIMD;
> break;
> + case 0x8: /* FCMLA, #0 */
> + case 0x9: /* FCMLA, #90 */
> + case 0xa: /* FCMLA, #180 */
> + case 0xb: /* FCMLA, #270 */
> case 0xc: /* FCADD, #90 */
> case 0xe: /* FCADD, #270 */
> if (size == 0 || (size == 3 && !is_q)) {
> @@ -10767,6 +10771,26 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
> 0, fn_gvec_ptr);
> break;
>
> + case 0x8: /* FCMLA, #0 */
> + case 0x9: /* FCMLA, #90 */
> + case 0xa: /* FCMLA, #180 */
> + case 0xb: /* FCMLA, #270 */
> + switch (size) {
> + case 1:
> + fn_gvec_ptr = gen_helper_gvec_fcmlah;
> + break;
> + case 2:
> + fn_gvec_ptr = gen_helper_gvec_fcmlas;
> + break;
> + case 3:
> + fn_gvec_ptr = gen_helper_gvec_fcmlad;
> + break;
> + default:
> + g_assert_not_reached();
> + }
> + data = extract32(opcode, 0, 2);
> + goto do_fpst;
These need the "size 0b01 is UNDEF unless FP16 extn present" check too.
> +
> case 0xc: /* FCADD, #90 */
> case 0xe: /* FCADD, #270 */
> switch (size) {
> @@ -10783,6 +10807,7 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
> g_assert_not_reached();
> }
> data = extract32(opcode, 1, 1);
> + do_fpst:
> fpst = get_fpstatus_ptr(size == 1);
> tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
> vec_full_reg_offset(s, rn),
> @@ -11864,80 +11889,80 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
> int rn = extract32(insn, 5, 5);
> int rd = extract32(insn, 0, 5);
> bool is_long = false;
> - bool is_fp = false;
> + int is_fp = 0;
> + bool is_fp16 = false;
> int index;
> TCGv_ptr fpst;
>
> - switch (opcode) {
> - case 0x0: /* MLA */
> - case 0x4: /* MLS */
> - if (!u || is_scalar) {
> + switch (16 * u + opcode) {
> + case 0x00: /* MLA */
> + case 0x04: /* MLS */
> + case 0x08: /* MUL */
> + if (is_scalar) {
> unallocated_encoding(s);
> return;
> }
This would all be easier to read if "refactor to switch on u:opcode"
was a separate patch from adding the new insns.
thanks
-- PMM
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 10/11] target/arm: Decode aa32 armv8.3 3-same
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 10/11] target/arm: Decode aa32 armv8.3 3-same Richard Henderson
@ 2018-01-15 18:46 ` Peter Maydell
2018-01-15 19:10 ` Richard Henderson
2018-01-15 18:49 ` Peter Maydell
1 sibling, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2018-01-15 18:46 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:24, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/translate.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 65 insertions(+)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index 1a0b0eaced..e57844c019 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -7662,6 +7662,65 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
> return 0;
> }
>
> +/* ARMv8.3 reclaims a portion of the LDC2/STC2 coprocessor 8 space. */
> +
> +static int disas_neon_insn_cp8_3same(DisasContext *s, uint32_t insn)
> +{
> + void (*fn_gvec_ptr)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
> + int rd, rn, rm, rot, size, opr_sz;
> + TCGv_ptr fpst;
> + bool q;
> +
> + /* FIXME: this access check should not take precedence over UNDEF
> + * for invalid encodings; we will generate incorrect syndrome information
> + * for attempts to execute invalid vfp/neon encodings with FP disabled.
> + */
> + if (s->fp_excp_el) {
> + gen_exception_insn(s, 4, EXCP_UDEF,
> + syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
> + return 0;
> + }
> + if (!s->vfp_enabled) {
> + return 1;
> + }
> + if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)) {
> + return 1;
> + }
> +
> + q = extract32(insn, 6, 1);
> + size = extract32(insn, 20, 1);
> + VFP_DREG_D(rd, insn);
> + VFP_DREG_N(rn, insn);
> + VFP_DREG_M(rm, insn);
> + if ((rd | rn | rm) & q) {
> + return 1;
> + }
> +
> + if (extract32(insn, 21, 1)) {
> + /* VCMLA */
> + rot = extract32(insn, 23, 2);
> + fn_gvec_ptr = size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
> + } else if (extract32(insn, 23, 1)) {
> + /* VCADD */
> + rot = extract32(insn, 24, 1);
> + fn_gvec_ptr = size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
> + } else {
> + /* Assuming the register fields remain, only bit 24 remains undecoded:
> + * 1111_110x_0d0s_nnnn_dddd_1000_nqm0_mmmm
> + */
> + return 1;
> + }
> +
> + opr_sz = (1 + q) * 8;
> + fpst = get_fpstatus_ptr(1);
> + tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd),
> + vfp_reg_offset(1, rn),
> + vfp_reg_offset(1, rm), fpst,
> + opr_sz, opr_sz, rot, fn_gvec_ptr);
> + tcg_temp_free_ptr(fpst);
> + return 0;
> +}
> +
> static int disas_coproc_insn(DisasContext *s, uint32_t insn)
> {
> int cpnum, is64, crn, crm, opc1, opc2, isread, rt, rt2;
> @@ -8406,6 +8465,12 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
> }
> }
> }
> + } else if ((insn & 0x0e000f10) == 0x0c000800) {
I think we should guard this with an ARM_FEATURE_V8 check, so
that for pre-v8 CPUs we fall back to the usual "treat it as a
coprocessor" codepath. (In theory it should work out the same
either way, but specifically limiting this to v8 makes it easier
to be sure that it's not changing the behaviour where it shouldn't.)
> + /* ARMv8.3 neon ldc2/stc2 coprocessor 8 extension. */
> + if (disas_neon_insn_cp8_3same(s, insn)) {
> + goto illegal_op;
> + }
This doesn't seem to line up with the Arm ARM decode. Your
pattern and mask give
op0 = 0x, op1 = 100, op2 = 0 and also bit 8 = 0.
The ARM has 3reg-same decoded with
op0 = 0x, op1 = 1x0, op2 = x
(and some insns in the 3reg-same group have bit 8 == 1, like
VSDOT and VUDOT.)
> + return;
> } else if ((insn & 0x0fe00000) == 0x0c400000) {
> /* Coprocessor double register transfer. */
> ARCH(5TE);
How are you proposing to do the Thumb decoding? Try to share
some of the 3same-vs-2reg+scalar decode part, or just have
them both do a similar kind of decode and call the
disas_neon_insn_cp8_3same/cp8_index functions?
thanks
-- PMM
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 10/11] target/arm: Decode aa32 armv8.3 3-same
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 10/11] target/arm: Decode aa32 armv8.3 3-same Richard Henderson
2018-01-15 18:46 ` Peter Maydell
@ 2018-01-15 18:49 ` Peter Maydell
1 sibling, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2018-01-15 18:49 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:24, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/translate.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 65 insertions(+)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index 1a0b0eaced..e57844c019 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -7662,6 +7662,65 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
> return 0;
> }
>
> +/* ARMv8.3 reclaims a portion of the LDC2/STC2 coprocessor 8 space. */
> +
> +static int disas_neon_insn_cp8_3same(DisasContext *s, uint32_t insn)
> +{
> + void (*fn_gvec_ptr)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
> + int rd, rn, rm, rot, size, opr_sz;
> + TCGv_ptr fpst;
> + bool q;
> +
> + /* FIXME: this access check should not take precedence over UNDEF
> + * for invalid encodings; we will generate incorrect syndrome information
> + * for attempts to execute invalid vfp/neon encodings with FP disabled.
> + */
(Forgot this bit before hitting send on the other email...)
Unlike the sprawling disas_vfp_insn(), we're in a position to get the
order of checks right here. Just move it and the vfp_enabled test a
bit further down...
> + if (s->fp_excp_el) {
> + gen_exception_insn(s, 4, EXCP_UDEF,
> + syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
> + return 0;
> + }
> + if (!s->vfp_enabled) {
> + return 1;
> + }
> + if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)) {
> + return 1;
> + }
> +
> + q = extract32(insn, 6, 1);
> + size = extract32(insn, 20, 1);
> + VFP_DREG_D(rd, insn);
> + VFP_DREG_N(rn, insn);
> + VFP_DREG_M(rm, insn);
> + if ((rd | rn | rm) & q) {
> + return 1;
> + }
> +
> + if (extract32(insn, 21, 1)) {
> + /* VCMLA */
> + rot = extract32(insn, 23, 2);
> + fn_gvec_ptr = size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
> + } else if (extract32(insn, 23, 1)) {
> + /* VCADD */
> + rot = extract32(insn, 24, 1);
> + fn_gvec_ptr = size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
> + } else {
> + /* Assuming the register fields remain, only bit 24 remains undecoded:
> + * 1111_110x_0d0s_nnnn_dddd_1000_nqm0_mmmm
> + */
> + return 1;
> + }
...to here.
> +
> + opr_sz = (1 + q) * 8;
> + fpst = get_fpstatus_ptr(1);
> + tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd),
> + vfp_reg_offset(1, rn),
> + vfp_reg_offset(1, rm), fpst,
> + opr_sz, opr_sz, rot, fn_gvec_ptr);
> + tcg_temp_free_ptr(fpst);
> + return 0;
> +}
> +
thanks
-- PMM
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 11/11] target/arm: Decode aa32 armv8.3 2-reg-index
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 11/11] target/arm: Decode aa32 armv8.3 2-reg-index Richard Henderson
@ 2018-01-15 18:51 ` Peter Maydell
0 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2018-01-15 18:51 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:24, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/translate.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 51 insertions(+)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index e57844c019..490e120b0b 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -7721,6 +7721,51 @@ static int disas_neon_insn_cp8_3same(DisasContext *s, uint32_t insn)
> return 0;
> }
>
> +/* ARMv8.3 reclaims a portion of the CDP2 coprocessor 8 space. */
> +
> +static int disas_neon_insn_cp8_index(DisasContext *s, uint32_t insn)
> +{
> + int rd, rn, rm, rot, size, opr_sz;
> + TCGv_ptr fpst;
> + bool q;
> +
> + /* FIXME: this access check should not take precedence over UNDEF
> + * for invalid encodings; we will generate incorrect syndrome information
> + * for attempts to execute invalid vfp/neon encodings with FP disabled.
> + */
Again, we can push this test down to where it belongs in this new code.
> + if (s->fp_excp_el) {
> + gen_exception_insn(s, 4, EXCP_UDEF,
> + syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
> + return 0;
> + }
> + if (!s->vfp_enabled || !arm_dc_feature(s, ARM_FEATURE_V8_FCMA)) {
> + return 1;
> + }
> +
> + q = extract32(insn, 6, 1);
> + size = extract32(insn, 23, 1);
> +
> + VFP_DREG_D(rd, insn);
> + VFP_DREG_N(rn, insn);
> + VFP_DREG_M(rm, insn);
> + if ((rd | rn) & q) {
> + return 1;
> + }
> +
> + /* This entire space is VCMLA (indexed). */
> + rot = extract32(insn, 20, 2);
> + opr_sz = (1 + q) * 8;
> + fpst = get_fpstatus_ptr(1);
> + tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd),
> + vfp_reg_offset(1, rn),
> + vfp_reg_offset(1, rm), fpst,
> + opr_sz, opr_sz, rot,
> + size ? gen_helper_gvec_fcmlas_idx
> + : gen_helper_gvec_fcmlah_idx);
> + tcg_temp_free_ptr(fpst);
> + return 0;
> +}
> +
> static int disas_coproc_insn(DisasContext *s, uint32_t insn)
> {
> int cpnum, is64, crn, crm, opc1, opc2, isread, rt, rt2;
> @@ -8471,6 +8516,12 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
> goto illegal_op;
> }
> return;
> + } else if ((insn & 0x0f000f10) == 0x0e000800) {
> + /* ARMv8.3 neon cdp2 coprocessor 8 extension. */
> + if (disas_neon_insn_cp8_index(s, insn)) {
> + goto illegal_op;
> + }
> + return;
> } else if ((insn & 0x0fe00000) == 0x0c400000) {
> /* Coprocessor double register transfer. */
> ARCH(5TE);
Similar remarks as for patch 10 about decode, and how we're going
to deal with thumb.
thanks
-- PMM
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 10/11] target/arm: Decode aa32 armv8.3 3-same
2018-01-15 18:46 ` Peter Maydell
@ 2018-01-15 19:10 ` Richard Henderson
0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-01-15 19:10 UTC (permalink / raw)
To: Peter Maydell; +Cc: QEMU Developers, qemu-arm
On 01/15/2018 10:46 AM, Peter Maydell wrote:
> This doesn't seem to line up with the Arm ARM decode. Your
> pattern and mask give
> op0 = 0x, op1 = 100, op2 = 0 and also bit 8 = 0.
> The ARM has 3reg-same decoded with
> op0 = 0x, op1 = 1x0, op2 = x
>
> (and some insns in the 3reg-same group have bit 8 == 1, like
> VSDOT and VUDOT.)
Ah, more v8.2 instructions that I wasn't even looking at...
> How are you proposing to do the Thumb decoding? Try to share
> some of the 3same-vs-2reg+scalar decode part, or just have
> them both do a similar kind of decode and call the
> disas_neon_insn_cp8_3same/cp8_index functions?
Hmm. I thought this was working via the "translate into the equivalent ARM
encoding" path. But it couldn't possibly be doing so, since
disas_neon_insn_cp8_3same is not a subroutine of disas_neon_data_insn.
I guess I'll have to verify that RISU is testing what I thought it was...
r~
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 06/11] target/arm: Decode aa32 armv8.1 two reg and a scalar
2018-01-15 17:47 ` Peter Maydell
@ 2018-01-26 7:18 ` Richard Henderson
2018-01-26 10:05 ` Peter Maydell
0 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2018-01-26 7:18 UTC (permalink / raw)
To: Peter Maydell; +Cc: QEMU Developers, qemu-arm
On 01/15/2018 09:47 AM, Peter Maydell wrote:
> On 18 December 2017 at 17:24, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>> target/arm/translate.c | 38 +++++++++++++++++++++++++++++++++++---
>> 1 file changed, 35 insertions(+), 3 deletions(-)
>>
>> diff --git a/target/arm/translate.c b/target/arm/translate.c
>> index a9587ae242..1a0b0eaced 100644
>> --- a/target/arm/translate.c
>> +++ b/target/arm/translate.c
>> @@ -6973,11 +6973,43 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
>> }
>> neon_store_reg64(cpu_V0, rd + pass);
>> }
>> + break;
>> + case 14: /* VQRDMLAH scalar */
>> + case 15: /* VQRDMLSH scalar */
>> + if (!arm_dc_feature(s, ARM_FEATURE_V8_1_SIMD)) {
>> + return 1;
>> + }
>> + if (u && ((rd | rn) & 1)) {
>> + return 1;
>> + }
>
> The pseudocode also has UNDEF if Q==1 && Vm<0> == 1 ....
Not for the indexed version, encoding A2.
>
>> + tmp2 = neon_get_scalar(size, rm);
>> + for (pass = 0; pass < (u ? 4 : 2); pass++) {
>> + void (*fn)(TCGv_i32, TCGv_env, TCGv_i32,
>> + TCGv_i32, TCGv_i32);
>
> Can we define a typedef for this, please ?
What would you name it?
r~
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 09/11] target/arm: Decode aa64 armv8.3 fcmla
2018-01-15 18:18 ` Peter Maydell
@ 2018-01-26 7:29 ` Richard Henderson
2018-01-26 10:07 ` Peter Maydell
0 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2018-01-26 7:29 UTC (permalink / raw)
To: Peter Maydell; +Cc: QEMU Developers, qemu-arm
On 01/15/2018 10:18 AM, Peter Maydell wrote:
>> +void HELPER(gvec_fcmlah)(void *vd, void *vn, void *vm,
>> + void *vfpst, uint32_t desc)
>> +{
>> + uintptr_t opr_sz = simd_oprsz(desc);
>> + float16 *d = vd;
>> + float16 *n = vn;
>> + float16 *m = vm;
>> + float_status *fpst = vfpst;
>> + intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
>> + uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
>> + uint32_t neg_real = flip ^ neg_imag;
>> + uintptr_t i;
>> +
>> + neg_real <<= 15;
>> + neg_imag <<= 15;
>> +
>> + for (i = 0; i < opr_sz / 2; i += 2) {
>> + float16 e0 = n[H2(i + flip)];
>> + float16 e1 = m[H2(i + flip)] ^ neg_real;
>> + float16 e2 = e0;
>> + float16 e3 = m[H2(i + 1 - flip)] ^ neg_imag;
>
> This is again rather confusing to compare against the pseudocode.
> What order are your e0/e1/e2/e3 compared to the pseudocode's
> element1/element2/element3/element4 ?
The SVE pseudocode for the same operation is clearer than that in the main ARM
ARM, and is nearer to what I used:
for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
pair = e - (e MOD 2); // index of first element in pair
addend = Elem[result, e, esize];
if IsEven(e) then // real part
// realD = realA [+-] flip ? (imagN * imagM) : (realN * realM)
element1 = Elem[operand1, pair + flip, esize];
element2 = Elem[operand2, pair + flip, esize];
if neg_real then element2 = FPNeg(element2);
else // imaginary part
// imagD = imagA [+-] flip ? (imagN * realM) : (realN * imagM)
element1 = Elem[operand1, pair + flip, esize];
element2 = Elem[operand2, pair + (1 - flip), esize];
if neg_imag then element2 = FPNeg(element2);
Elem[result, e, esize] = FPMulAdd(addend, element1, element2, FPCR);
In my version, e0/e1 are element1/element2 (real) and e2/e3 are
element1/element2 (imag).
r~
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 06/11] target/arm: Decode aa32 armv8.1 two reg and a scalar
2018-01-26 7:18 ` Richard Henderson
@ 2018-01-26 10:05 ` Peter Maydell
0 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2018-01-26 10:05 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 26 January 2018 at 07:18, Richard Henderson
<richard.henderson@linaro.org> wrote:
> On 01/15/2018 09:47 AM, Peter Maydell wrote:
>> On 18 December 2017 at 17:24, Richard Henderson
>> <richard.henderson@linaro.org> wrote:
>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>> ---
>>> target/arm/translate.c | 38 +++++++++++++++++++++++++++++++++++---
>>> 1 file changed, 35 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/target/arm/translate.c b/target/arm/translate.c
>>> index a9587ae242..1a0b0eaced 100644
>>> --- a/target/arm/translate.c
>>> +++ b/target/arm/translate.c
>>> @@ -6973,11 +6973,43 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
>>> }
>>> neon_store_reg64(cpu_V0, rd + pass);
>>> }
>>> + break;
>>> + case 14: /* VQRDMLAH scalar */
>>> + case 15: /* VQRDMLSH scalar */
>>> + if (!arm_dc_feature(s, ARM_FEATURE_V8_1_SIMD)) {
>>> + return 1;
>>> + }
>>> + if (u && ((rd | rn) & 1)) {
>>> + return 1;
>>> + }
>>
>> The pseudocode also has UNDEF if Q==1 && Vm<0> == 1 ....
>
> Not for the indexed version, encoding A2.
Ah, yes.
>>
>>> + tmp2 = neon_get_scalar(size, rm);
>>> + for (pass = 0; pass < (u ? 4 : 2); pass++) {
>>> + void (*fn)(TCGv_i32, TCGv_env, TCGv_i32,
>>> + TCGv_i32, TCGv_i32);
>>
>> Can we define a typedef for this, please ?
>
> What would you name it?
NeonGenThreeOpEnvFn would fit the naming scheme we've got
in translate-a64.c.
thanks
-- PMM
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 09/11] target/arm: Decode aa64 armv8.3 fcmla
2018-01-26 7:29 ` Richard Henderson
@ 2018-01-26 10:07 ` Peter Maydell
2018-01-26 19:03 ` Richard Henderson
0 siblings, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2018-01-26 10:07 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 26 January 2018 at 07:29, Richard Henderson
<richard.henderson@linaro.org> wrote:
> On 01/15/2018 10:18 AM, Peter Maydell wrote:
>>> +void HELPER(gvec_fcmlah)(void *vd, void *vn, void *vm,
>>> + void *vfpst, uint32_t desc)
>>> +{
>>> + uintptr_t opr_sz = simd_oprsz(desc);
>>> + float16 *d = vd;
>>> + float16 *n = vn;
>>> + float16 *m = vm;
>>> + float_status *fpst = vfpst;
>>> + intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
>>> + uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
>>> + uint32_t neg_real = flip ^ neg_imag;
>>> + uintptr_t i;
>>> +
>>> + neg_real <<= 15;
>>> + neg_imag <<= 15;
>>> +
>>> + for (i = 0; i < opr_sz / 2; i += 2) {
>>> + float16 e0 = n[H2(i + flip)];
>>> + float16 e1 = m[H2(i + flip)] ^ neg_real;
>>> + float16 e2 = e0;
>>> + float16 e3 = m[H2(i + 1 - flip)] ^ neg_imag;
>>
>> This is again rather confusing to compare against the pseudocode.
>> What order are your e0/e1/e2/e3 compared to the pseudocode's
>> element1/element2/element3/element4 ?
>
> The SVE pseudocode for the same operation is clearer than that in the main ARM
> ARM, and is nearer to what I used:
>
> for e = 0 to elements-1
> if ElemP[mask, e, esize] == '1' then
> pair = e - (e MOD 2); // index of first element in pair
> addend = Elem[result, e, esize];
> if IsEven(e) then // real part
> // realD = realA [+-] flip ? (imagN * imagM) : (realN * realM)
> element1 = Elem[operand1, pair + flip, esize];
> element2 = Elem[operand2, pair + flip, esize];
> if neg_real then element2 = FPNeg(element2);
> else // imaginary part
> // imagD = imagA [+-] flip ? (imagN * realM) : (realN * imagM)
> element1 = Elem[operand1, pair + flip, esize];
> element2 = Elem[operand2, pair + (1 - flip), esize];
> if neg_imag then element2 = FPNeg(element2);
> Elem[result, e, esize] = FPMulAdd(addend, element1, element2, FPCR);
>
> In my version, e0/e1 are element1/element2 (real) and e2/e3 are
> element1/element2 (imag).
Thanks. Could we use the same indexing (1/2/3/4) as the final Arm ARM
pseudocode?
thanks
-- PMM
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [Qemu-arm] [PATCH v2 06/11] target/arm: Decode aa32 armv8.1 two reg and a scalar
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 06/11] target/arm: Decode aa32 armv8.1 two reg and a scalar Richard Henderson
2018-01-15 17:47 ` Peter Maydell
@ 2018-01-26 13:41 ` Philippe Mathieu-Daudé
1 sibling, 0 replies; 30+ messages in thread
From: Philippe Mathieu-Daudé @ 2018-01-26 13:41 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, peter.maydell, qemu-arm
On 12/18/2017 02:24 PM, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> ---
> target/arm/translate.c | 38 +++++++++++++++++++++++++++++++++++---
> 1 file changed, 35 insertions(+), 3 deletions(-)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index a9587ae242..1a0b0eaced 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -6973,11 +6973,43 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
> }
> neon_store_reg64(cpu_V0, rd + pass);
> }
> + break;
> + case 14: /* VQRDMLAH scalar */
> + case 15: /* VQRDMLSH scalar */
> + if (!arm_dc_feature(s, ARM_FEATURE_V8_1_SIMD)) {
> + return 1;
> + }
> + if (u && ((rd | rn) & 1)) {
> + return 1;
> + }
> + tmp2 = neon_get_scalar(size, rm);
> + for (pass = 0; pass < (u ? 4 : 2); pass++) {
> + void (*fn)(TCGv_i32, TCGv_env, TCGv_i32,
> + TCGv_i32, TCGv_i32);
>
> -
> + tmp = neon_load_reg(rn, pass);
> + tmp3 = neon_load_reg(rd, pass);
> + if (op == 14) {
> + if (size == 1) {
> + fn = gen_helper_neon_qrdmlah_s16;
> + } else {
> + fn = gen_helper_neon_qrdmlah_s32;
> + }
> + } else {
> + if (size == 1) {
> + fn = gen_helper_neon_qrdmlsh_s16;
> + } else {
> + fn = gen_helper_neon_qrdmlsh_s32;
> + }
> + }
> + fn(tmp, cpu_env, tmp, tmp2, tmp3);
> + tcg_temp_free_i32(tmp3);
> + neon_store_reg(rd, pass, tmp);
> + }
> + tcg_temp_free_i32(tmp2);
> break;
> - default: /* 14 and 15 are RESERVED */
> - return 1;
> + default:
> + g_assert_not_reached();
> }
> }
> } else { /* size == 3 */
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH v2 09/11] target/arm: Decode aa64 armv8.3 fcmla
2018-01-26 10:07 ` Peter Maydell
@ 2018-01-26 19:03 ` Richard Henderson
0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2018-01-26 19:03 UTC (permalink / raw)
To: Peter Maydell; +Cc: QEMU Developers, qemu-arm
On 01/26/2018 02:07 AM, Peter Maydell wrote:
>> The SVE pseudocode for the same operation is clearer than that in the main ARM
>> ARM, and is nearer to what I used:
>>
>> for e = 0 to elements-1
>> if ElemP[mask, e, esize] == '1' then
>> pair = e - (e MOD 2); // index of first element in pair
>> addend = Elem[result, e, esize];
>> if IsEven(e) then // real part
>> // realD = realA [+-] flip ? (imagN * imagM) : (realN * realM)
>> element1 = Elem[operand1, pair + flip, esize];
>> element2 = Elem[operand2, pair + flip, esize];
>> if neg_real then element2 = FPNeg(element2);
>> else // imaginary part
>> // imagD = imagA [+-] flip ? (imagN * realM) : (realN * imagM)
>> element1 = Elem[operand1, pair + flip, esize];
>> element2 = Elem[operand2, pair + (1 - flip), esize];
>> if neg_imag then element2 = FPNeg(element2);
>> Elem[result, e, esize] = FPMulAdd(addend, element1, element2, FPCR);
>>
>> In my version, e0/e1 are element1/element2 (real) and e2/e3 are
>> element1/element2 (imag).
>
> Thanks. Could we use the same indexing (1/2/3/4) as the final Arm ARM
> pseudocode?
Done.
r~
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2018-01-26 19:33 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-18 17:24 [Qemu-devel] [PATCH v2 00/11] ARM v8.1 simd + v8.3 complex insns Richard Henderson
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 01/11] target/arm: Add ARM_FEATURE_V8_1_SIMD Richard Henderson
2018-01-15 17:21 ` Peter Maydell
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 02/11] target/arm: Decode aa64 armv8.1 scalar three same extra Richard Henderson
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 03/11] target/arm: Decode aa64 armv8.1 " Richard Henderson
2018-01-15 17:21 ` Peter Maydell
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 04/11] target/arm: Decode aa64 armv8.1 scalar/vector x indexed element Richard Henderson
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 05/11] target/arm: Decode aa32 armv8.1 three same Richard Henderson
2018-01-15 17:37 ` Peter Maydell
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 06/11] target/arm: Decode aa32 armv8.1 two reg and a scalar Richard Henderson
2018-01-15 17:47 ` Peter Maydell
2018-01-26 7:18 ` Richard Henderson
2018-01-26 10:05 ` Peter Maydell
2018-01-26 13:41 ` [Qemu-devel] [Qemu-arm] " Philippe Mathieu-Daudé
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 07/11] target/arm: Add ARM_FEATURE_V8_FCMA Richard Henderson
2018-01-15 17:53 ` Peter Maydell
2018-01-15 18:03 ` Richard Henderson
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 08/11] target/arm: Decode aa64 armv8.3 fcadd Richard Henderson
2018-01-15 18:11 ` Peter Maydell
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 09/11] target/arm: Decode aa64 armv8.3 fcmla Richard Henderson
2018-01-15 18:18 ` Peter Maydell
2018-01-26 7:29 ` Richard Henderson
2018-01-26 10:07 ` Peter Maydell
2018-01-26 19:03 ` Richard Henderson
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 10/11] target/arm: Decode aa32 armv8.3 3-same Richard Henderson
2018-01-15 18:46 ` Peter Maydell
2018-01-15 19:10 ` Richard Henderson
2018-01-15 18:49 ` Peter Maydell
2017-12-18 17:24 ` [Qemu-devel] [PATCH v2 11/11] target/arm: Decode aa32 armv8.3 2-reg-index Richard Henderson
2018-01-15 18:51 ` Peter Maydell
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.