* [PULL 095/114] target/arm: Implement SVE2 FCVTLT
@ 2021-05-25 15:07 Peter Maydell
2021-05-25 15:07 ` [PULL 096/114] target/arm: Implement SVE2 FCVTXNT, FCVTX Peter Maydell
` (18 more replies)
0 siblings, 19 replies; 20+ messages in thread
From: Peter Maydell @ 2021-05-25 15:07 UTC (permalink / raw)
To: qemu-devel
From: Stephen Long <steplong@quicinc.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stephen Long <steplong@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525010358.152808-74-richard.henderson@linaro.org
Message-Id: <20200428174332.17162-3-steplong@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/helper-sve.h | 5 +++++
target/arm/sve.decode | 2 ++
target/arm/sve_helper.c | 23 +++++++++++++++++++++++
target/arm/translate-sve.c | 16 ++++++++++++++++
4 files changed, 46 insertions(+)
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 7aa365d5659..be4b17f1c2e 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -2749,3 +2749,8 @@ DEF_HELPER_FLAGS_5(sve2_fcvtnt_sh, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_5(sve2_fcvtnt_ds, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sve2_fcvtlt_hs, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve2_fcvtlt_sd, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 94cdc6ff15a..1be35154708 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -1583,4 +1583,6 @@ RAX1 01000101 00 1 ..... 11110 1 ..... ..... @rd_rn_rm_e0
### SVE2 floating-point convert precision odd elements
FCVTNT_sh 01100100 10 0010 00 101 ... ..... ..... @rd_pg_rn_e0
+FCVTLT_hs 01100100 10 0010 01 101 ... ..... ..... @rd_pg_rn_e0
FCVTNT_ds 01100100 11 0010 10 101 ... ..... ..... @rd_pg_rn_e0
+FCVTLT_sd 01100100 11 0010 11 101 ... ..... ..... @rd_pg_rn_e0
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index d44bcfa44aa..88823935156 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -7622,3 +7622,26 @@ void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc) \
DO_FCVTNT(sve2_fcvtnt_sh, uint32_t, uint16_t, H1_4, H1_2, sve_f32_to_f16)
DO_FCVTNT(sve2_fcvtnt_ds, uint64_t, uint32_t, , H1_4, float64_to_float32)
+
+#define DO_FCVTLT(NAME, TYPEW, TYPEN, HW, HN, OP) \
+void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc) \
+{ \
+ intptr_t i = simd_oprsz(desc); \
+ uint64_t *g = vg; \
+ do { \
+ uint64_t pg = g[(i - 1) >> 6]; \
+ do { \
+ i -= sizeof(TYPEW); \
+ if (likely((pg >> (i & 63)) & 1)) { \
+ TYPEN nn = *(TYPEN *)(vn + HN(i + sizeof(TYPEN))); \
+ *(TYPEW *)(vd + HW(i)) = OP(nn, status); \
+ } \
+ } while (i & 63); \
+ } while (i != 0); \
+}
+
+DO_FCVTLT(sve2_fcvtlt_hs, uint32_t, uint16_t, H1_4, H1_2, sve_f16_to_f32)
+DO_FCVTLT(sve2_fcvtlt_sd, uint64_t, uint32_t, , H1_4, float32_to_float64)
+
+#undef DO_FCVTLT
+#undef DO_FCVTNT
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 700b02814c4..7490094d172 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -8262,3 +8262,19 @@ static bool trans_FCVTNT_ds(DisasContext *s, arg_rpr_esz *a)
}
return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve2_fcvtnt_ds);
}
+
+static bool trans_FCVTLT_hs(DisasContext *s, arg_rpr_esz *a)
+{
+ if (!dc_isar_feature(aa64_sve2, s)) {
+ return false;
+ }
+ return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve2_fcvtlt_hs);
+}
+
+static bool trans_FCVTLT_sd(DisasContext *s, arg_rpr_esz *a)
+{
+ if (!dc_isar_feature(aa64_sve2, s)) {
+ return false;
+ }
+ return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve2_fcvtlt_sd);
+}
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PULL 096/114] target/arm: Implement SVE2 FCVTXNT, FCVTX
2021-05-25 15:07 [PULL 095/114] target/arm: Implement SVE2 FCVTLT Peter Maydell
@ 2021-05-25 15:07 ` Peter Maydell
2021-05-25 15:07 ` [PULL 097/114] target/arm: Implement SVE2 FLOGB Peter Maydell
` (17 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-05-25 15:07 UTC (permalink / raw)
To: qemu-devel
From: Stephen Long <steplong@quicinc.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stephen Long <steplong@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525010358.152808-75-richard.henderson@linaro.org
Message-Id: <20200428174332.17162-4-steplong@quicinc.com>
[rth: Use do_frint_mode, which avoids a specific runtime helper.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/sve.decode | 2 ++
target/arm/translate-sve.c | 49 ++++++++++++++++++++++++++++++--------
2 files changed, 41 insertions(+), 10 deletions(-)
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 1be35154708..5dcc79759e0 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -1582,6 +1582,8 @@ SM4EKEY 01000101 00 1 ..... 11110 0 ..... ..... @rd_rn_rm_e0
RAX1 01000101 00 1 ..... 11110 1 ..... ..... @rd_rn_rm_e0
### SVE2 floating-point convert precision odd elements
+FCVTXNT_ds 01100100 00 0010 10 101 ... ..... ..... @rd_pg_rn_e0
+FCVTX_ds 01100101 00 0010 10 101 ... ..... ..... @rd_pg_rn_e0
FCVTNT_sh 01100100 10 0010 00 101 ... ..... ..... @rd_pg_rn_e0
FCVTLT_hs 01100100 10 0010 01 101 ... ..... ..... @rd_pg_rn_e0
FCVTNT_ds 01100100 11 0010 10 101 ... ..... ..... @rd_pg_rn_e0
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 7490094d172..0a2718c4810 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -4777,11 +4777,9 @@ static bool trans_FRINTX(DisasContext *s, arg_rpr_esz *a)
return do_zpz_ptr(s, a->rd, a->rn, a->pg, a->esz == MO_16, fns[a->esz - 1]);
}
-static bool do_frint_mode(DisasContext *s, arg_rpr_esz *a, int mode)
+static bool do_frint_mode(DisasContext *s, arg_rpr_esz *a,
+ int mode, gen_helper_gvec_3_ptr *fn)
{
- if (a->esz == 0) {
- return false;
- }
if (sve_access_check(s)) {
unsigned vsz = vec_full_reg_size(s);
TCGv_i32 tmode = tcg_const_i32(mode);
@@ -4792,7 +4790,7 @@ static bool do_frint_mode(DisasContext *s, arg_rpr_esz *a, int mode)
tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, a->rd),
vec_full_reg_offset(s, a->rn),
pred_full_reg_offset(s, a->pg),
- status, vsz, vsz, 0, frint_fns[a->esz - 1]);
+ status, vsz, vsz, 0, fn);
gen_helper_set_rmode(tmode, tmode, status);
tcg_temp_free_i32(tmode);
@@ -4803,27 +4801,42 @@ static bool do_frint_mode(DisasContext *s, arg_rpr_esz *a, int mode)
static bool trans_FRINTN(DisasContext *s, arg_rpr_esz *a)
{
- return do_frint_mode(s, a, float_round_nearest_even);
+ if (a->esz == 0) {
+ return false;
+ }
+ return do_frint_mode(s, a, float_round_nearest_even, frint_fns[a->esz - 1]);
}
static bool trans_FRINTP(DisasContext *s, arg_rpr_esz *a)
{
- return do_frint_mode(s, a, float_round_up);
+ if (a->esz == 0) {
+ return false;
+ }
+ return do_frint_mode(s, a, float_round_up, frint_fns[a->esz - 1]);
}
static bool trans_FRINTM(DisasContext *s, arg_rpr_esz *a)
{
- return do_frint_mode(s, a, float_round_down);
+ if (a->esz == 0) {
+ return false;
+ }
+ return do_frint_mode(s, a, float_round_down, frint_fns[a->esz - 1]);
}
static bool trans_FRINTZ(DisasContext *s, arg_rpr_esz *a)
{
- return do_frint_mode(s, a, float_round_to_zero);
+ if (a->esz == 0) {
+ return false;
+ }
+ return do_frint_mode(s, a, float_round_to_zero, frint_fns[a->esz - 1]);
}
static bool trans_FRINTA(DisasContext *s, arg_rpr_esz *a)
{
- return do_frint_mode(s, a, float_round_ties_away);
+ if (a->esz == 0) {
+ return false;
+ }
+ return do_frint_mode(s, a, float_round_ties_away, frint_fns[a->esz - 1]);
}
static bool trans_FRECPX(DisasContext *s, arg_rpr_esz *a)
@@ -8278,3 +8291,19 @@ static bool trans_FCVTLT_sd(DisasContext *s, arg_rpr_esz *a)
}
return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve2_fcvtlt_sd);
}
+
+static bool trans_FCVTX_ds(DisasContext *s, arg_rpr_esz *a)
+{
+ if (!dc_isar_feature(aa64_sve2, s)) {
+ return false;
+ }
+ return do_frint_mode(s, a, float_round_to_odd, gen_helper_sve_fcvt_ds);
+}
+
+static bool trans_FCVTXNT_ds(DisasContext *s, arg_rpr_esz *a)
+{
+ if (!dc_isar_feature(aa64_sve2, s)) {
+ return false;
+ }
+ return do_frint_mode(s, a, float_round_to_odd, gen_helper_sve2_fcvtnt_ds);
+}
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PULL 097/114] target/arm: Implement SVE2 FLOGB
2021-05-25 15:07 [PULL 095/114] target/arm: Implement SVE2 FCVTLT Peter Maydell
2021-05-25 15:07 ` [PULL 096/114] target/arm: Implement SVE2 FCVTXNT, FCVTX Peter Maydell
@ 2021-05-25 15:07 ` Peter Maydell
2021-05-25 15:07 ` [PULL 098/114] target/arm: Share table of sve load functions Peter Maydell
` (16 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-05-25 15:07 UTC (permalink / raw)
To: qemu-devel
From: Stephen Long <steplong@quicinc.com>
Signed-off-by: Stephen Long <steplong@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525010358.152808-76-richard.henderson@linaro.org
Message-Id: <20200430191405.21641-1-steplong@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/helper-sve.h | 4 ++
target/arm/sve.decode | 3 ++
target/arm/sve_helper.c | 88 ++++++++++++++++++++++++++++++++++++++
target/arm/translate-sve.c | 24 +++++++++++
4 files changed, 119 insertions(+)
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index be4b17f1c2e..342bb837214 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -2754,3 +2754,7 @@ DEF_HELPER_FLAGS_5(sve2_fcvtlt_hs, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_5(sve2_fcvtlt_sd, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(flogb_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(flogb_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(flogb_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 5dcc79759e0..5a1cceccb60 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -1588,3 +1588,6 @@ FCVTNT_sh 01100100 10 0010 00 101 ... ..... ..... @rd_pg_rn_e0
FCVTLT_hs 01100100 10 0010 01 101 ... ..... ..... @rd_pg_rn_e0
FCVTNT_ds 01100100 11 0010 10 101 ... ..... ..... @rd_pg_rn_e0
FCVTLT_sd 01100100 11 0010 11 101 ... ..... ..... @rd_pg_rn_e0
+
+### SVE2 floating-point convert to integer
+FLOGB 01100101 00 011 esz:2 0101 pg:3 rn:5 rd:5 &rpr_esz
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 88823935156..a0518549849 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4729,6 +4729,94 @@ DO_ZPZ_FP(sve_ucvt_dh, uint64_t, , uint64_to_float16)
DO_ZPZ_FP(sve_ucvt_ds, uint64_t, , uint64_to_float32)
DO_ZPZ_FP(sve_ucvt_dd, uint64_t, , uint64_to_float64)
+static int16_t do_float16_logb_as_int(float16 a, float_status *s)
+{
+ /* Extract frac to the top of the uint32_t. */
+ uint32_t frac = (uint32_t)a << (16 + 6);
+ int16_t exp = extract32(a, 10, 5);
+
+ if (unlikely(exp == 0)) {
+ if (frac != 0) {
+ if (!get_flush_inputs_to_zero(s)) {
+ /* denormal: bias - fractional_zeros */
+ return -15 - clz32(frac);
+ }
+ /* flush to zero */
+ float_raise(float_flag_input_denormal, s);
+ }
+ } else if (unlikely(exp == 0x1f)) {
+ if (frac == 0) {
+ return INT16_MAX; /* infinity */
+ }
+ } else {
+ /* normal: exp - bias */
+ return exp - 15;
+ }
+ /* nan or zero */
+ float_raise(float_flag_invalid, s);
+ return INT16_MIN;
+}
+
+static int32_t do_float32_logb_as_int(float32 a, float_status *s)
+{
+ /* Extract frac to the top of the uint32_t. */
+ uint32_t frac = a << 9;
+ int32_t exp = extract32(a, 23, 8);
+
+ if (unlikely(exp == 0)) {
+ if (frac != 0) {
+ if (!get_flush_inputs_to_zero(s)) {
+ /* denormal: bias - fractional_zeros */
+ return -127 - clz32(frac);
+ }
+ /* flush to zero */
+ float_raise(float_flag_input_denormal, s);
+ }
+ } else if (unlikely(exp == 0xff)) {
+ if (frac == 0) {
+ return INT32_MAX; /* infinity */
+ }
+ } else {
+ /* normal: exp - bias */
+ return exp - 127;
+ }
+ /* nan or zero */
+ float_raise(float_flag_invalid, s);
+ return INT32_MIN;
+}
+
+static int64_t do_float64_logb_as_int(float64 a, float_status *s)
+{
+ /* Extract frac to the top of the uint64_t. */
+ uint64_t frac = a << 12;
+ int64_t exp = extract64(a, 52, 11);
+
+ if (unlikely(exp == 0)) {
+ if (frac != 0) {
+ if (!get_flush_inputs_to_zero(s)) {
+ /* denormal: bias - fractional_zeros */
+ return -1023 - clz64(frac);
+ }
+ /* flush to zero */
+ float_raise(float_flag_input_denormal, s);
+ }
+ } else if (unlikely(exp == 0x7ff)) {
+ if (frac == 0) {
+ return INT64_MAX; /* infinity */
+ }
+ } else {
+ /* normal: exp - bias */
+ return exp - 1023;
+ }
+ /* nan or zero */
+ float_raise(float_flag_invalid, s);
+ return INT64_MIN;
+}
+
+DO_ZPZ_FP(flogb_h, float16, H1_2, do_float16_logb_as_int)
+DO_ZPZ_FP(flogb_s, float32, H1_4, do_float32_logb_as_int)
+DO_ZPZ_FP(flogb_d, float64, , do_float64_logb_as_int)
+
#undef DO_ZPZ_FP
static void do_fmla_zpzzz_h(void *vd, void *vn, void *vm, void *va, void *vg,
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 0a2718c4810..3ea51a73d36 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -8307,3 +8307,27 @@ static bool trans_FCVTXNT_ds(DisasContext *s, arg_rpr_esz *a)
}
return do_frint_mode(s, a, float_round_to_odd, gen_helper_sve2_fcvtnt_ds);
}
+
+static bool trans_FLOGB(DisasContext *s, arg_rpr_esz *a)
+{
+ static gen_helper_gvec_3_ptr * const fns[] = {
+ NULL, gen_helper_flogb_h,
+ gen_helper_flogb_s, gen_helper_flogb_d
+ };
+
+ if (!dc_isar_feature(aa64_sve2, s) || fns[a->esz] == NULL) {
+ return false;
+ }
+ if (sve_access_check(s)) {
+ TCGv_ptr status =
+ fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
+ unsigned vsz = vec_full_reg_size(s);
+
+ tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, a->rd),
+ vec_full_reg_offset(s, a->rn),
+ pred_full_reg_offset(s, a->pg),
+ status, vsz, vsz, 0, fns[a->esz]);
+ tcg_temp_free_ptr(status);
+ }
+ return true;
+}
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PULL 098/114] target/arm: Share table of sve load functions
2021-05-25 15:07 [PULL 095/114] target/arm: Implement SVE2 FCVTLT Peter Maydell
2021-05-25 15:07 ` [PULL 096/114] target/arm: Implement SVE2 FCVTXNT, FCVTX Peter Maydell
2021-05-25 15:07 ` [PULL 097/114] target/arm: Implement SVE2 FLOGB Peter Maydell
@ 2021-05-25 15:07 ` Peter Maydell
2021-05-25 15:07 ` [PULL 099/114] target/arm: Tidy do_ldrq Peter Maydell
` (15 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-05-25 15:07 UTC (permalink / raw)
To: qemu-devel
From: Richard Henderson <richard.henderson@linaro.org>
The table used by do_ldrq is a subset of the table used by do_ld_zpa;
we can share them by passing dtype instead of msz to do_ldrq.
The lack of MTE handling in do_ldrq was a bug, fixed by this change.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525010358.152808-77-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/translate-sve.c | 254 ++++++++++++++++++-------------------
1 file changed, 126 insertions(+), 128 deletions(-)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 3ea51a73d36..54c50349aba 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -5215,128 +5215,130 @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
tcg_temp_free_i32(t_desc);
}
+/* Indexed by [mte][be][dtype][nreg] */
+static gen_helper_gvec_mem * const ldr_fns[2][2][16][4] = {
+ { /* mte inactive, little-endian */
+ { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r,
+ gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r },
+ { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL },
+
+ { gen_helper_sve_ld1sds_le_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1hh_le_r, gen_helper_sve_ld2hh_le_r,
+ gen_helper_sve_ld3hh_le_r, gen_helper_sve_ld4hh_le_r },
+ { gen_helper_sve_ld1hsu_le_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1hdu_le_r, NULL, NULL, NULL },
+
+ { gen_helper_sve_ld1hds_le_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1hss_le_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1ss_le_r, gen_helper_sve_ld2ss_le_r,
+ gen_helper_sve_ld3ss_le_r, gen_helper_sve_ld4ss_le_r },
+ { gen_helper_sve_ld1sdu_le_r, NULL, NULL, NULL },
+
+ { gen_helper_sve_ld1bds_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bss_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1dd_le_r, gen_helper_sve_ld2dd_le_r,
+ gen_helper_sve_ld3dd_le_r, gen_helper_sve_ld4dd_le_r } },
+
+ /* mte inactive, big-endian */
+ { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r,
+ gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r },
+ { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL },
+
+ { gen_helper_sve_ld1sds_be_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1hh_be_r, gen_helper_sve_ld2hh_be_r,
+ gen_helper_sve_ld3hh_be_r, gen_helper_sve_ld4hh_be_r },
+ { gen_helper_sve_ld1hsu_be_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1hdu_be_r, NULL, NULL, NULL },
+
+ { gen_helper_sve_ld1hds_be_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1hss_be_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1ss_be_r, gen_helper_sve_ld2ss_be_r,
+ gen_helper_sve_ld3ss_be_r, gen_helper_sve_ld4ss_be_r },
+ { gen_helper_sve_ld1sdu_be_r, NULL, NULL, NULL },
+
+ { gen_helper_sve_ld1bds_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bss_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1dd_be_r, gen_helper_sve_ld2dd_be_r,
+ gen_helper_sve_ld3dd_be_r, gen_helper_sve_ld4dd_be_r } } },
+
+ { /* mte active, little-endian */
+ { { gen_helper_sve_ld1bb_r_mte,
+ gen_helper_sve_ld2bb_r_mte,
+ gen_helper_sve_ld3bb_r_mte,
+ gen_helper_sve_ld4bb_r_mte },
+ { gen_helper_sve_ld1bhu_r_mte, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bsu_r_mte, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bdu_r_mte, NULL, NULL, NULL },
+
+ { gen_helper_sve_ld1sds_le_r_mte, NULL, NULL, NULL },
+ { gen_helper_sve_ld1hh_le_r_mte,
+ gen_helper_sve_ld2hh_le_r_mte,
+ gen_helper_sve_ld3hh_le_r_mte,
+ gen_helper_sve_ld4hh_le_r_mte },
+ { gen_helper_sve_ld1hsu_le_r_mte, NULL, NULL, NULL },
+ { gen_helper_sve_ld1hdu_le_r_mte, NULL, NULL, NULL },
+
+ { gen_helper_sve_ld1hds_le_r_mte, NULL, NULL, NULL },
+ { gen_helper_sve_ld1hss_le_r_mte, NULL, NULL, NULL },
+ { gen_helper_sve_ld1ss_le_r_mte,
+ gen_helper_sve_ld2ss_le_r_mte,
+ gen_helper_sve_ld3ss_le_r_mte,
+ gen_helper_sve_ld4ss_le_r_mte },
+ { gen_helper_sve_ld1sdu_le_r_mte, NULL, NULL, NULL },
+
+ { gen_helper_sve_ld1bds_r_mte, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bss_r_mte, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bhs_r_mte, NULL, NULL, NULL },
+ { gen_helper_sve_ld1dd_le_r_mte,
+ gen_helper_sve_ld2dd_le_r_mte,
+ gen_helper_sve_ld3dd_le_r_mte,
+ gen_helper_sve_ld4dd_le_r_mte } },
+
+ /* mte active, big-endian */
+ { { gen_helper_sve_ld1bb_r_mte,
+ gen_helper_sve_ld2bb_r_mte,
+ gen_helper_sve_ld3bb_r_mte,
+ gen_helper_sve_ld4bb_r_mte },
+ { gen_helper_sve_ld1bhu_r_mte, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bsu_r_mte, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bdu_r_mte, NULL, NULL, NULL },
+
+ { gen_helper_sve_ld1sds_be_r_mte, NULL, NULL, NULL },
+ { gen_helper_sve_ld1hh_be_r_mte,
+ gen_helper_sve_ld2hh_be_r_mte,
+ gen_helper_sve_ld3hh_be_r_mte,
+ gen_helper_sve_ld4hh_be_r_mte },
+ { gen_helper_sve_ld1hsu_be_r_mte, NULL, NULL, NULL },
+ { gen_helper_sve_ld1hdu_be_r_mte, NULL, NULL, NULL },
+
+ { gen_helper_sve_ld1hds_be_r_mte, NULL, NULL, NULL },
+ { gen_helper_sve_ld1hss_be_r_mte, NULL, NULL, NULL },
+ { gen_helper_sve_ld1ss_be_r_mte,
+ gen_helper_sve_ld2ss_be_r_mte,
+ gen_helper_sve_ld3ss_be_r_mte,
+ gen_helper_sve_ld4ss_be_r_mte },
+ { gen_helper_sve_ld1sdu_be_r_mte, NULL, NULL, NULL },
+
+ { gen_helper_sve_ld1bds_r_mte, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bss_r_mte, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bhs_r_mte, NULL, NULL, NULL },
+ { gen_helper_sve_ld1dd_be_r_mte,
+ gen_helper_sve_ld2dd_be_r_mte,
+ gen_helper_sve_ld3dd_be_r_mte,
+ gen_helper_sve_ld4dd_be_r_mte } } },
+};
+
static void do_ld_zpa(DisasContext *s, int zt, int pg,
TCGv_i64 addr, int dtype, int nreg)
{
- static gen_helper_gvec_mem * const fns[2][2][16][4] = {
- { /* mte inactive, little-endian */
- { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r,
- gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r },
- { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL },
-
- { gen_helper_sve_ld1sds_le_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1hh_le_r, gen_helper_sve_ld2hh_le_r,
- gen_helper_sve_ld3hh_le_r, gen_helper_sve_ld4hh_le_r },
- { gen_helper_sve_ld1hsu_le_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1hdu_le_r, NULL, NULL, NULL },
-
- { gen_helper_sve_ld1hds_le_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1hss_le_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1ss_le_r, gen_helper_sve_ld2ss_le_r,
- gen_helper_sve_ld3ss_le_r, gen_helper_sve_ld4ss_le_r },
- { gen_helper_sve_ld1sdu_le_r, NULL, NULL, NULL },
-
- { gen_helper_sve_ld1bds_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1bss_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1dd_le_r, gen_helper_sve_ld2dd_le_r,
- gen_helper_sve_ld3dd_le_r, gen_helper_sve_ld4dd_le_r } },
-
- /* mte inactive, big-endian */
- { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r,
- gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r },
- { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL },
-
- { gen_helper_sve_ld1sds_be_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1hh_be_r, gen_helper_sve_ld2hh_be_r,
- gen_helper_sve_ld3hh_be_r, gen_helper_sve_ld4hh_be_r },
- { gen_helper_sve_ld1hsu_be_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1hdu_be_r, NULL, NULL, NULL },
-
- { gen_helper_sve_ld1hds_be_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1hss_be_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1ss_be_r, gen_helper_sve_ld2ss_be_r,
- gen_helper_sve_ld3ss_be_r, gen_helper_sve_ld4ss_be_r },
- { gen_helper_sve_ld1sdu_be_r, NULL, NULL, NULL },
-
- { gen_helper_sve_ld1bds_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1bss_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1dd_be_r, gen_helper_sve_ld2dd_be_r,
- gen_helper_sve_ld3dd_be_r, gen_helper_sve_ld4dd_be_r } } },
-
- { /* mte active, little-endian */
- { { gen_helper_sve_ld1bb_r_mte,
- gen_helper_sve_ld2bb_r_mte,
- gen_helper_sve_ld3bb_r_mte,
- gen_helper_sve_ld4bb_r_mte },
- { gen_helper_sve_ld1bhu_r_mte, NULL, NULL, NULL },
- { gen_helper_sve_ld1bsu_r_mte, NULL, NULL, NULL },
- { gen_helper_sve_ld1bdu_r_mte, NULL, NULL, NULL },
-
- { gen_helper_sve_ld1sds_le_r_mte, NULL, NULL, NULL },
- { gen_helper_sve_ld1hh_le_r_mte,
- gen_helper_sve_ld2hh_le_r_mte,
- gen_helper_sve_ld3hh_le_r_mte,
- gen_helper_sve_ld4hh_le_r_mte },
- { gen_helper_sve_ld1hsu_le_r_mte, NULL, NULL, NULL },
- { gen_helper_sve_ld1hdu_le_r_mte, NULL, NULL, NULL },
-
- { gen_helper_sve_ld1hds_le_r_mte, NULL, NULL, NULL },
- { gen_helper_sve_ld1hss_le_r_mte, NULL, NULL, NULL },
- { gen_helper_sve_ld1ss_le_r_mte,
- gen_helper_sve_ld2ss_le_r_mte,
- gen_helper_sve_ld3ss_le_r_mte,
- gen_helper_sve_ld4ss_le_r_mte },
- { gen_helper_sve_ld1sdu_le_r_mte, NULL, NULL, NULL },
-
- { gen_helper_sve_ld1bds_r_mte, NULL, NULL, NULL },
- { gen_helper_sve_ld1bss_r_mte, NULL, NULL, NULL },
- { gen_helper_sve_ld1bhs_r_mte, NULL, NULL, NULL },
- { gen_helper_sve_ld1dd_le_r_mte,
- gen_helper_sve_ld2dd_le_r_mte,
- gen_helper_sve_ld3dd_le_r_mte,
- gen_helper_sve_ld4dd_le_r_mte } },
-
- /* mte active, big-endian */
- { { gen_helper_sve_ld1bb_r_mte,
- gen_helper_sve_ld2bb_r_mte,
- gen_helper_sve_ld3bb_r_mte,
- gen_helper_sve_ld4bb_r_mte },
- { gen_helper_sve_ld1bhu_r_mte, NULL, NULL, NULL },
- { gen_helper_sve_ld1bsu_r_mte, NULL, NULL, NULL },
- { gen_helper_sve_ld1bdu_r_mte, NULL, NULL, NULL },
-
- { gen_helper_sve_ld1sds_be_r_mte, NULL, NULL, NULL },
- { gen_helper_sve_ld1hh_be_r_mte,
- gen_helper_sve_ld2hh_be_r_mte,
- gen_helper_sve_ld3hh_be_r_mte,
- gen_helper_sve_ld4hh_be_r_mte },
- { gen_helper_sve_ld1hsu_be_r_mte, NULL, NULL, NULL },
- { gen_helper_sve_ld1hdu_be_r_mte, NULL, NULL, NULL },
-
- { gen_helper_sve_ld1hds_be_r_mte, NULL, NULL, NULL },
- { gen_helper_sve_ld1hss_be_r_mte, NULL, NULL, NULL },
- { gen_helper_sve_ld1ss_be_r_mte,
- gen_helper_sve_ld2ss_be_r_mte,
- gen_helper_sve_ld3ss_be_r_mte,
- gen_helper_sve_ld4ss_be_r_mte },
- { gen_helper_sve_ld1sdu_be_r_mte, NULL, NULL, NULL },
-
- { gen_helper_sve_ld1bds_r_mte, NULL, NULL, NULL },
- { gen_helper_sve_ld1bss_r_mte, NULL, NULL, NULL },
- { gen_helper_sve_ld1bhs_r_mte, NULL, NULL, NULL },
- { gen_helper_sve_ld1dd_be_r_mte,
- gen_helper_sve_ld2dd_be_r_mte,
- gen_helper_sve_ld3dd_be_r_mte,
- gen_helper_sve_ld4dd_be_r_mte } } },
- };
gen_helper_gvec_mem *fn
- = fns[s->mte_active[0]][s->be_data == MO_BE][dtype][nreg];
+ = ldr_fns[s->mte_active[0]][s->be_data == MO_BE][dtype][nreg];
/*
* While there are holes in the table, they are not
@@ -5574,14 +5576,8 @@ static bool trans_LDNF1_zpri(DisasContext *s, arg_rpri_load *a)
return true;
}
-static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
+static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
{
- static gen_helper_gvec_mem * const fns[2][4] = {
- { gen_helper_sve_ld1bb_r, gen_helper_sve_ld1hh_le_r,
- gen_helper_sve_ld1ss_le_r, gen_helper_sve_ld1dd_le_r },
- { gen_helper_sve_ld1bb_r, gen_helper_sve_ld1hh_be_r,
- gen_helper_sve_ld1ss_be_r, gen_helper_sve_ld1dd_be_r },
- };
unsigned vsz = vec_full_reg_size(s);
TCGv_ptr t_pg;
TCGv_i32 t_desc;
@@ -5613,7 +5609,9 @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
t_pg = tcg_temp_new_ptr();
tcg_gen_addi_ptr(t_pg, cpu_env, poff);
- fns[s->be_data == MO_BE][msz](cpu_env, t_pg, addr, t_desc);
+ gen_helper_gvec_mem *fn
+ = ldr_fns[s->mte_active[0]][s->be_data == MO_BE][dtype][0];
+ fn(cpu_env, t_pg, addr, t_desc);
tcg_temp_free_ptr(t_pg);
tcg_temp_free_i32(t_desc);
@@ -5635,7 +5633,7 @@ static bool trans_LD1RQ_zprr(DisasContext *s, arg_rprr_load *a)
TCGv_i64 addr = new_tmp_a64(s);
tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), msz);
tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
- do_ldrq(s, a->rd, a->pg, addr, msz);
+ do_ldrq(s, a->rd, a->pg, addr, a->dtype);
}
return true;
}
@@ -5645,7 +5643,7 @@ static bool trans_LD1RQ_zpri(DisasContext *s, arg_rpri_load *a)
if (sve_access_check(s)) {
TCGv_i64 addr = new_tmp_a64(s);
tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), a->imm * 16);
- do_ldrq(s, a->rd, a->pg, addr, dtype_msz(a->dtype));
+ do_ldrq(s, a->rd, a->pg, addr, a->dtype);
}
return true;
}
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PULL 099/114] target/arm: Tidy do_ldrq
2021-05-25 15:07 [PULL 095/114] target/arm: Implement SVE2 FCVTLT Peter Maydell
` (2 preceding siblings ...)
2021-05-25 15:07 ` [PULL 098/114] target/arm: Share table of sve load functions Peter Maydell
@ 2021-05-25 15:07 ` Peter Maydell
2021-05-25 15:07 ` [PULL 100/114] target/arm: Implement SVE2 LD1RO Peter Maydell
` (14 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-05-25 15:07 UTC (permalink / raw)
To: qemu-devel
From: Richard Henderson <richard.henderson@linaro.org>
Use tcg_constant_i32 for passing the simd descriptor,
as this hashed value does not need to be freed.
Rename dofs to doff to match poff.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525010358.152808-78-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/translate-sve.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 54c50349aba..a213450583b 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -5580,13 +5580,9 @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
{
unsigned vsz = vec_full_reg_size(s);
TCGv_ptr t_pg;
- TCGv_i32 t_desc;
- int desc, poff;
+ int poff;
/* Load the first quadword using the normal predicated load helpers. */
- desc = simd_desc(16, 16, zt);
- t_desc = tcg_const_i32(desc);
-
poff = pred_full_reg_offset(s, pg);
if (vsz > 16) {
/*
@@ -5611,15 +5607,14 @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
gen_helper_gvec_mem *fn
= ldr_fns[s->mte_active[0]][s->be_data == MO_BE][dtype][0];
- fn(cpu_env, t_pg, addr, t_desc);
+ fn(cpu_env, t_pg, addr, tcg_constant_i32(simd_desc(16, 16, zt)));
tcg_temp_free_ptr(t_pg);
- tcg_temp_free_i32(t_desc);
/* Replicate that first quadword. */
if (vsz > 16) {
- unsigned dofs = vec_full_reg_offset(s, zt);
- tcg_gen_gvec_dup_mem(4, dofs + 16, dofs, vsz - 16, vsz - 16);
+ int doff = vec_full_reg_offset(s, zt);
+ tcg_gen_gvec_dup_mem(4, doff + 16, doff, vsz - 16, vsz - 16);
}
}
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PULL 100/114] target/arm: Implement SVE2 LD1RO
2021-05-25 15:07 [PULL 095/114] target/arm: Implement SVE2 FCVTLT Peter Maydell
` (3 preceding siblings ...)
2021-05-25 15:07 ` [PULL 099/114] target/arm: Tidy do_ldrq Peter Maydell
@ 2021-05-25 15:07 ` Peter Maydell
2021-05-25 15:07 ` [PULL 101/114] target/arm: Implement 128-bit ZIP, UZP, TRN Peter Maydell
` (13 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-05-25 15:07 UTC (permalink / raw)
To: qemu-devel
From: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525010358.152808-79-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/sve.decode | 4 ++
target/arm/translate-sve.c | 93 ++++++++++++++++++++++++++++++++++++++
2 files changed, 97 insertions(+)
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 5a1cceccb60..884c5358eb1 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -1126,11 +1126,15 @@ LD_zpri 1010010 .. nreg:2 0.... 111 ... ..... ..... @rpri_load_msz
# SVE load and broadcast quadword (scalar plus scalar)
LD1RQ_zprr 1010010 .. 00 ..... 000 ... ..... ..... \
@rprr_load_msz nreg=0
+LD1RO_zprr 1010010 .. 01 ..... 000 ... ..... ..... \
+ @rprr_load_msz nreg=0
# SVE load and broadcast quadword (scalar plus immediate)
# LD1RQB, LD1RQH, LD1RQS, LD1RQD
LD1RQ_zpri 1010010 .. 00 0.... 001 ... ..... ..... \
@rpri_load_msz nreg=0
+LD1RO_zpri 1010010 .. 01 0.... 001 ... ..... ..... \
+ @rpri_load_msz nreg=0
# SVE 32-bit gather prefetch (scalar plus 32-bit scaled offsets)
PRF 1000010 00 -1 ----- 0-- --- ----- 0 ----
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index a213450583b..1dcdbac0af0 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -5643,6 +5643,99 @@ static bool trans_LD1RQ_zpri(DisasContext *s, arg_rpri_load *a)
return true;
}
+static void do_ldro(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
+{
+ unsigned vsz = vec_full_reg_size(s);
+ unsigned vsz_r32;
+ TCGv_ptr t_pg;
+ int poff, doff;
+
+ if (vsz < 32) {
+ /*
+ * Note that this UNDEFINED check comes after CheckSVEEnabled()
+ * in the ARM pseudocode, which is the sve_access_check() done
+ * in our caller. We should not now return false from the caller.
+ */
+ unallocated_encoding(s);
+ return;
+ }
+
+ /* Load the first octaword using the normal predicated load helpers. */
+
+ poff = pred_full_reg_offset(s, pg);
+ if (vsz > 32) {
+ /*
+ * Zero-extend the first 32 bits of the predicate into a temporary.
+ * This avoids triggering an assert making sure we don't have bits
+ * set within a predicate beyond VQ, but we have lowered VQ to 2
+ * for this load operation.
+ */
+ TCGv_i64 tmp = tcg_temp_new_i64();
+#ifdef HOST_WORDS_BIGENDIAN
+ poff += 4;
+#endif
+ tcg_gen_ld32u_i64(tmp, cpu_env, poff);
+
+ poff = offsetof(CPUARMState, vfp.preg_tmp);
+ tcg_gen_st_i64(tmp, cpu_env, poff);
+ tcg_temp_free_i64(tmp);
+ }
+
+ t_pg = tcg_temp_new_ptr();
+ tcg_gen_addi_ptr(t_pg, cpu_env, poff);
+
+ gen_helper_gvec_mem *fn
+ = ldr_fns[s->mte_active[0]][s->be_data == MO_BE][dtype][0];
+ fn(cpu_env, t_pg, addr, tcg_constant_i32(simd_desc(32, 32, zt)));
+
+ tcg_temp_free_ptr(t_pg);
+
+ /*
+ * Replicate that first octaword.
+ * The replication happens in units of 32; if the full vector size
+ * is not a multiple of 32, the final bits are zeroed.
+ */
+ doff = vec_full_reg_offset(s, zt);
+ vsz_r32 = QEMU_ALIGN_DOWN(vsz, 32);
+ if (vsz >= 64) {
+ tcg_gen_gvec_dup_mem(5, doff + 32, doff, vsz_r32 - 32, vsz_r32 - 32);
+ }
+ vsz -= vsz_r32;
+ if (vsz) {
+ tcg_gen_gvec_dup_imm(MO_64, doff + vsz_r32, vsz, vsz, 0);
+ }
+}
+
+static bool trans_LD1RO_zprr(DisasContext *s, arg_rprr_load *a)
+{
+ if (!dc_isar_feature(aa64_sve_f64mm, s)) {
+ return false;
+ }
+ if (a->rm == 31) {
+ return false;
+ }
+ if (sve_access_check(s)) {
+ TCGv_i64 addr = new_tmp_a64(s);
+ tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype));
+ tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
+ do_ldro(s, a->rd, a->pg, addr, a->dtype);
+ }
+ return true;
+}
+
+static bool trans_LD1RO_zpri(DisasContext *s, arg_rpri_load *a)
+{
+ if (!dc_isar_feature(aa64_sve_f64mm, s)) {
+ return false;
+ }
+ if (sve_access_check(s)) {
+ TCGv_i64 addr = new_tmp_a64(s);
+ tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), a->imm * 32);
+ do_ldro(s, a->rd, a->pg, addr, a->dtype);
+ }
+ return true;
+}
+
/* Load and broadcast element. */
static bool trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a)
{
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PULL 101/114] target/arm: Implement 128-bit ZIP, UZP, TRN
2021-05-25 15:07 [PULL 095/114] target/arm: Implement SVE2 FCVTLT Peter Maydell
` (4 preceding siblings ...)
2021-05-25 15:07 ` [PULL 100/114] target/arm: Implement SVE2 LD1RO Peter Maydell
@ 2021-05-25 15:07 ` Peter Maydell
2021-05-25 15:07 ` [PULL 102/114] target/arm: Implement SVE2 bitwise shift immediate Peter Maydell
` (12 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-05-25 15:07 UTC (permalink / raw)
To: qemu-devel
From: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525010358.152808-80-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/helper-sve.h | 3 ++
target/arm/sve.decode | 8 ++++++
target/arm/sve_helper.c | 29 +++++++++++++------
target/arm/translate-sve.c | 58 ++++++++++++++++++++++++++++++++++++++
4 files changed, 90 insertions(+), 8 deletions(-)
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 342bb837214..b43ffce23ac 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -689,16 +689,19 @@ DEF_HELPER_FLAGS_4(sve_zip_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(sve_zip_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(sve_zip_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(sve_zip_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve2_zip_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(sve_uzp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(sve_uzp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(sve_uzp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(sve_uzp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve2_uzp_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(sve_trn_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(sve_trn_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(sve_trn_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(sve_trn_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve2_trn_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(sve_compact_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(sve_compact_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 884c5358eb1..5469ce04143 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -590,6 +590,14 @@ UZP2_z 00000101 .. 1 ..... 011 011 ..... ..... @rd_rn_rm
TRN1_z 00000101 .. 1 ..... 011 100 ..... ..... @rd_rn_rm
TRN2_z 00000101 .. 1 ..... 011 101 ..... ..... @rd_rn_rm
+# SVE2 permute vector segments
+ZIP1_q 00000101 10 1 ..... 000 000 ..... ..... @rd_rn_rm_e0
+ZIP2_q 00000101 10 1 ..... 000 001 ..... ..... @rd_rn_rm_e0
+UZP1_q 00000101 10 1 ..... 000 010 ..... ..... @rd_rn_rm_e0
+UZP2_q 00000101 10 1 ..... 000 011 ..... ..... @rd_rn_rm_e0
+TRN1_q 00000101 10 1 ..... 000 110 ..... ..... @rd_rn_rm_e0
+TRN2_q 00000101 10 1 ..... 000 111 ..... ..... @rd_rn_rm_e0
+
### SVE Permute - Predicated Group
# SVE compress active elements
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index a0518549849..d088b1f74ce 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -3492,36 +3492,45 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \
*(TYPE *)(vd + H(2 * i + 0)) = *(TYPE *)(vn + H(i)); \
*(TYPE *)(vd + H(2 * i + sizeof(TYPE))) = *(TYPE *)(vm + H(i)); \
} \
+ if (sizeof(TYPE) == 16 && unlikely(oprsz & 16)) { \
+ memset(vd + oprsz - 16, 0, 16); \
+ } \
}
DO_ZIP(sve_zip_b, uint8_t, H1)
DO_ZIP(sve_zip_h, uint16_t, H1_2)
DO_ZIP(sve_zip_s, uint32_t, H1_4)
DO_ZIP(sve_zip_d, uint64_t, )
+DO_ZIP(sve2_zip_q, Int128, )
#define DO_UZP(NAME, TYPE, H) \
void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \
{ \
intptr_t oprsz = simd_oprsz(desc); \
- intptr_t oprsz_2 = oprsz / 2; \
intptr_t odd_ofs = simd_data(desc); \
- intptr_t i; \
+ intptr_t i, p; \
ARMVectorReg tmp_m; \
if (unlikely((vm - vd) < (uintptr_t)oprsz)) { \
vm = memcpy(&tmp_m, vm, oprsz); \
} \
- for (i = 0; i < oprsz_2; i += sizeof(TYPE)) { \
- *(TYPE *)(vd + H(i)) = *(TYPE *)(vn + H(2 * i + odd_ofs)); \
- } \
- for (i = 0; i < oprsz_2; i += sizeof(TYPE)) { \
- *(TYPE *)(vd + H(oprsz_2 + i)) = *(TYPE *)(vm + H(2 * i + odd_ofs)); \
- } \
+ i = 0, p = odd_ofs; \
+ do { \
+ *(TYPE *)(vd + H(i)) = *(TYPE *)(vn + H(p)); \
+ i += sizeof(TYPE), p += 2 * sizeof(TYPE); \
+ } while (p < oprsz); \
+ p -= oprsz; \
+ do { \
+ *(TYPE *)(vd + H(i)) = *(TYPE *)(vm + H(p)); \
+ i += sizeof(TYPE), p += 2 * sizeof(TYPE); \
+ } while (p < oprsz); \
+ tcg_debug_assert(i == oprsz); \
}
DO_UZP(sve_uzp_b, uint8_t, H1)
DO_UZP(sve_uzp_h, uint16_t, H1_2)
DO_UZP(sve_uzp_s, uint32_t, H1_4)
DO_UZP(sve_uzp_d, uint64_t, )
+DO_UZP(sve2_uzp_q, Int128, )
#define DO_TRN(NAME, TYPE, H) \
void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \
@@ -3535,12 +3544,16 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \
*(TYPE *)(vd + H(i + 0)) = ae; \
*(TYPE *)(vd + H(i + sizeof(TYPE))) = be; \
} \
+ if (sizeof(TYPE) == 16 && unlikely(oprsz & 16)) { \
+ memset(vd + oprsz - 16, 0, 16); \
+ } \
}
DO_TRN(sve_trn_b, uint8_t, H1)
DO_TRN(sve_trn_h, uint16_t, H1_2)
DO_TRN(sve_trn_s, uint32_t, H1_4)
DO_TRN(sve_trn_d, uint64_t, )
+DO_TRN(sve2_trn_q, Int128, )
#undef DO_ZIP
#undef DO_UZP
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 1dcdbac0af0..b2aa9130b64 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -2624,6 +2624,32 @@ static bool trans_ZIP2_z(DisasContext *s, arg_rrr_esz *a)
return do_zip(s, a, true);
}
+static bool do_zip_q(DisasContext *s, arg_rrr_esz *a, bool high)
+{
+ if (!dc_isar_feature(aa64_sve_f64mm, s)) {
+ return false;
+ }
+ if (sve_access_check(s)) {
+ unsigned vsz = vec_full_reg_size(s);
+ unsigned high_ofs = high ? QEMU_ALIGN_DOWN(vsz, 32) / 2 : 0;
+ tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
+ vec_full_reg_offset(s, a->rn) + high_ofs,
+ vec_full_reg_offset(s, a->rm) + high_ofs,
+ vsz, vsz, 0, gen_helper_sve2_zip_q);
+ }
+ return true;
+}
+
+static bool trans_ZIP1_q(DisasContext *s, arg_rrr_esz *a)
+{
+ return do_zip_q(s, a, false);
+}
+
+static bool trans_ZIP2_q(DisasContext *s, arg_rrr_esz *a)
+{
+ return do_zip_q(s, a, true);
+}
+
static gen_helper_gvec_3 * const uzp_fns[4] = {
gen_helper_sve_uzp_b, gen_helper_sve_uzp_h,
gen_helper_sve_uzp_s, gen_helper_sve_uzp_d,
@@ -2639,6 +2665,22 @@ static bool trans_UZP2_z(DisasContext *s, arg_rrr_esz *a)
return do_zzz_data_ool(s, a, 1 << a->esz, uzp_fns[a->esz]);
}
+static bool trans_UZP1_q(DisasContext *s, arg_rrr_esz *a)
+{
+ if (!dc_isar_feature(aa64_sve_f64mm, s)) {
+ return false;
+ }
+ return do_zzz_data_ool(s, a, 0, gen_helper_sve2_uzp_q);
+}
+
+static bool trans_UZP2_q(DisasContext *s, arg_rrr_esz *a)
+{
+ if (!dc_isar_feature(aa64_sve_f64mm, s)) {
+ return false;
+ }
+ return do_zzz_data_ool(s, a, 16, gen_helper_sve2_uzp_q);
+}
+
static gen_helper_gvec_3 * const trn_fns[4] = {
gen_helper_sve_trn_b, gen_helper_sve_trn_h,
gen_helper_sve_trn_s, gen_helper_sve_trn_d,
@@ -2654,6 +2696,22 @@ static bool trans_TRN2_z(DisasContext *s, arg_rrr_esz *a)
return do_zzz_data_ool(s, a, 1 << a->esz, trn_fns[a->esz]);
}
+static bool trans_TRN1_q(DisasContext *s, arg_rrr_esz *a)
+{
+ if (!dc_isar_feature(aa64_sve_f64mm, s)) {
+ return false;
+ }
+ return do_zzz_data_ool(s, a, 0, gen_helper_sve2_trn_q);
+}
+
+static bool trans_TRN2_q(DisasContext *s, arg_rrr_esz *a)
+{
+ if (!dc_isar_feature(aa64_sve_f64mm, s)) {
+ return false;
+ }
+ return do_zzz_data_ool(s, a, 16, gen_helper_sve2_trn_q);
+}
+
/*
*** SVE Permute Vector - Predicated Group
*/
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PULL 102/114] target/arm: Implement SVE2 bitwise shift immediate
2021-05-25 15:07 [PULL 095/114] target/arm: Implement SVE2 FCVTLT Peter Maydell
` (5 preceding siblings ...)
2021-05-25 15:07 ` [PULL 101/114] target/arm: Implement 128-bit ZIP, UZP, TRN Peter Maydell
@ 2021-05-25 15:07 ` Peter Maydell
2021-05-25 15:07 ` [PULL 103/114] target/arm: Move endian adjustment macros to vec_internal.h Peter Maydell
` (11 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-05-25 15:07 UTC (permalink / raw)
To: qemu-devel
From: Stephen Long <steplong@quicinc.com>
Implements SQSHL/UQSHL, SRSHR/URSHR, and SQSHLU
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stephen Long <steplong@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525010358.152808-81-richard.henderson@linaro.org
Message-Id: <20200430194159.24064-1-steplong@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/helper-sve.h | 33 +++++++++++++++++++++
target/arm/sve.decode | 5 ++++
target/arm/sve_helper.c | 35 ++++++++++++++++++++++
target/arm/translate-sve.c | 60 ++++++++++++++++++++++++++++++++++++++
4 files changed, 133 insertions(+)
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index b43ffce23ac..29a14a21f50 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -2761,3 +2761,36 @@ DEF_HELPER_FLAGS_5(sve2_fcvtlt_sd, TCG_CALL_NO_RWG,
DEF_HELPER_FLAGS_5(flogb_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_5(flogb_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_5(flogb_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve2_sqshl_zpzi_b, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve2_sqshl_zpzi_h, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve2_sqshl_zpzi_s, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve2_sqshl_zpzi_d, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve2_uqshl_zpzi_b, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve2_uqshl_zpzi_h, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve2_uqshl_zpzi_s, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve2_uqshl_zpzi_d, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve2_srshr_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve2_srshr_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve2_srshr_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve2_srshr_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve2_urshr_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve2_urshr_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve2_urshr_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve2_urshr_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(sve2_sqshlu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve2_sqshlu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve2_sqshlu_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sve2_sqshlu_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 5469ce04143..ea98508cddc 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -340,6 +340,11 @@ ASR_zpzi 00000100 .. 000 000 100 ... .. ... ..... @rdn_pg_tszimm_shr
LSR_zpzi 00000100 .. 000 001 100 ... .. ... ..... @rdn_pg_tszimm_shr
LSL_zpzi 00000100 .. 000 011 100 ... .. ... ..... @rdn_pg_tszimm_shl
ASRD 00000100 .. 000 100 100 ... .. ... ..... @rdn_pg_tszimm_shr
+SQSHL_zpzi 00000100 .. 000 110 100 ... .. ... ..... @rdn_pg_tszimm_shl
+UQSHL_zpzi 00000100 .. 000 111 100 ... .. ... ..... @rdn_pg_tszimm_shl
+SRSHR 00000100 .. 001 100 100 ... .. ... ..... @rdn_pg_tszimm_shr
+URSHR 00000100 .. 001 101 100 ... .. ... ..... @rdn_pg_tszimm_shr
+SQSHLU 00000100 .. 001 111 100 ... .. ... ..... @rdn_pg_tszimm_shl
# SVE bitwise shift by vector (predicated)
ASR_zpzz 00000100 .. 010 000 100 ... ..... ..... @rdn_pg_rm
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index d088b1f74ce..4afb06fb2a1 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -2238,6 +2238,41 @@ DO_ZPZI(sve_asrd_h, int16_t, H1_2, DO_ASRD)
DO_ZPZI(sve_asrd_s, int32_t, H1_4, DO_ASRD)
DO_ZPZI_D(sve_asrd_d, int64_t, DO_ASRD)
+/* SVE2 bitwise shift by immediate */
+DO_ZPZI(sve2_sqshl_zpzi_b, int8_t, H1, do_sqshl_b)
+DO_ZPZI(sve2_sqshl_zpzi_h, int16_t, H1_2, do_sqshl_h)
+DO_ZPZI(sve2_sqshl_zpzi_s, int32_t, H1_4, do_sqshl_s)
+DO_ZPZI_D(sve2_sqshl_zpzi_d, int64_t, do_sqshl_d)
+
+DO_ZPZI(sve2_uqshl_zpzi_b, uint8_t, H1, do_uqshl_b)
+DO_ZPZI(sve2_uqshl_zpzi_h, uint16_t, H1_2, do_uqshl_h)
+DO_ZPZI(sve2_uqshl_zpzi_s, uint32_t, H1_4, do_uqshl_s)
+DO_ZPZI_D(sve2_uqshl_zpzi_d, uint64_t, do_uqshl_d)
+
+DO_ZPZI(sve2_srshr_b, int8_t, H1, do_srshr)
+DO_ZPZI(sve2_srshr_h, int16_t, H1_2, do_srshr)
+DO_ZPZI(sve2_srshr_s, int32_t, H1_4, do_srshr)
+DO_ZPZI_D(sve2_srshr_d, int64_t, do_srshr)
+
+DO_ZPZI(sve2_urshr_b, uint8_t, H1, do_urshr)
+DO_ZPZI(sve2_urshr_h, uint16_t, H1_2, do_urshr)
+DO_ZPZI(sve2_urshr_s, uint32_t, H1_4, do_urshr)
+DO_ZPZI_D(sve2_urshr_d, uint64_t, do_urshr)
+
+#define do_suqrshl_b(n, m) \
+ ({ uint32_t discard; do_suqrshl_bhs(n, (int8_t)m, 8, false, &discard); })
+#define do_suqrshl_h(n, m) \
+ ({ uint32_t discard; do_suqrshl_bhs(n, (int16_t)m, 16, false, &discard); })
+#define do_suqrshl_s(n, m) \
+ ({ uint32_t discard; do_suqrshl_bhs(n, m, 32, false, &discard); })
+#define do_suqrshl_d(n, m) \
+ ({ uint32_t discard; do_suqrshl_d(n, m, false, &discard); })
+
+DO_ZPZI(sve2_sqshlu_b, int8_t, H1, do_suqrshl_b)
+DO_ZPZI(sve2_sqshlu_h, int16_t, H1_2, do_suqrshl_h)
+DO_ZPZI(sve2_sqshlu_s, int32_t, H1_4, do_suqrshl_s)
+DO_ZPZI_D(sve2_sqshlu_d, int64_t, do_suqrshl_d)
+
#undef DO_ASRD
#undef DO_ZPZI
#undef DO_ZPZI_D
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index b2aa9130b64..92c0620bc8e 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -1044,6 +1044,66 @@ static bool trans_ASRD(DisasContext *s, arg_rpri_esz *a)
}
}
+static bool trans_SQSHL_zpzi(DisasContext *s, arg_rpri_esz *a)
+{
+ static gen_helper_gvec_3 * const fns[4] = {
+ gen_helper_sve2_sqshl_zpzi_b, gen_helper_sve2_sqshl_zpzi_h,
+ gen_helper_sve2_sqshl_zpzi_s, gen_helper_sve2_sqshl_zpzi_d,
+ };
+ if (a->esz < 0 || !dc_isar_feature(aa64_sve2, s)) {
+ return false;
+ }
+ return do_zpzi_ool(s, a, fns[a->esz]);
+}
+
+static bool trans_UQSHL_zpzi(DisasContext *s, arg_rpri_esz *a)
+{
+ static gen_helper_gvec_3 * const fns[4] = {
+ gen_helper_sve2_uqshl_zpzi_b, gen_helper_sve2_uqshl_zpzi_h,
+ gen_helper_sve2_uqshl_zpzi_s, gen_helper_sve2_uqshl_zpzi_d,
+ };
+ if (a->esz < 0 || !dc_isar_feature(aa64_sve2, s)) {
+ return false;
+ }
+ return do_zpzi_ool(s, a, fns[a->esz]);
+}
+
+static bool trans_SRSHR(DisasContext *s, arg_rpri_esz *a)
+{
+ static gen_helper_gvec_3 * const fns[4] = {
+ gen_helper_sve2_srshr_b, gen_helper_sve2_srshr_h,
+ gen_helper_sve2_srshr_s, gen_helper_sve2_srshr_d,
+ };
+ if (a->esz < 0 || !dc_isar_feature(aa64_sve2, s)) {
+ return false;
+ }
+ return do_zpzi_ool(s, a, fns[a->esz]);
+}
+
+static bool trans_URSHR(DisasContext *s, arg_rpri_esz *a)
+{
+ static gen_helper_gvec_3 * const fns[4] = {
+ gen_helper_sve2_urshr_b, gen_helper_sve2_urshr_h,
+ gen_helper_sve2_urshr_s, gen_helper_sve2_urshr_d,
+ };
+ if (a->esz < 0 || !dc_isar_feature(aa64_sve2, s)) {
+ return false;
+ }
+ return do_zpzi_ool(s, a, fns[a->esz]);
+}
+
+static bool trans_SQSHLU(DisasContext *s, arg_rpri_esz *a)
+{
+ static gen_helper_gvec_3 * const fns[4] = {
+ gen_helper_sve2_sqshlu_b, gen_helper_sve2_sqshlu_h,
+ gen_helper_sve2_sqshlu_s, gen_helper_sve2_sqshlu_d,
+ };
+ if (a->esz < 0 || !dc_isar_feature(aa64_sve2, s)) {
+ return false;
+ }
+ return do_zpzi_ool(s, a, fns[a->esz]);
+}
+
/*
*** SVE Bitwise Shift - Predicated Group
*/
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PULL 103/114] target/arm: Move endian adjustment macros to vec_internal.h
2021-05-25 15:07 [PULL 095/114] target/arm: Implement SVE2 FCVTLT Peter Maydell
` (6 preceding siblings ...)
2021-05-25 15:07 ` [PULL 102/114] target/arm: Implement SVE2 bitwise shift immediate Peter Maydell
@ 2021-05-25 15:07 ` Peter Maydell
2021-05-25 15:07 ` [PULL 104/114] target/arm: Implement SVE2 fp multiply-add long Peter Maydell
` (10 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-05-25 15:07 UTC (permalink / raw)
To: qemu-devel
From: Richard Henderson <richard.henderson@linaro.org>
We have two copies of these, one set of which is not complete.
Move them to a common header.
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525010358.152808-82-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/vec_internal.h | 24 ++++++++++++++++++++++++
target/arm/sve_helper.c | 16 ----------------
target/arm/vec_helper.c | 12 ------------
3 files changed, 24 insertions(+), 28 deletions(-)
diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
index ff694d870ac..dba481e0012 100644
--- a/target/arm/vec_internal.h
+++ b/target/arm/vec_internal.h
@@ -20,6 +20,30 @@
#ifndef TARGET_ARM_VEC_INTERNALS_H
#define TARGET_ARM_VEC_INTERNALS_H
+/*
+ * Note that vector data is stored in host-endian 64-bit chunks,
+ * so addressing units smaller than that needs a host-endian fixup.
+ *
+ * The H<N> macros are used when indexing an array of elements of size N.
+ *
+ * The H1_<N> macros are used when performing byte arithmetic and then
+ * casting the final pointer to a type of size N.
+ */
+#ifdef HOST_WORDS_BIGENDIAN
+#define H1(x) ((x) ^ 7)
+#define H1_2(x) ((x) ^ 6)
+#define H1_4(x) ((x) ^ 4)
+#define H2(x) ((x) ^ 3)
+#define H4(x) ((x) ^ 1)
+#else
+#define H1(x) (x)
+#define H1_2(x) (x)
+#define H1_4(x) (x)
+#define H2(x) (x)
+#define H4(x) (x)
+#endif
+
+
static inline void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz)
{
uint64_t *d = vd + opr_sz;
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 4afb06fb2a1..40af3024dfb 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -29,22 +29,6 @@
#include "vec_internal.h"
-/* Note that vector data is stored in host-endian 64-bit chunks,
- so addressing units smaller than that needs a host-endian fixup. */
-#ifdef HOST_WORDS_BIGENDIAN
-#define H1(x) ((x) ^ 7)
-#define H1_2(x) ((x) ^ 6)
-#define H1_4(x) ((x) ^ 4)
-#define H2(x) ((x) ^ 3)
-#define H4(x) ((x) ^ 1)
-#else
-#define H1(x) (x)
-#define H1_2(x) (x)
-#define H1_4(x) (x)
-#define H2(x) (x)
-#define H4(x) (x)
-#endif
-
/* Return a value for NZCV as per the ARM PredTest pseudofunction.
*
* The return value has bit 31 set if N is set, bit 1 set if Z is clear,
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index 21ae1258f2e..f5af45375df 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -25,18 +25,6 @@
#include "qemu/int128.h"
#include "vec_internal.h"
-/* Note that vector data is stored in host-endian 64-bit chunks,
- so addressing units smaller than that needs a host-endian fixup. */
-#ifdef HOST_WORDS_BIGENDIAN
-#define H1(x) ((x) ^ 7)
-#define H2(x) ((x) ^ 3)
-#define H4(x) ((x) ^ 1)
-#else
-#define H1(x) (x)
-#define H2(x) (x)
-#define H4(x) (x)
-#endif
-
/* Signed saturating rounding doubling multiply-accumulate high half, 8-bit */
int8_t do_sqrdmlah_b(int8_t src1, int8_t src2, int8_t src3,
bool neg, bool round)
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PULL 104/114] target/arm: Implement SVE2 fp multiply-add long
2021-05-25 15:07 [PULL 095/114] target/arm: Implement SVE2 FCVTLT Peter Maydell
` (7 preceding siblings ...)
2021-05-25 15:07 ` [PULL 103/114] target/arm: Move endian adjustment macros to vec_internal.h Peter Maydell
@ 2021-05-25 15:07 ` Peter Maydell
2021-05-25 15:07 ` [PULL 105/114] target/arm: Implement aarch64 SUDOT, USDOT Peter Maydell
` (9 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-05-25 15:07 UTC (permalink / raw)
To: qemu-devel
From: Stephen Long <steplong@quicinc.com>
Implements both vectored and indexed FMLALB, FMLALT, FMLSLB, FMLSLT
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stephen Long <steplong@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525010358.152808-83-richard.henderson@linaro.org
Message-Id: <20200504171240.11220-1-steplong@quicinc.com>
[rth: Rearrange to use float16_to_float32_by_bits.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/helper.h | 5 +++
target/arm/sve.decode | 14 +++++++
target/arm/translate-sve.c | 75 ++++++++++++++++++++++++++++++++++++++
target/arm/vec_helper.c | 47 ++++++++++++++++++++++++
4 files changed, 141 insertions(+)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 2e212ae96be..92b81bbabe4 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -986,6 +986,11 @@ DEF_HELPER_FLAGS_4(sve2_sqrdmulh_idx_s, TCG_CALL_NO_RWG,
DEF_HELPER_FLAGS_4(sve2_sqrdmulh_idx_d, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve2_fmlal_zzzw_s, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sve2_fmlal_zzxw_s, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, ptr, i32)
+
DEF_HELPER_FLAGS_4(gvec_xar_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
#ifdef TARGET_AARCH64
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index ea98508cddc..78a2a31ab19 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -132,6 +132,8 @@
&rrrr_esz ra=%reg_movprfx
# Four operand with unused vector element size
+@rda_rn_rm_e0 ........ ... rm:5 ... ... rn:5 rd:5 \
+ &rrrr_esz esz=0 ra=%reg_movprfx
@rdn_ra_rm_e0 ........ ... rm:5 ... ... ra:5 rd:5 \
&rrrr_esz esz=0 rn=%reg_movprfx
@@ -1608,3 +1610,15 @@ FCVTLT_sd 01100100 11 0010 11 101 ... ..... ..... @rd_pg_rn_e0
### SVE2 floating-point convert to integer
FLOGB 01100101 00 011 esz:2 0101 pg:3 rn:5 rd:5 &rpr_esz
+
+### SVE2 floating-point multiply-add long (vectors)
+FMLALB_zzzw 01100100 10 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_e0
+FMLALT_zzzw 01100100 10 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_e0
+FMLSLB_zzzw 01100100 10 1 ..... 10 1 00 0 ..... ..... @rda_rn_rm_e0
+FMLSLT_zzzw 01100100 10 1 ..... 10 1 00 1 ..... ..... @rda_rn_rm_e0
+
+### SVE2 floating-point multiply-add long (indexed)
+FMLALB_zzxw 01100100 10 1 ..... 0100.0 ..... ..... @rrxr_3a esz=2
+FMLALT_zzxw 01100100 10 1 ..... 0100.1 ..... ..... @rrxr_3a esz=2
+FMLSLB_zzxw 01100100 10 1 ..... 0110.0 ..... ..... @rrxr_3a esz=2
+FMLSLT_zzxw 01100100 10 1 ..... 0110.1 ..... ..... @rrxr_3a esz=2
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 92c0620bc8e..428ae018a35 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -8535,3 +8535,78 @@ static bool trans_FLOGB(DisasContext *s, arg_rpr_esz *a)
}
return true;
}
+
+static bool do_FMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sub, bool sel)
+{
+ if (!dc_isar_feature(aa64_sve2, s)) {
+ return false;
+ }
+ if (sve_access_check(s)) {
+ unsigned vsz = vec_full_reg_size(s);
+ tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd),
+ vec_full_reg_offset(s, a->rn),
+ vec_full_reg_offset(s, a->rm),
+ vec_full_reg_offset(s, a->ra),
+ cpu_env, vsz, vsz, (sel << 1) | sub,
+ gen_helper_sve2_fmlal_zzzw_s);
+ }
+ return true;
+}
+
+static bool trans_FMLALB_zzzw(DisasContext *s, arg_rrrr_esz *a)
+{
+ return do_FMLAL_zzzw(s, a, false, false);
+}
+
+static bool trans_FMLALT_zzzw(DisasContext *s, arg_rrrr_esz *a)
+{
+ return do_FMLAL_zzzw(s, a, false, true);
+}
+
+static bool trans_FMLSLB_zzzw(DisasContext *s, arg_rrrr_esz *a)
+{
+ return do_FMLAL_zzzw(s, a, true, false);
+}
+
+static bool trans_FMLSLT_zzzw(DisasContext *s, arg_rrrr_esz *a)
+{
+ return do_FMLAL_zzzw(s, a, true, true);
+}
+
+static bool do_FMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sub, bool sel)
+{
+ if (!dc_isar_feature(aa64_sve2, s)) {
+ return false;
+ }
+ if (sve_access_check(s)) {
+ unsigned vsz = vec_full_reg_size(s);
+ tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd),
+ vec_full_reg_offset(s, a->rn),
+ vec_full_reg_offset(s, a->rm),
+ vec_full_reg_offset(s, a->ra),
+ cpu_env, vsz, vsz,
+ (a->index << 2) | (sel << 1) | sub,
+ gen_helper_sve2_fmlal_zzxw_s);
+ }
+ return true;
+}
+
+static bool trans_FMLALB_zzxw(DisasContext *s, arg_rrxr_esz *a)
+{
+ return do_FMLAL_zzxw(s, a, false, false);
+}
+
+static bool trans_FMLALT_zzxw(DisasContext *s, arg_rrxr_esz *a)
+{
+ return do_FMLAL_zzxw(s, a, false, true);
+}
+
+static bool trans_FMLSLB_zzxw(DisasContext *s, arg_rrxr_esz *a)
+{
+ return do_FMLAL_zzxw(s, a, true, false);
+}
+
+static bool trans_FMLSLT_zzxw(DisasContext *s, arg_rrxr_esz *a)
+{
+ return do_FMLAL_zzxw(s, a, true, true);
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index f5af45375df..19c4ba1bdf5 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -1668,6 +1668,27 @@ void HELPER(gvec_fmlal_a64)(void *vd, void *vn, void *vm,
get_flush_inputs_to_zero(&env->vfp.fp_status_f16));
}
+void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va,
+ void *venv, uint32_t desc)
+{
+ intptr_t i, oprsz = simd_oprsz(desc);
+ uint16_t negn = extract32(desc, SIMD_DATA_SHIFT, 1) << 15;
+ intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
+ CPUARMState *env = venv;
+ float_status *status = &env->vfp.fp_status;
+ bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16);
+
+ for (i = 0; i < oprsz; i += sizeof(float32)) {
+ float16 nn_16 = *(float16 *)(vn + H1_2(i + sel)) ^ negn;
+ float16 mm_16 = *(float16 *)(vm + H1_2(i + sel));
+ float32 nn = float16_to_float32_by_bits(nn_16, fz16);
+ float32 mm = float16_to_float32_by_bits(mm_16, fz16);
+ float32 aa = *(float32 *)(va + H1_4(i));
+
+ *(float32 *)(vd + H1_4(i)) = float32_muladd(nn, mm, aa, 0, status);
+ }
+}
+
static void do_fmlal_idx(float32 *d, void *vn, void *vm, float_status *fpst,
uint32_t desc, bool fz16)
{
@@ -1712,6 +1733,32 @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
get_flush_inputs_to_zero(&env->vfp.fp_status_f16));
}
+void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va,
+ void *venv, uint32_t desc)
+{
+ intptr_t i, j, oprsz = simd_oprsz(desc);
+ uint16_t negn = extract32(desc, SIMD_DATA_SHIFT, 1) << 15;
+ intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16);
+ intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16);
+ CPUARMState *env = venv;
+ float_status *status = &env->vfp.fp_status;
+ bool fz16 = get_flush_inputs_to_zero(&env->vfp.fp_status_f16);
+
+ for (i = 0; i < oprsz; i += 16) {
+ float16 mm_16 = *(float16 *)(vm + i + idx);
+ float32 mm = float16_to_float32_by_bits(mm_16, fz16);
+
+ for (j = 0; j < 16; j += sizeof(float32)) {
+ float16 nn_16 = *(float16 *)(vn + H1_2(i + j + sel)) ^ negn;
+ float32 nn = float16_to_float32_by_bits(nn_16, fz16);
+ float32 aa = *(float32 *)(va + H1_4(i + j));
+
+ *(float32 *)(vd + H1_4(i + j)) =
+ float32_muladd(nn, mm, aa, 0, status);
+ }
+ }
+}
+
void HELPER(gvec_sshl_b)(void *vd, void *vn, void *vm, uint32_t desc)
{
intptr_t i, opr_sz = simd_oprsz(desc);
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PULL 105/114] target/arm: Implement aarch64 SUDOT, USDOT
2021-05-25 15:07 [PULL 095/114] target/arm: Implement SVE2 FCVTLT Peter Maydell
` (8 preceding siblings ...)
2021-05-25 15:07 ` [PULL 104/114] target/arm: Implement SVE2 fp multiply-add long Peter Maydell
@ 2021-05-25 15:07 ` Peter Maydell
2021-05-25 15:07 ` [PULL 106/114] target/arm: Split out do_neon_ddda_fpst Peter Maydell
` (8 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-05-25 15:07 UTC (permalink / raw)
To: qemu-devel
From: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525010358.152808-84-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/cpu.h | 5 +++++
target/arm/translate-a64.c | 25 +++++++++++++++++++++++++
2 files changed, 30 insertions(+)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 55383cb0661..8ecb2a1c89e 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -4216,6 +4216,11 @@ static inline bool isar_feature_aa64_rcpc_8_4(const ARMISARegisters *id)
return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, LRCPC) >= 2;
}
+static inline bool isar_feature_aa64_i8mm(const ARMISARegisters *id)
+{
+ return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, I8MM) != 0;
+}
+
static inline bool isar_feature_aa64_ccidx(const ARMISARegisters *id)
{
return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, CCIDX) != 0;
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index a8edd2d2815..c8754817842 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -12175,6 +12175,13 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
feature = dc_isar_feature(aa64_dp, s);
break;
+ case 0x03: /* USDOT */
+ if (size != MO_32) {
+ unallocated_encoding(s);
+ return;
+ }
+ feature = dc_isar_feature(aa64_i8mm, s);
+ break;
case 0x18: /* FCMLA, #0 */
case 0x19: /* FCMLA, #90 */
case 0x1a: /* FCMLA, #180 */
@@ -12215,6 +12222,10 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b);
return;
+ case 0x3: /* USDOT */
+ gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_usdot_b);
+ return;
+
case 0x8: /* FCMLA, #0 */
case 0x9: /* FCMLA, #90 */
case 0xa: /* FCMLA, #180 */
@@ -13360,6 +13371,13 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
return;
}
break;
+ case 0x0f: /* SUDOT, USDOT */
+ if (is_scalar || (size & 1) || !dc_isar_feature(aa64_i8mm, s)) {
+ unallocated_encoding(s);
+ return;
+ }
+ size = MO_32;
+ break;
case 0x11: /* FCMLA #0 */
case 0x13: /* FCMLA #90 */
case 0x15: /* FCMLA #180 */
@@ -13474,6 +13492,13 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
u ? gen_helper_gvec_udot_idx_b
: gen_helper_gvec_sdot_idx_b);
return;
+ case 0x0f: /* SUDOT, USDOT */
+ gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
+ extract32(insn, 23, 1)
+ ? gen_helper_gvec_usdot_idx_b
+ : gen_helper_gvec_sudot_idx_b);
+ return;
+
case 0x11: /* FCMLA #0 */
case 0x13: /* FCMLA #90 */
case 0x15: /* FCMLA #180 */
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PULL 106/114] target/arm: Split out do_neon_ddda_fpst
2021-05-25 15:07 [PULL 095/114] target/arm: Implement SVE2 FCVTLT Peter Maydell
` (9 preceding siblings ...)
2021-05-25 15:07 ` [PULL 105/114] target/arm: Implement aarch64 SUDOT, USDOT Peter Maydell
@ 2021-05-25 15:07 ` Peter Maydell
2021-05-25 15:07 ` [PULL 107/114] target/arm: Remove unused fpst from VDOT_scalar Peter Maydell
` (7 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-05-25 15:07 UTC (permalink / raw)
To: qemu-devel
From: Richard Henderson <richard.henderson@linaro.org>
Split out a helper that can handle the 4-register
format for helpers shared with SVE.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525010358.152808-85-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/translate-neon.c | 98 ++++++++++++++++---------------------
1 file changed, 43 insertions(+), 55 deletions(-)
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index 45fa5166f34..1a8fc7fb390 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -151,24 +151,21 @@ static void neon_store_element64(int reg, int ele, MemOp size, TCGv_i64 var)
}
}
-static bool trans_VCMLA(DisasContext *s, arg_VCMLA *a)
+static bool do_neon_ddda_fpst(DisasContext *s, int q, int vd, int vn, int vm,
+ int data, ARMFPStatusFlavour fp_flavour,
+ gen_helper_gvec_4_ptr *fn_gvec_ptr)
{
- int opr_sz;
- TCGv_ptr fpst;
- gen_helper_gvec_4_ptr *fn_gvec_ptr;
-
- if (!dc_isar_feature(aa32_vcma, s)
- || (a->size == MO_16 && !dc_isar_feature(aa32_fp16_arith, s))) {
- return false;
- }
-
/* UNDEF accesses to D16-D31 if they don't exist. */
- if (!dc_isar_feature(aa32_simd_r32, s) &&
- ((a->vd | a->vn | a->vm) & 0x10)) {
+ if (((vd | vn | vm) & 0x10) && !dc_isar_feature(aa32_simd_r32, s)) {
return false;
}
- if ((a->vn | a->vm | a->vd) & a->q) {
+ /*
+ * UNDEF accesses to odd registers for each bit of Q.
+ * Q will be 0b111 for all Q-reg instructions, otherwise
+ * when we have mixed Q- and D-reg inputs.
+ */
+ if (((vd & 1) * 4 | (vn & 1) * 2 | (vm & 1)) & q) {
return false;
}
@@ -176,20 +173,34 @@ static bool trans_VCMLA(DisasContext *s, arg_VCMLA *a)
return true;
}
- opr_sz = (1 + a->q) * 8;
- fpst = fpstatus_ptr(a->size == MO_16 ? FPST_STD_F16 : FPST_STD);
- fn_gvec_ptr = (a->size == MO_16) ?
- gen_helper_gvec_fcmlah : gen_helper_gvec_fcmlas;
- tcg_gen_gvec_4_ptr(vfp_reg_offset(1, a->vd),
- vfp_reg_offset(1, a->vn),
- vfp_reg_offset(1, a->vm),
- vfp_reg_offset(1, a->vd),
- fpst, opr_sz, opr_sz, a->rot,
- fn_gvec_ptr);
+ int opr_sz = q ? 16 : 8;
+ TCGv_ptr fpst = fpstatus_ptr(fp_flavour);
+
+ tcg_gen_gvec_4_ptr(vfp_reg_offset(1, vd),
+ vfp_reg_offset(1, vn),
+ vfp_reg_offset(1, vm),
+ vfp_reg_offset(1, vd),
+ fpst, opr_sz, opr_sz, data, fn_gvec_ptr);
tcg_temp_free_ptr(fpst);
return true;
}
+static bool trans_VCMLA(DisasContext *s, arg_VCMLA *a)
+{
+ if (!dc_isar_feature(aa32_vcma, s)) {
+ return false;
+ }
+ if (a->size == MO_16) {
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
+ return false;
+ }
+ return do_neon_ddda_fpst(s, a->q * 7, a->vd, a->vn, a->vm, a->rot,
+ FPST_STD_F16, gen_helper_gvec_fcmlah);
+ }
+ return do_neon_ddda_fpst(s, a->q * 7, a->vd, a->vn, a->vm, a->rot,
+ FPST_STD, gen_helper_gvec_fcmlas);
+}
+
static bool trans_VCADD(DisasContext *s, arg_VCADD *a)
{
int opr_sz;
@@ -294,43 +305,20 @@ static bool trans_VFML(DisasContext *s, arg_VFML *a)
static bool trans_VCMLA_scalar(DisasContext *s, arg_VCMLA_scalar *a)
{
- gen_helper_gvec_4_ptr *fn_gvec_ptr;
- int opr_sz;
- TCGv_ptr fpst;
+ int data = (a->index << 2) | a->rot;
if (!dc_isar_feature(aa32_vcma, s)) {
return false;
}
- if (a->size == MO_16 && !dc_isar_feature(aa32_fp16_arith, s)) {
- return false;
+ if (a->size == MO_16) {
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
+ return false;
+ }
+ return do_neon_ddda_fpst(s, a->q * 6, a->vd, a->vn, a->vm, data,
+ FPST_STD_F16, gen_helper_gvec_fcmlah_idx);
}
-
- /* UNDEF accesses to D16-D31 if they don't exist. */
- if (!dc_isar_feature(aa32_simd_r32, s) &&
- ((a->vd | a->vn | a->vm) & 0x10)) {
- return false;
- }
-
- if ((a->vd | a->vn) & a->q) {
- return false;
- }
-
- if (!vfp_access_check(s)) {
- return true;
- }
-
- fn_gvec_ptr = (a->size == MO_16) ?
- gen_helper_gvec_fcmlah_idx : gen_helper_gvec_fcmlas_idx;
- opr_sz = (1 + a->q) * 8;
- fpst = fpstatus_ptr(a->size == MO_16 ? FPST_STD_F16 : FPST_STD);
- tcg_gen_gvec_4_ptr(vfp_reg_offset(1, a->vd),
- vfp_reg_offset(1, a->vn),
- vfp_reg_offset(1, a->vm),
- vfp_reg_offset(1, a->vd),
- fpst, opr_sz, opr_sz,
- (a->index << 2) | a->rot, fn_gvec_ptr);
- tcg_temp_free_ptr(fpst);
- return true;
+ return do_neon_ddda_fpst(s, a->q * 6, a->vd, a->vn, a->vm, data,
+ FPST_STD, gen_helper_gvec_fcmlas_idx);
}
static bool trans_VDOT_scalar(DisasContext *s, arg_VDOT_scalar *a)
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PULL 107/114] target/arm: Remove unused fpst from VDOT_scalar
2021-05-25 15:07 [PULL 095/114] target/arm: Implement SVE2 FCVTLT Peter Maydell
` (10 preceding siblings ...)
2021-05-25 15:07 ` [PULL 106/114] target/arm: Split out do_neon_ddda_fpst Peter Maydell
@ 2021-05-25 15:07 ` Peter Maydell
2021-05-25 15:07 ` [PULL 108/114] target/arm: Fix decode for VDOT (indexed) Peter Maydell
` (6 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-05-25 15:07 UTC (permalink / raw)
To: qemu-devel
From: Richard Henderson <richard.henderson@linaro.org>
Cut and paste error from another pattern.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525010358.152808-86-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/translate-neon.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index 1a8fc7fb390..14a9d0d4d30 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -325,7 +325,6 @@ static bool trans_VDOT_scalar(DisasContext *s, arg_VDOT_scalar *a)
{
gen_helper_gvec_4 *fn_gvec;
int opr_sz;
- TCGv_ptr fpst;
if (!dc_isar_feature(aa32_dp, s)) {
return false;
@@ -347,13 +346,11 @@ static bool trans_VDOT_scalar(DisasContext *s, arg_VDOT_scalar *a)
fn_gvec = a->u ? gen_helper_gvec_udot_idx_b : gen_helper_gvec_sdot_idx_b;
opr_sz = (1 + a->q) * 8;
- fpst = fpstatus_ptr(FPST_STD);
tcg_gen_gvec_4_ool(vfp_reg_offset(1, a->vd),
vfp_reg_offset(1, a->vn),
vfp_reg_offset(1, a->rm),
vfp_reg_offset(1, a->vd),
opr_sz, opr_sz, a->index, fn_gvec);
- tcg_temp_free_ptr(fpst);
return true;
}
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PULL 108/114] target/arm: Fix decode for VDOT (indexed)
2021-05-25 15:07 [PULL 095/114] target/arm: Implement SVE2 FCVTLT Peter Maydell
` (11 preceding siblings ...)
2021-05-25 15:07 ` [PULL 107/114] target/arm: Remove unused fpst from VDOT_scalar Peter Maydell
@ 2021-05-25 15:07 ` Peter Maydell
2021-05-25 15:07 ` [PULL 109/114] target/arm: Split out do_neon_ddda Peter Maydell
` (5 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-05-25 15:07 UTC (permalink / raw)
To: qemu-devel
From: Richard Henderson <richard.henderson@linaro.org>
We were extracting the M register twice, once incorrectly
as M:vm and once correctly as rm. Remove the incorrect
name and remove the incorrect decode.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525010358.152808-87-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/neon-shared.decode | 4 ++--
target/arm/translate-neon.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index ca0c699072e..facb621450d 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -61,8 +61,8 @@ VCMLA_scalar 1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
VCMLA_scalar 1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
vm=%vm_dp vn=%vn_dp vd=%vd_dp size=2 index=0
-VDOT_scalar 1111 1110 0 . 10 .... .... 1101 . q:1 index:1 u:1 rm:4 \
- vm=%vm_dp vn=%vn_dp vd=%vd_dp
+VDOT_scalar 1111 1110 0 . 10 .... .... 1101 . q:1 index:1 u:1 vm:4 \
+ vn=%vn_dp vd=%vd_dp
%vfml_scalar_q0_rm 0:3 5:1
%vfml_scalar_q1_index 5:1 3:1
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index 14a9d0d4d30..9f7a88aab1b 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -348,7 +348,7 @@ static bool trans_VDOT_scalar(DisasContext *s, arg_VDOT_scalar *a)
opr_sz = (1 + a->q) * 8;
tcg_gen_gvec_4_ool(vfp_reg_offset(1, a->vd),
vfp_reg_offset(1, a->vn),
- vfp_reg_offset(1, a->rm),
+ vfp_reg_offset(1, a->vm),
vfp_reg_offset(1, a->vd),
opr_sz, opr_sz, a->index, fn_gvec);
return true;
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PULL 109/114] target/arm: Split out do_neon_ddda
2021-05-25 15:07 [PULL 095/114] target/arm: Implement SVE2 FCVTLT Peter Maydell
` (12 preceding siblings ...)
2021-05-25 15:07 ` [PULL 108/114] target/arm: Fix decode for VDOT (indexed) Peter Maydell
@ 2021-05-25 15:07 ` Peter Maydell
2021-05-25 15:07 ` [PULL 110/114] target/arm: Split decode of VSDOT and VUDOT Peter Maydell
` (4 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-05-25 15:07 UTC (permalink / raw)
To: qemu-devel
From: Richard Henderson <richard.henderson@linaro.org>
Split out a helper that can handle the 4-register
format for helpers shared with SVE.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525010358.152808-88-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/translate-neon.c | 90 ++++++++++++++++---------------------
1 file changed, 38 insertions(+), 52 deletions(-)
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index 9f7a88aab1b..dfa33912ab1 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -151,6 +151,36 @@ static void neon_store_element64(int reg, int ele, MemOp size, TCGv_i64 var)
}
}
+static bool do_neon_ddda(DisasContext *s, int q, int vd, int vn, int vm,
+ int data, gen_helper_gvec_4 *fn_gvec)
+{
+ /* UNDEF accesses to D16-D31 if they don't exist. */
+ if (((vd | vn | vm) & 0x10) && !dc_isar_feature(aa32_simd_r32, s)) {
+ return false;
+ }
+
+ /*
+ * UNDEF accesses to odd registers for each bit of Q.
+ * Q will be 0b111 for all Q-reg instructions, otherwise
+ * when we have mixed Q- and D-reg inputs.
+ */
+ if (((vd & 1) * 4 | (vn & 1) * 2 | (vm & 1)) & q) {
+ return false;
+ }
+
+ if (!vfp_access_check(s)) {
+ return true;
+ }
+
+ int opr_sz = q ? 16 : 8;
+ tcg_gen_gvec_4_ool(vfp_reg_offset(1, vd),
+ vfp_reg_offset(1, vn),
+ vfp_reg_offset(1, vm),
+ vfp_reg_offset(1, vd),
+ opr_sz, opr_sz, data, fn_gvec);
+ return true;
+}
+
static bool do_neon_ddda_fpst(DisasContext *s, int q, int vd, int vn, int vm,
int data, ARMFPStatusFlavour fp_flavour,
gen_helper_gvec_4_ptr *fn_gvec_ptr)
@@ -241,35 +271,13 @@ static bool trans_VCADD(DisasContext *s, arg_VCADD *a)
static bool trans_VDOT(DisasContext *s, arg_VDOT *a)
{
- int opr_sz;
- gen_helper_gvec_4 *fn_gvec;
-
if (!dc_isar_feature(aa32_dp, s)) {
return false;
}
-
- /* UNDEF accesses to D16-D31 if they don't exist. */
- if (!dc_isar_feature(aa32_simd_r32, s) &&
- ((a->vd | a->vn | a->vm) & 0x10)) {
- return false;
- }
-
- if ((a->vn | a->vm | a->vd) & a->q) {
- return false;
- }
-
- if (!vfp_access_check(s)) {
- return true;
- }
-
- opr_sz = (1 + a->q) * 8;
- fn_gvec = a->u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b;
- tcg_gen_gvec_4_ool(vfp_reg_offset(1, a->vd),
- vfp_reg_offset(1, a->vn),
- vfp_reg_offset(1, a->vm),
- vfp_reg_offset(1, a->vd),
- opr_sz, opr_sz, 0, fn_gvec);
- return true;
+ return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0,
+ a->u
+ ? gen_helper_gvec_udot_b
+ : gen_helper_gvec_sdot_b);
}
static bool trans_VFML(DisasContext *s, arg_VFML *a)
@@ -323,35 +331,13 @@ static bool trans_VCMLA_scalar(DisasContext *s, arg_VCMLA_scalar *a)
static bool trans_VDOT_scalar(DisasContext *s, arg_VDOT_scalar *a)
{
- gen_helper_gvec_4 *fn_gvec;
- int opr_sz;
-
if (!dc_isar_feature(aa32_dp, s)) {
return false;
}
-
- /* UNDEF accesses to D16-D31 if they don't exist. */
- if (!dc_isar_feature(aa32_simd_r32, s) &&
- ((a->vd | a->vn) & 0x10)) {
- return false;
- }
-
- if ((a->vd | a->vn) & a->q) {
- return false;
- }
-
- if (!vfp_access_check(s)) {
- return true;
- }
-
- fn_gvec = a->u ? gen_helper_gvec_udot_idx_b : gen_helper_gvec_sdot_idx_b;
- opr_sz = (1 + a->q) * 8;
- tcg_gen_gvec_4_ool(vfp_reg_offset(1, a->vd),
- vfp_reg_offset(1, a->vn),
- vfp_reg_offset(1, a->vm),
- vfp_reg_offset(1, a->vd),
- opr_sz, opr_sz, a->index, fn_gvec);
- return true;
+ return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index,
+ a->u
+ ? gen_helper_gvec_udot_idx_b
+ : gen_helper_gvec_sdot_idx_b);
}
static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PULL 110/114] target/arm: Split decode of VSDOT and VUDOT
2021-05-25 15:07 [PULL 095/114] target/arm: Implement SVE2 FCVTLT Peter Maydell
` (13 preceding siblings ...)
2021-05-25 15:07 ` [PULL 109/114] target/arm: Split out do_neon_ddda Peter Maydell
@ 2021-05-25 15:07 ` Peter Maydell
2021-05-25 15:07 ` [PULL 111/114] target/arm: Implement aarch32 VSUDOT, VUSDOT Peter Maydell
` (3 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-05-25 15:07 UTC (permalink / raw)
To: qemu-devel
From: Richard Henderson <richard.henderson@linaro.org>
Now that we have a common helper, sharing decode does not
save much. Also, this will solve an upcoming naming problem.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525010358.152808-89-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/neon-shared.decode | 9 ++++++---
target/arm/translate-neon.c | 30 ++++++++++++++++++++++--------
2 files changed, 28 insertions(+), 11 deletions(-)
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index facb621450d..2d94369750d 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -46,8 +46,9 @@ VCMLA 1111 110 rot:2 . 1 . .... .... 1000 . q:1 . 0 .... \
VCADD 1111 110 rot:1 1 . 0 . .... .... 1000 . q:1 . 0 .... \
vm=%vm_dp vn=%vn_dp vd=%vd_dp size=%vcadd_size
-# VUDOT and VSDOT
-VDOT 1111 110 00 . 10 .... .... 1101 . q:1 . u:1 .... \
+VSDOT 1111 110 00 . 10 .... .... 1101 . q:1 . 0 .... \
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
+VUDOT 1111 110 00 . 10 .... .... 1101 . q:1 . 1 .... \
vm=%vm_dp vn=%vn_dp vd=%vd_dp
# VFM[AS]L
@@ -61,7 +62,9 @@ VCMLA_scalar 1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
VCMLA_scalar 1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
vm=%vm_dp vn=%vn_dp vd=%vd_dp size=2 index=0
-VDOT_scalar 1111 1110 0 . 10 .... .... 1101 . q:1 index:1 u:1 vm:4 \
+VSDOT_scalar 1111 1110 0 . 10 .... .... 1101 . q:1 index:1 0 vm:4 \
+ vn=%vn_dp vd=%vd_dp
+VUDOT_scalar 1111 1110 0 . 10 .... .... 1101 . q:1 index:1 1 vm:4 \
vn=%vn_dp vd=%vd_dp
%vfml_scalar_q0_rm 0:3 5:1
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index dfa33912ab1..386b42fe4b0 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -269,15 +269,22 @@ static bool trans_VCADD(DisasContext *s, arg_VCADD *a)
return true;
}
-static bool trans_VDOT(DisasContext *s, arg_VDOT *a)
+static bool trans_VSDOT(DisasContext *s, arg_VSDOT *a)
{
if (!dc_isar_feature(aa32_dp, s)) {
return false;
}
return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0,
- a->u
- ? gen_helper_gvec_udot_b
- : gen_helper_gvec_sdot_b);
+ gen_helper_gvec_sdot_b);
+}
+
+static bool trans_VUDOT(DisasContext *s, arg_VUDOT *a)
+{
+ if (!dc_isar_feature(aa32_dp, s)) {
+ return false;
+ }
+ return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0,
+ gen_helper_gvec_udot_b);
}
static bool trans_VFML(DisasContext *s, arg_VFML *a)
@@ -329,15 +336,22 @@ static bool trans_VCMLA_scalar(DisasContext *s, arg_VCMLA_scalar *a)
FPST_STD, gen_helper_gvec_fcmlas_idx);
}
-static bool trans_VDOT_scalar(DisasContext *s, arg_VDOT_scalar *a)
+static bool trans_VSDOT_scalar(DisasContext *s, arg_VSDOT_scalar *a)
{
if (!dc_isar_feature(aa32_dp, s)) {
return false;
}
return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index,
- a->u
- ? gen_helper_gvec_udot_idx_b
- : gen_helper_gvec_sdot_idx_b);
+ gen_helper_gvec_sdot_idx_b);
+}
+
+static bool trans_VUDOT_scalar(DisasContext *s, arg_VUDOT_scalar *a)
+{
+ if (!dc_isar_feature(aa32_dp, s)) {
+ return false;
+ }
+ return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index,
+ gen_helper_gvec_udot_idx_b);
}
static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PULL 111/114] target/arm: Implement aarch32 VSUDOT, VUSDOT
2021-05-25 15:07 [PULL 095/114] target/arm: Implement SVE2 FCVTLT Peter Maydell
` (14 preceding siblings ...)
2021-05-25 15:07 ` [PULL 110/114] target/arm: Split decode of VSDOT and VUDOT Peter Maydell
@ 2021-05-25 15:07 ` Peter Maydell
2021-05-25 15:07 ` [PULL 112/114] target/arm: Implement integer matrix multiply accumulate Peter Maydell
` (2 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-05-25 15:07 UTC (permalink / raw)
To: qemu-devel
From: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525010358.152808-90-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/cpu.h | 5 +++++
target/arm/neon-shared.decode | 6 ++++++
target/arm/translate-neon.c | 27 +++++++++++++++++++++++++++
3 files changed, 38 insertions(+)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 8ecb2a1c89e..04f8be35bf0 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3783,6 +3783,11 @@ static inline bool isar_feature_aa32_predinv(const ARMISARegisters *id)
return FIELD_EX32(id->id_isar6, ID_ISAR6, SPECRES) != 0;
}
+static inline bool isar_feature_aa32_i8mm(const ARMISARegisters *id)
+{
+ return FIELD_EX32(id->id_isar6, ID_ISAR6, I8MM) != 0;
+}
+
static inline bool isar_feature_aa32_ras(const ARMISARegisters *id)
{
return FIELD_EX32(id->id_pfr0, ID_PFR0, RAS) != 0;
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index 2d94369750d..5befaec87b1 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -50,6 +50,8 @@ VSDOT 1111 110 00 . 10 .... .... 1101 . q:1 . 0 .... \
vm=%vm_dp vn=%vn_dp vd=%vd_dp
VUDOT 1111 110 00 . 10 .... .... 1101 . q:1 . 1 .... \
vm=%vm_dp vn=%vn_dp vd=%vd_dp
+VUSDOT 1111 110 01 . 10 .... .... 1101 . q:1 . 0 .... \
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
# VFM[AS]L
VFML 1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
@@ -66,6 +68,10 @@ VSDOT_scalar 1111 1110 0 . 10 .... .... 1101 . q:1 index:1 0 vm:4 \
vn=%vn_dp vd=%vd_dp
VUDOT_scalar 1111 1110 0 . 10 .... .... 1101 . q:1 index:1 1 vm:4 \
vn=%vn_dp vd=%vd_dp
+VUSDOT_scalar 1111 1110 1 . 00 .... .... 1101 . q:1 index:1 0 vm:4 \
+ vn=%vn_dp vd=%vd_dp
+VSUDOT_scalar 1111 1110 1 . 00 .... .... 1101 . q:1 index:1 1 vm:4 \
+ vn=%vn_dp vd=%vd_dp
%vfml_scalar_q0_rm 0:3 5:1
%vfml_scalar_q1_index 5:1 3:1
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index 386b42fe4b0..b6ca29c25ca 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -287,6 +287,15 @@ static bool trans_VUDOT(DisasContext *s, arg_VUDOT *a)
gen_helper_gvec_udot_b);
}
+static bool trans_VUSDOT(DisasContext *s, arg_VUSDOT *a)
+{
+ if (!dc_isar_feature(aa32_i8mm, s)) {
+ return false;
+ }
+ return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0,
+ gen_helper_gvec_usdot_b);
+}
+
static bool trans_VFML(DisasContext *s, arg_VFML *a)
{
int opr_sz;
@@ -354,6 +363,24 @@ static bool trans_VUDOT_scalar(DisasContext *s, arg_VUDOT_scalar *a)
gen_helper_gvec_udot_idx_b);
}
+static bool trans_VUSDOT_scalar(DisasContext *s, arg_VUSDOT_scalar *a)
+{
+ if (!dc_isar_feature(aa32_i8mm, s)) {
+ return false;
+ }
+ return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index,
+ gen_helper_gvec_usdot_idx_b);
+}
+
+static bool trans_VSUDOT_scalar(DisasContext *s, arg_VSUDOT_scalar *a)
+{
+ if (!dc_isar_feature(aa32_i8mm, s)) {
+ return false;
+ }
+ return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index,
+ gen_helper_gvec_sudot_idx_b);
+}
+
static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
{
int opr_sz;
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PULL 112/114] target/arm: Implement integer matrix multiply accumulate
2021-05-25 15:07 [PULL 095/114] target/arm: Implement SVE2 FCVTLT Peter Maydell
` (15 preceding siblings ...)
2021-05-25 15:07 ` [PULL 111/114] target/arm: Implement aarch32 VSUDOT, VUSDOT Peter Maydell
@ 2021-05-25 15:07 ` Peter Maydell
2021-05-25 15:07 ` [PULL 113/114] linux-user/aarch64: Enable hwcap bits for sve2 and related extensions Peter Maydell
2021-05-25 15:07 ` [PULL 114/114] target/arm: Enable SVE2 " Peter Maydell
18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-05-25 15:07 UTC (permalink / raw)
To: qemu-devel
From: Richard Henderson <richard.henderson@linaro.org>
This is {S,U,US}MMLA for both AArch64 AdvSIMD and SVE,
and V{S,U,US}MMLA.S8 for AArch32 NEON.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525010358.152808-91-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/helper.h | 7 ++++
target/arm/neon-shared.decode | 7 ++++
target/arm/sve.decode | 6 +++
target/arm/translate-a64.c | 18 ++++++++
target/arm/translate-neon.c | 27 ++++++++++++
target/arm/translate-sve.c | 27 ++++++++++++
target/arm/vec_helper.c | 77 +++++++++++++++++++++++++++++++++++
7 files changed, 169 insertions(+)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 92b81bbabe4..23ccb0f72f6 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -993,6 +993,13 @@ DEF_HELPER_FLAGS_6(sve2_fmlal_zzxw_s, TCG_CALL_NO_RWG,
DEF_HELPER_FLAGS_4(gvec_xar_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_smmla_b, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_ummla_b, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_usmmla_b, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, ptr, i32)
+
#ifdef TARGET_AARCH64
#include "helper-a64.h"
#include "helper-sve.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index 5befaec87b1..cc9f4cdd85b 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -59,6 +59,13 @@ VFML 1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
VFML 1111 110 0 s:1 . 10 .... .... 1000 . 1 . 1 .... \
vm=%vm_dp vn=%vn_dp vd=%vd_dp q=1
+VSMMLA 1111 1100 0.10 .... .... 1100 .1.0 .... \
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
+VUMMLA 1111 1100 0.10 .... .... 1100 .1.1 .... \
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
+VUSMMLA 1111 1100 1.10 .... .... 1100 .1.0 .... \
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
VCMLA_scalar 1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
vn=%vn_dp vd=%vd_dp size=1
VCMLA_scalar 1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 78a2a31ab19..cb077bfde90 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -1413,6 +1413,12 @@ USHLLT 01000101 .. 0 ..... 1010 11 ..... ..... @rd_rn_tszimm_shl
EORBT 01000101 .. 0 ..... 10010 0 ..... ..... @rd_rn_rm
EORTB 01000101 .. 0 ..... 10010 1 ..... ..... @rd_rn_rm
+## SVE integer matrix multiply accumulate
+
+SMMLA 01000101 00 0 ..... 10011 0 ..... ..... @rda_rn_rm_e0
+USMMLA 01000101 10 0 ..... 10011 0 ..... ..... @rda_rn_rm_e0
+UMMLA 01000101 11 0 ..... 10011 0 ..... ..... @rda_rn_rm_e0
+
## SVE2 bitwise permute
BEXT 01000101 .. 0 ..... 1011 00 ..... ..... @rd_rn_rm
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index c8754817842..ceac0ee2bd6 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -12182,6 +12182,15 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
}
feature = dc_isar_feature(aa64_i8mm, s);
break;
+ case 0x04: /* SMMLA */
+ case 0x14: /* UMMLA */
+ case 0x05: /* USMMLA */
+ if (!is_q || size != MO_32) {
+ unallocated_encoding(s);
+ return;
+ }
+ feature = dc_isar_feature(aa64_i8mm, s);
+ break;
case 0x18: /* FCMLA, #0 */
case 0x19: /* FCMLA, #90 */
case 0x1a: /* FCMLA, #180 */
@@ -12226,6 +12235,15 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_usdot_b);
return;
+ case 0x04: /* SMMLA, UMMLA */
+ gen_gvec_op4_ool(s, 1, rd, rn, rm, rd, 0,
+ u ? gen_helper_gvec_ummla_b
+ : gen_helper_gvec_smmla_b);
+ return;
+ case 0x05: /* USMMLA */
+ gen_gvec_op4_ool(s, 1, rd, rn, rm, rd, 0, gen_helper_gvec_usmmla_b);
+ return;
+
case 0x8: /* FCMLA, #0 */
case 0x9: /* FCMLA, #90 */
case 0xa: /* FCMLA, #180 */
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index b6ca29c25ca..9e990b41eda 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -4036,3 +4036,30 @@ static bool trans_VTRN(DisasContext *s, arg_2misc *a)
tcg_temp_free_i32(tmp2);
return true;
}
+
+static bool trans_VSMMLA(DisasContext *s, arg_VSMMLA *a)
+{
+ if (!dc_isar_feature(aa32_i8mm, s)) {
+ return false;
+ }
+ return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
+ gen_helper_gvec_smmla_b);
+}
+
+static bool trans_VUMMLA(DisasContext *s, arg_VUMMLA *a)
+{
+ if (!dc_isar_feature(aa32_i8mm, s)) {
+ return false;
+ }
+ return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
+ gen_helper_gvec_ummla_b);
+}
+
+static bool trans_VUSMMLA(DisasContext *s, arg_VUSMMLA *a)
+{
+ if (!dc_isar_feature(aa32_i8mm, s)) {
+ return false;
+ }
+ return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
+ gen_helper_gvec_usmmla_b);
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 428ae018a35..9574efe9578 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -8610,3 +8610,30 @@ static bool trans_FMLSLT_zzxw(DisasContext *s, arg_rrxr_esz *a)
{
return do_FMLAL_zzxw(s, a, true, true);
}
+
+static bool do_i8mm_zzzz_ool(DisasContext *s, arg_rrrr_esz *a,
+ gen_helper_gvec_4 *fn, int data)
+{
+ if (!dc_isar_feature(aa64_sve_i8mm, s)) {
+ return false;
+ }
+ if (sve_access_check(s)) {
+ gen_gvec_ool_zzzz(s, fn, a->rd, a->rn, a->rm, a->ra, data);
+ }
+ return true;
+}
+
+static bool trans_SMMLA(DisasContext *s, arg_rrrr_esz *a)
+{
+ return do_i8mm_zzzz_ool(s, a, gen_helper_gvec_smmla_b, 0);
+}
+
+static bool trans_USMMLA(DisasContext *s, arg_rrrr_esz *a)
+{
+ return do_i8mm_zzzz_ool(s, a, gen_helper_gvec_usmmla_b, 0);
+}
+
+static bool trans_UMMLA(DisasContext *s, arg_rrrr_esz *a)
+{
+ return do_i8mm_zzzz_ool(s, a, gen_helper_gvec_ummla_b, 0);
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index 19c4ba1bdf5..e84b438340e 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -2335,3 +2335,80 @@ void HELPER(gvec_xar_d)(void *vd, void *vn, void *vm, uint32_t desc)
}
clear_tail(d, opr_sz * 8, simd_maxsz(desc));
}
+
+/*
+ * Integer matrix-multiply accumulate
+ */
+
+static uint32_t do_smmla_b(uint32_t sum, void *vn, void *vm)
+{
+ int8_t *n = vn, *m = vm;
+
+ for (intptr_t k = 0; k < 8; ++k) {
+ sum += n[H1(k)] * m[H1(k)];
+ }
+ return sum;
+}
+
+static uint32_t do_ummla_b(uint32_t sum, void *vn, void *vm)
+{
+ uint8_t *n = vn, *m = vm;
+
+ for (intptr_t k = 0; k < 8; ++k) {
+ sum += n[H1(k)] * m[H1(k)];
+ }
+ return sum;
+}
+
+static uint32_t do_usmmla_b(uint32_t sum, void *vn, void *vm)
+{
+ uint8_t *n = vn;
+ int8_t *m = vm;
+
+ for (intptr_t k = 0; k < 8; ++k) {
+ sum += n[H1(k)] * m[H1(k)];
+ }
+ return sum;
+}
+
+static void do_mmla_b(void *vd, void *vn, void *vm, void *va, uint32_t desc,
+ uint32_t (*inner_loop)(uint32_t, void *, void *))
+{
+ intptr_t seg, opr_sz = simd_oprsz(desc);
+
+ for (seg = 0; seg < opr_sz; seg += 16) {
+ uint32_t *d = vd + seg;
+ uint32_t *a = va + seg;
+ uint32_t sum0, sum1, sum2, sum3;
+
+ /*
+ * Process the entire segment at once, writing back the
+ * results only after we've consumed all of the inputs.
+ *
+ * Key to indicies by column:
+ * i j i j
+ */
+ sum0 = a[H4(0 + 0)];
+ sum0 = inner_loop(sum0, vn + seg + 0, vm + seg + 0);
+ sum1 = a[H4(0 + 1)];
+ sum1 = inner_loop(sum1, vn + seg + 0, vm + seg + 8);
+ sum2 = a[H4(2 + 0)];
+ sum2 = inner_loop(sum2, vn + seg + 8, vm + seg + 0);
+ sum3 = a[H4(2 + 1)];
+ sum3 = inner_loop(sum3, vn + seg + 8, vm + seg + 8);
+
+ d[H4(0)] = sum0;
+ d[H4(1)] = sum1;
+ d[H4(2)] = sum2;
+ d[H4(3)] = sum3;
+ }
+ clear_tail(vd, opr_sz, simd_maxsz(desc));
+}
+
+#define DO_MMLA_B(NAME, INNER) \
+ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \
+ { do_mmla_b(vd, vn, vm, va, desc, INNER); }
+
+DO_MMLA_B(gvec_smmla_b, do_smmla_b)
+DO_MMLA_B(gvec_ummla_b, do_ummla_b)
+DO_MMLA_B(gvec_usmmla_b, do_usmmla_b)
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PULL 113/114] linux-user/aarch64: Enable hwcap bits for sve2 and related extensions
2021-05-25 15:07 [PULL 095/114] target/arm: Implement SVE2 FCVTLT Peter Maydell
` (16 preceding siblings ...)
2021-05-25 15:07 ` [PULL 112/114] target/arm: Implement integer matrix multiply accumulate Peter Maydell
@ 2021-05-25 15:07 ` Peter Maydell
2021-05-25 15:07 ` [PULL 114/114] target/arm: Enable SVE2 " Peter Maydell
18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-05-25 15:07 UTC (permalink / raw)
To: qemu-devel
From: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525010358.152808-92-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
linux-user/elfload.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 0e832b2649f..1ab97e38e08 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -648,8 +648,18 @@ static uint32_t get_elf_hwcap2(void)
uint32_t hwcaps = 0;
GET_FEATURE_ID(aa64_dcpodp, ARM_HWCAP2_A64_DCPODP);
+ GET_FEATURE_ID(aa64_sve2, ARM_HWCAP2_A64_SVE2);
+ GET_FEATURE_ID(aa64_sve2_aes, ARM_HWCAP2_A64_SVEAES);
+ GET_FEATURE_ID(aa64_sve2_pmull128, ARM_HWCAP2_A64_SVEPMULL);
+ GET_FEATURE_ID(aa64_sve2_bitperm, ARM_HWCAP2_A64_SVEBITPERM);
+ GET_FEATURE_ID(aa64_sve2_sha3, ARM_HWCAP2_A64_SVESHA3);
+ GET_FEATURE_ID(aa64_sve2_sm4, ARM_HWCAP2_A64_SVESM4);
GET_FEATURE_ID(aa64_condm_5, ARM_HWCAP2_A64_FLAGM2);
GET_FEATURE_ID(aa64_frint, ARM_HWCAP2_A64_FRINT);
+ GET_FEATURE_ID(aa64_sve_i8mm, ARM_HWCAP2_A64_SVEI8MM);
+ GET_FEATURE_ID(aa64_sve_f32mm, ARM_HWCAP2_A64_SVEF32MM);
+ GET_FEATURE_ID(aa64_sve_f64mm, ARM_HWCAP2_A64_SVEF64MM);
+ GET_FEATURE_ID(aa64_i8mm, ARM_HWCAP2_A64_I8MM);
GET_FEATURE_ID(aa64_rndr, ARM_HWCAP2_A64_RNG);
GET_FEATURE_ID(aa64_bti, ARM_HWCAP2_A64_BTI);
GET_FEATURE_ID(aa64_mte, ARM_HWCAP2_A64_MTE);
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PULL 114/114] target/arm: Enable SVE2 and related extensions
2021-05-25 15:07 [PULL 095/114] target/arm: Implement SVE2 FCVTLT Peter Maydell
` (17 preceding siblings ...)
2021-05-25 15:07 ` [PULL 113/114] linux-user/aarch64: Enable hwcap bits for sve2 and related extensions Peter Maydell
@ 2021-05-25 15:07 ` Peter Maydell
18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2021-05-25 15:07 UTC (permalink / raw)
To: qemu-devel
From: Richard Henderson <richard.henderson@linaro.org>
Disable I8MM again for !have_neon during realize.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525010358.152808-93-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/cpu.c | 2 ++
target/arm/cpu64.c | 13 +++++++++++++
target/arm/cpu_tcg.c | 1 +
3 files changed, 16 insertions(+)
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 4eb0d2f85c4..7aeb4b13816 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1503,6 +1503,7 @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
t = cpu->isar.id_aa64isar1;
t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 0);
+ t = FIELD_DP64(t, ID_AA64ISAR1, I8MM, 0);
cpu->isar.id_aa64isar1 = t;
t = cpu->isar.id_aa64pfr0;
@@ -1517,6 +1518,7 @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
u = cpu->isar.id_isar6;
u = FIELD_DP32(u, ID_ISAR6, DP, 0);
u = FIELD_DP32(u, ID_ISAR6, FHM, 0);
+ u = FIELD_DP32(u, ID_ISAR6, I8MM, 0);
cpu->isar.id_isar6 = u;
if (!arm_feature(env, ARM_FEATURE_M)) {
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index f42803ecaf1..d561dc7accc 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -663,6 +663,7 @@ static void aarch64_max_initfn(Object *obj)
t = FIELD_DP64(t, ID_AA64ISAR1, SPECRES, 1);
t = FIELD_DP64(t, ID_AA64ISAR1, FRINTTS, 1);
t = FIELD_DP64(t, ID_AA64ISAR1, LRCPC, 2); /* ARMv8.4-RCPC */
+ t = FIELD_DP64(t, ID_AA64ISAR1, I8MM, 1);
cpu->isar.id_aa64isar1 = t;
t = cpu->isar.id_aa64pfr0;
@@ -703,6 +704,17 @@ static void aarch64_max_initfn(Object *obj)
t = FIELD_DP64(t, ID_AA64MMFR2, ST, 1); /* TTST */
cpu->isar.id_aa64mmfr2 = t;
+ t = cpu->isar.id_aa64zfr0;
+ t = FIELD_DP64(t, ID_AA64ZFR0, SVEVER, 1);
+ t = FIELD_DP64(t, ID_AA64ZFR0, AES, 2); /* PMULL */
+ t = FIELD_DP64(t, ID_AA64ZFR0, BITPERM, 1);
+ t = FIELD_DP64(t, ID_AA64ZFR0, SHA3, 1);
+ t = FIELD_DP64(t, ID_AA64ZFR0, SM4, 1);
+ t = FIELD_DP64(t, ID_AA64ZFR0, I8MM, 1);
+ t = FIELD_DP64(t, ID_AA64ZFR0, F32MM, 1);
+ t = FIELD_DP64(t, ID_AA64ZFR0, F64MM, 1);
+ cpu->isar.id_aa64zfr0 = t;
+
/* Replicate the same data to the 32-bit id registers. */
u = cpu->isar.id_isar5;
u = FIELD_DP32(u, ID_ISAR5, AES, 2); /* AES + PMULL */
@@ -719,6 +731,7 @@ static void aarch64_max_initfn(Object *obj)
u = FIELD_DP32(u, ID_ISAR6, FHM, 1);
u = FIELD_DP32(u, ID_ISAR6, SB, 1);
u = FIELD_DP32(u, ID_ISAR6, SPECRES, 1);
+ u = FIELD_DP32(u, ID_ISAR6, I8MM, 1);
cpu->isar.id_isar6 = u;
u = cpu->isar.id_pfr0;
diff --git a/target/arm/cpu_tcg.c b/target/arm/cpu_tcg.c
index 046e476f65f..d3458335ed9 100644
--- a/target/arm/cpu_tcg.c
+++ b/target/arm/cpu_tcg.c
@@ -968,6 +968,7 @@ static void arm_max_initfn(Object *obj)
t = FIELD_DP32(t, ID_ISAR6, FHM, 1);
t = FIELD_DP32(t, ID_ISAR6, SB, 1);
t = FIELD_DP32(t, ID_ISAR6, SPECRES, 1);
+ t = FIELD_DP32(t, ID_ISAR6, I8MM, 1);
cpu->isar.id_isar6 = t;
t = cpu->isar.mvfr1;
--
2.20.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
end of thread, other threads:[~2021-05-25 17:06 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-25 15:07 [PULL 095/114] target/arm: Implement SVE2 FCVTLT Peter Maydell
2021-05-25 15:07 ` [PULL 096/114] target/arm: Implement SVE2 FCVTXNT, FCVTX Peter Maydell
2021-05-25 15:07 ` [PULL 097/114] target/arm: Implement SVE2 FLOGB Peter Maydell
2021-05-25 15:07 ` [PULL 098/114] target/arm: Share table of sve load functions Peter Maydell
2021-05-25 15:07 ` [PULL 099/114] target/arm: Tidy do_ldrq Peter Maydell
2021-05-25 15:07 ` [PULL 100/114] target/arm: Implement SVE2 LD1RO Peter Maydell
2021-05-25 15:07 ` [PULL 101/114] target/arm: Implement 128-bit ZIP, UZP, TRN Peter Maydell
2021-05-25 15:07 ` [PULL 102/114] target/arm: Implement SVE2 bitwise shift immediate Peter Maydell
2021-05-25 15:07 ` [PULL 103/114] target/arm: Move endian adjustment macros to vec_internal.h Peter Maydell
2021-05-25 15:07 ` [PULL 104/114] target/arm: Implement SVE2 fp multiply-add long Peter Maydell
2021-05-25 15:07 ` [PULL 105/114] target/arm: Implement aarch64 SUDOT, USDOT Peter Maydell
2021-05-25 15:07 ` [PULL 106/114] target/arm: Split out do_neon_ddda_fpst Peter Maydell
2021-05-25 15:07 ` [PULL 107/114] target/arm: Remove unused fpst from VDOT_scalar Peter Maydell
2021-05-25 15:07 ` [PULL 108/114] target/arm: Fix decode for VDOT (indexed) Peter Maydell
2021-05-25 15:07 ` [PULL 109/114] target/arm: Split out do_neon_ddda Peter Maydell
2021-05-25 15:07 ` [PULL 110/114] target/arm: Split decode of VSDOT and VUDOT Peter Maydell
2021-05-25 15:07 ` [PULL 111/114] target/arm: Implement aarch32 VSUDOT, VUSDOT Peter Maydell
2021-05-25 15:07 ` [PULL 112/114] target/arm: Implement integer matrix multiply accumulate Peter Maydell
2021-05-25 15:07 ` [PULL 113/114] linux-user/aarch64: Enable hwcap bits for sve2 and related extensions Peter Maydell
2021-05-25 15:07 ` [PULL 114/114] target/arm: Enable SVE2 " Peter Maydell
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.