* [PATCH 0/6] target/arm: Convert crypto to gvec
@ 2020-05-14 21:28 Richard Henderson
2020-05-14 21:28 ` [PATCH 1/6] target/arm: Convert aes and sm4 to gvec helpers Richard Henderson
` (7 more replies)
0 siblings, 8 replies; 10+ messages in thread
From: Richard Henderson @ 2020-05-14 21:28 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, alex.bennee
In addition, this fixes the missing tail clearing for SVE.
The sha1, sha256, sm3 routines that are not fully generalized
are not used by sve -- it only supports the newer algorithms.
I'm not sure that this:
Based-on: <20200508151055.5832-1-richard.henderson@linaro.org>
("tcg vector rotate operations")
will be sufficient for patchew, because it also relies on
today's target-arm.next merge to master. But you get the idea.
r~
Richard Henderson (6):
target/arm: Convert aes and sm4 to gvec helpers
target/arm: Convert rax1 to gvec helpers
target/arm: Convert sha512 and sm3 to gvec helpers
target/arm: Convert sha1 and sha256 to gvec helpers
target/arm: Split helper_crypto_sha1_3reg
target/arm: Split helper_crypto_sm3tt
target/arm/helper.h | 45 ++++--
target/arm/translate-a64.h | 3 +
target/arm/vec_internal.h | 33 ++++
target/arm/neon-dp.decode | 18 ++-
target/arm/crypto_helper.c | 267 +++++++++++++++++++++++---------
target/arm/translate-a64.c | 198 ++++++++++-------------
target/arm/translate-neon.inc.c | 172 ++++----------------
target/arm/translate.c | 51 +++---
target/arm/vec_helper.c | 12 +-
9 files changed, 403 insertions(+), 396 deletions(-)
create mode 100644 target/arm/vec_internal.h
--
2.20.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/6] target/arm: Convert aes and sm4 to gvec helpers
2020-05-14 21:28 [PATCH 0/6] target/arm: Convert crypto to gvec Richard Henderson
@ 2020-05-14 21:28 ` Richard Henderson
2020-05-14 21:28 ` [PATCH 2/6] target/arm: Convert rax1 " Richard Henderson
` (6 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2020-05-14 21:28 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, alex.bennee
With this conversion, we will be able to use the same helpers
with sve. In particular, pass 3 vector parameters for the
3-operand operations; for advsimd the destination register
is also an input.
This also fixes a bug in which we failed to clear the high bits
of the SVE register after an AdvSIMD operation.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 6 ++--
target/arm/vec_internal.h | 33 +++++++++++++++++
target/arm/crypto_helper.c | 72 +++++++++++++++++++++++++++-----------
target/arm/translate-a64.c | 55 ++++++++++++++++++-----------
target/arm/translate.c | 27 +++++++-------
target/arm/vec_helper.c | 12 +------
6 files changed, 138 insertions(+), 67 deletions(-)
create mode 100644 target/arm/vec_internal.h
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 49336dc432..42759f82aa 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -510,7 +510,7 @@ DEF_HELPER_FLAGS_2(neon_qzip8, TCG_CALL_NO_RWG, void, ptr, ptr)
DEF_HELPER_FLAGS_2(neon_qzip16, TCG_CALL_NO_RWG, void, ptr, ptr)
DEF_HELPER_FLAGS_2(neon_qzip32, TCG_CALL_NO_RWG, void, ptr, ptr)
-DEF_HELPER_FLAGS_3(crypto_aese, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(crypto_aese, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_3(crypto_aesmc, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(crypto_sha1_3reg, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
@@ -531,8 +531,8 @@ DEF_HELPER_FLAGS_5(crypto_sm3tt, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32, i32)
DEF_HELPER_FLAGS_3(crypto_sm3partw1, TCG_CALL_NO_RWG, void, ptr, ptr, ptr)
DEF_HELPER_FLAGS_3(crypto_sm3partw2, TCG_CALL_NO_RWG, void, ptr, ptr, ptr)
-DEF_HELPER_FLAGS_2(crypto_sm4e, TCG_CALL_NO_RWG, void, ptr, ptr)
-DEF_HELPER_FLAGS_3(crypto_sm4ekey, TCG_CALL_NO_RWG, void, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(crypto_sm4e, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(crypto_sm4ekey, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_3(crc32, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32)
DEF_HELPER_FLAGS_3(crc32c, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32)
diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
new file mode 100644
index 0000000000..00a8277765
--- /dev/null
+++ b/target/arm/vec_internal.h
@@ -0,0 +1,33 @@
+/*
+ * ARM AdvSIMD / SVE Vector Helpers
+ *
+ * Copyright (c) 2020 Linaro
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef TARGET_ARM_VEC_INTERNALS_H
+#define TARGET_ARM_VEC_INTERNALS_H
+
+static inline void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz)
+{
+ uint64_t *d = vd + opr_sz;
+ uintptr_t i;
+
+ for (i = opr_sz; i < max_sz; i += 8) {
+ *d++ = 0;
+ }
+}
+
+#endif /* TARGET_ARM_VEC_INTERNALS_H */
diff --git a/target/arm/crypto_helper.c b/target/arm/crypto_helper.c
index f800266727..6bd5a3d2d0 100644
--- a/target/arm/crypto_helper.c
+++ b/target/arm/crypto_helper.c
@@ -13,7 +13,9 @@
#include "cpu.h"
#include "exec/helper-proto.h"
+#include "tcg/tcg-gvec-desc.h"
#include "crypto/aes.h"
+#include "vec_internal.h"
union CRYPTO_STATE {
uint8_t bytes[16];
@@ -29,18 +31,15 @@ union CRYPTO_STATE {
#define CR_ST_WORD(state, i) (state.words[i])
#endif
-void HELPER(crypto_aese)(void *vd, void *vm, uint32_t decrypt)
+static void do_crypto_aese(uint64_t *rd, uint64_t *rn,
+ uint64_t *rm, bool decrypt)
{
static uint8_t const * const sbox[2] = { AES_sbox, AES_isbox };
static uint8_t const * const shift[2] = { AES_shifts, AES_ishifts };
- uint64_t *rd = vd;
- uint64_t *rm = vm;
union CRYPTO_STATE rk = { .l = { rm[0], rm[1] } };
- union CRYPTO_STATE st = { .l = { rd[0], rd[1] } };
+ union CRYPTO_STATE st = { .l = { rn[0], rn[1] } };
int i;
- assert(decrypt < 2);
-
/* xor state vector with round key */
rk.l[0] ^= st.l[0];
rk.l[1] ^= st.l[1];
@@ -54,7 +53,18 @@ void HELPER(crypto_aese)(void *vd, void *vm, uint32_t decrypt)
rd[1] = st.l[1];
}
-void HELPER(crypto_aesmc)(void *vd, void *vm, uint32_t decrypt)
+void HELPER(crypto_aese)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+ intptr_t i, opr_sz = simd_oprsz(desc);
+ bool decrypt = simd_data(desc);
+
+ for (i = 0; i < opr_sz; i += 16) {
+ do_crypto_aese(vd + i, vn + i, vm + i, decrypt);
+ }
+ clear_tail(vd, opr_sz, simd_maxsz(desc));
+}
+
+static void do_crypto_aesmc(uint64_t *rd, uint64_t *rm, bool decrypt)
{
static uint32_t const mc[][256] = { {
/* MixColumns lookup table */
@@ -190,13 +200,9 @@ void HELPER(crypto_aesmc)(void *vd, void *vm, uint32_t decrypt)
0xbe805d9f, 0xb58d5491, 0xa89a4f83, 0xa397468d,
} };
- uint64_t *rd = vd;
- uint64_t *rm = vm;
union CRYPTO_STATE st = { .l = { rm[0], rm[1] } };
int i;
- assert(decrypt < 2);
-
for (i = 0; i < 16; i += 4) {
CR_ST_WORD(st, i >> 2) =
mc[decrypt][CR_ST_BYTE(st, i)] ^
@@ -209,6 +215,17 @@ void HELPER(crypto_aesmc)(void *vd, void *vm, uint32_t decrypt)
rd[1] = st.l[1];
}
+void HELPER(crypto_aesmc)(void *vd, void *vm, uint32_t desc)
+{
+ intptr_t i, opr_sz = simd_oprsz(desc);
+ bool decrypt = simd_data(desc);
+
+ for (i = 0; i < opr_sz; i += 16) {
+ do_crypto_aesmc(vd + i, vm + i, decrypt);
+ }
+ clear_tail(vd, opr_sz, simd_maxsz(desc));
+}
+
/*
* SHA-1 logical functions
*/
@@ -638,12 +655,10 @@ static uint8_t const sm4_sbox[] = {
0x79, 0xee, 0x5f, 0x3e, 0xd7, 0xcb, 0x39, 0x48,
};
-void HELPER(crypto_sm4e)(void *vd, void *vn)
+static void do_crypto_sm4e(uint64_t *rd, uint64_t *rn, uint64_t *rm)
{
- uint64_t *rd = vd;
- uint64_t *rn = vn;
- union CRYPTO_STATE d = { .l = { rd[0], rd[1] } };
- union CRYPTO_STATE n = { .l = { rn[0], rn[1] } };
+ union CRYPTO_STATE d = { .l = { rn[0], rn[1] } };
+ union CRYPTO_STATE n = { .l = { rm[0], rm[1] } };
uint32_t t, i;
for (i = 0; i < 4; i++) {
@@ -665,11 +680,18 @@ void HELPER(crypto_sm4e)(void *vd, void *vn)
rd[1] = d.l[1];
}
-void HELPER(crypto_sm4ekey)(void *vd, void *vn, void* vm)
+void HELPER(crypto_sm4e)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+ intptr_t i, opr_sz = simd_oprsz(desc);
+
+ for (i = 0; i < opr_sz; i += 16) {
+ do_crypto_sm4e(vd + i, vn + i, vm + i);
+ }
+ clear_tail(vd, opr_sz, simd_maxsz(desc));
+}
+
+static void do_crypto_sm4ekey(uint64_t *rd, uint64_t *rn, uint64_t *rm)
{
- uint64_t *rd = vd;
- uint64_t *rn = vn;
- uint64_t *rm = vm;
union CRYPTO_STATE d;
union CRYPTO_STATE n = { .l = { rn[0], rn[1] } };
union CRYPTO_STATE m = { .l = { rm[0], rm[1] } };
@@ -693,3 +715,13 @@ void HELPER(crypto_sm4ekey)(void *vd, void *vn, void* vm)
rd[0] = d.l[0];
rd[1] = d.l[1];
}
+
+void HELPER(crypto_sm4ekey)(void *vd, void *vn, void* vm, uint32_t desc)
+{
+ intptr_t i, opr_sz = simd_oprsz(desc);
+
+ for (i = 0; i < opr_sz; i += 16) {
+ do_crypto_sm4ekey(vd + i, vn + i, vm + i);
+ }
+ clear_tail(vd, opr_sz, simd_maxsz(desc));
+}
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 991e451644..1e511529b8 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -577,6 +577,15 @@ static void gen_gvec_fn4(DisasContext *s, bool is_q, int rd, int rn, int rm,
is_q ? 16 : 8, vec_full_reg_size(s));
}
+/* Expand a 2-operand operation using an out-of-line helper. */
+static void gen_gvec_op2_ool(DisasContext *s, bool is_q, int rd,
+ int rn, int data, gen_helper_gvec_2 *fn)
+{
+ tcg_gen_gvec_2_ool(vec_full_reg_offset(s, rd),
+ vec_full_reg_offset(s, rn),
+ is_q ? 16 : 8, vec_full_reg_size(s), data, fn);
+}
+
/* Expand a 3-operand operation using an out-of-line helper. */
static void gen_gvec_op3_ool(DisasContext *s, bool is_q, int rd,
int rn, int rm, int data, gen_helper_gvec_3 *fn)
@@ -13398,9 +13407,8 @@ static void disas_crypto_aes(DisasContext *s, uint32_t insn)
int rn = extract32(insn, 5, 5);
int rd = extract32(insn, 0, 5);
int decrypt;
- TCGv_ptr tcg_rd_ptr, tcg_rn_ptr;
- TCGv_i32 tcg_decrypt;
- CryptoThreeOpIntFn *genfn;
+ gen_helper_gvec_2 *genfn2 = NULL;
+ gen_helper_gvec_3 *genfn3 = NULL;
if (!dc_isar_feature(aa64_aes, s) || size != 0) {
unallocated_encoding(s);
@@ -13410,19 +13418,19 @@ static void disas_crypto_aes(DisasContext *s, uint32_t insn)
switch (opcode) {
case 0x4: /* AESE */
decrypt = 0;
- genfn = gen_helper_crypto_aese;
+ genfn3 = gen_helper_crypto_aese;
break;
case 0x6: /* AESMC */
decrypt = 0;
- genfn = gen_helper_crypto_aesmc;
+ genfn2 = gen_helper_crypto_aesmc;
break;
case 0x5: /* AESD */
decrypt = 1;
- genfn = gen_helper_crypto_aese;
+ genfn3 = gen_helper_crypto_aese;
break;
case 0x7: /* AESIMC */
decrypt = 1;
- genfn = gen_helper_crypto_aesmc;
+ genfn2 = gen_helper_crypto_aesmc;
break;
default:
unallocated_encoding(s);
@@ -13432,16 +13440,11 @@ static void disas_crypto_aes(DisasContext *s, uint32_t insn)
if (!fp_access_check(s)) {
return;
}
-
- tcg_rd_ptr = vec_full_reg_ptr(s, rd);
- tcg_rn_ptr = vec_full_reg_ptr(s, rn);
- tcg_decrypt = tcg_const_i32(decrypt);
-
- genfn(tcg_rd_ptr, tcg_rn_ptr, tcg_decrypt);
-
- tcg_temp_free_ptr(tcg_rd_ptr);
- tcg_temp_free_ptr(tcg_rn_ptr);
- tcg_temp_free_i32(tcg_decrypt);
+ if (genfn2) {
+ gen_gvec_op2_ool(s, true, rd, rn, decrypt, genfn2);
+ } else {
+ gen_gvec_op3_ool(s, true, rd, rd, rn, decrypt, genfn3);
+ }
}
/* Crypto three-reg SHA
@@ -13590,7 +13593,8 @@ static void disas_crypto_three_reg_sha512(DisasContext *s, uint32_t insn)
int rn = extract32(insn, 5, 5);
int rd = extract32(insn, 0, 5);
bool feature;
- CryptoThreeOpFn *genfn;
+ CryptoThreeOpFn *genfn = NULL;
+ gen_helper_gvec_3 *oolfn = NULL;
if (o == 0) {
switch (opcode) {
@@ -13625,7 +13629,7 @@ static void disas_crypto_three_reg_sha512(DisasContext *s, uint32_t insn)
break;
case 2: /* SM4EKEY */
feature = dc_isar_feature(aa64_sm4, s);
- genfn = gen_helper_crypto_sm4ekey;
+ oolfn = gen_helper_crypto_sm4ekey;
break;
default:
unallocated_encoding(s);
@@ -13642,6 +13646,11 @@ static void disas_crypto_three_reg_sha512(DisasContext *s, uint32_t insn)
return;
}
+ if (oolfn) {
+ gen_gvec_op3_ool(s, true, rd, rn, rm, 0, oolfn);
+ return;
+ }
+
if (genfn) {
TCGv_ptr tcg_rd_ptr, tcg_rn_ptr, tcg_rm_ptr;
@@ -13694,6 +13703,7 @@ static void disas_crypto_two_reg_sha512(DisasContext *s, uint32_t insn)
TCGv_ptr tcg_rd_ptr, tcg_rn_ptr;
bool feature;
CryptoTwoOpFn *genfn;
+ gen_helper_gvec_3 *oolfn = NULL;
switch (opcode) {
case 0: /* SHA512SU0 */
@@ -13702,7 +13712,7 @@ static void disas_crypto_two_reg_sha512(DisasContext *s, uint32_t insn)
break;
case 1: /* SM4E */
feature = dc_isar_feature(aa64_sm4, s);
- genfn = gen_helper_crypto_sm4e;
+ oolfn = gen_helper_crypto_sm4e;
break;
default:
unallocated_encoding(s);
@@ -13718,6 +13728,11 @@ static void disas_crypto_two_reg_sha512(DisasContext *s, uint32_t insn)
return;
}
+ if (oolfn) {
+ gen_gvec_op3_ool(s, true, rd, rd, rn, 0, oolfn);
+ return;
+ }
+
tcg_rd_ptr = vec_full_reg_ptr(s, rd);
tcg_rn_ptr = vec_full_reg_ptr(s, rn);
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 4c9bb8b5ac..921359dfd4 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -6373,22 +6373,23 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
if (!dc_isar_feature(aa32_aes, s) || ((rm | rd) & 1)) {
return 1;
}
- ptr1 = vfp_reg_ptr(true, rd);
- ptr2 = vfp_reg_ptr(true, rm);
-
- /* Bit 6 is the lowest opcode bit; it distinguishes between
- * encryption (AESE/AESMC) and decryption (AESD/AESIMC)
- */
- tmp3 = tcg_const_i32(extract32(insn, 6, 1));
-
+ /*
+ * Bit 6 is the lowest opcode bit; it distinguishes
+ * between encryption (AESE/AESMC) and decryption
+ * (AESD/AESIMC).
+ */
if (op == NEON_2RM_AESE) {
- gen_helper_crypto_aese(ptr1, ptr2, tmp3);
+ tcg_gen_gvec_3_ool(vfp_reg_offset(true, rd),
+ vfp_reg_offset(true, rd),
+ vfp_reg_offset(true, rm),
+ 16, 16, extract32(insn, 6, 1),
+ gen_helper_crypto_aese);
} else {
- gen_helper_crypto_aesmc(ptr1, ptr2, tmp3);
+ tcg_gen_gvec_2_ool(vfp_reg_offset(true, rd),
+ vfp_reg_offset(true, rm),
+ 16, 16, extract32(insn, 6, 1),
+ gen_helper_crypto_aesmc);
}
- tcg_temp_free_ptr(ptr1);
- tcg_temp_free_ptr(ptr2);
- tcg_temp_free_i32(tmp3);
break;
case NEON_2RM_SHA1H:
if (!dc_isar_feature(aa32_sha1, s) || ((rm | rd) & 1)) {
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index 50a499299f..7d76412ee0 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -22,7 +22,7 @@
#include "exec/helper-proto.h"
#include "tcg/tcg-gvec-desc.h"
#include "fpu/softfloat.h"
-
+#include "vec_internal.h"
/* Note that vector data is stored in host-endian 64-bit chunks,
so addressing units smaller than that needs a host-endian fixup. */
@@ -36,16 +36,6 @@
#define H4(x) (x)
#endif
-static void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz)
-{
- uint64_t *d = vd + opr_sz;
- uintptr_t i;
-
- for (i = opr_sz; i < max_sz; i += 8) {
- *d++ = 0;
- }
-}
-
/* Signed saturating rounding doubling multiply-accumulate high half, 16-bit */
static int16_t inl_qrdmlah_s16(int16_t src1, int16_t src2,
int16_t src3, uint32_t *sat)
--
2.20.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/6] target/arm: Convert rax1 to gvec helpers
2020-05-14 21:28 [PATCH 0/6] target/arm: Convert crypto to gvec Richard Henderson
2020-05-14 21:28 ` [PATCH 1/6] target/arm: Convert aes and sm4 to gvec helpers Richard Henderson
@ 2020-05-14 21:28 ` Richard Henderson
2020-05-14 21:28 ` [PATCH 3/6] target/arm: Convert sha512 and sm3 " Richard Henderson
` (5 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2020-05-14 21:28 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, alex.bennee
With this conversion, we will be able to use the same helpers
with sve. This also fixes a bug in which we failed to clear
the high bits of the SVE register after an AdvSIMD operation.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 2 ++
target/arm/translate-a64.h | 3 ++
target/arm/crypto_helper.c | 11 +++++++
target/arm/translate-a64.c | 59 ++++++++++++++++++++------------------
4 files changed, 47 insertions(+), 28 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 42759f82aa..6c4eb9befb 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -534,6 +534,8 @@ DEF_HELPER_FLAGS_3(crypto_sm3partw2, TCG_CALL_NO_RWG, void, ptr, ptr, ptr)
DEF_HELPER_FLAGS_4(crypto_sm4e, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(crypto_sm4ekey, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(crypto_rax1, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
DEF_HELPER_FLAGS_3(crc32, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32)
DEF_HELPER_FLAGS_3(crc32c, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32)
diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index f02fbb63a4..da0f59a2ce 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -115,4 +115,7 @@ static inline int vec_full_reg_size(DisasContext *s)
bool disas_sve(DisasContext *, uint32_t);
+void gen_gvec_rax1(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+
#endif /* TARGET_ARM_TRANSLATE_A64_H */
diff --git a/target/arm/crypto_helper.c b/target/arm/crypto_helper.c
index 6bd5a3d2d0..372d8350e4 100644
--- a/target/arm/crypto_helper.c
+++ b/target/arm/crypto_helper.c
@@ -725,3 +725,14 @@ void HELPER(crypto_sm4ekey)(void *vd, void *vn, void* vm, uint32_t desc)
}
clear_tail(vd, opr_sz, simd_maxsz(desc));
}
+
+void HELPER(crypto_rax1)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+ intptr_t i, opr_sz = simd_oprsz(desc);
+ uint64_t *d = vd, *n = vn, *m = vm;
+
+ for (i = 0; i < opr_sz / 8; ++i) {
+ d[i] = n[i] ^ rol64(m[i], 1);
+ }
+ clear_tail(vd, opr_sz, simd_maxsz(desc));
+}
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 1e511529b8..4d7a8fd2bb 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -13579,6 +13579,32 @@ static void disas_crypto_two_reg_sha(DisasContext *s, uint32_t insn)
tcg_temp_free_ptr(tcg_rn_ptr);
}
+static void gen_rax1_i64(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m)
+{
+ tcg_gen_rotli_i64(d, m, 1);
+ tcg_gen_xor_i64(d, d, n);
+}
+
+static void gen_rax1_vec(unsigned vece, TCGv_vec d, TCGv_vec n, TCGv_vec m)
+{
+ tcg_gen_rotli_vec(vece, d, m, 1);
+ tcg_gen_xor_vec(vece, d, d, n);
+}
+
+void gen_gvec_rax1(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+ static const TCGOpcode vecop_list[] = { INDEX_op_rotli_vec, 0 };
+ static const GVecGen3 op = {
+ .fni8 = gen_rax1_i64,
+ .fniv = gen_rax1_vec,
+ .opt_opc = vecop_list,
+ .fno = gen_helper_crypto_rax1,
+ .vece = MO_64,
+ };
+ tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, &op);
+}
+
/* Crypto three-reg SHA512
* 31 21 20 16 15 14 13 12 11 10 9 5 4 0
* +-----------------------+------+---+---+-----+--------+------+------+
@@ -13595,6 +13621,7 @@ static void disas_crypto_three_reg_sha512(DisasContext *s, uint32_t insn)
bool feature;
CryptoThreeOpFn *genfn = NULL;
gen_helper_gvec_3 *oolfn = NULL;
+ GVecGen3Fn *gvecfn = NULL;
if (o == 0) {
switch (opcode) {
@@ -13612,7 +13639,7 @@ static void disas_crypto_three_reg_sha512(DisasContext *s, uint32_t insn)
break;
case 3: /* RAX1 */
feature = dc_isar_feature(aa64_sha3, s);
- genfn = NULL;
+ gvecfn = gen_gvec_rax1;
break;
default:
g_assert_not_reached();
@@ -13648,10 +13675,9 @@ static void disas_crypto_three_reg_sha512(DisasContext *s, uint32_t insn)
if (oolfn) {
gen_gvec_op3_ool(s, true, rd, rn, rm, 0, oolfn);
- return;
- }
-
- if (genfn) {
+ } else if (gvecfn) {
+ gen_gvec_fn3(s, true, rd, rn, rm, gvecfn, MO_64);
+ } else {
TCGv_ptr tcg_rd_ptr, tcg_rn_ptr, tcg_rm_ptr;
tcg_rd_ptr = vec_full_reg_ptr(s, rd);
@@ -13663,29 +13689,6 @@ static void disas_crypto_three_reg_sha512(DisasContext *s, uint32_t insn)
tcg_temp_free_ptr(tcg_rd_ptr);
tcg_temp_free_ptr(tcg_rn_ptr);
tcg_temp_free_ptr(tcg_rm_ptr);
- } else {
- TCGv_i64 tcg_op1, tcg_op2, tcg_res[2];
- int pass;
-
- tcg_op1 = tcg_temp_new_i64();
- tcg_op2 = tcg_temp_new_i64();
- tcg_res[0] = tcg_temp_new_i64();
- tcg_res[1] = tcg_temp_new_i64();
-
- for (pass = 0; pass < 2; pass++) {
- read_vec_element(s, tcg_op1, rn, pass, MO_64);
- read_vec_element(s, tcg_op2, rm, pass, MO_64);
-
- tcg_gen_rotli_i64(tcg_res[pass], tcg_op2, 1);
- tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1);
- }
- write_vec_element(s, tcg_res[0], rd, 0, MO_64);
- write_vec_element(s, tcg_res[1], rd, 1, MO_64);
-
- tcg_temp_free_i64(tcg_op1);
- tcg_temp_free_i64(tcg_op2);
- tcg_temp_free_i64(tcg_res[0]);
- tcg_temp_free_i64(tcg_res[1]);
}
}
--
2.20.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/6] target/arm: Convert sha512 and sm3 to gvec helpers
2020-05-14 21:28 [PATCH 0/6] target/arm: Convert crypto to gvec Richard Henderson
2020-05-14 21:28 ` [PATCH 1/6] target/arm: Convert aes and sm4 to gvec helpers Richard Henderson
2020-05-14 21:28 ` [PATCH 2/6] target/arm: Convert rax1 " Richard Henderson
@ 2020-05-14 21:28 ` Richard Henderson
2020-05-14 21:28 ` [PATCH 4/6] target/arm: Convert sha1 and sha256 " Richard Henderson
` (4 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2020-05-14 21:28 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, alex.bennee
Do not yet convert the helpers to loop over opr_sz, but the
descriptor allows the vector tail to be cleared. Which fixes
an existing bug vs SVE.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 15 +++++++-----
target/arm/crypto_helper.c | 37 +++++++++++++++++++++++-----
target/arm/translate-a64.c | 50 ++++++++++++--------------------------
3 files changed, 55 insertions(+), 47 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 6c4eb9befb..784dc29ce2 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -522,14 +522,17 @@ DEF_HELPER_FLAGS_3(crypto_sha256h2, TCG_CALL_NO_RWG, void, ptr, ptr, ptr)
DEF_HELPER_FLAGS_2(crypto_sha256su0, TCG_CALL_NO_RWG, void, ptr, ptr)
DEF_HELPER_FLAGS_3(crypto_sha256su1, TCG_CALL_NO_RWG, void, ptr, ptr, ptr)
-DEF_HELPER_FLAGS_3(crypto_sha512h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr)
-DEF_HELPER_FLAGS_3(crypto_sha512h2, TCG_CALL_NO_RWG, void, ptr, ptr, ptr)
-DEF_HELPER_FLAGS_2(crypto_sha512su0, TCG_CALL_NO_RWG, void, ptr, ptr)
-DEF_HELPER_FLAGS_3(crypto_sha512su1, TCG_CALL_NO_RWG, void, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(crypto_sha512h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(crypto_sha512h2, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(crypto_sha512su0, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(crypto_sha512su1, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_5(crypto_sm3tt, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32, i32)
-DEF_HELPER_FLAGS_3(crypto_sm3partw1, TCG_CALL_NO_RWG, void, ptr, ptr, ptr)
-DEF_HELPER_FLAGS_3(crypto_sm3partw2, TCG_CALL_NO_RWG, void, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(crypto_sm3partw1, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(crypto_sm3partw2, TCG_CALL_NO_RWG,
+ void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(crypto_sm4e, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(crypto_sm4ekey, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/crypto_helper.c b/target/arm/crypto_helper.c
index 372d8350e4..637e4c00bb 100644
--- a/target/arm/crypto_helper.c
+++ b/target/arm/crypto_helper.c
@@ -31,6 +31,19 @@ union CRYPTO_STATE {
#define CR_ST_WORD(state, i) (state.words[i])
#endif
+/*
+ * The caller has not been converted to full gvec, and so only
+ * modifies the low 16 bytes of the vector register.
+ */
+static void clear_tail_16(void *vd, uint32_t desc)
+{
+ int opr_sz = simd_oprsz(desc);
+ int max_sz = simd_maxsz(desc);
+
+ assert(opr_sz == 16);
+ clear_tail(vd, opr_sz, max_sz);
+}
+
static void do_crypto_aese(uint64_t *rd, uint64_t *rn,
uint64_t *rm, bool decrypt)
{
@@ -470,7 +483,7 @@ static uint64_t s1_512(uint64_t x)
return ror64(x, 19) ^ ror64(x, 61) ^ (x >> 6);
}
-void HELPER(crypto_sha512h)(void *vd, void *vn, void *vm)
+void HELPER(crypto_sha512h)(void *vd, void *vn, void *vm, uint32_t desc)
{
uint64_t *rd = vd;
uint64_t *rn = vn;
@@ -483,9 +496,11 @@ void HELPER(crypto_sha512h)(void *vd, void *vn, void *vm)
rd[0] = d0;
rd[1] = d1;
+
+ clear_tail_16(vd, desc);
}
-void HELPER(crypto_sha512h2)(void *vd, void *vn, void *vm)
+void HELPER(crypto_sha512h2)(void *vd, void *vn, void *vm, uint32_t desc)
{
uint64_t *rd = vd;
uint64_t *rn = vn;
@@ -498,9 +513,11 @@ void HELPER(crypto_sha512h2)(void *vd, void *vn, void *vm)
rd[0] = d0;
rd[1] = d1;
+
+ clear_tail_16(vd, desc);
}
-void HELPER(crypto_sha512su0)(void *vd, void *vn)
+void HELPER(crypto_sha512su0)(void *vd, void *vn, uint32_t desc)
{
uint64_t *rd = vd;
uint64_t *rn = vn;
@@ -512,9 +529,11 @@ void HELPER(crypto_sha512su0)(void *vd, void *vn)
rd[0] = d0;
rd[1] = d1;
+
+ clear_tail_16(vd, desc);
}
-void HELPER(crypto_sha512su1)(void *vd, void *vn, void *vm)
+void HELPER(crypto_sha512su1)(void *vd, void *vn, void *vm, uint32_t desc)
{
uint64_t *rd = vd;
uint64_t *rn = vn;
@@ -522,9 +541,11 @@ void HELPER(crypto_sha512su1)(void *vd, void *vn, void *vm)
rd[0] += s1_512(rn[0]) + rm[0];
rd[1] += s1_512(rn[1]) + rm[1];
+
+ clear_tail_16(vd, desc);
}
-void HELPER(crypto_sm3partw1)(void *vd, void *vn, void *vm)
+void HELPER(crypto_sm3partw1)(void *vd, void *vn, void *vm, uint32_t desc)
{
uint64_t *rd = vd;
uint64_t *rn = vn;
@@ -548,9 +569,11 @@ void HELPER(crypto_sm3partw1)(void *vd, void *vn, void *vm)
rd[0] = d.l[0];
rd[1] = d.l[1];
+
+ clear_tail_16(vd, desc);
}
-void HELPER(crypto_sm3partw2)(void *vd, void *vn, void *vm)
+void HELPER(crypto_sm3partw2)(void *vd, void *vn, void *vm, uint32_t desc)
{
uint64_t *rd = vd;
uint64_t *rn = vn;
@@ -568,6 +591,8 @@ void HELPER(crypto_sm3partw2)(void *vd, void *vn, void *vm)
rd[0] = d.l[0];
rd[1] = d.l[1];
+
+ clear_tail_16(vd, desc);
}
void HELPER(crypto_sm3tt)(void *vd, void *vn, void *vm, uint32_t imm2,
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 4d7a8fd2bb..96e20fa401 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -13619,7 +13619,6 @@ static void disas_crypto_three_reg_sha512(DisasContext *s, uint32_t insn)
int rn = extract32(insn, 5, 5);
int rd = extract32(insn, 0, 5);
bool feature;
- CryptoThreeOpFn *genfn = NULL;
gen_helper_gvec_3 *oolfn = NULL;
GVecGen3Fn *gvecfn = NULL;
@@ -13627,15 +13626,15 @@ static void disas_crypto_three_reg_sha512(DisasContext *s, uint32_t insn)
switch (opcode) {
case 0: /* SHA512H */
feature = dc_isar_feature(aa64_sha512, s);
- genfn = gen_helper_crypto_sha512h;
+ oolfn = gen_helper_crypto_sha512h;
break;
case 1: /* SHA512H2 */
feature = dc_isar_feature(aa64_sha512, s);
- genfn = gen_helper_crypto_sha512h2;
+ oolfn = gen_helper_crypto_sha512h2;
break;
case 2: /* SHA512SU1 */
feature = dc_isar_feature(aa64_sha512, s);
- genfn = gen_helper_crypto_sha512su1;
+ oolfn = gen_helper_crypto_sha512su1;
break;
case 3: /* RAX1 */
feature = dc_isar_feature(aa64_sha3, s);
@@ -13648,11 +13647,11 @@ static void disas_crypto_three_reg_sha512(DisasContext *s, uint32_t insn)
switch (opcode) {
case 0: /* SM3PARTW1 */
feature = dc_isar_feature(aa64_sm3, s);
- genfn = gen_helper_crypto_sm3partw1;
+ oolfn = gen_helper_crypto_sm3partw1;
break;
case 1: /* SM3PARTW2 */
feature = dc_isar_feature(aa64_sm3, s);
- genfn = gen_helper_crypto_sm3partw2;
+ oolfn = gen_helper_crypto_sm3partw2;
break;
case 2: /* SM4EKEY */
feature = dc_isar_feature(aa64_sm4, s);
@@ -13675,20 +13674,8 @@ static void disas_crypto_three_reg_sha512(DisasContext *s, uint32_t insn)
if (oolfn) {
gen_gvec_op3_ool(s, true, rd, rn, rm, 0, oolfn);
- } else if (gvecfn) {
- gen_gvec_fn3(s, true, rd, rn, rm, gvecfn, MO_64);
} else {
- TCGv_ptr tcg_rd_ptr, tcg_rn_ptr, tcg_rm_ptr;
-
- tcg_rd_ptr = vec_full_reg_ptr(s, rd);
- tcg_rn_ptr = vec_full_reg_ptr(s, rn);
- tcg_rm_ptr = vec_full_reg_ptr(s, rm);
-
- genfn(tcg_rd_ptr, tcg_rn_ptr, tcg_rm_ptr);
-
- tcg_temp_free_ptr(tcg_rd_ptr);
- tcg_temp_free_ptr(tcg_rn_ptr);
- tcg_temp_free_ptr(tcg_rm_ptr);
+ gen_gvec_fn3(s, true, rd, rn, rm, gvecfn, MO_64);
}
}
@@ -13703,19 +13690,14 @@ static void disas_crypto_two_reg_sha512(DisasContext *s, uint32_t insn)
int opcode = extract32(insn, 10, 2);
int rn = extract32(insn, 5, 5);
int rd = extract32(insn, 0, 5);
- TCGv_ptr tcg_rd_ptr, tcg_rn_ptr;
bool feature;
- CryptoTwoOpFn *genfn;
- gen_helper_gvec_3 *oolfn = NULL;
switch (opcode) {
case 0: /* SHA512SU0 */
feature = dc_isar_feature(aa64_sha512, s);
- genfn = gen_helper_crypto_sha512su0;
break;
case 1: /* SM4E */
feature = dc_isar_feature(aa64_sm4, s);
- oolfn = gen_helper_crypto_sm4e;
break;
default:
unallocated_encoding(s);
@@ -13731,18 +13713,16 @@ static void disas_crypto_two_reg_sha512(DisasContext *s, uint32_t insn)
return;
}
- if (oolfn) {
- gen_gvec_op3_ool(s, true, rd, rd, rn, 0, oolfn);
- return;
+ switch (opcode) {
+ case 0: /* SHA512SU0 */
+ gen_gvec_op2_ool(s, true, rd, rn, 0, gen_helper_crypto_sha512su0);
+ break;
+ case 1: /* SM4E */
+ gen_gvec_op3_ool(s, true, rd, rd, rn, 0, gen_helper_crypto_sm4e);
+ break;
+ default:
+ g_assert_not_reached();
}
-
- tcg_rd_ptr = vec_full_reg_ptr(s, rd);
- tcg_rn_ptr = vec_full_reg_ptr(s, rn);
-
- genfn(tcg_rd_ptr, tcg_rn_ptr);
-
- tcg_temp_free_ptr(tcg_rd_ptr);
- tcg_temp_free_ptr(tcg_rn_ptr);
}
/* Crypto four-register
--
2.20.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/6] target/arm: Convert sha1 and sha256 to gvec helpers
2020-05-14 21:28 [PATCH 0/6] target/arm: Convert crypto to gvec Richard Henderson
` (2 preceding siblings ...)
2020-05-14 21:28 ` [PATCH 3/6] target/arm: Convert sha512 and sm3 " Richard Henderson
@ 2020-05-14 21:28 ` Richard Henderson
2020-05-14 21:28 ` [PATCH 5/6] target/arm: Split helper_crypto_sha1_3reg Richard Henderson
` (3 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2020-05-14 21:28 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, alex.bennee
Do not yet convert the helpers to loop over opr_sz, but the
descriptor allows the vector tail to be cleared. Which fixes
an existing bug vs SVE.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 12 ++--
target/arm/neon-dp.decode | 12 ++--
target/arm/crypto_helper.c | 24 +++++--
target/arm/translate-a64.c | 34 ++++-----
target/arm/translate-neon.inc.c | 124 +++++---------------------------
target/arm/translate.c | 24 ++-----
6 files changed, 67 insertions(+), 163 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 784dc29ce2..cee23adbfc 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -514,13 +514,13 @@ DEF_HELPER_FLAGS_4(crypto_aese, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_3(crypto_aesmc, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(crypto_sha1_3reg, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-DEF_HELPER_FLAGS_2(crypto_sha1h, TCG_CALL_NO_RWG, void, ptr, ptr)
-DEF_HELPER_FLAGS_2(crypto_sha1su1, TCG_CALL_NO_RWG, void, ptr, ptr)
+DEF_HELPER_FLAGS_3(crypto_sha1h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(crypto_sha1su1, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
-DEF_HELPER_FLAGS_3(crypto_sha256h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr)
-DEF_HELPER_FLAGS_3(crypto_sha256h2, TCG_CALL_NO_RWG, void, ptr, ptr, ptr)
-DEF_HELPER_FLAGS_2(crypto_sha256su0, TCG_CALL_NO_RWG, void, ptr, ptr)
-DEF_HELPER_FLAGS_3(crypto_sha256su1, TCG_CALL_NO_RWG, void, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(crypto_sha256h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(crypto_sha256h2, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(crypto_sha256su0, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(crypto_sha256su1, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(crypto_sha512h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(crypto_sha512h2, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 8beb1db768..5b2fc65d72 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -165,14 +165,14 @@ VPADD_3s 1111 001 0 0 . .. .... .... 1011 . . . 1 .... @3same_q0
VQRDMLAH_3s 1111 001 1 0 . .. .... .... 1011 ... 1 .... @3same
+@3same_crypto .... .... .... .... .... .... .... .... \
+ &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp size=0 q=1
+
SHA1_3s 1111 001 0 0 . optype:2 .... .... 1100 . 1 . 0 .... \
vm=%vm_dp vn=%vn_dp vd=%vd_dp
-SHA256H_3s 1111 001 1 0 . 00 .... .... 1100 . 1 . 0 .... \
- vm=%vm_dp vn=%vn_dp vd=%vd_dp
-SHA256H2_3s 1111 001 1 0 . 01 .... .... 1100 . 1 . 0 .... \
- vm=%vm_dp vn=%vn_dp vd=%vd_dp
-SHA256SU1_3s 1111 001 1 0 . 10 .... .... 1100 . 1 . 0 .... \
- vm=%vm_dp vn=%vn_dp vd=%vd_dp
+SHA256H_3s 1111 001 1 0 . 00 .... .... 1100 . 1 . 0 .... @3same_crypto
+SHA256H2_3s 1111 001 1 0 . 01 .... .... 1100 . 1 . 0 .... @3same_crypto
+SHA256SU1_3s 1111 001 1 0 . 10 .... .... 1100 . 1 . 0 .... @3same_crypto
VFMA_fp_3s 1111 001 0 0 . 0 . .... .... 1100 ... 1 .... @3same_fp
VFMS_fp_3s 1111 001 0 0 . 1 . .... .... 1100 ... 1 .... @3same_fp
diff --git a/target/arm/crypto_helper.c b/target/arm/crypto_helper.c
index 637e4c00bb..7124745c32 100644
--- a/target/arm/crypto_helper.c
+++ b/target/arm/crypto_helper.c
@@ -303,7 +303,7 @@ void HELPER(crypto_sha1_3reg)(void *vd, void *vn, void *vm, uint32_t op)
rd[1] = d.l[1];
}
-void HELPER(crypto_sha1h)(void *vd, void *vm)
+void HELPER(crypto_sha1h)(void *vd, void *vm, uint32_t desc)
{
uint64_t *rd = vd;
uint64_t *rm = vm;
@@ -314,9 +314,11 @@ void HELPER(crypto_sha1h)(void *vd, void *vm)
rd[0] = m.l[0];
rd[1] = m.l[1];
+
+ clear_tail_16(vd, desc);
}
-void HELPER(crypto_sha1su1)(void *vd, void *vm)
+void HELPER(crypto_sha1su1)(void *vd, void *vm, uint32_t desc)
{
uint64_t *rd = vd;
uint64_t *rm = vm;
@@ -330,6 +332,8 @@ void HELPER(crypto_sha1su1)(void *vd, void *vm)
rd[0] = d.l[0];
rd[1] = d.l[1];
+
+ clear_tail_16(vd, desc);
}
/*
@@ -357,7 +361,7 @@ static uint32_t s1(uint32_t x)
return ror32(x, 17) ^ ror32(x, 19) ^ (x >> 10);
}
-void HELPER(crypto_sha256h)(void *vd, void *vn, void *vm)
+void HELPER(crypto_sha256h)(void *vd, void *vn, void *vm, uint32_t desc)
{
uint64_t *rd = vd;
uint64_t *rn = vn;
@@ -388,9 +392,11 @@ void HELPER(crypto_sha256h)(void *vd, void *vn, void *vm)
rd[0] = d.l[0];
rd[1] = d.l[1];
+
+ clear_tail_16(vd, desc);
}
-void HELPER(crypto_sha256h2)(void *vd, void *vn, void *vm)
+void HELPER(crypto_sha256h2)(void *vd, void *vn, void *vm, uint32_t desc)
{
uint64_t *rd = vd;
uint64_t *rn = vn;
@@ -413,9 +419,11 @@ void HELPER(crypto_sha256h2)(void *vd, void *vn, void *vm)
rd[0] = d.l[0];
rd[1] = d.l[1];
+
+ clear_tail_16(vd, desc);
}
-void HELPER(crypto_sha256su0)(void *vd, void *vm)
+void HELPER(crypto_sha256su0)(void *vd, void *vm, uint32_t desc)
{
uint64_t *rd = vd;
uint64_t *rm = vm;
@@ -429,9 +437,11 @@ void HELPER(crypto_sha256su0)(void *vd, void *vm)
rd[0] = d.l[0];
rd[1] = d.l[1];
+
+ clear_tail_16(vd, desc);
}
-void HELPER(crypto_sha256su1)(void *vd, void *vn, void *vm)
+void HELPER(crypto_sha256su1)(void *vd, void *vn, void *vm, uint32_t desc)
{
uint64_t *rd = vd;
uint64_t *rn = vn;
@@ -447,6 +457,8 @@ void HELPER(crypto_sha256su1)(void *vd, void *vn, void *vm)
rd[0] = d.l[0];
rd[1] = d.l[1];
+
+ clear_tail_16(vd, desc);
}
/*
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 96e20fa401..d3094d5dfd 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -13460,8 +13460,7 @@ static void disas_crypto_three_reg_sha(DisasContext *s, uint32_t insn)
int rm = extract32(insn, 16, 5);
int rn = extract32(insn, 5, 5);
int rd = extract32(insn, 0, 5);
- CryptoThreeOpFn *genfn;
- TCGv_ptr tcg_rd_ptr, tcg_rn_ptr, tcg_rm_ptr;
+ gen_helper_gvec_3 *genfn;
bool feature;
if (size != 0) {
@@ -13503,23 +13502,22 @@ static void disas_crypto_three_reg_sha(DisasContext *s, uint32_t insn)
return;
}
- tcg_rd_ptr = vec_full_reg_ptr(s, rd);
- tcg_rn_ptr = vec_full_reg_ptr(s, rn);
- tcg_rm_ptr = vec_full_reg_ptr(s, rm);
-
if (genfn) {
- genfn(tcg_rd_ptr, tcg_rn_ptr, tcg_rm_ptr);
+ gen_gvec_op3_ool(s, true, rd, rn, rm, 0, genfn);
} else {
TCGv_i32 tcg_opcode = tcg_const_i32(opcode);
+ TCGv_ptr tcg_rd_ptr = vec_full_reg_ptr(s, rd);
+ TCGv_ptr tcg_rn_ptr = vec_full_reg_ptr(s, rn);
+ TCGv_ptr tcg_rm_ptr = vec_full_reg_ptr(s, rm);
gen_helper_crypto_sha1_3reg(tcg_rd_ptr, tcg_rn_ptr,
tcg_rm_ptr, tcg_opcode);
- tcg_temp_free_i32(tcg_opcode);
- }
- tcg_temp_free_ptr(tcg_rd_ptr);
- tcg_temp_free_ptr(tcg_rn_ptr);
- tcg_temp_free_ptr(tcg_rm_ptr);
+ tcg_temp_free_i32(tcg_opcode);
+ tcg_temp_free_ptr(tcg_rd_ptr);
+ tcg_temp_free_ptr(tcg_rn_ptr);
+ tcg_temp_free_ptr(tcg_rm_ptr);
+ }
}
/* Crypto two-reg SHA
@@ -13534,9 +13532,8 @@ static void disas_crypto_two_reg_sha(DisasContext *s, uint32_t insn)
int opcode = extract32(insn, 12, 5);
int rn = extract32(insn, 5, 5);
int rd = extract32(insn, 0, 5);
- CryptoTwoOpFn *genfn;
+ gen_helper_gvec_2 *genfn;
bool feature;
- TCGv_ptr tcg_rd_ptr, tcg_rn_ptr;
if (size != 0) {
unallocated_encoding(s);
@@ -13569,14 +13566,7 @@ static void disas_crypto_two_reg_sha(DisasContext *s, uint32_t insn)
if (!fp_access_check(s)) {
return;
}
-
- tcg_rd_ptr = vec_full_reg_ptr(s, rd);
- tcg_rn_ptr = vec_full_reg_ptr(s, rn);
-
- genfn(tcg_rd_ptr, tcg_rn_ptr);
-
- tcg_temp_free_ptr(tcg_rd_ptr);
- tcg_temp_free_ptr(tcg_rn_ptr);
+ gen_gvec_op2_ool(s, true, rd, rn, 0, genfn);
}
static void gen_rax1_i64(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m)
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 3fe65a0b08..205877ca48 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -661,12 +661,14 @@ DO_3SAME_CMP(VCGE_S, TCG_COND_GE)
DO_3SAME_CMP(VCGE_U, TCG_COND_GEU)
DO_3SAME_CMP(VCEQ, TCG_COND_EQ)
-static void gen_VMUL_p_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
- uint32_t rm_ofs, uint32_t oprsz, uint32_t maxsz)
-{
- tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz,
- 0, gen_helper_gvec_pmul_b);
-}
+#define WRAP_OOL_FN(WRAPNAME, FUNC) \
+ static void WRAPNAME(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, \
+ uint32_t rm_ofs, uint32_t oprsz, uint32_t maxsz) \
+ { \
+ tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, 0, FUNC); \
+ }
+
+WRAP_OOL_FN(gen_VMUL_p_3s, gen_helper_gvec_pmul_b)
static bool trans_VMUL_p_3s(DisasContext *s, arg_3same *a)
{
@@ -728,107 +730,19 @@ static bool trans_SHA1_3s(DisasContext *s, arg_SHA1_3s *a)
return true;
}
-static bool trans_SHA256H_3s(DisasContext *s, arg_SHA256H_3s *a)
-{
- TCGv_ptr ptr1, ptr2, ptr3;
-
- if (!arm_dc_feature(s, ARM_FEATURE_NEON) ||
- !dc_isar_feature(aa32_sha2, s)) {
- return false;
+#define DO_SHA2(NAME, FUNC) \
+ WRAP_OOL_FN(gen_##NAME##_3s, FUNC) \
+ static bool trans_##NAME##_3s(DisasContext *s, arg_3same *a) \
+ { \
+ if (!dc_isar_feature(aa32_sha2, s)) { \
+ return false; \
+ } \
+ return do_3same(s, a, gen_##NAME##_3s); \
}
- /* UNDEF accesses to D16-D31 if they don't exist. */
- if (!dc_isar_feature(aa32_simd_r32, s) &&
- ((a->vd | a->vn | a->vm) & 0x10)) {
- return false;
- }
-
- if ((a->vn | a->vm | a->vd) & 1) {
- return false;
- }
-
- if (!vfp_access_check(s)) {
- return true;
- }
-
- ptr1 = vfp_reg_ptr(true, a->vd);
- ptr2 = vfp_reg_ptr(true, a->vn);
- ptr3 = vfp_reg_ptr(true, a->vm);
- gen_helper_crypto_sha256h(ptr1, ptr2, ptr3);
- tcg_temp_free_ptr(ptr1);
- tcg_temp_free_ptr(ptr2);
- tcg_temp_free_ptr(ptr3);
-
- return true;
-}
-
-static bool trans_SHA256H2_3s(DisasContext *s, arg_SHA256H2_3s *a)
-{
- TCGv_ptr ptr1, ptr2, ptr3;
-
- if (!arm_dc_feature(s, ARM_FEATURE_NEON) ||
- !dc_isar_feature(aa32_sha2, s)) {
- return false;
- }
-
- /* UNDEF accesses to D16-D31 if they don't exist. */
- if (!dc_isar_feature(aa32_simd_r32, s) &&
- ((a->vd | a->vn | a->vm) & 0x10)) {
- return false;
- }
-
- if ((a->vn | a->vm | a->vd) & 1) {
- return false;
- }
-
- if (!vfp_access_check(s)) {
- return true;
- }
-
- ptr1 = vfp_reg_ptr(true, a->vd);
- ptr2 = vfp_reg_ptr(true, a->vn);
- ptr3 = vfp_reg_ptr(true, a->vm);
- gen_helper_crypto_sha256h2(ptr1, ptr2, ptr3);
- tcg_temp_free_ptr(ptr1);
- tcg_temp_free_ptr(ptr2);
- tcg_temp_free_ptr(ptr3);
-
- return true;
-}
-
-static bool trans_SHA256SU1_3s(DisasContext *s, arg_SHA256SU1_3s *a)
-{
- TCGv_ptr ptr1, ptr2, ptr3;
-
- if (!arm_dc_feature(s, ARM_FEATURE_NEON) ||
- !dc_isar_feature(aa32_sha2, s)) {
- return false;
- }
-
- /* UNDEF accesses to D16-D31 if they don't exist. */
- if (!dc_isar_feature(aa32_simd_r32, s) &&
- ((a->vd | a->vn | a->vm) & 0x10)) {
- return false;
- }
-
- if ((a->vn | a->vm | a->vd) & 1) {
- return false;
- }
-
- if (!vfp_access_check(s)) {
- return true;
- }
-
- ptr1 = vfp_reg_ptr(true, a->vd);
- ptr2 = vfp_reg_ptr(true, a->vn);
- ptr3 = vfp_reg_ptr(true, a->vm);
- gen_helper_crypto_sha256su1(ptr1, ptr2, ptr3);
- tcg_temp_free_ptr(ptr1);
- tcg_temp_free_ptr(ptr2);
- tcg_temp_free_ptr(ptr3);
-
- return true;
-}
+DO_SHA2(SHA256H, gen_helper_crypto_sha256h)
+DO_SHA2(SHA256H2, gen_helper_crypto_sha256h2)
+DO_SHA2(SHA256SU1, gen_helper_crypto_sha256su1)
#define DO_3SAME_64(INSN, FUNC) \
static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs, \
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 921359dfd4..d5c97a8e3c 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5280,7 +5280,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
int vec_size;
uint32_t imm;
TCGv_i32 tmp, tmp2, tmp3, tmp4, tmp5;
- TCGv_ptr ptr1, ptr2;
+ TCGv_ptr ptr1;
TCGv_i64 tmp64;
if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -6395,13 +6395,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
if (!dc_isar_feature(aa32_sha1, s) || ((rm | rd) & 1)) {
return 1;
}
- ptr1 = vfp_reg_ptr(true, rd);
- ptr2 = vfp_reg_ptr(true, rm);
-
- gen_helper_crypto_sha1h(ptr1, ptr2);
-
- tcg_temp_free_ptr(ptr1);
- tcg_temp_free_ptr(ptr2);
+ tcg_gen_gvec_2_ool(rd_ofs, rm_ofs, 16, 16, 0,
+ gen_helper_crypto_sha1h);
break;
case NEON_2RM_SHA1SU1:
if ((rm | rd) & 1) {
@@ -6415,17 +6410,10 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
} else if (!dc_isar_feature(aa32_sha1, s)) {
return 1;
}
- ptr1 = vfp_reg_ptr(true, rd);
- ptr2 = vfp_reg_ptr(true, rm);
- if (q) {
- gen_helper_crypto_sha256su0(ptr1, ptr2);
- } else {
- gen_helper_crypto_sha1su1(ptr1, ptr2);
- }
- tcg_temp_free_ptr(ptr1);
- tcg_temp_free_ptr(ptr2);
+ tcg_gen_gvec_2_ool(rd_ofs, rm_ofs, 16, 16, 0,
+ q ? gen_helper_crypto_sha256su0
+ : gen_helper_crypto_sha1su1);
break;
-
case NEON_2RM_VMVN:
tcg_gen_gvec_not(0, rd_ofs, rm_ofs, vec_size, vec_size);
break;
--
2.20.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 5/6] target/arm: Split helper_crypto_sha1_3reg
2020-05-14 21:28 [PATCH 0/6] target/arm: Convert crypto to gvec Richard Henderson
` (3 preceding siblings ...)
2020-05-14 21:28 ` [PATCH 4/6] target/arm: Convert sha1 and sha256 " Richard Henderson
@ 2020-05-14 21:28 ` Richard Henderson
2020-05-14 21:28 ` [PATCH 6/6] target/arm: Split helper_crypto_sm3tt Richard Henderson
` (2 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2020-05-14 21:28 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, alex.bennee
Rather than passing an opcode to a helper, fully decode the
operation at translate time. Use clear_tail_16 to zap the
balance of the SVE register with the AdvSIMD write.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 5 +-
target/arm/neon-dp.decode | 6 +-
target/arm/crypto_helper.c | 99 +++++++++++++++++++++------------
target/arm/translate-a64.c | 29 ++++------
target/arm/translate-neon.inc.c | 46 ++++-----------
5 files changed, 93 insertions(+), 92 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index cee23adbfc..13475ecf81 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -513,7 +513,10 @@ DEF_HELPER_FLAGS_2(neon_qzip32, TCG_CALL_NO_RWG, void, ptr, ptr)
DEF_HELPER_FLAGS_4(crypto_aese, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_3(crypto_aesmc, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
-DEF_HELPER_FLAGS_4(crypto_sha1_3reg, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(crypto_sha1su0, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(crypto_sha1c, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(crypto_sha1p, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(crypto_sha1m, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_3(crypto_sha1h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
DEF_HELPER_FLAGS_3(crypto_sha1su1, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index 5b2fc65d72..8af7c53d8b 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -168,8 +168,10 @@ VQRDMLAH_3s 1111 001 1 0 . .. .... .... 1011 ... 1 .... @3same
@3same_crypto .... .... .... .... .... .... .... .... \
&3same vm=%vm_dp vn=%vn_dp vd=%vd_dp size=0 q=1
-SHA1_3s 1111 001 0 0 . optype:2 .... .... 1100 . 1 . 0 .... \
- vm=%vm_dp vn=%vn_dp vd=%vd_dp
+SHA1C_3s 1111 001 0 0 . 00 .... .... 1100 . 1 . 0 .... @3same_crypto
+SHA1P_3s 1111 001 0 0 . 01 .... .... 1100 . 1 . 0 .... @3same_crypto
+SHA1M_3s 1111 001 0 0 . 10 .... .... 1100 . 1 . 0 .... @3same_crypto
+SHA1SU0_3s 1111 001 0 0 . 11 .... .... 1100 . 1 . 0 .... @3same_crypto
SHA256H_3s 1111 001 1 0 . 00 .... .... 1100 . 1 . 0 .... @3same_crypto
SHA256H2_3s 1111 001 1 0 . 01 .... .... 1100 . 1 . 0 .... @3same_crypto
SHA256SU1_3s 1111 001 1 0 . 10 .... .... 1100 . 1 . 0 .... @3same_crypto
diff --git a/target/arm/crypto_helper.c b/target/arm/crypto_helper.c
index 7124745c32..636683d0f1 100644
--- a/target/arm/crypto_helper.c
+++ b/target/arm/crypto_helper.c
@@ -24,11 +24,11 @@ union CRYPTO_STATE {
};
#ifdef HOST_WORDS_BIGENDIAN
-#define CR_ST_BYTE(state, i) (state.bytes[(15 - (i)) ^ 8])
-#define CR_ST_WORD(state, i) (state.words[(3 - (i)) ^ 2])
+#define CR_ST_BYTE(state, i) ((state).bytes[(15 - (i)) ^ 8])
+#define CR_ST_WORD(state, i) ((state).words[(3 - (i)) ^ 2])
#else
-#define CR_ST_BYTE(state, i) (state.bytes[i])
-#define CR_ST_WORD(state, i) (state.words[i])
+#define CR_ST_BYTE(state, i) ((state).bytes[i])
+#define CR_ST_WORD(state, i) ((state).words[i])
#endif
/*
@@ -258,49 +258,74 @@ static uint32_t maj(uint32_t x, uint32_t y, uint32_t z)
return (x & y) | ((x | y) & z);
}
-void HELPER(crypto_sha1_3reg)(void *vd, void *vn, void *vm, uint32_t op)
+void HELPER(crypto_sha1su0)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+ uint64_t *d = vd, *n = vn, *m = vm;
+ uint64_t d0, d1;
+
+ d0 = d[1] ^ d[0] ^ m[0];
+ d1 = n[0] ^ d[1] ^ m[1];
+ d[0] = d0;
+ d[1] = d1;
+
+ clear_tail_16(vd, desc);
+}
+
+static inline void crypto_sha1_3reg(uint64_t *rd, uint64_t *rn,
+ uint64_t *rm, uint32_t desc,
+ uint32_t (*fn)(union CRYPTO_STATE *d))
{
- uint64_t *rd = vd;
- uint64_t *rn = vn;
- uint64_t *rm = vm;
union CRYPTO_STATE d = { .l = { rd[0], rd[1] } };
union CRYPTO_STATE n = { .l = { rn[0], rn[1] } };
union CRYPTO_STATE m = { .l = { rm[0], rm[1] } };
+ int i;
- if (op == 3) { /* sha1su0 */
- d.l[0] ^= d.l[1] ^ m.l[0];
- d.l[1] ^= n.l[0] ^ m.l[1];
- } else {
- int i;
+ for (i = 0; i < 4; i++) {
+ uint32_t t = fn(&d);
- for (i = 0; i < 4; i++) {
- uint32_t t;
+ t += rol32(CR_ST_WORD(d, 0), 5) + CR_ST_WORD(n, 0)
+ + CR_ST_WORD(m, i);
- switch (op) {
- case 0: /* sha1c */
- t = cho(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3));
- break;
- case 1: /* sha1p */
- t = par(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3));
- break;
- case 2: /* sha1m */
- t = maj(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3));
- break;
- default:
- g_assert_not_reached();
- }
- t += rol32(CR_ST_WORD(d, 0), 5) + CR_ST_WORD(n, 0)
- + CR_ST_WORD(m, i);
-
- CR_ST_WORD(n, 0) = CR_ST_WORD(d, 3);
- CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2);
- CR_ST_WORD(d, 2) = ror32(CR_ST_WORD(d, 1), 2);
- CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0);
- CR_ST_WORD(d, 0) = t;
- }
+ CR_ST_WORD(n, 0) = CR_ST_WORD(d, 3);
+ CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2);
+ CR_ST_WORD(d, 2) = ror32(CR_ST_WORD(d, 1), 2);
+ CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0);
+ CR_ST_WORD(d, 0) = t;
}
rd[0] = d.l[0];
rd[1] = d.l[1];
+
+ clear_tail_16(rd, desc);
+}
+
+static uint32_t do_sha1c(union CRYPTO_STATE *d)
+{
+ return cho(CR_ST_WORD(*d, 1), CR_ST_WORD(*d, 2), CR_ST_WORD(*d, 3));
+}
+
+void HELPER(crypto_sha1c)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+ crypto_sha1_3reg(vd, vn, vm, desc, do_sha1c);
+}
+
+static uint32_t do_sha1p(union CRYPTO_STATE *d)
+{
+ return par(CR_ST_WORD(*d, 1), CR_ST_WORD(*d, 2), CR_ST_WORD(*d, 3));
+}
+
+void HELPER(crypto_sha1p)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+ crypto_sha1_3reg(vd, vn, vm, desc, do_sha1p);
+}
+
+static uint32_t do_sha1m(union CRYPTO_STATE *d)
+{
+ return maj(CR_ST_WORD(*d, 1), CR_ST_WORD(*d, 2), CR_ST_WORD(*d, 3));
+}
+
+void HELPER(crypto_sha1m)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+ crypto_sha1_3reg(vd, vn, vm, desc, do_sha1m);
}
void HELPER(crypto_sha1h)(void *vd, void *vm, uint32_t desc)
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index d3094d5dfd..49ca7ac76e 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -13470,10 +13470,19 @@ static void disas_crypto_three_reg_sha(DisasContext *s, uint32_t insn)
switch (opcode) {
case 0: /* SHA1C */
+ genfn = gen_helper_crypto_sha1c;
+ feature = dc_isar_feature(aa64_sha1, s);
+ break;
case 1: /* SHA1P */
+ genfn = gen_helper_crypto_sha1p;
+ feature = dc_isar_feature(aa64_sha1, s);
+ break;
case 2: /* SHA1M */
+ genfn = gen_helper_crypto_sha1m;
+ feature = dc_isar_feature(aa64_sha1, s);
+ break;
case 3: /* SHA1SU0 */
- genfn = NULL;
+ genfn = gen_helper_crypto_sha1su0;
feature = dc_isar_feature(aa64_sha1, s);
break;
case 4: /* SHA256H */
@@ -13501,23 +13510,7 @@ static void disas_crypto_three_reg_sha(DisasContext *s, uint32_t insn)
if (!fp_access_check(s)) {
return;
}
-
- if (genfn) {
- gen_gvec_op3_ool(s, true, rd, rn, rm, 0, genfn);
- } else {
- TCGv_i32 tcg_opcode = tcg_const_i32(opcode);
- TCGv_ptr tcg_rd_ptr = vec_full_reg_ptr(s, rd);
- TCGv_ptr tcg_rn_ptr = vec_full_reg_ptr(s, rn);
- TCGv_ptr tcg_rm_ptr = vec_full_reg_ptr(s, rm);
-
- gen_helper_crypto_sha1_3reg(tcg_rd_ptr, tcg_rn_ptr,
- tcg_rm_ptr, tcg_opcode);
-
- tcg_temp_free_i32(tcg_opcode);
- tcg_temp_free_ptr(tcg_rd_ptr);
- tcg_temp_free_ptr(tcg_rn_ptr);
- tcg_temp_free_ptr(tcg_rm_ptr);
- }
+ gen_gvec_op3_ool(s, true, rd, rn, rm, 0, genfn);
}
/* Crypto two-reg SHA
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index 205877ca48..7b19753c8c 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -693,42 +693,20 @@ static bool trans_VMUL_p_3s(DisasContext *s, arg_3same *a)
DO_VQRDMLAH(VQRDMLAH, gen_gvec_sqrdmlah_qc)
DO_VQRDMLAH(VQRDMLSH, gen_gvec_sqrdmlsh_qc)
-static bool trans_SHA1_3s(DisasContext *s, arg_SHA1_3s *a)
-{
- TCGv_ptr ptr1, ptr2, ptr3;
- TCGv_i32 tmp;
-
- if (!arm_dc_feature(s, ARM_FEATURE_NEON) ||
- !dc_isar_feature(aa32_sha1, s)) {
- return false;
+#define DO_SHA1(NAME, FUNC) \
+ WRAP_OOL_FN(gen_##NAME##_3s, FUNC) \
+ static bool trans_##NAME##_3s(DisasContext *s, arg_3same *a) \
+ { \
+ if (!dc_isar_feature(aa32_sha1, s)) { \
+ return false; \
+ } \
+ return do_3same(s, a, gen_##NAME##_3s); \
}
- /* UNDEF accesses to D16-D31 if they don't exist. */
- if (!dc_isar_feature(aa32_simd_r32, s) &&
- ((a->vd | a->vn | a->vm) & 0x10)) {
- return false;
- }
-
- if ((a->vn | a->vm | a->vd) & 1) {
- return false;
- }
-
- if (!vfp_access_check(s)) {
- return true;
- }
-
- ptr1 = vfp_reg_ptr(true, a->vd);
- ptr2 = vfp_reg_ptr(true, a->vn);
- ptr3 = vfp_reg_ptr(true, a->vm);
- tmp = tcg_const_i32(a->optype);
- gen_helper_crypto_sha1_3reg(ptr1, ptr2, ptr3, tmp);
- tcg_temp_free_i32(tmp);
- tcg_temp_free_ptr(ptr1);
- tcg_temp_free_ptr(ptr2);
- tcg_temp_free_ptr(ptr3);
-
- return true;
-}
+DO_SHA1(SHA1C, gen_helper_crypto_sha1c)
+DO_SHA1(SHA1P, gen_helper_crypto_sha1p)
+DO_SHA1(SHA1M, gen_helper_crypto_sha1m)
+DO_SHA1(SHA1SU0, gen_helper_crypto_sha1su0)
#define DO_SHA2(NAME, FUNC) \
WRAP_OOL_FN(gen_##NAME##_3s, FUNC) \
--
2.20.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 6/6] target/arm: Split helper_crypto_sm3tt
2020-05-14 21:28 [PATCH 0/6] target/arm: Convert crypto to gvec Richard Henderson
` (4 preceding siblings ...)
2020-05-14 21:28 ` [PATCH 5/6] target/arm: Split helper_crypto_sha1_3reg Richard Henderson
@ 2020-05-14 21:28 ` Richard Henderson
2020-06-02 19:16 ` [PATCH 0/6] target/arm: Convert crypto to gvec Peter Maydell
2020-06-05 13:24 ` Peter Maydell
7 siblings, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2020-05-14 21:28 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, alex.bennee
Rather than passing an opcode to a helper, fully decode the
operation at translate time. Use clear_tail_16 to zap the
balance of the SVE register with the AdvSIMD write.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 5 ++++-
target/arm/crypto_helper.c | 24 ++++++++++++++++++------
target/arm/translate-a64.c | 21 +++++----------------
3 files changed, 27 insertions(+), 23 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 13475ecf81..2a20c8174c 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -531,7 +531,10 @@ DEF_HELPER_FLAGS_3(crypto_sha512su0, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(crypto_sha512su1, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, i32)
-DEF_HELPER_FLAGS_5(crypto_sm3tt, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32, i32)
+DEF_HELPER_FLAGS_4(crypto_sm3tt1a, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(crypto_sm3tt1b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(crypto_sm3tt2a, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(crypto_sm3tt2b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(crypto_sm3partw1, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(crypto_sm3partw2, TCG_CALL_NO_RWG,
diff --git a/target/arm/crypto_helper.c b/target/arm/crypto_helper.c
index 636683d0f1..c76806dc8d 100644
--- a/target/arm/crypto_helper.c
+++ b/target/arm/crypto_helper.c
@@ -632,15 +632,14 @@ void HELPER(crypto_sm3partw2)(void *vd, void *vn, void *vm, uint32_t desc)
clear_tail_16(vd, desc);
}
-void HELPER(crypto_sm3tt)(void *vd, void *vn, void *vm, uint32_t imm2,
- uint32_t opcode)
+static inline void QEMU_ALWAYS_INLINE
+crypto_sm3tt(uint64_t *rd, uint64_t *rn, uint64_t *rm,
+ uint32_t desc, uint32_t opcode)
{
- uint64_t *rd = vd;
- uint64_t *rn = vn;
- uint64_t *rm = vm;
union CRYPTO_STATE d = { .l = { rd[0], rd[1] } };
union CRYPTO_STATE n = { .l = { rn[0], rn[1] } };
union CRYPTO_STATE m = { .l = { rm[0], rm[1] } };
+ uint32_t imm2 = simd_data(desc);
uint32_t t;
assert(imm2 < 4);
@@ -655,7 +654,7 @@ void HELPER(crypto_sm3tt)(void *vd, void *vn, void *vm, uint32_t imm2,
/* SM3TT2B */
t = cho(CR_ST_WORD(d, 3), CR_ST_WORD(d, 2), CR_ST_WORD(d, 1));
} else {
- g_assert_not_reached();
+ qemu_build_not_reached();
}
t += CR_ST_WORD(d, 0) + CR_ST_WORD(m, imm2);
@@ -680,8 +679,21 @@ void HELPER(crypto_sm3tt)(void *vd, void *vn, void *vm, uint32_t imm2,
rd[0] = d.l[0];
rd[1] = d.l[1];
+
+ clear_tail_16(rd, desc);
}
+#define DO_SM3TT(NAME, OPCODE) \
+ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \
+ { crypto_sm3tt(vd, vn, vm, desc, OPCODE); }
+
+DO_SM3TT(crypto_sm3tt1a, 0)
+DO_SM3TT(crypto_sm3tt1b, 1)
+DO_SM3TT(crypto_sm3tt2a, 2)
+DO_SM3TT(crypto_sm3tt2b, 3)
+
+#undef DO_SM3TT
+
static uint8_t const sm4_sbox[] = {
0xd6, 0x90, 0xe9, 0xfe, 0xcc, 0xe1, 0x3d, 0xb7,
0x16, 0xb6, 0x14, 0xc2, 0x28, 0xfb, 0x2c, 0x05,
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 49ca7ac76e..9c1ebcc8e3 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -13861,13 +13861,15 @@ static void disas_crypto_xar(DisasContext *s, uint32_t insn)
*/
static void disas_crypto_three_reg_imm2(DisasContext *s, uint32_t insn)
{
+ static gen_helper_gvec_3 * const fns[4] = {
+ gen_helper_crypto_sm3tt1a, gen_helper_crypto_sm3tt1b,
+ gen_helper_crypto_sm3tt2a, gen_helper_crypto_sm3tt2b,
+ };
int opcode = extract32(insn, 10, 2);
int imm2 = extract32(insn, 12, 2);
int rm = extract32(insn, 16, 5);
int rn = extract32(insn, 5, 5);
int rd = extract32(insn, 0, 5);
- TCGv_ptr tcg_rd_ptr, tcg_rn_ptr, tcg_rm_ptr;
- TCGv_i32 tcg_imm2, tcg_opcode;
if (!dc_isar_feature(aa64_sm3, s)) {
unallocated_encoding(s);
@@ -13878,20 +13880,7 @@ static void disas_crypto_three_reg_imm2(DisasContext *s, uint32_t insn)
return;
}
- tcg_rd_ptr = vec_full_reg_ptr(s, rd);
- tcg_rn_ptr = vec_full_reg_ptr(s, rn);
- tcg_rm_ptr = vec_full_reg_ptr(s, rm);
- tcg_imm2 = tcg_const_i32(imm2);
- tcg_opcode = tcg_const_i32(opcode);
-
- gen_helper_crypto_sm3tt(tcg_rd_ptr, tcg_rn_ptr, tcg_rm_ptr, tcg_imm2,
- tcg_opcode);
-
- tcg_temp_free_ptr(tcg_rd_ptr);
- tcg_temp_free_ptr(tcg_rn_ptr);
- tcg_temp_free_ptr(tcg_rm_ptr);
- tcg_temp_free_i32(tcg_imm2);
- tcg_temp_free_i32(tcg_opcode);
+ gen_gvec_op3_ool(s, true, rd, rn, rm, imm2, fns[opcode]);
}
/* C3.6 Data processing - SIMD, inc Crypto
--
2.20.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/6] target/arm: Convert crypto to gvec
2020-05-14 21:28 [PATCH 0/6] target/arm: Convert crypto to gvec Richard Henderson
` (5 preceding siblings ...)
2020-05-14 21:28 ` [PATCH 6/6] target/arm: Split helper_crypto_sm3tt Richard Henderson
@ 2020-06-02 19:16 ` Peter Maydell
2020-06-02 19:21 ` Richard Henderson
2020-06-05 13:24 ` Peter Maydell
7 siblings, 1 reply; 10+ messages in thread
From: Peter Maydell @ 2020-06-02 19:16 UTC (permalink / raw)
To: Richard Henderson; +Cc: Alex Bennée, QEMU Developers
On Thu, 14 May 2020 at 22:28, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> In addition, this fixes the missing tail clearing for SVE.
>
> The sha1, sha256, sm3 routines that are not fully generalized
> are not used by sve -- it only supports the newer algorithms.
>
> I'm not sure that this:
>
> Based-on: <20200508151055.5832-1-richard.henderson@linaro.org>
> ("tcg vector rotate operations")
>
> will be sufficient for patchew, because it also relies on
> today's target-arm.next merge to master. But you get the idea.
Now I've just applied your latest tcg pull, are all the
prerequisites for this series in master?
thanks
-- PMM
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/6] target/arm: Convert crypto to gvec
2020-06-02 19:16 ` [PATCH 0/6] target/arm: Convert crypto to gvec Peter Maydell
@ 2020-06-02 19:21 ` Richard Henderson
0 siblings, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2020-06-02 19:21 UTC (permalink / raw)
To: Peter Maydell; +Cc: Alex Bennée, QEMU Developers
On 6/2/20 12:16 PM, Peter Maydell wrote:
> On Thu, 14 May 2020 at 22:28, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> In addition, this fixes the missing tail clearing for SVE.
>>
>> The sha1, sha256, sm3 routines that are not fully generalized
>> are not used by sve -- it only supports the newer algorithms.
>>
>> I'm not sure that this:
>>
>> Based-on: <20200508151055.5832-1-richard.henderson@linaro.org>
>> ("tcg vector rotate operations")
>>
>> will be sufficient for patchew, because it also relies on
>> today's target-arm.next merge to master. But you get the idea.
>
> Now I've just applied your latest tcg pull, are all the
> prerequisites for this series in master?
Yes. My branch rebased and rebuilt without incident.
r~
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/6] target/arm: Convert crypto to gvec
2020-05-14 21:28 [PATCH 0/6] target/arm: Convert crypto to gvec Richard Henderson
` (6 preceding siblings ...)
2020-06-02 19:16 ` [PATCH 0/6] target/arm: Convert crypto to gvec Peter Maydell
@ 2020-06-05 13:24 ` Peter Maydell
7 siblings, 0 replies; 10+ messages in thread
From: Peter Maydell @ 2020-06-05 13:24 UTC (permalink / raw)
To: Richard Henderson; +Cc: Alex Bennée, QEMU Developers
On Thu, 14 May 2020 at 22:28, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> In addition, this fixes the missing tail clearing for SVE.
>
> The sha1, sha256, sm3 routines that are not fully generalized
> are not used by sve -- it only supports the newer algorithms.
>
> I'm not sure that this:
>
> Based-on: <20200508151055.5832-1-richard.henderson@linaro.org>
> ("tcg vector rotate operations")
>
> will be sufficient for patchew, because it also relies on
> today's target-arm.next merge to master. But you get the idea.
>
Applied to target-arm.next, thanks.
-- PMM
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-06-05 13:25 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-14 21:28 [PATCH 0/6] target/arm: Convert crypto to gvec Richard Henderson
2020-05-14 21:28 ` [PATCH 1/6] target/arm: Convert aes and sm4 to gvec helpers Richard Henderson
2020-05-14 21:28 ` [PATCH 2/6] target/arm: Convert rax1 " Richard Henderson
2020-05-14 21:28 ` [PATCH 3/6] target/arm: Convert sha512 and sm3 " Richard Henderson
2020-05-14 21:28 ` [PATCH 4/6] target/arm: Convert sha1 and sha256 " Richard Henderson
2020-05-14 21:28 ` [PATCH 5/6] target/arm: Split helper_crypto_sha1_3reg Richard Henderson
2020-05-14 21:28 ` [PATCH 6/6] target/arm: Split helper_crypto_sm3tt Richard Henderson
2020-06-02 19:16 ` [PATCH 0/6] target/arm: Convert crypto to gvec Peter Maydell
2020-06-02 19:21 ` Richard Henderson
2020-06-05 13:24 ` Peter Maydell
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.