All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/39] Add RISC-V vector cryptography extensions
@ 2023-02-02 12:41 Lawrence Hunter
  2023-02-02 12:41 ` [PATCH 01/39] target/riscv: add zvkb cpu property Lawrence Hunter
                   ` (38 more replies)
  0 siblings, 39 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Lawrence Hunter

This patch series introduces an implementation for the six instruction sets
of the draft RISC-V vector cryptography extensions specification.

This patch set implements the instruction sets as per the 20221202
version of the specification (1). We plan to update to the latest spec
once stabilised.

Work performed by Dickon, Lawrence, Nazar, Kiran, and William from Codethink
sponsored by SiFive, as well as Max Chou and Frank Chang from SiFive.

For convenience we have created a git repo with our patches on top of a
recent master. https://github.com/CodethinkLabs/qemu-ct

1. https://github.com/riscv/riscv-crypto/releases

Dickon Hood (1):
  target/riscv: Add vrol.[vv,vx] and vror.[vv,vx,vi] decoding,
    translation and execution support

Kiran Ostrolenk (4):
  target/riscv: Add vsha2ms.vv decoding, translation and execution
    support
  target/riscv: add zvksh cpu property
  target/riscv: Add vsm3c.vi decoding, translation and execution support
  target/riscv: expose zvksh cpu property

Lawrence Hunter (16):
  target/riscv: Add vclmul.vv decoding, translation and execution
    support
  target/riscv: Add vclmul.vx decoding, translation and execution
    support
  target/riscv: Add vclmulh.vv decoding, translation and execution
    support
  target/riscv: Add vclmulh.vx decoding, translation and execution
    support
  target/riscv: Add vaesef.vv decoding, translation and execution
    support
  target/riscv: Add vaesef.vs decoding, translation and execution
    support
  target/riscv: Add vaesdf.vv decoding, translation and execution
    support
  target/riscv: Add vaesdf.vs decoding, translation and execution
    support
  target/riscv: Add vaesdm.vv decoding, translation and execution
    support
  target/riscv: Add vaesdm.vs decoding, translation and execution
    support
  target/riscv: Add vaesz.vs decoding, translation and execution support
  target/riscv: Add vsha2c[hl].vv decoding, translation and execution
    support
  target/riscv: Add vsm3me.vv decoding, translation and execution
    support
  target/riscv: add zvkg cpu property
  target/riscv: Add vghmac.vv decoding, translation and execution
    support
  target/riscv: expose zvkg cpu property

Max Chou (5):
  crypto: Move SM4_SBOXWORD from target/riscv
  crypto: Add SM4 constant parameter CK.
  target/riscv: Add zvksed cfg property
  target/riscv: Add Zvksed support
  target/riscv: Expose Zvksed property

Nazar Kazakov (10):
  target/riscv: add zvkb cpu property
  target/riscv: Add vrev8.v decoding, translation and execution support
  target/riscv: Add vandn.[vv,vx,vi] decoding, translation and execution
    support
  target/riscv: expose zvkb cpu property
  target/riscv: add zvkns cpu property
  target/riscv: Add vaeskf1.vi decoding, translation and execution
    support
  target/riscv: Add vaeskf2.vi decoding, translation and execution
    support
  target/riscv: expose zvkns cpu property
  target/riscv: add zvknh cpu properties
  target/riscv: expose zvknh cpu properties

William Salmon (3):
  target/riscv: Add vbrev8.v decoding, translation and execution support
  target/riscv: Add vaesem.vv decoding, translation and execution
    support
  target/riscv: Add vaesem.vs decoding, translation and execution
    support

 crypto/sm4.c                                 |   10 +
 include/crypto/sm4.h                         |    8 +
 include/qemu/bitops.h                        |   32 +
 target/arm/crypto_helper.c                   |   10 +-
 target/riscv/cpu.c                           |   33 +
 target/riscv/cpu.h                           |    7 +
 target/riscv/crypto_helper.c                 |    1 +
 target/riscv/helper.h                        |   69 ++
 target/riscv/insn32.decode                   |   48 +
 target/riscv/insn_trans/trans_rvzvkb.c.inc   |  162 +++
 target/riscv/insn_trans/trans_rvzvkg.c.inc   |    9 +
 target/riscv/insn_trans/trans_rvzvknh.c.inc  |   47 +
 target/riscv/insn_trans/trans_rvzvkns.c.inc  |  119 ++
 target/riscv/insn_trans/trans_rvzvksed.c.inc |   35 +
 target/riscv/insn_trans/trans_rvzvksh.c.inc  |   20 +
 target/riscv/meson.build                     |    4 +-
 target/riscv/translate.c                     |    6 +
 target/riscv/vcrypto_helper.c                | 1013 ++++++++++++++++++
 target/riscv/vector_helper.c                 |  242 +----
 target/riscv/vector_internals.c              |   63 ++
 target/riscv/vector_internals.h              |  226 ++++
 21 files changed, 1914 insertions(+), 250 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvzvkb.c.inc
 create mode 100644 target/riscv/insn_trans/trans_rvzvkg.c.inc
 create mode 100644 target/riscv/insn_trans/trans_rvzvknh.c.inc
 create mode 100644 target/riscv/insn_trans/trans_rvzvkns.c.inc
 create mode 100644 target/riscv/insn_trans/trans_rvzvksed.c.inc
 create mode 100644 target/riscv/insn_trans/trans_rvzvksh.c.inc
 create mode 100644 target/riscv/vcrypto_helper.c
 create mode 100644 target/riscv/vector_internals.c
 create mode 100644 target/riscv/vector_internals.h

-- 
2.39.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 01/39] target/riscv: add zvkb cpu property
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
@ 2023-02-02 12:41 ` Lawrence Hunter
  2023-02-02 12:41 ` [PATCH 02/39] target/riscv: Add vclmul.vv decoding, translation and execution support Lawrence Hunter
                   ` (37 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm

From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>

Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
---
 target/riscv/cpu.c | 12 ++++++++++++
 target/riscv/cpu.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index cc75ca7667..bd34119c75 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -100,6 +100,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
     ISA_EXT_DATA_ENTRY(zkt, true, PRIV_VERSION_1_12_0, ext_zkt),
     ISA_EXT_DATA_ENTRY(zve32f, true, PRIV_VERSION_1_12_0, ext_zve32f),
     ISA_EXT_DATA_ENTRY(zve64f, true, PRIV_VERSION_1_12_0, ext_zve64f),
+    ISA_EXT_DATA_ENTRY(zvkb, true, PRIV_VERSION_1_12_0, ext_zvkb),
     ISA_EXT_DATA_ENTRY(zhinx, true, PRIV_VERSION_1_12_0, ext_zhinx),
     ISA_EXT_DATA_ENTRY(zhinxmin, true, PRIV_VERSION_1_12_0, ext_zhinxmin),
     ISA_EXT_DATA_ENTRY(smaia, true, PRIV_VERSION_1_12_0, ext_smaia),
@@ -792,6 +793,17 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
             return;
         }
 
+        /*
+         * In principle zve*{x,d} would also suffice here, were they supported
+         * in qemu
+         */
+        if (cpu->cfg.ext_zvkb &&
+            !(cpu->cfg.ext_zve32f || cpu->cfg.ext_zve64f || cpu->cfg.ext_v)) {
+            error_setg(
+                errp, "Vector crypto extensions require V or Zve* extensions");
+            return;
+        }
+
         /* Set the ISA extensions, checks should have happened above */
         if (cpu->cfg.ext_zdinx || cpu->cfg.ext_zhinx ||
             cpu->cfg.ext_zhinxmin) {
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index f5609b62a2..d4824ad0bb 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -461,6 +461,7 @@ struct RISCVCPUConfig {
     bool ext_zhinxmin;
     bool ext_zve32f;
     bool ext_zve64f;
+    bool ext_zvkb;
     bool ext_zmmul;
     bool ext_smaia;
     bool ext_ssaia;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 02/39] target/riscv: Add vclmul.vv decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
  2023-02-02 12:41 ` [PATCH 01/39] target/riscv: add zvkb cpu property Lawrence Hunter
@ 2023-02-02 12:41 ` Lawrence Hunter
  2023-02-02 13:53   ` Philipp Tomsich
  2023-02-02 12:41 ` [PATCH 03/39] target/riscv: Add vclmul.vx " Lawrence Hunter
                   ` (36 subsequent siblings)
  38 siblings, 1 reply; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Lawrence Hunter, Max Chou

Co-authored-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Co-authored-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
Co-authored-by: Max Chou <max.chou@sifive.com>
Signed-off-by: Max Chou <max.chou@sifive.com>
Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
---
 target/riscv/helper.h                      |   3 +
 target/riscv/insn32.decode                 |   3 +
 target/riscv/insn_trans/trans_rvzvkb.c.inc |  41 +++++++
 target/riscv/meson.build                   |   4 +-
 target/riscv/translate.c                   |   1 +
 target/riscv/vcrypto_helper.c              |  23 ++++
 target/riscv/vector_helper.c               | 120 +--------------------
 target/riscv/vector_internals.c            |  39 +++++++
 target/riscv/vector_internals.h            | 116 ++++++++++++++++++++
 9 files changed, 230 insertions(+), 120 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvzvkb.c.inc
 create mode 100644 target/riscv/vcrypto_helper.c
 create mode 100644 target/riscv/vector_internals.c
 create mode 100644 target/riscv/vector_internals.h

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 227c7122ef..e9127c9ccb 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1136,3 +1136,6 @@ DEF_HELPER_FLAGS_1(aes64im, TCG_CALL_NO_RWG_SE, tl, tl)
 
 DEF_HELPER_FLAGS_3(sm4ed, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl)
 DEF_HELPER_FLAGS_3(sm4ks, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl)
+
+/* Vector crypto functions */
+DEF_HELPER_6(vclmul_vv, void, ptr, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b7e7613ea2..5ddee69d60 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -890,3 +890,6 @@ sm3p1       00 01000 01001 ..... 001 ..... 0010011 @r2
 # *** RV32 Zksed Standard Extension ***
 sm4ed       .. 11000 ..... ..... 000 ..... 0110011 @k_aes
 sm4ks       .. 11010 ..... ..... 000 ..... 0110011 @k_aes
+
+# *** RV64 Zvkb vector crypto extension ***
+vclmul_vv       001100 . ..... ..... 010 ..... 1010111 @r_vm
diff --git a/target/riscv/insn_trans/trans_rvzvkb.c.inc b/target/riscv/insn_trans/trans_rvzvkb.c.inc
new file mode 100644
index 0000000000..fb1995f737
--- /dev/null
+++ b/target/riscv/insn_trans/trans_rvzvkb.c.inc
@@ -0,0 +1,41 @@
+#define GEN_VV_MASKED_TRANS(NAME, CHECK)                               \
+static bool trans_##NAME(DisasContext *s, arg_rmrr * a)                \
+{                                                                      \
+    if (CHECK(s, a)) {                                                 \
+        uint32_t data = 0;                                             \
+        TCGLabel *over = gen_new_label();                              \
+        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);              \
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);     \
+                                                                       \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                     \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                 \
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);                   \
+        data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s); \
+        data = FIELD_DP32(data, VDATA, VMA, s->vma);                   \
+                                                                       \
+        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),         \
+                           vreg_ofs(s, a->rs1),                        \
+                           vreg_ofs(s, a->rs2), cpu_env,               \
+                           s->cfg_ptr->vlen / 8,                       \
+                           s->cfg_ptr->vlen / 8, data,                 \
+                           gen_helper_##NAME);                         \
+                                                                       \
+        mark_vs_dirty(s);                                              \
+        gen_set_label(over);                                           \
+        return true;                                                   \
+    }                                                                  \
+    return false;                                                      \
+}
+
+static bool zvkb_vv_check(DisasContext *s, arg_rmrr *a)
+{
+    return opivv_check(s, a) &&
+           s->cfg_ptr->ext_zvkb == true;
+}
+
+static bool vclmul_vv_check(DisasContext *s, arg_rmrr *a)
+{
+    return zvkb_vv_check(s, a) && s->sew == MO_64;
+}
+
+GEN_VV_MASKED_TRANS(vclmul_vv, vclmul_vv_check)
diff --git a/target/riscv/meson.build b/target/riscv/meson.build
index ba25164d74..5313b01e5f 100644
--- a/target/riscv/meson.build
+++ b/target/riscv/meson.build
@@ -15,10 +15,12 @@ riscv_ss.add(files(
   'gdbstub.c',
   'op_helper.c',
   'vector_helper.c',
+  'vector_internals.c',
   'bitmanip_helper.c',
   'translate.c',
   'm128_helper.c',
-  'crypto_helper.c'
+  'crypto_helper.c',
+  'vcrypto_helper.c'
 ))
 riscv_ss.add(when: 'CONFIG_KVM', if_true: files('kvm.c'), if_false: files('kvm-stub.c'))
 
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index df38db7553..71684c10f3 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -1063,6 +1063,7 @@ static uint32_t opcode_at(DisasContextBase *dcbase, target_ulong pc)
 #include "insn_trans/trans_rvzawrs.c.inc"
 #include "insn_trans/trans_rvzfh.c.inc"
 #include "insn_trans/trans_rvk.c.inc"
+#include "insn_trans/trans_rvzvkb.c.inc"
 #include "insn_trans/trans_privileged.c.inc"
 #include "insn_trans/trans_svinval.c.inc"
 #include "insn_trans/trans_xventanacondops.c.inc"
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
new file mode 100644
index 0000000000..8a11e56754
--- /dev/null
+++ b/target/riscv/vcrypto_helper.c
@@ -0,0 +1,23 @@
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+#include "qemu/bitops.h"
+#include "cpu.h"
+#include "exec/memop.h"
+#include "exec/exec-all.h"
+#include "exec/helper-proto.h"
+#include "tcg/tcg-gvec-desc.h"
+#include "internals.h"
+#include "vector_internals.h"
+
+static void do_vclmul_vv(void *vd, void *vs1, void *vs2, int i)
+{
+    uint64_t result = 0;
+    for (int j = 63; j >= 0; j--) {
+        if ((((uint64_t *)vs1)[i] >> j) & 1) {
+            result ^= (((uint64_t *)vs2)[i] << j);
+        }
+    }
+    ((uint64_t *)vd)[i] = result;
+}
+
+GEN_VEXT_VV(vclmul_vv, 8)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 00de879787..def1b21414 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -26,6 +26,7 @@
 #include "fpu/softfloat.h"
 #include "tcg/tcg-gvec-desc.h"
 #include "internals.h"
+#include "vector_internals.h"
 #include <math.h>
 
 target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
@@ -95,48 +96,6 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
 #define H8(x)   (x)
 #endif
 
-static inline uint32_t vext_nf(uint32_t desc)
-{
-    return FIELD_EX32(simd_data(desc), VDATA, NF);
-}
-
-static inline uint32_t vext_vm(uint32_t desc)
-{
-    return FIELD_EX32(simd_data(desc), VDATA, VM);
-}
-
-/*
- * Encode LMUL to lmul as following:
- *     LMUL    vlmul    lmul
- *      1       000       0
- *      2       001       1
- *      4       010       2
- *      8       011       3
- *      -       100       -
- *     1/8      101      -3
- *     1/4      110      -2
- *     1/2      111      -1
- */
-static inline int32_t vext_lmul(uint32_t desc)
-{
-    return sextract32(FIELD_EX32(simd_data(desc), VDATA, LMUL), 0, 3);
-}
-
-static inline uint32_t vext_vta(uint32_t desc)
-{
-    return FIELD_EX32(simd_data(desc), VDATA, VTA);
-}
-
-static inline uint32_t vext_vma(uint32_t desc)
-{
-    return FIELD_EX32(simd_data(desc), VDATA, VMA);
-}
-
-static inline uint32_t vext_vta_all_1s(uint32_t desc)
-{
-    return FIELD_EX32(simd_data(desc), VDATA, VTA_ALL_1S);
-}
-
 /*
  * Get the maximum number of elements can be operated.
  *
@@ -155,21 +114,6 @@ static inline uint32_t vext_max_elems(uint32_t desc, uint32_t log2_esz)
     return scale < 0 ? vlenb >> -scale : vlenb << scale;
 }
 
-/*
- * Get number of total elements, including prestart, body and tail elements.
- * Note that when LMUL < 1, the tail includes the elements past VLMAX that
- * are held in the same vector register.
- */
-static inline uint32_t vext_get_total_elems(CPURISCVState *env, uint32_t desc,
-                                            uint32_t esz)
-{
-    uint32_t vlenb = simd_maxsz(desc);
-    uint32_t sew = 1 << FIELD_EX64(env->vtype, VTYPE, VSEW);
-    int8_t emul = ctzl(esz) - ctzl(sew) + vext_lmul(desc) < 0 ? 0 :
-                  ctzl(esz) - ctzl(sew) + vext_lmul(desc);
-    return (vlenb << emul) / esz;
-}
-
 static inline target_ulong adjust_addr(CPURISCVState *env, target_ulong addr)
 {
     return (addr & env->cur_pmmask) | env->cur_pmbase;
@@ -202,20 +146,6 @@ static void probe_pages(CPURISCVState *env, target_ulong addr,
     }
 }
 
-/* set agnostic elements to 1s */
-static void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
-                              uint32_t tot)
-{
-    if (is_agnostic == 0) {
-        /* policy undisturbed */
-        return;
-    }
-    if (tot - cnt == 0) {
-        return;
-    }
-    memset(base + cnt, -1, tot - cnt);
-}
-
 static inline void vext_set_elem_mask(void *v0, int index,
                                       uint8_t value)
 {
@@ -225,18 +155,6 @@ static inline void vext_set_elem_mask(void *v0, int index,
     ((uint64_t *)v0)[idx] = deposit64(old, pos, 1, value);
 }
 
-/*
- * Earlier designs (pre-0.9) had a varying number of bits
- * per mask value (MLEN). In the 0.9 design, MLEN=1.
- * (Section 4.5)
- */
-static inline int vext_elem_mask(void *v0, int index)
-{
-    int idx = index / 64;
-    int pos = index  % 64;
-    return (((uint64_t *)v0)[idx] >> pos) & 1;
-}
-
 /* elements operations for load and store */
 typedef void vext_ldst_elem_fn(CPURISCVState *env, target_ulong addr,
                                uint32_t idx, void *vd, uintptr_t retaddr);
@@ -800,8 +718,6 @@ GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b)
 #define NOP_UUU_H uint16_t, uint16_t, uint32_t, uint16_t, uint32_t
 #define NOP_UUU_W uint32_t, uint32_t, uint64_t, uint32_t, uint64_t
 
-/* operation of two vector elements */
-typedef void opivv2_fn(void *vd, void *vs1, void *vs2, int i);
 
 #define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
 static void do_##NAME(void *vd, void *vs1, void *vs2, int i)    \
@@ -822,40 +738,6 @@ RVVCALL(OPIVV2, vsub_vv_h, OP_SSS_H, H2, H2, H2, DO_SUB)
 RVVCALL(OPIVV2, vsub_vv_w, OP_SSS_W, H4, H4, H4, DO_SUB)
 RVVCALL(OPIVV2, vsub_vv_d, OP_SSS_D, H8, H8, H8, DO_SUB)
 
-static void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
-                       CPURISCVState *env, uint32_t desc,
-                       opivv2_fn *fn, uint32_t esz)
-{
-    uint32_t vm = vext_vm(desc);
-    uint32_t vl = env->vl;
-    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
-    uint32_t vta = vext_vta(desc);
-    uint32_t vma = vext_vma(desc);
-    uint32_t i;
-
-    for (i = env->vstart; i < vl; i++) {
-        if (!vm && !vext_elem_mask(v0, i)) {
-            /* set masked-off elements to 1s */
-            vext_set_elems_1s(vd, vma, i * esz, (i + 1) * esz);
-            continue;
-        }
-        fn(vd, vs1, vs2, i);
-    }
-    env->vstart = 0;
-    /* set tail elements to 1s */
-    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
-}
-
-/* generate the helpers for OPIVV */
-#define GEN_VEXT_VV(NAME, ESZ)                            \
-void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
-                  void *vs2, CPURISCVState *env,          \
-                  uint32_t desc)                          \
-{                                                         \
-    do_vext_vv(vd, v0, vs1, vs2, env, desc,               \
-               do_##NAME, ESZ);                           \
-}
-
 GEN_VEXT_VV(vadd_vv_b, 1)
 GEN_VEXT_VV(vadd_vv_h, 2)
 GEN_VEXT_VV(vadd_vv_w, 4)
diff --git a/target/riscv/vector_internals.c b/target/riscv/vector_internals.c
new file mode 100644
index 0000000000..a264797882
--- /dev/null
+++ b/target/riscv/vector_internals.c
@@ -0,0 +1,39 @@
+#include "vector_internals.h"
+
+/* set agnostic elements to 1s */
+void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
+                       uint32_t tot)
+{
+    if (is_agnostic == 0) {
+        /* policy undisturbed */
+        return;
+    }
+    if (tot - cnt == 0) {
+        return ;
+    }
+    memset(base + cnt, -1, tot - cnt);
+}
+
+void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
+                CPURISCVState *env, uint32_t desc,
+                opivv2_fn *fn, uint32_t esz)
+{
+    uint32_t vm = vext_vm(desc);
+    uint32_t vl = env->vl;
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+    uint32_t vta = vext_vta(desc);
+    uint32_t vma = vext_vma(desc);
+    uint32_t i;
+
+    for (i = env->vstart; i < vl; i++) {
+        if (!vm && !vext_elem_mask(v0, i)) {
+            /* set masked-off elements to 1s */
+            vext_set_elems_1s(vd, vma, i * esz, (i + 1) * esz);
+            continue;
+        }
+        fn(vd, vs1, vs2, i);
+    }
+    env->vstart = 0;
+    /* set tail elements to 1s */
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
+}
diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h
new file mode 100644
index 0000000000..f61803acc0
--- /dev/null
+++ b/target/riscv/vector_internals.h
@@ -0,0 +1,116 @@
+/*
+ * RISC-V Vector Extension Internals
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef TARGET_RISCV_VECTOR_INTERNAL_H
+#define TARGET_RISCV_VECTOR_INTERNAL_H
+
+#include "qemu/osdep.h"
+#include "qemu/bitops.h"
+#include "cpu.h"
+#include "tcg/tcg-gvec-desc.h"
+#include "internals.h"
+
+static inline uint32_t vext_nf(uint32_t desc)
+{
+    return FIELD_EX32(simd_data(desc), VDATA, NF);
+}
+
+/*
+ * Encode LMUL to lmul as following:
+ *     LMUL    vlmul    lmul
+ *      1       000       0
+ *      2       001       1
+ *      4       010       2
+ *      8       011       3
+ *      -       100       -
+ *     1/8      101      -3
+ *     1/4      110      -2
+ *     1/2      111      -1
+ */
+static inline int32_t vext_lmul(uint32_t desc)
+{
+    return sextract32(FIELD_EX32(simd_data(desc), VDATA, LMUL), 0, 3);
+}
+
+static inline uint32_t vext_vm(uint32_t desc)
+{
+    return FIELD_EX32(simd_data(desc), VDATA, VM);
+}
+
+static inline uint32_t vext_vma(uint32_t desc)
+{
+    return FIELD_EX32(simd_data(desc), VDATA, VMA);
+}
+
+static inline uint32_t vext_vta(uint32_t desc)
+{
+    return FIELD_EX32(simd_data(desc), VDATA, VTA);
+}
+
+static inline uint32_t vext_vta_all_1s(uint32_t desc)
+{
+    return FIELD_EX32(simd_data(desc), VDATA, VTA_ALL_1S);
+}
+
+/*
+ * Earlier designs (pre-0.9) had a varying number of bits
+ * per mask value (MLEN). In the 0.9 design, MLEN=1.
+ * (Section 4.5)
+ */
+static inline int vext_elem_mask(void *v0, int index)
+{
+    int idx = index / 64;
+    int pos = index  % 64;
+    return (((uint64_t *)v0)[idx] >> pos) & 1;
+}
+
+/*
+ * Get number of total elements, including prestart, body and tail elements.
+ * Note that when LMUL < 1, the tail includes the elements past VLMAX that
+ * are held in the same vector register.
+ */
+static inline uint32_t vext_get_total_elems(CPURISCVState *env, uint32_t desc,
+                                            uint32_t esz)
+{
+    uint32_t vlenb = simd_maxsz(desc);
+    uint32_t sew = 1 << FIELD_EX64(env->vtype, VTYPE, VSEW);
+    int8_t emul = ctzl(esz) - ctzl(sew) + vext_lmul(desc) < 0 ? 0 :
+                  ctzl(esz) - ctzl(sew) + vext_lmul(desc);
+    return (vlenb << emul) / esz;
+}
+
+/* set agnostic elements to 1s */
+void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
+                       uint32_t tot);
+
+/* operation of two vector elements */
+typedef void opivv2_fn(void *vd, void *vs1, void *vs2, int i);
+
+void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
+                CPURISCVState *env, uint32_t desc,
+                opivv2_fn *fn, uint32_t esz);
+
+/* generate the helpers for OPIVV */
+#define GEN_VEXT_VV(NAME, ESZ)                            \
+void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
+                  void *vs2, CPURISCVState *env,          \
+                  uint32_t desc)                          \
+{                                                         \
+    do_vext_vv(vd, v0, vs1, vs2, env, desc,               \
+               do_##NAME, ESZ);                           \
+}
+
+#endif /* TARGET_RISCV_VECTOR_INTERNAL_H */
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 03/39] target/riscv: Add vclmul.vx decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
  2023-02-02 12:41 ` [PATCH 01/39] target/riscv: add zvkb cpu property Lawrence Hunter
  2023-02-02 12:41 ` [PATCH 02/39] target/riscv: Add vclmul.vv decoding, translation and execution support Lawrence Hunter
@ 2023-02-02 12:41 ` Lawrence Hunter
  2023-02-02 13:59   ` Philipp Tomsich
  2023-02-02 12:41 ` [PATCH 04/39] target/riscv: Add vclmulh.vv " Lawrence Hunter
                   ` (35 subsequent siblings)
  38 siblings, 1 reply; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Lawrence Hunter

Co-authored-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
Co-authored-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
---
 target/riscv/helper.h                      |  1 +
 target/riscv/insn32.decode                 |  1 +
 target/riscv/insn_trans/trans_rvzvkb.c.inc | 54 ++++++++++++++++++++++
 target/riscv/vcrypto_helper.c              | 12 +++++
 target/riscv/vector_helper.c               | 36 ---------------
 target/riscv/vector_internals.c            | 24 ++++++++++
 target/riscv/vector_internals.h            | 16 +++++++
 7 files changed, 108 insertions(+), 36 deletions(-)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index e9127c9ccb..6c786ef6f3 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1139,3 +1139,4 @@ DEF_HELPER_FLAGS_3(sm4ks, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl)
 
 /* Vector crypto functions */
 DEF_HELPER_6(vclmul_vv, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vclmul_vx, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 5ddee69d60..4a7421354d 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -893,3 +893,4 @@ sm4ks       .. 11010 ..... ..... 000 ..... 0110011 @k_aes
 
 # *** RV64 Zvkb vector crypto extension ***
 vclmul_vv       001100 . ..... ..... 010 ..... 1010111 @r_vm
+vclmul_vx       001100 . ..... ..... 110 ..... 1010111 @r_vm
diff --git a/target/riscv/insn_trans/trans_rvzvkb.c.inc b/target/riscv/insn_trans/trans_rvzvkb.c.inc
index fb1995f737..6e8b81136c 100644
--- a/target/riscv/insn_trans/trans_rvzvkb.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvkb.c.inc
@@ -39,3 +39,57 @@ static bool vclmul_vv_check(DisasContext *s, arg_rmrr *a)
 }
 
 GEN_VV_MASKED_TRANS(vclmul_vv, vclmul_vv_check)
+
+#define GEN_VX_MASKED_TRANS(NAME, CHECK)                                \
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                  \
+{                                                                       \
+    if (CHECK(s, a)) {                                                  \
+        TCGv_ptr rd_v, v0_v, rs2_v;                                     \
+        TCGv rs1;                                                       \
+        TCGv_i32 desc;                                                  \
+        uint32_t data = 0;                                              \
+                                                                        \
+        TCGLabel *over = gen_new_label();                               \
+        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);               \
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);      \
+                                                                        \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                      \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                  \
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);                    \
+        data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);  \
+        data = FIELD_DP32(data, VDATA, VMA, s->vma);                    \
+                                                                        \
+        rd_v = tcg_temp_new_ptr();                                      \
+        v0_v = tcg_temp_new_ptr();                                      \
+        rs1 = get_gpr(s, a->rs1, EXT_ZERO);                             \
+        rs2_v = tcg_temp_new_ptr();                                     \
+        desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlen / 8,         \
+                                          s->cfg_ptr->vlen / 8, data)); \
+        tcg_gen_addi_ptr(rd_v, cpu_env, vreg_ofs(s, a->rd));            \
+        tcg_gen_addi_ptr(v0_v, cpu_env, vreg_ofs(s, 0));                \
+        tcg_gen_addi_ptr(rs2_v, cpu_env, vreg_ofs(s, a->rs2));          \
+        gen_helper_##NAME(rd_v, v0_v, rs1, rs2_v, cpu_env, desc);       \
+        tcg_temp_free_ptr(rd_v);                                        \
+        tcg_temp_free_ptr(v0_v);                                        \
+        tcg_temp_free_ptr(rs2_v);                                       \
+                                                                        \
+        mark_vs_dirty(s);                                               \
+        gen_set_label(over);                                            \
+        return true;                                                    \
+    }                                                                   \
+    return false;                                                       \
+}
+
+static bool zvkb_vx_check(DisasContext *s, arg_rmrr *a)
+{
+    return opivx_check(s, a) &&
+           s->cfg_ptr->ext_zvkb == true;
+}
+
+static bool vclmul_vx_check(DisasContext *s, arg_rmrr *a)
+{
+    return zvkb_vx_check(s, a) &&
+           s->sew == MO_64;
+}
+
+GEN_VX_MASKED_TRANS(vclmul_vx, vclmul_vx_check)
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index 8a11e56754..c453d348ad 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -20,4 +20,16 @@ static void do_vclmul_vv(void *vd, void *vs1, void *vs2, int i)
     ((uint64_t *)vd)[i] = result;
 }
 
+static void do_vclmul_vx(void *vd, target_long rs1, void *vs2, int i)
+{
+    uint64_t result = 0;
+    for (int j = 63; j >= 0; j--) {
+        if ((rs1 >> j) & 1) {
+            result ^= (((uint64_t *)vs2)[i] << j);
+        }
+    }
+    ((uint64_t *)vd)[i] = result;
+}
+
 GEN_VEXT_VV(vclmul_vv, 8)
+GEN_VEXT_VX(vclmul_vx, 8)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index def1b21414..ab470092f6 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -747,8 +747,6 @@ GEN_VEXT_VV(vsub_vv_h, 2)
 GEN_VEXT_VV(vsub_vv_w, 4)
 GEN_VEXT_VV(vsub_vv_d, 8)
 
-typedef void opivx2_fn(void *vd, target_long s1, void *vs2, int i);
-
 /*
  * (T1)s1 gives the real operator type.
  * (TX1)(T1)s1 expands the operator type of widen or narrow operations.
@@ -773,40 +771,6 @@ RVVCALL(OPIVX2, vrsub_vx_h, OP_SSS_H, H2, H2, DO_RSUB)
 RVVCALL(OPIVX2, vrsub_vx_w, OP_SSS_W, H4, H4, DO_RSUB)
 RVVCALL(OPIVX2, vrsub_vx_d, OP_SSS_D, H8, H8, DO_RSUB)
 
-static void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
-                       CPURISCVState *env, uint32_t desc,
-                       opivx2_fn fn, uint32_t esz)
-{
-    uint32_t vm = vext_vm(desc);
-    uint32_t vl = env->vl;
-    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
-    uint32_t vta = vext_vta(desc);
-    uint32_t vma = vext_vma(desc);
-    uint32_t i;
-
-    for (i = env->vstart; i < vl; i++) {
-        if (!vm && !vext_elem_mask(v0, i)) {
-            /* set masked-off elements to 1s */
-            vext_set_elems_1s(vd, vma, i * esz, (i + 1) * esz);
-            continue;
-        }
-        fn(vd, s1, vs2, i);
-    }
-    env->vstart = 0;
-    /* set tail elements to 1s */
-    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
-}
-
-/* generate the helpers for OPIVX */
-#define GEN_VEXT_VX(NAME, ESZ)                            \
-void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
-                  void *vs2, CPURISCVState *env,          \
-                  uint32_t desc)                          \
-{                                                         \
-    do_vext_vx(vd, v0, s1, vs2, env, desc,                \
-               do_##NAME, ESZ);                           \
-}
-
 GEN_VEXT_VX(vadd_vx_b, 1)
 GEN_VEXT_VX(vadd_vx_h, 2)
 GEN_VEXT_VX(vadd_vx_w, 4)
diff --git a/target/riscv/vector_internals.c b/target/riscv/vector_internals.c
index a264797882..b23fa4dd74 100644
--- a/target/riscv/vector_internals.c
+++ b/target/riscv/vector_internals.c
@@ -37,3 +37,27 @@ void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
     /* set tail elements to 1s */
     vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
 }
+
+void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
+                CPURISCVState *env, uint32_t desc,
+                opivx2_fn fn, uint32_t esz)
+{
+    uint32_t vm = vext_vm(desc);
+    uint32_t vl = env->vl;
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+    uint32_t vta = vext_vta(desc);
+    uint32_t vma = vext_vma(desc);
+    uint32_t i;
+
+    for (i = env->vstart; i < vl; i++) {
+        if (!vm && !vext_elem_mask(v0, i)) {
+            /* set masked-off elements to 1s */
+            vext_set_elems_1s(vd, vma, i * esz, (i + 1) * esz);
+            continue;
+        }
+        fn(vd, s1, vs2, i);
+    }
+    env->vstart = 0;
+    /* set tail elements to 1s */
+    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
+}
diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h
index f61803acc0..49529d2379 100644
--- a/target/riscv/vector_internals.h
+++ b/target/riscv/vector_internals.h
@@ -113,4 +113,20 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
                do_##NAME, ESZ);                           \
 }
 
+typedef void opivx2_fn(void *vd, target_long s1, void *vs2, int i);
+
+void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
+                CPURISCVState *env, uint32_t desc,
+                opivx2_fn fn, uint32_t esz);
+
+/* generate the helpers for OPIVX */
+#define GEN_VEXT_VX(NAME, ESZ)                            \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
+                  void *vs2, CPURISCVState *env,          \
+                  uint32_t desc)                          \
+{                                                         \
+    do_vext_vx(vd, v0, s1, vs2, env, desc,                \
+               do_##NAME, ESZ);                           \
+}
+
 #endif /* TARGET_RISCV_VECTOR_INTERNAL_H */
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 04/39] target/riscv: Add vclmulh.vv decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (2 preceding siblings ...)
  2023-02-02 12:41 ` [PATCH 03/39] target/riscv: Add vclmul.vx " Lawrence Hunter
@ 2023-02-02 12:41 ` Lawrence Hunter
  2023-02-02 14:03   ` Philipp Tomsich
  2023-02-02 16:54   ` Richard Henderson
  2023-02-02 12:41 ` [PATCH 05/39] target/riscv: Add vclmulh.vx " Lawrence Hunter
                   ` (34 subsequent siblings)
  38 siblings, 2 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Lawrence Hunter

Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
---
 target/riscv/helper.h                      |  1 +
 target/riscv/insn32.decode                 |  1 +
 target/riscv/insn_trans/trans_rvzvkb.c.inc |  1 +
 target/riscv/vcrypto_helper.c              | 12 ++++++++++++
 4 files changed, 15 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 6c786ef6f3..a155272701 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1140,3 +1140,4 @@ DEF_HELPER_FLAGS_3(sm4ks, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl)
 /* Vector crypto functions */
 DEF_HELPER_6(vclmul_vv, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vclmul_vx, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vclmulh_vv, void, ptr, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 4a7421354d..e26ea1df08 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -894,3 +894,4 @@ sm4ks       .. 11010 ..... ..... 000 ..... 0110011 @k_aes
 # *** RV64 Zvkb vector crypto extension ***
 vclmul_vv       001100 . ..... ..... 010 ..... 1010111 @r_vm
 vclmul_vx       001100 . ..... ..... 110 ..... 1010111 @r_vm
+vclmulh_vv      001101 . ..... ..... 010 ..... 1010111 @r_vm
diff --git a/target/riscv/insn_trans/trans_rvzvkb.c.inc b/target/riscv/insn_trans/trans_rvzvkb.c.inc
index 6e8b81136c..19ce4c7431 100644
--- a/target/riscv/insn_trans/trans_rvzvkb.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvkb.c.inc
@@ -39,6 +39,7 @@ static bool vclmul_vv_check(DisasContext *s, arg_rmrr *a)
 }
 
 GEN_VV_MASKED_TRANS(vclmul_vv, vclmul_vv_check)
+GEN_VV_MASKED_TRANS(vclmulh_vv, vclmul_vv_check)
 
 #define GEN_VX_MASKED_TRANS(NAME, CHECK)                                \
 static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                  \
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index c453d348ad..022b941131 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -31,5 +31,17 @@ static void do_vclmul_vx(void *vd, target_long rs1, void *vs2, int i)
     ((uint64_t *)vd)[i] = result;
 }
 
+static void do_vclmulh_vv(void *vd, void *vs1, void *vs2, int i)
+{
+    __uint128_t result = 0;
+    for (int j = 63; j >= 0; j--) {
+        if ((((uint64_t *)vs1)[i] >> j) & 1) {
+            result ^= (((__uint128_t)(((uint64_t *)vs2)[i])) << j);
+        }
+    }
+    ((uint64_t *)vd)[i] = (result >> 64);
+}
+
 GEN_VEXT_VV(vclmul_vv, 8)
 GEN_VEXT_VX(vclmul_vx, 8)
+GEN_VEXT_VV(vclmulh_vv, 8)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 05/39] target/riscv: Add vclmulh.vx decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (3 preceding siblings ...)
  2023-02-02 12:41 ` [PATCH 04/39] target/riscv: Add vclmulh.vv " Lawrence Hunter
@ 2023-02-02 12:41 ` Lawrence Hunter
  2023-02-02 12:41   ` [PATCH 06/39] target/riscv: Add vrol.[vv, vx] and vror.[vv, vx, vi] " Lawrence Hunter
                   ` (33 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Lawrence Hunter

Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
---
 target/riscv/helper.h                      |  1 +
 target/riscv/insn32.decode                 |  1 +
 target/riscv/insn_trans/trans_rvzvkb.c.inc |  1 +
 target/riscv/vcrypto_helper.c              | 12 ++++++++++++
 4 files changed, 15 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index a155272701..32f1179e29 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1141,3 +1141,4 @@ DEF_HELPER_FLAGS_3(sm4ks, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl)
 DEF_HELPER_6(vclmul_vv, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vclmul_vx, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vclmulh_vv, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vclmulh_vx, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e26ea1df08..b4d88dd1cb 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -895,3 +895,4 @@ sm4ks       .. 11010 ..... ..... 000 ..... 0110011 @k_aes
 vclmul_vv       001100 . ..... ..... 010 ..... 1010111 @r_vm
 vclmul_vx       001100 . ..... ..... 110 ..... 1010111 @r_vm
 vclmulh_vv      001101 . ..... ..... 010 ..... 1010111 @r_vm
+vclmulh_vx      001101 . ..... ..... 110 ..... 1010111 @r_vm
diff --git a/target/riscv/insn_trans/trans_rvzvkb.c.inc b/target/riscv/insn_trans/trans_rvzvkb.c.inc
index 19ce4c7431..533141e559 100644
--- a/target/riscv/insn_trans/trans_rvzvkb.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvkb.c.inc
@@ -94,3 +94,4 @@ static bool vclmul_vx_check(DisasContext *s, arg_rmrr *a)
 }
 
 GEN_VX_MASKED_TRANS(vclmul_vx, vclmul_vx_check)
+GEN_VX_MASKED_TRANS(vclmulh_vx, vclmul_vx_check)
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index 022b941131..46e2e510c5 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -42,6 +42,18 @@ static void do_vclmulh_vv(void *vd, void *vs1, void *vs2, int i)
     ((uint64_t *)vd)[i] = (result >> 64);
 }
 
+static void do_vclmulh_vx(void *vd, target_long rs1, void *vs2, int i)
+{
+    __uint128_t result = 0;
+    for (int j = 63; j >= 0; j--) {
+        if ((rs1 >> j) & 1) {
+            result ^= (((__uint128_t)(((uint64_t *)vs2)[i])) << j);
+        }
+    }
+    ((uint64_t *)vd)[i] = (result >> 64);
+}
+
 GEN_VEXT_VV(vclmul_vv, 8)
 GEN_VEXT_VX(vclmul_vx, 8)
 GEN_VEXT_VV(vclmulh_vv, 8)
+GEN_VEXT_VX(vclmulh_vx, 8)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 06/39] target/riscv: Add vrol.[vv,vx] and vror.[vv,vx,vi] decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
@ 2023-02-02 12:41   ` Lawrence Hunter
  2023-02-02 12:41 ` [PATCH 02/39] target/riscv: Add vclmul.vv decoding, translation and execution support Lawrence Hunter
                     ` (37 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm

From: Dickon Hood <dickon.hood@codethink.co.uk>

Add an initial implementation of the vrol.* and vror.* instructions,
with mappings between the RISC-V instructions and their internal TCG
accelerated implmentations.

There are some missing ror helpers, so I've bodged it by converting them
to rols.

Co-authored-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
Signed-off-by: Dickon Hood <dickon.hood@codethink.co.uk>
---
 target/riscv/helper.h                      | 20 ++++++++
 target/riscv/insn32.decode                 |  6 +++
 target/riscv/insn_trans/trans_rvzvkb.c.inc | 20 ++++++++
 target/riscv/vcrypto_helper.c              | 58 ++++++++++++++++++++++
 target/riscv/vector_helper.c               | 45 -----------------
 target/riscv/vector_internals.h            | 52 +++++++++++++++++++
 6 files changed, 156 insertions(+), 45 deletions(-)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 32f1179e29..e5b6b3360f 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1142,3 +1142,23 @@ DEF_HELPER_6(vclmul_vv, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vclmul_vx, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vclmulh_vv, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vclmulh_vx, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vror_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vror_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vror_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vror_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vror_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vror_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vror_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vror_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vrol_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrol_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrol_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrol_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vrol_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrol_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrol_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrol_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b4d88dd1cb..725f907ad1 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -896,3 +896,9 @@ vclmul_vv       001100 . ..... ..... 010 ..... 1010111 @r_vm
 vclmul_vx       001100 . ..... ..... 110 ..... 1010111 @r_vm
 vclmulh_vv      001101 . ..... ..... 010 ..... 1010111 @r_vm
 vclmulh_vx      001101 . ..... ..... 110 ..... 1010111 @r_vm
+vrol_vv         010101 . ..... ..... 000 ..... 1010111 @r_vm
+vrol_vx         010101 . ..... ..... 100 ..... 1010111 @r_vm
+vror_vv         010100 . ..... ..... 000 ..... 1010111 @r_vm
+vror_vx         010100 . ..... ..... 100 ..... 1010111 @r_vm
+vror_vi         010100 . ..... ..... 011 ..... 1010111 @r_vm
+vror_vi2        010101 . ..... ..... 011 ..... 1010111 @r_vm
diff --git a/target/riscv/insn_trans/trans_rvzvkb.c.inc b/target/riscv/insn_trans/trans_rvzvkb.c.inc
index 533141e559..d2a7a92d42 100644
--- a/target/riscv/insn_trans/trans_rvzvkb.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvkb.c.inc
@@ -95,3 +95,23 @@ static bool vclmul_vx_check(DisasContext *s, arg_rmrr *a)
 
 GEN_VX_MASKED_TRANS(vclmul_vx, vclmul_vx_check)
 GEN_VX_MASKED_TRANS(vclmulh_vx, vclmul_vx_check)
+
+GEN_OPIVV_TRANS(vror_vv, zvkb_vv_check)
+GEN_OPIVX_TRANS(vror_vx, zvkb_vx_check)
+GEN_OPIVV_TRANS(vrol_vv, zvkb_vv_check)
+GEN_OPIVX_TRANS(vrol_vx, zvkb_vx_check)
+
+GEN_OPIVI_TRANS(vror_vi, IMM_TRUNC_SEW, vror_vx, zvkb_vx_check)
+
+/*
+ * Immediates are 5b long, and we need six for the rotate-immediate.  The
+ * decision has been taken to remove the vrol.vi instruction -- you can
+ * emulate it with a ror, after all -- and use the bottom bit of the funct6
+ * part of the opcode to encode the extra bit.  I've chosen to implement it
+ * like this because it's easy and reasonably clean.
+ */
+static bool trans_vror_vi2(DisasContext *s, arg_rmrr *a)
+{
+    a->rs1 += 32;
+    return trans_vror_vi(s, a);
+}
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index 46e2e510c5..7ec75c5589 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -57,3 +57,61 @@ GEN_VEXT_VV(vclmul_vv, 8)
 GEN_VEXT_VX(vclmul_vx, 8)
 GEN_VEXT_VV(vclmulh_vv, 8)
 GEN_VEXT_VX(vclmulh_vx, 8)
+
+/*
+ *  Looks a mess, but produces reasonable (aarch32) code on clang:
+ * https://godbolt.org/z/jchjsTda8
+ */
+#define DO_ROR(x, n)                       \
+    ((x >> (n & ((sizeof(x) << 3) - 1))) | \
+     (x << ((sizeof(x) << 3) - (n & ((sizeof(x) << 3) - 1)))))
+#define DO_ROL(x, n)                       \
+    ((x << (n & ((sizeof(x) << 3) - 1))) | \
+     (x >> ((sizeof(x) << 3) - (n & ((sizeof(x) << 3) - 1)))))
+
+RVVCALL(OPIVV2, vror_vv_b, OP_UUU_B, H1, H1, H1, DO_ROR)
+RVVCALL(OPIVV2, vror_vv_h, OP_UUU_H, H2, H2, H2, DO_ROR)
+RVVCALL(OPIVV2, vror_vv_w, OP_UUU_W, H4, H4, H4, DO_ROR)
+RVVCALL(OPIVV2, vror_vv_d, OP_UUU_D, H8, H8, H8, DO_ROR)
+GEN_VEXT_VV(vror_vv_b, 1)
+GEN_VEXT_VV(vror_vv_h, 2)
+GEN_VEXT_VV(vror_vv_w, 4)
+GEN_VEXT_VV(vror_vv_d, 8)
+
+/*
+ * There's a missing tcg_gen_gvec_rotrs() helper function.
+ */
+#define GEN_VEXT_VX_RTOL(NAME, ESZ)                                      \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,        \
+                  CPURISCVState *env, uint32_t desc)                     \
+{                                                                        \
+    do_vext_vx(vd, v0, (ESZ << 3) - s1, vs2, env, desc, do_##NAME, ESZ); \
+}
+
+/* DO_ROL because GEN_VEXT_VX_RTOL() converts from R to L */
+RVVCALL(OPIVX2, vror_vx_b, OP_UUU_B, H1, H1, DO_ROL)
+RVVCALL(OPIVX2, vror_vx_h, OP_UUU_H, H2, H2, DO_ROL)
+RVVCALL(OPIVX2, vror_vx_w, OP_UUU_W, H4, H4, DO_ROL)
+RVVCALL(OPIVX2, vror_vx_d, OP_UUU_D, H8, H8, DO_ROL)
+GEN_VEXT_VX_RTOL(vror_vx_b, 1)
+GEN_VEXT_VX_RTOL(vror_vx_h, 2)
+GEN_VEXT_VX_RTOL(vror_vx_w, 4)
+GEN_VEXT_VX_RTOL(vror_vx_d, 8)
+
+RVVCALL(OPIVV2, vrol_vv_b, OP_UUU_B, H1, H1, H1, DO_ROL)
+RVVCALL(OPIVV2, vrol_vv_h, OP_UUU_H, H2, H2, H2, DO_ROL)
+RVVCALL(OPIVV2, vrol_vv_w, OP_UUU_W, H4, H4, H4, DO_ROL)
+RVVCALL(OPIVV2, vrol_vv_d, OP_UUU_D, H8, H8, H8, DO_ROL)
+GEN_VEXT_VV(vrol_vv_b, 1)
+GEN_VEXT_VV(vrol_vv_h, 2)
+GEN_VEXT_VV(vrol_vv_w, 4)
+GEN_VEXT_VV(vrol_vv_d, 8)
+
+RVVCALL(OPIVX2, vrol_vx_b, OP_UUU_B, H1, H1, DO_ROL)
+RVVCALL(OPIVX2, vrol_vx_h, OP_UUU_H, H2, H2, DO_ROL)
+RVVCALL(OPIVX2, vrol_vx_w, OP_UUU_W, H4, H4, DO_ROL)
+RVVCALL(OPIVX2, vrol_vx_d, OP_UUU_D, H8, H8, DO_ROL)
+GEN_VEXT_VX(vrol_vx_b, 1)
+GEN_VEXT_VX(vrol_vx_h, 2)
+GEN_VEXT_VX(vrol_vx_w, 4)
+GEN_VEXT_VX(vrol_vx_d, 8)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index ab470092f6..ff7b03cbe3 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -76,26 +76,6 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
     return vl;
 }
 
-/*
- * Note that vector data is stored in host-endian 64-bit chunks,
- * so addressing units smaller than that needs a host-endian fixup.
- */
-#if HOST_BIG_ENDIAN
-#define H1(x)   ((x) ^ 7)
-#define H1_2(x) ((x) ^ 6)
-#define H1_4(x) ((x) ^ 4)
-#define H2(x)   ((x) ^ 3)
-#define H4(x)   ((x) ^ 1)
-#define H8(x)   ((x))
-#else
-#define H1(x)   (x)
-#define H1_2(x) (x)
-#define H1_4(x) (x)
-#define H2(x)   (x)
-#define H4(x)   (x)
-#define H8(x)   (x)
-#endif
-
 /*
  * Get the maximum number of elements can be operated.
  *
@@ -683,18 +663,11 @@ GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b)
  *** Vector Integer Arithmetic Instructions
  */
 
-/* expand macro args before macro */
-#define RVVCALL(macro, ...)  macro(__VA_ARGS__)
-
 /* (TD, T1, T2, TX1, TX2) */
 #define OP_SSS_B int8_t, int8_t, int8_t, int8_t, int8_t
 #define OP_SSS_H int16_t, int16_t, int16_t, int16_t, int16_t
 #define OP_SSS_W int32_t, int32_t, int32_t, int32_t, int32_t
 #define OP_SSS_D int64_t, int64_t, int64_t, int64_t, int64_t
-#define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
-#define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
-#define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
-#define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
 #define OP_SUS_B int8_t, uint8_t, int8_t, uint8_t, int8_t
 #define OP_SUS_H int16_t, uint16_t, int16_t, uint16_t, int16_t
 #define OP_SUS_W int32_t, uint32_t, int32_t, uint32_t, int32_t
@@ -718,14 +691,6 @@ GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b)
 #define NOP_UUU_H uint16_t, uint16_t, uint32_t, uint16_t, uint32_t
 #define NOP_UUU_W uint32_t, uint32_t, uint64_t, uint32_t, uint64_t
 
-
-#define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
-static void do_##NAME(void *vd, void *vs1, void *vs2, int i)    \
-{                                                               \
-    TX1 s1 = *((T1 *)vs1 + HS1(i));                             \
-    TX2 s2 = *((T2 *)vs2 + HS2(i));                             \
-    *((TD *)vd + HD(i)) = OP(s2, s1);                           \
-}
 #define DO_SUB(N, M) (N - M)
 #define DO_RSUB(N, M) (M - N)
 
@@ -747,16 +712,6 @@ GEN_VEXT_VV(vsub_vv_h, 2)
 GEN_VEXT_VV(vsub_vv_w, 4)
 GEN_VEXT_VV(vsub_vv_d, 8)
 
-/*
- * (T1)s1 gives the real operator type.
- * (TX1)(T1)s1 expands the operator type of widen or narrow operations.
- */
-#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
-static void do_##NAME(void *vd, target_long s1, void *vs2, int i)   \
-{                                                                   \
-    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
-    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1);                      \
-}
 
 RVVCALL(OPIVX2, vadd_vx_b, OP_SSS_B, H1, H1, DO_ADD)
 RVVCALL(OPIVX2, vadd_vx_h, OP_SSS_H, H2, H2, DO_ADD)
diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h
index 49529d2379..a0fbac7bf3 100644
--- a/target/riscv/vector_internals.h
+++ b/target/riscv/vector_internals.h
@@ -28,6 +28,26 @@ static inline uint32_t vext_nf(uint32_t desc)
     return FIELD_EX32(simd_data(desc), VDATA, NF);
 }
 
+/*
+ * Note that vector data is stored in host-endian 64-bit chunks,
+ * so addressing units smaller than that needs a host-endian fixup.
+ */
+#if HOST_BIG_ENDIAN
+#define H1(x)   ((x) ^ 7)
+#define H1_2(x) ((x) ^ 6)
+#define H1_4(x) ((x) ^ 4)
+#define H2(x)   ((x) ^ 3)
+#define H4(x)   ((x) ^ 1)
+#define H8(x)   ((x))
+#else
+#define H1(x)   (x)
+#define H1_2(x) (x)
+#define H1_4(x) (x)
+#define H2(x)   (x)
+#define H4(x)   (x)
+#define H8(x)   (x)
+#endif
+
 /*
  * Encode LMUL to lmul as following:
  *     LMUL    vlmul    lmul
@@ -96,9 +116,30 @@ static inline uint32_t vext_get_total_elems(CPURISCVState *env, uint32_t desc,
 void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
                        uint32_t tot);
 
+/*
+ *** Vector Integer Arithmetic Instructions
+ */
+
+/* expand macro args before macro */
+#define RVVCALL(macro, ...)  macro(__VA_ARGS__)
+
+/* (TD, T1, T2, TX1, TX2) */
+#define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
+#define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
+#define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
+#define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
+
 /* operation of two vector elements */
 typedef void opivv2_fn(void *vd, void *vs1, void *vs2, int i);
 
+#define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
+static void do_##NAME(void *vd, void *vs1, void *vs2, int i)    \
+{                                                               \
+    TX1 s1 = *((T1 *)vs1 + HS1(i));                             \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                             \
+    *((TD *)vd + HD(i)) = OP(s2, s1);                           \
+}
+
 void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
                 CPURISCVState *env, uint32_t desc,
                 opivv2_fn *fn, uint32_t esz);
@@ -115,6 +156,17 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
 
 typedef void opivx2_fn(void *vd, target_long s1, void *vs2, int i);
 
+/*
+ * (T1)s1 gives the real operator type.
+ * (TX1)(T1)s1 expands the operator type of widen or narrow operations.
+ */
+#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
+static void do_##NAME(void *vd, target_long s1, void *vs2, int i)   \
+{                                                                   \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
+    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1);                      \
+}
+
 void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
                 CPURISCVState *env, uint32_t desc,
                 opivx2_fn fn, uint32_t esz);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 06/39] target/riscv: Add vrol.[vv, vx] and vror.[vv, vx, vi] decoding, translation and execution support
@ 2023-02-02 12:41   ` Lawrence Hunter
  0 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm

From: Dickon Hood <dickon.hood@codethink.co.uk>

Add an initial implementation of the vrol.* and vror.* instructions,
with mappings between the RISC-V instructions and their internal TCG
accelerated implmentations.

There are some missing ror helpers, so I've bodged it by converting them
to rols.

Co-authored-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
Signed-off-by: Dickon Hood <dickon.hood@codethink.co.uk>
---
 target/riscv/helper.h                      | 20 ++++++++
 target/riscv/insn32.decode                 |  6 +++
 target/riscv/insn_trans/trans_rvzvkb.c.inc | 20 ++++++++
 target/riscv/vcrypto_helper.c              | 58 ++++++++++++++++++++++
 target/riscv/vector_helper.c               | 45 -----------------
 target/riscv/vector_internals.h            | 52 +++++++++++++++++++
 6 files changed, 156 insertions(+), 45 deletions(-)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 32f1179e29..e5b6b3360f 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1142,3 +1142,23 @@ DEF_HELPER_6(vclmul_vv, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vclmul_vx, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vclmulh_vv, void, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_6(vclmulh_vx, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vror_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vror_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vror_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vror_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vror_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vror_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vror_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vror_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vrol_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrol_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrol_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrol_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vrol_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrol_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrol_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrol_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b4d88dd1cb..725f907ad1 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -896,3 +896,9 @@ vclmul_vv       001100 . ..... ..... 010 ..... 1010111 @r_vm
 vclmul_vx       001100 . ..... ..... 110 ..... 1010111 @r_vm
 vclmulh_vv      001101 . ..... ..... 010 ..... 1010111 @r_vm
 vclmulh_vx      001101 . ..... ..... 110 ..... 1010111 @r_vm
+vrol_vv         010101 . ..... ..... 000 ..... 1010111 @r_vm
+vrol_vx         010101 . ..... ..... 100 ..... 1010111 @r_vm
+vror_vv         010100 . ..... ..... 000 ..... 1010111 @r_vm
+vror_vx         010100 . ..... ..... 100 ..... 1010111 @r_vm
+vror_vi         010100 . ..... ..... 011 ..... 1010111 @r_vm
+vror_vi2        010101 . ..... ..... 011 ..... 1010111 @r_vm
diff --git a/target/riscv/insn_trans/trans_rvzvkb.c.inc b/target/riscv/insn_trans/trans_rvzvkb.c.inc
index 533141e559..d2a7a92d42 100644
--- a/target/riscv/insn_trans/trans_rvzvkb.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvkb.c.inc
@@ -95,3 +95,23 @@ static bool vclmul_vx_check(DisasContext *s, arg_rmrr *a)
 
 GEN_VX_MASKED_TRANS(vclmul_vx, vclmul_vx_check)
 GEN_VX_MASKED_TRANS(vclmulh_vx, vclmul_vx_check)
+
+GEN_OPIVV_TRANS(vror_vv, zvkb_vv_check)
+GEN_OPIVX_TRANS(vror_vx, zvkb_vx_check)
+GEN_OPIVV_TRANS(vrol_vv, zvkb_vv_check)
+GEN_OPIVX_TRANS(vrol_vx, zvkb_vx_check)
+
+GEN_OPIVI_TRANS(vror_vi, IMM_TRUNC_SEW, vror_vx, zvkb_vx_check)
+
+/*
+ * Immediates are 5b long, and we need six for the rotate-immediate.  The
+ * decision has been taken to remove the vrol.vi instruction -- you can
+ * emulate it with a ror, after all -- and use the bottom bit of the funct6
+ * part of the opcode to encode the extra bit.  I've chosen to implement it
+ * like this because it's easy and reasonably clean.
+ */
+static bool trans_vror_vi2(DisasContext *s, arg_rmrr *a)
+{
+    a->rs1 += 32;
+    return trans_vror_vi(s, a);
+}
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index 46e2e510c5..7ec75c5589 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -57,3 +57,61 @@ GEN_VEXT_VV(vclmul_vv, 8)
 GEN_VEXT_VX(vclmul_vx, 8)
 GEN_VEXT_VV(vclmulh_vv, 8)
 GEN_VEXT_VX(vclmulh_vx, 8)
+
+/*
+ *  Looks a mess, but produces reasonable (aarch32) code on clang:
+ * https://godbolt.org/z/jchjsTda8
+ */
+#define DO_ROR(x, n)                       \
+    ((x >> (n & ((sizeof(x) << 3) - 1))) | \
+     (x << ((sizeof(x) << 3) - (n & ((sizeof(x) << 3) - 1)))))
+#define DO_ROL(x, n)                       \
+    ((x << (n & ((sizeof(x) << 3) - 1))) | \
+     (x >> ((sizeof(x) << 3) - (n & ((sizeof(x) << 3) - 1)))))
+
+RVVCALL(OPIVV2, vror_vv_b, OP_UUU_B, H1, H1, H1, DO_ROR)
+RVVCALL(OPIVV2, vror_vv_h, OP_UUU_H, H2, H2, H2, DO_ROR)
+RVVCALL(OPIVV2, vror_vv_w, OP_UUU_W, H4, H4, H4, DO_ROR)
+RVVCALL(OPIVV2, vror_vv_d, OP_UUU_D, H8, H8, H8, DO_ROR)
+GEN_VEXT_VV(vror_vv_b, 1)
+GEN_VEXT_VV(vror_vv_h, 2)
+GEN_VEXT_VV(vror_vv_w, 4)
+GEN_VEXT_VV(vror_vv_d, 8)
+
+/*
+ * There's a missing tcg_gen_gvec_rotrs() helper function.
+ */
+#define GEN_VEXT_VX_RTOL(NAME, ESZ)                                      \
+void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,        \
+                  CPURISCVState *env, uint32_t desc)                     \
+{                                                                        \
+    do_vext_vx(vd, v0, (ESZ << 3) - s1, vs2, env, desc, do_##NAME, ESZ); \
+}
+
+/* DO_ROL because GEN_VEXT_VX_RTOL() converts from R to L */
+RVVCALL(OPIVX2, vror_vx_b, OP_UUU_B, H1, H1, DO_ROL)
+RVVCALL(OPIVX2, vror_vx_h, OP_UUU_H, H2, H2, DO_ROL)
+RVVCALL(OPIVX2, vror_vx_w, OP_UUU_W, H4, H4, DO_ROL)
+RVVCALL(OPIVX2, vror_vx_d, OP_UUU_D, H8, H8, DO_ROL)
+GEN_VEXT_VX_RTOL(vror_vx_b, 1)
+GEN_VEXT_VX_RTOL(vror_vx_h, 2)
+GEN_VEXT_VX_RTOL(vror_vx_w, 4)
+GEN_VEXT_VX_RTOL(vror_vx_d, 8)
+
+RVVCALL(OPIVV2, vrol_vv_b, OP_UUU_B, H1, H1, H1, DO_ROL)
+RVVCALL(OPIVV2, vrol_vv_h, OP_UUU_H, H2, H2, H2, DO_ROL)
+RVVCALL(OPIVV2, vrol_vv_w, OP_UUU_W, H4, H4, H4, DO_ROL)
+RVVCALL(OPIVV2, vrol_vv_d, OP_UUU_D, H8, H8, H8, DO_ROL)
+GEN_VEXT_VV(vrol_vv_b, 1)
+GEN_VEXT_VV(vrol_vv_h, 2)
+GEN_VEXT_VV(vrol_vv_w, 4)
+GEN_VEXT_VV(vrol_vv_d, 8)
+
+RVVCALL(OPIVX2, vrol_vx_b, OP_UUU_B, H1, H1, DO_ROL)
+RVVCALL(OPIVX2, vrol_vx_h, OP_UUU_H, H2, H2, DO_ROL)
+RVVCALL(OPIVX2, vrol_vx_w, OP_UUU_W, H4, H4, DO_ROL)
+RVVCALL(OPIVX2, vrol_vx_d, OP_UUU_D, H8, H8, DO_ROL)
+GEN_VEXT_VX(vrol_vx_b, 1)
+GEN_VEXT_VX(vrol_vx_h, 2)
+GEN_VEXT_VX(vrol_vx_w, 4)
+GEN_VEXT_VX(vrol_vx_d, 8)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index ab470092f6..ff7b03cbe3 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -76,26 +76,6 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
     return vl;
 }
 
-/*
- * Note that vector data is stored in host-endian 64-bit chunks,
- * so addressing units smaller than that needs a host-endian fixup.
- */
-#if HOST_BIG_ENDIAN
-#define H1(x)   ((x) ^ 7)
-#define H1_2(x) ((x) ^ 6)
-#define H1_4(x) ((x) ^ 4)
-#define H2(x)   ((x) ^ 3)
-#define H4(x)   ((x) ^ 1)
-#define H8(x)   ((x))
-#else
-#define H1(x)   (x)
-#define H1_2(x) (x)
-#define H1_4(x) (x)
-#define H2(x)   (x)
-#define H4(x)   (x)
-#define H8(x)   (x)
-#endif
-
 /*
  * Get the maximum number of elements can be operated.
  *
@@ -683,18 +663,11 @@ GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b)
  *** Vector Integer Arithmetic Instructions
  */
 
-/* expand macro args before macro */
-#define RVVCALL(macro, ...)  macro(__VA_ARGS__)
-
 /* (TD, T1, T2, TX1, TX2) */
 #define OP_SSS_B int8_t, int8_t, int8_t, int8_t, int8_t
 #define OP_SSS_H int16_t, int16_t, int16_t, int16_t, int16_t
 #define OP_SSS_W int32_t, int32_t, int32_t, int32_t, int32_t
 #define OP_SSS_D int64_t, int64_t, int64_t, int64_t, int64_t
-#define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
-#define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
-#define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
-#define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
 #define OP_SUS_B int8_t, uint8_t, int8_t, uint8_t, int8_t
 #define OP_SUS_H int16_t, uint16_t, int16_t, uint16_t, int16_t
 #define OP_SUS_W int32_t, uint32_t, int32_t, uint32_t, int32_t
@@ -718,14 +691,6 @@ GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b)
 #define NOP_UUU_H uint16_t, uint16_t, uint32_t, uint16_t, uint32_t
 #define NOP_UUU_W uint32_t, uint32_t, uint64_t, uint32_t, uint64_t
 
-
-#define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
-static void do_##NAME(void *vd, void *vs1, void *vs2, int i)    \
-{                                                               \
-    TX1 s1 = *((T1 *)vs1 + HS1(i));                             \
-    TX2 s2 = *((T2 *)vs2 + HS2(i));                             \
-    *((TD *)vd + HD(i)) = OP(s2, s1);                           \
-}
 #define DO_SUB(N, M) (N - M)
 #define DO_RSUB(N, M) (M - N)
 
@@ -747,16 +712,6 @@ GEN_VEXT_VV(vsub_vv_h, 2)
 GEN_VEXT_VV(vsub_vv_w, 4)
 GEN_VEXT_VV(vsub_vv_d, 8)
 
-/*
- * (T1)s1 gives the real operator type.
- * (TX1)(T1)s1 expands the operator type of widen or narrow operations.
- */
-#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
-static void do_##NAME(void *vd, target_long s1, void *vs2, int i)   \
-{                                                                   \
-    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
-    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1);                      \
-}
 
 RVVCALL(OPIVX2, vadd_vx_b, OP_SSS_B, H1, H1, DO_ADD)
 RVVCALL(OPIVX2, vadd_vx_h, OP_SSS_H, H2, H2, DO_ADD)
diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h
index 49529d2379..a0fbac7bf3 100644
--- a/target/riscv/vector_internals.h
+++ b/target/riscv/vector_internals.h
@@ -28,6 +28,26 @@ static inline uint32_t vext_nf(uint32_t desc)
     return FIELD_EX32(simd_data(desc), VDATA, NF);
 }
 
+/*
+ * Note that vector data is stored in host-endian 64-bit chunks,
+ * so addressing units smaller than that needs a host-endian fixup.
+ */
+#if HOST_BIG_ENDIAN
+#define H1(x)   ((x) ^ 7)
+#define H1_2(x) ((x) ^ 6)
+#define H1_4(x) ((x) ^ 4)
+#define H2(x)   ((x) ^ 3)
+#define H4(x)   ((x) ^ 1)
+#define H8(x)   ((x))
+#else
+#define H1(x)   (x)
+#define H1_2(x) (x)
+#define H1_4(x) (x)
+#define H2(x)   (x)
+#define H4(x)   (x)
+#define H8(x)   (x)
+#endif
+
 /*
  * Encode LMUL to lmul as following:
  *     LMUL    vlmul    lmul
@@ -96,9 +116,30 @@ static inline uint32_t vext_get_total_elems(CPURISCVState *env, uint32_t desc,
 void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
                        uint32_t tot);
 
+/*
+ *** Vector Integer Arithmetic Instructions
+ */
+
+/* expand macro args before macro */
+#define RVVCALL(macro, ...)  macro(__VA_ARGS__)
+
+/* (TD, T1, T2, TX1, TX2) */
+#define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
+#define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
+#define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
+#define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
+
 /* operation of two vector elements */
 typedef void opivv2_fn(void *vd, void *vs1, void *vs2, int i);
 
+#define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
+static void do_##NAME(void *vd, void *vs1, void *vs2, int i)    \
+{                                                               \
+    TX1 s1 = *((T1 *)vs1 + HS1(i));                             \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                             \
+    *((TD *)vd + HD(i)) = OP(s2, s1);                           \
+}
+
 void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
                 CPURISCVState *env, uint32_t desc,
                 opivv2_fn *fn, uint32_t esz);
@@ -115,6 +156,17 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
 
 typedef void opivx2_fn(void *vd, target_long s1, void *vs2, int i);
 
+/*
+ * (T1)s1 gives the real operator type.
+ * (TX1)(T1)s1 expands the operator type of widen or narrow operations.
+ */
+#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
+static void do_##NAME(void *vd, target_long s1, void *vs2, int i)   \
+{                                                                   \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
+    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1);                      \
+}
+
 void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
                 CPURISCVState *env, uint32_t desc,
                 opivx2_fn fn, uint32_t esz);
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 07/39] target/riscv: Add vbrev8.v decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (5 preceding siblings ...)
  2023-02-02 12:41   ` [PATCH 06/39] target/riscv: Add vrol.[vv, vx] and vror.[vv, vx, vi] " Lawrence Hunter
@ 2023-02-02 12:41 ` Lawrence Hunter
  2023-02-02 14:21   ` Philipp Tomsich
  2023-02-02 12:41 ` [PATCH 08/39] target/riscv: Add vrev8.v " Lawrence Hunter
                   ` (31 subsequent siblings)
  38 siblings, 1 reply; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	William Salmon

From: William Salmon <will.salmon@codethink.co.uk>

Co-authored-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
Signed-off-by: William Salmon <will.salmon@codethink.co.uk>
---
 include/qemu/bitops.h                      | 32 +++++++++++++++++
 target/riscv/helper.h                      |  5 +++
 target/riscv/insn32.decode                 |  1 +
 target/riscv/insn_trans/trans_rvzvkb.c.inc | 39 ++++++++++++++++++++
 target/riscv/vcrypto_helper.c              | 10 ++++++
 target/riscv/vector_helper.c               | 41 ---------------------
 target/riscv/vector_internals.h            | 42 ++++++++++++++++++++++
 7 files changed, 129 insertions(+), 41 deletions(-)

diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
index 03213ce952..dfce1cb10c 100644
--- a/include/qemu/bitops.h
+++ b/include/qemu/bitops.h
@@ -618,4 +618,36 @@ static inline uint64_t half_unshuffle64(uint64_t x)
     return x;
 }
 
+static inline uint8_t reverse_bits_byte(uint8_t x)
+{
+    x = (((x & 0b10101010) >> 1) | ((x & 0b01010101) << 1));
+    x = (((x & 0b11001100) >> 2) | ((x & 0b00110011) << 2));
+    return ((x & 0b11110000) >> 4) | ((x & 0b00001111) << 4);
+}
+
+static inline uint16_t reverse_bits_byte_2(uint16_t x)
+{
+    return (uint16_t)reverse_bits_byte(x & 0xFF) | \
+       (uint16_t)reverse_bits_byte((x >> 8) & 0xFF) << 8;
+}
+
+static inline uint32_t reverse_bits_byte_4(uint32_t x)
+{
+    return (uint32_t)reverse_bits_byte(x & 0xFF) | \
+       (uint32_t)reverse_bits_byte((x >> 8) & 0xFF) << 8 | \
+       (uint32_t)reverse_bits_byte((x >> 16) & 0xFF) << 16 | \
+       (uint32_t)reverse_bits_byte((x >> 24) & 0xFF) << 24;
+}
+
+static inline uint64_t reverse_bits_byte_8(uint64_t x)
+{
+    return (uint64_t)reverse_bits_byte(x & 0xFF) | \
+       (uint64_t)reverse_bits_byte((x >> 8) & 0xFF) << 8 | \
+       (uint64_t)reverse_bits_byte((x >> 16) & 0xFF) << 16 | \
+       (uint64_t)reverse_bits_byte((x >> 24) & 0xFF) << 24 | \
+       (uint64_t)reverse_bits_byte((x >> 32) & 0xFF) << 32 | \
+       (uint64_t)reverse_bits_byte((x >> 40) & 0xFF) << 40 | \
+       (uint64_t)reverse_bits_byte((x >> 48) & 0xFF) << 48 | \
+       (uint64_t)reverse_bits_byte((x >> 56) & 0xFF) << 56;
+}
 #endif
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index e5b6b3360f..c94627d8a4 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1162,3 +1162,8 @@ DEF_HELPER_6(vrol_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrol_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrol_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrol_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_5(vbrev8_v_b, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vbrev8_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vbrev8_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vbrev8_v_d, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 725f907ad1..782632a165 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -902,3 +902,4 @@ vror_vv         010100 . ..... ..... 000 ..... 1010111 @r_vm
 vror_vx         010100 . ..... ..... 100 ..... 1010111 @r_vm
 vror_vi         010100 . ..... ..... 011 ..... 1010111 @r_vm
 vror_vi2        010101 . ..... ..... 011 ..... 1010111 @r_vm
+vbrev8_v        010010 . ..... 01000 010 ..... 1010111 @r2_vm
diff --git a/target/riscv/insn_trans/trans_rvzvkb.c.inc b/target/riscv/insn_trans/trans_rvzvkb.c.inc
index d2a7a92d42..591980459a 100644
--- a/target/riscv/insn_trans/trans_rvzvkb.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvkb.c.inc
@@ -115,3 +115,42 @@ static bool trans_vror_vi2(DisasContext *s, arg_rmrr *a)
     a->rs1 += 32;
     return trans_vror_vi(s, a);
 }
+
+#define GEN_OPIV_TRANS(NAME, CHECK)                                    \
+static bool trans_##NAME(DisasContext *s, arg_rmr * a)                 \
+{                                                                      \
+    if (CHECK(s, a)) {                                                 \
+        uint32_t data = 0;                                             \
+        static gen_helper_gvec_3_ptr * const fns[4] = {                \
+            gen_helper_##NAME##_b,                                     \
+            gen_helper_##NAME##_h,                                     \
+            gen_helper_##NAME##_w,                                     \
+            gen_helper_##NAME##_d,                                     \
+        };                                                             \
+        TCGLabel *over = gen_new_label();                              \
+        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);              \
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);     \
+                                                                       \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                     \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                 \
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);                   \
+        data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s); \
+        data = FIELD_DP32(data, VDATA, VMA, s->vma);                   \
+        tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),         \
+                           vreg_ofs(s, a->rs2), cpu_env,               \
+                           s->cfg_ptr->vlen / 8, s->cfg_ptr->vlen / 8, \
+                           data, fns[s->sew]);                         \
+        mark_vs_dirty(s);                                              \
+        gen_set_label(over);                                           \
+        return true;                                                   \
+    }                                                                  \
+    return false;                                                      \
+}
+
+static bool vxrev8_check(DisasContext *s, arg_rmr *a)
+{
+    return s->cfg_ptr->ext_zvkb == true && vext_check_isa_ill(s) &&
+           vext_check_ss(s, a->rd, a->rs2, a->vm);
+}
+
+GEN_OPIV_TRANS(vbrev8_v, vxrev8_check)
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index 7ec75c5589..303a656141 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -1,6 +1,7 @@
 #include "qemu/osdep.h"
 #include "qemu/host-utils.h"
 #include "qemu/bitops.h"
+#include "qemu/bswap.h"
 #include "cpu.h"
 #include "exec/memop.h"
 #include "exec/exec-all.h"
@@ -115,3 +116,12 @@ GEN_VEXT_VX(vrol_vx_b, 1)
 GEN_VEXT_VX(vrol_vx_h, 2)
 GEN_VEXT_VX(vrol_vx_w, 4)
 GEN_VEXT_VX(vrol_vx_d, 8)
+
+RVVCALL(OPIVV1, vbrev8_v_b, OP_UU_B, H1, H1, reverse_bits_byte)
+RVVCALL(OPIVV1, vbrev8_v_h, OP_UU_H, H2, H2, reverse_bits_byte_2)
+RVVCALL(OPIVV1, vbrev8_v_w, OP_UU_W, H4, H4, reverse_bits_byte_4)
+RVVCALL(OPIVV1, vbrev8_v_d, OP_UU_D, H8, H8, reverse_bits_byte_8)
+GEN_VEXT_V(vbrev8_v_b, 1)
+GEN_VEXT_V(vbrev8_v_h, 2)
+GEN_VEXT_V(vbrev8_v_w, 4)
+GEN_VEXT_V(vbrev8_v_d, 8)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index ff7b03cbe3..07da4b5e16 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -3437,12 +3437,6 @@ RVVCALL(OPFVF3, vfwnmsac_vf_w, WOP_UUU_W, H8, H4, fwnmsac32)
 GEN_VEXT_VF(vfwnmsac_vf_h, 4)
 GEN_VEXT_VF(vfwnmsac_vf_w, 8)
 
-/* Vector Floating-Point Square-Root Instruction */
-/* (TD, T2, TX2) */
-#define OP_UU_H uint16_t, uint16_t, uint16_t
-#define OP_UU_W uint32_t, uint32_t, uint32_t
-#define OP_UU_D uint64_t, uint64_t, uint64_t
-
 #define OPFVV1(NAME, TD, T2, TX2, HD, HS2, OP)        \
 static void do_##NAME(void *vd, void *vs2, int i,      \
         CPURISCVState *env)                            \
@@ -4134,41 +4128,6 @@ GEN_VEXT_CMP_VF(vmfge_vf_h, uint16_t, H2, vmfge16)
 GEN_VEXT_CMP_VF(vmfge_vf_w, uint32_t, H4, vmfge32)
 GEN_VEXT_CMP_VF(vmfge_vf_d, uint64_t, H8, vmfge64)
 
-/* Vector Floating-Point Classify Instruction */
-#define OPIVV1(NAME, TD, T2, TX2, HD, HS2, OP)         \
-static void do_##NAME(void *vd, void *vs2, int i)      \
-{                                                      \
-    TX2 s2 = *((T2 *)vs2 + HS2(i));                    \
-    *((TD *)vd + HD(i)) = OP(s2);                      \
-}
-
-#define GEN_VEXT_V(NAME, ESZ)                          \
-void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
-                  CPURISCVState *env, uint32_t desc)   \
-{                                                      \
-    uint32_t vm = vext_vm(desc);                       \
-    uint32_t vl = env->vl;                             \
-    uint32_t total_elems =                             \
-        vext_get_total_elems(env, desc, ESZ);          \
-    uint32_t vta = vext_vta(desc);                     \
-    uint32_t vma = vext_vma(desc);                     \
-    uint32_t i;                                        \
-                                                       \
-    for (i = env->vstart; i < vl; i++) {               \
-        if (!vm && !vext_elem_mask(v0, i)) {           \
-            /* set masked-off elements to 1s */        \
-            vext_set_elems_1s(vd, vma, i * ESZ,        \
-                              (i + 1) * ESZ);          \
-            continue;                                  \
-        }                                              \
-        do_##NAME(vd, vs2, i);                         \
-    }                                                  \
-    env->vstart = 0;                                   \
-    /* set tail elements to 1s */                      \
-    vext_set_elems_1s(vd, vta, vl * ESZ,               \
-                      total_elems * ESZ);              \
-}
-
 target_ulong fclass_h(uint64_t frs1)
 {
     float16 f = frs1;
diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h
index a0fbac7bf3..f1f16453dc 100644
--- a/target/riscv/vector_internals.h
+++ b/target/riscv/vector_internals.h
@@ -123,12 +123,54 @@ void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
 /* expand macro args before macro */
 #define RVVCALL(macro, ...)  macro(__VA_ARGS__)
 
+/* Vector Floating-Point Square-Root Instruction */
+/* (TD, T2, TX2) */
+#define OP_UU_B uint8_t, uint8_t, uint8_t
+#define OP_UU_H uint16_t, uint16_t, uint16_t
+#define OP_UU_W uint32_t, uint32_t, uint32_t
+#define OP_UU_D uint64_t, uint64_t, uint64_t
+
 /* (TD, T1, T2, TX1, TX2) */
 #define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
 #define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
 #define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
 #define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
 
+/* Vector Floating-Point Classify Instruction */
+#define OPIVV1(NAME, TD, T2, TX2, HD, HS2, OP)         \
+static void do_##NAME(void *vd, void *vs2, int i)      \
+{                                                      \
+    TX2 s2 = *((T2 *)vs2 + HS2(i));                    \
+    *((TD *)vd + HD(i)) = OP(s2);                      \
+}
+
+#define GEN_VEXT_V(NAME, ESZ)                          \
+void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
+                  CPURISCVState *env, uint32_t desc)   \
+{                                                      \
+    uint32_t vm = vext_vm(desc);                       \
+    uint32_t vl = env->vl;                             \
+    uint32_t total_elems =                             \
+        vext_get_total_elems(env, desc, ESZ);          \
+    uint32_t vta = vext_vta(desc);                     \
+    uint32_t vma = vext_vma(desc);                     \
+    uint32_t i;                                        \
+                                                       \
+    for (i = env->vstart; i < vl; i++) {               \
+        if (!vm && !vext_elem_mask(v0, i)) {           \
+            /* set masked-off elements to 1s */        \
+            vext_set_elems_1s(vd, vma, i * ESZ,        \
+                              (i + 1) * ESZ);          \
+            continue;                                  \
+        }                                              \
+        do_##NAME(vd, vs2, i);                         \
+    }                                                  \
+    env->vstart = 0;                                   \
+    /* set tail elements to 1s */                      \
+    vext_set_elems_1s(vd, vta, vl * ESZ,               \
+                      total_elems * ESZ);              \
+}
+
 /* operation of two vector elements */
 typedef void opivv2_fn(void *vd, void *vs1, void *vs2, int i);
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 08/39] target/riscv: Add vrev8.v decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (6 preceding siblings ...)
  2023-02-02 12:41 ` [PATCH 07/39] target/riscv: Add vbrev8.v " Lawrence Hunter
@ 2023-02-02 12:41 ` Lawrence Hunter
  2023-02-02 14:22   ` Philipp Tomsich
  2023-02-02 12:42   ` [PATCH 09/39] target/riscv: Add vandn.[vv, vx, vi] " Lawrence Hunter
                   ` (30 subsequent siblings)
  38 siblings, 1 reply; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm

From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>

Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
---
 target/riscv/helper.h                      |  4 ++++
 target/riscv/insn32.decode                 |  1 +
 target/riscv/insn_trans/trans_rvzvkb.c.inc |  1 +
 target/riscv/vcrypto_helper.c              | 10 ++++++++++
 4 files changed, 16 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index c94627d8a4..c980d52828 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1163,6 +1163,10 @@ DEF_HELPER_6(vrol_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrol_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vrol_vx_d, void, ptr, ptr, tl, ptr, env, i32)
 
+DEF_HELPER_5(vrev8_v_b, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vrev8_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vrev8_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vrev8_v_d, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vbrev8_v_b, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vbrev8_v_h, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vbrev8_v_w, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 782632a165..342199abc0 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -903,3 +903,4 @@ vror_vx         010100 . ..... ..... 100 ..... 1010111 @r_vm
 vror_vi         010100 . ..... ..... 011 ..... 1010111 @r_vm
 vror_vi2        010101 . ..... ..... 011 ..... 1010111 @r_vm
 vbrev8_v        010010 . ..... 01000 010 ..... 1010111 @r2_vm
+vrev8_v         010010 . ..... 01001 010 ..... 1010111 @r2_vm
diff --git a/target/riscv/insn_trans/trans_rvzvkb.c.inc b/target/riscv/insn_trans/trans_rvzvkb.c.inc
index 591980459a..18b362db92 100644
--- a/target/riscv/insn_trans/trans_rvzvkb.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvkb.c.inc
@@ -154,3 +154,4 @@ static bool vxrev8_check(DisasContext *s, arg_rmr *a)
 }
 
 GEN_OPIV_TRANS(vbrev8_v, vxrev8_check)
+GEN_OPIV_TRANS(vrev8_v, vxrev8_check)
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index 303a656141..b09fe5fa2a 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -125,3 +125,13 @@ GEN_VEXT_V(vbrev8_v_b, 1)
 GEN_VEXT_V(vbrev8_v_h, 2)
 GEN_VEXT_V(vbrev8_v_w, 4)
 GEN_VEXT_V(vbrev8_v_d, 8)
+
+#define DO_VREV8_B(a) (a)
+RVVCALL(OPIVV1, vrev8_v_b, OP_UU_B, H1, H1, DO_VREV8_B)
+RVVCALL(OPIVV1, vrev8_v_h, OP_UU_H, H2, H2, bswap16)
+RVVCALL(OPIVV1, vrev8_v_w, OP_UU_W, H4, H4, bswap32)
+RVVCALL(OPIVV1, vrev8_v_d, OP_UU_D, H8, H8, bswap64)
+GEN_VEXT_V(vrev8_v_b, 1)
+GEN_VEXT_V(vrev8_v_h, 2)
+GEN_VEXT_V(vrev8_v_w, 4)
+GEN_VEXT_V(vrev8_v_d, 8)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 09/39] target/riscv: Add vandn.[vv,vx,vi] decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
@ 2023-02-02 12:42   ` Lawrence Hunter
  2023-02-02 12:41 ` [PATCH 02/39] target/riscv: Add vclmul.vv decoding, translation and execution support Lawrence Hunter
                     ` (37 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm

From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>

Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
---
 target/riscv/helper.h                      |  9 +++++++++
 target/riscv/insn32.decode                 |  3 +++
 target/riscv/insn_trans/trans_rvzvkb.c.inc |  5 +++++
 target/riscv/vcrypto_helper.c              | 19 +++++++++++++++++++
 4 files changed, 36 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index c980d52828..5de615ea78 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1171,3 +1171,12 @@ DEF_HELPER_5(vbrev8_v_b, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vbrev8_v_h, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vbrev8_v_w, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vbrev8_v_d, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vandn_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vandn_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vandn_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vandn_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vandn_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vandn_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vandn_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vandn_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 342199abc0..d6f5e4d198 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -904,3 +904,6 @@ vror_vi         010100 . ..... ..... 011 ..... 1010111 @r_vm
 vror_vi2        010101 . ..... ..... 011 ..... 1010111 @r_vm
 vbrev8_v        010010 . ..... 01000 010 ..... 1010111 @r2_vm
 vrev8_v         010010 . ..... 01001 010 ..... 1010111 @r2_vm
+vandn_vi        000001 . ..... ..... 011 ..... 1010111 @r_vm
+vandn_vv        000001 . ..... ..... 000 ..... 1010111 @r_vm
+vandn_vx        000001 . ..... ..... 100 ..... 1010111 @r_vm
diff --git a/target/riscv/insn_trans/trans_rvzvkb.c.inc b/target/riscv/insn_trans/trans_rvzvkb.c.inc
index 18b362db92..a973b27bdd 100644
--- a/target/riscv/insn_trans/trans_rvzvkb.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvkb.c.inc
@@ -147,6 +147,11 @@ static bool trans_##NAME(DisasContext *s, arg_rmr * a)                 \
     return false;                                                      \
 }
 
+
+GEN_OPIVV_TRANS(vandn_vv, zvkb_vv_check)
+GEN_OPIVX_TRANS(vandn_vx, zvkb_vx_check)
+GEN_OPIVI_TRANS(vandn_vi, IMM_SX, vandn_vx, zvkb_vx_check)
+
 static bool vxrev8_check(DisasContext *s, arg_rmr *a)
 {
     return s->cfg_ptr->ext_zvkb == true && vext_check_isa_ill(s) &&
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index b09fe5fa2a..900e68dfb0 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -135,3 +135,22 @@ GEN_VEXT_V(vrev8_v_b, 1)
 GEN_VEXT_V(vrev8_v_h, 2)
 GEN_VEXT_V(vrev8_v_w, 4)
 GEN_VEXT_V(vrev8_v_d, 8)
+
+#define DO_ANDN(a, b) ((b) & ~(a))
+RVVCALL(OPIVV2, vandn_vv_b, OP_UUU_B, H1, H1, H1, DO_ANDN)
+RVVCALL(OPIVV2, vandn_vv_h, OP_UUU_H, H2, H2, H2, DO_ANDN)
+RVVCALL(OPIVV2, vandn_vv_w, OP_UUU_W, H4, H4, H4, DO_ANDN)
+RVVCALL(OPIVV2, vandn_vv_d, OP_UUU_D, H8, H8, H8, DO_ANDN)
+GEN_VEXT_VV(vandn_vv_b, 1)
+GEN_VEXT_VV(vandn_vv_h, 2)
+GEN_VEXT_VV(vandn_vv_w, 4)
+GEN_VEXT_VV(vandn_vv_d, 8)
+
+RVVCALL(OPIVX2, vandn_vx_b, OP_UUU_B, H1, H1, DO_ANDN)
+RVVCALL(OPIVX2, vandn_vx_h, OP_UUU_H, H2, H2, DO_ANDN)
+RVVCALL(OPIVX2, vandn_vx_w, OP_UUU_W, H4, H4, DO_ANDN)
+RVVCALL(OPIVX2, vandn_vx_d, OP_UUU_D, H8, H8, DO_ANDN)
+GEN_VEXT_VX(vandn_vx_b, 1)
+GEN_VEXT_VX(vandn_vx_h, 2)
+GEN_VEXT_VX(vandn_vx_w, 4)
+GEN_VEXT_VX(vandn_vx_d, 8)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 09/39] target/riscv: Add vandn.[vv, vx, vi] decoding, translation and execution support
@ 2023-02-02 12:42   ` Lawrence Hunter
  0 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm

From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>

Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
---
 target/riscv/helper.h                      |  9 +++++++++
 target/riscv/insn32.decode                 |  3 +++
 target/riscv/insn_trans/trans_rvzvkb.c.inc |  5 +++++
 target/riscv/vcrypto_helper.c              | 19 +++++++++++++++++++
 4 files changed, 36 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index c980d52828..5de615ea78 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1171,3 +1171,12 @@ DEF_HELPER_5(vbrev8_v_b, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vbrev8_v_h, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vbrev8_v_w, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vbrev8_v_d, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vandn_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vandn_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vandn_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vandn_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vandn_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vandn_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vandn_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vandn_vx_d, void, ptr, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 342199abc0..d6f5e4d198 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -904,3 +904,6 @@ vror_vi         010100 . ..... ..... 011 ..... 1010111 @r_vm
 vror_vi2        010101 . ..... ..... 011 ..... 1010111 @r_vm
 vbrev8_v        010010 . ..... 01000 010 ..... 1010111 @r2_vm
 vrev8_v         010010 . ..... 01001 010 ..... 1010111 @r2_vm
+vandn_vi        000001 . ..... ..... 011 ..... 1010111 @r_vm
+vandn_vv        000001 . ..... ..... 000 ..... 1010111 @r_vm
+vandn_vx        000001 . ..... ..... 100 ..... 1010111 @r_vm
diff --git a/target/riscv/insn_trans/trans_rvzvkb.c.inc b/target/riscv/insn_trans/trans_rvzvkb.c.inc
index 18b362db92..a973b27bdd 100644
--- a/target/riscv/insn_trans/trans_rvzvkb.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvkb.c.inc
@@ -147,6 +147,11 @@ static bool trans_##NAME(DisasContext *s, arg_rmr * a)                 \
     return false;                                                      \
 }
 
+
+GEN_OPIVV_TRANS(vandn_vv, zvkb_vv_check)
+GEN_OPIVX_TRANS(vandn_vx, zvkb_vx_check)
+GEN_OPIVI_TRANS(vandn_vi, IMM_SX, vandn_vx, zvkb_vx_check)
+
 static bool vxrev8_check(DisasContext *s, arg_rmr *a)
 {
     return s->cfg_ptr->ext_zvkb == true && vext_check_isa_ill(s) &&
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index b09fe5fa2a..900e68dfb0 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -135,3 +135,22 @@ GEN_VEXT_V(vrev8_v_b, 1)
 GEN_VEXT_V(vrev8_v_h, 2)
 GEN_VEXT_V(vrev8_v_w, 4)
 GEN_VEXT_V(vrev8_v_d, 8)
+
+#define DO_ANDN(a, b) ((b) & ~(a))
+RVVCALL(OPIVV2, vandn_vv_b, OP_UUU_B, H1, H1, H1, DO_ANDN)
+RVVCALL(OPIVV2, vandn_vv_h, OP_UUU_H, H2, H2, H2, DO_ANDN)
+RVVCALL(OPIVV2, vandn_vv_w, OP_UUU_W, H4, H4, H4, DO_ANDN)
+RVVCALL(OPIVV2, vandn_vv_d, OP_UUU_D, H8, H8, H8, DO_ANDN)
+GEN_VEXT_VV(vandn_vv_b, 1)
+GEN_VEXT_VV(vandn_vv_h, 2)
+GEN_VEXT_VV(vandn_vv_w, 4)
+GEN_VEXT_VV(vandn_vv_d, 8)
+
+RVVCALL(OPIVX2, vandn_vx_b, OP_UUU_B, H1, H1, DO_ANDN)
+RVVCALL(OPIVX2, vandn_vx_h, OP_UUU_H, H2, H2, DO_ANDN)
+RVVCALL(OPIVX2, vandn_vx_w, OP_UUU_W, H4, H4, DO_ANDN)
+RVVCALL(OPIVX2, vandn_vx_d, OP_UUU_D, H8, H8, DO_ANDN)
+GEN_VEXT_VX(vandn_vx_b, 1)
+GEN_VEXT_VX(vandn_vx_h, 2)
+GEN_VEXT_VX(vandn_vx_w, 4)
+GEN_VEXT_VX(vandn_vx_d, 8)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 10/39] target/riscv: expose zvkb cpu property
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (8 preceding siblings ...)
  2023-02-02 12:42   ` [PATCH 09/39] target/riscv: Add vandn.[vv, vx, vi] " Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 14:23   ` Philipp Tomsich
  2023-02-02 12:42 ` [PATCH 11/39] target/riscv: add zvkns " Lawrence Hunter
                   ` (28 subsequent siblings)
  38 siblings, 1 reply; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm

From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>

Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
---
 target/riscv/cpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index bd34119c75..35790befc0 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1082,6 +1082,8 @@ static Property riscv_cpu_extensions[] = {
 
     DEFINE_PROP_BOOL("zmmul", RISCVCPU, cfg.ext_zmmul, false),
 
+    DEFINE_PROP_BOOL("zvkb", RISCVCPU, cfg.ext_zvkb, false),
+
     /* Vendor-specific custom extensions */
     DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, false),
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 11/39] target/riscv: add zvkns cpu property
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (9 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 10/39] target/riscv: expose zvkb cpu property Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 12/39] target/riscv: Add vaesef.vv decoding, translation and execution support Lawrence Hunter
                   ` (27 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm

From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>

Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
---
 target/riscv/cpu.c | 3 ++-
 target/riscv/cpu.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 35790befc0..fd09822b4f 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -101,6 +101,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
     ISA_EXT_DATA_ENTRY(zve32f, true, PRIV_VERSION_1_12_0, ext_zve32f),
     ISA_EXT_DATA_ENTRY(zve64f, true, PRIV_VERSION_1_12_0, ext_zve64f),
     ISA_EXT_DATA_ENTRY(zvkb, true, PRIV_VERSION_1_12_0, ext_zvkb),
+    ISA_EXT_DATA_ENTRY(zvkns, true, PRIV_VERSION_1_12_0, ext_zvkns),
     ISA_EXT_DATA_ENTRY(zhinx, true, PRIV_VERSION_1_12_0, ext_zhinx),
     ISA_EXT_DATA_ENTRY(zhinxmin, true, PRIV_VERSION_1_12_0, ext_zhinxmin),
     ISA_EXT_DATA_ENTRY(smaia, true, PRIV_VERSION_1_12_0, ext_smaia),
@@ -797,7 +798,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
          * In principle zve*{x,d} would also suffice here, were they supported
          * in qemu
          */
-        if (cpu->cfg.ext_zvkb &&
+        if ((cpu->cfg.ext_zvkb || cpu->cfg.ext_zvkns) &&
             !(cpu->cfg.ext_zve32f || cpu->cfg.ext_zve64f || cpu->cfg.ext_v)) {
             error_setg(
                 errp, "Vector crypto extensions require V or Zve* extensions");
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index d4824ad0bb..56008ef9b9 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -462,6 +462,7 @@ struct RISCVCPUConfig {
     bool ext_zve32f;
     bool ext_zve64f;
     bool ext_zvkb;
+    bool ext_zvkns;
     bool ext_zmmul;
     bool ext_smaia;
     bool ext_ssaia;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 12/39] target/riscv: Add vaesef.vv decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (10 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 11/39] target/riscv: add zvkns " Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 13/39] target/riscv: Add vaesef.vs " Lawrence Hunter
                   ` (26 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Lawrence Hunter

Co-authored-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
---
 target/riscv/helper.h                       |  2 +
 target/riscv/insn32.decode                  |  4 ++
 target/riscv/insn_trans/trans_rvzvkns.c.inc | 40 +++++++++++
 target/riscv/translate.c                    |  1 +
 target/riscv/vcrypto_helper.c               | 75 +++++++++++++++++++++
 5 files changed, 122 insertions(+)
 create mode 100644 target/riscv/insn_trans/trans_rvzvkns.c.inc

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 5de615ea78..bdfa63302e 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1180,3 +1180,5 @@ DEF_HELPER_6(vandn_vx_b, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vandn_vx_h, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vandn_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vandn_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_4(vaesef_vv, void, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index d6f5e4d198..fdaab9a189 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -74,6 +74,7 @@
 @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
 @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
 @r2      .......   ..... ..... ... ..... ....... &r2 %rs1 %rd
+@r2_vm_1 ...... . ..... ..... ... ..... ....... &rmr vm=1 %rs2 %rd
 @r2_nfvm ... ... vm:1 ..... ..... ... ..... ....... &r2nfvm %nf %rs1 %rd
 @r2_vm   ...... vm:1 ..... ..... ... ..... ....... &rmr %rs2 %rd
 @r1_vm   ...... vm:1 ..... ..... ... ..... ....... %rd
@@ -907,3 +908,6 @@ vrev8_v         010010 . ..... 01001 010 ..... 1010111 @r2_vm
 vandn_vi        000001 . ..... ..... 011 ..... 1010111 @r_vm
 vandn_vv        000001 . ..... ..... 000 ..... 1010111 @r_vm
 vandn_vx        000001 . ..... ..... 100 ..... 1010111 @r_vm
+
+# *** RV64 Zvkns vector crypto extension ***
+vaesef_vv       101000 1 ..... 00011 010 ..... 1110111 @r2_vm_1
diff --git a/target/riscv/insn_trans/trans_rvzvkns.c.inc b/target/riscv/insn_trans/trans_rvzvkns.c.inc
new file mode 100644
index 0000000000..16e2c58001
--- /dev/null
+++ b/target/riscv/insn_trans/trans_rvzvkns.c.inc
@@ -0,0 +1,40 @@
+#define GEN_V_UNMASKED_TRANS(NAME, CHECK)                                 \
+static bool trans_##NAME(DisasContext *s, arg_##NAME * a)                 \
+{                                                                         \
+    if (CHECK(s, a)) {                                                    \
+        TCGv_ptr rd_v, rs2_v;                                             \
+        TCGv_i32 desc;                                                    \
+        uint32_t data = 0;                                                \
+                                                                          \
+        TCGLabel *over = gen_new_label();                                 \
+        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);                 \
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);        \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                        \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                    \
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);                      \
+        data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);    \
+        data = FIELD_DP32(data, VDATA, VMA, s->vma);                      \
+        rd_v = tcg_temp_new_ptr();                                        \
+        rs2_v = tcg_temp_new_ptr();                                       \
+        desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlen / 8,           \
+                                          s->cfg_ptr->vlen / 8, data));   \
+        tcg_gen_addi_ptr(rd_v, cpu_env, vreg_ofs(s, a->rd));              \
+        tcg_gen_addi_ptr(rs2_v, cpu_env, vreg_ofs(s, a->rs2));            \
+        gen_helper_##NAME(rd_v, rs2_v, cpu_env, desc);                    \
+        tcg_temp_free_ptr(rd_v);                                          \
+        tcg_temp_free_ptr(rs2_v);                                         \
+        mark_vs_dirty(s);                                                 \
+        gen_set_label(over);                                              \
+        return true;                                                      \
+    }                                                                     \
+    return false;                                                         \
+}
+
+
+static bool vaes_check_vv(DisasContext *s, arg_rmr *a)
+{
+    return s->cfg_ptr->ext_zvkns == true && vext_check_isa_ill(s) &&
+           require_align(a->rd, s->lmul) && require_align(a->rs2, s->lmul) &&
+           s->vstart % 4 == 0 && s->sew == MO_32;
+}
+GEN_V_UNMASKED_TRANS(vaesef_vv, vaes_check_vv)
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 71684c10f3..8d39743de2 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -1064,6 +1064,7 @@ static uint32_t opcode_at(DisasContextBase *dcbase, target_ulong pc)
 #include "insn_trans/trans_rvzfh.c.inc"
 #include "insn_trans/trans_rvk.c.inc"
 #include "insn_trans/trans_rvzvkb.c.inc"
+#include "insn_trans/trans_rvzvkns.c.inc"
 #include "insn_trans/trans_privileged.c.inc"
 #include "insn_trans/trans_svinval.c.inc"
 #include "insn_trans/trans_xventanacondops.c.inc"
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index 900e68dfb0..59d7d32a05 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -3,6 +3,7 @@
 #include "qemu/bitops.h"
 #include "qemu/bswap.h"
 #include "cpu.h"
+#include "crypto/aes.h"
 #include "exec/memop.h"
 #include "exec/exec-all.h"
 #include "exec/helper-proto.h"
@@ -154,3 +155,77 @@ GEN_VEXT_VX(vandn_vx_b, 1)
 GEN_VEXT_VX(vandn_vx_h, 2)
 GEN_VEXT_VX(vandn_vx_w, 4)
 GEN_VEXT_VX(vandn_vx_d, 8)
+
+static inline void aes_sub_bytes(uint8_t round_state[4][4])
+{
+    for (int j = 0; j < 16; j++) {
+        round_state[j / 4][j % 4] = AES_sbox[round_state[j / 4][j % 4]];
+    }
+}
+
+static inline void aes_shift_bytes(uint8_t round_state[4][4])
+{
+    uint8_t temp;
+    temp = round_state[0][1];
+    round_state[0][1] = round_state[1][1];
+    round_state[1][1] = round_state[2][1];
+    round_state[2][1] = round_state[3][1];
+    round_state[3][1] = temp;
+    temp = round_state[0][2];
+    round_state[0][2] = round_state[2][2];
+    round_state[2][2] = temp;
+    temp = round_state[1][2];
+    round_state[1][2] = round_state[3][2];
+    round_state[3][2] = temp;
+    temp = round_state[0][3];
+    round_state[0][3] = round_state[3][3];
+    round_state[3][3] = round_state[2][3];
+    round_state[2][3] = round_state[1][3];
+    round_state[1][3] = temp;
+}
+
+static inline void xor_round_key(uint8_t round_state[4][4], uint8_t *round_key)
+{
+    for (int j = 0; j < 16; j++) {
+        round_state[j / 4][j % 4] = round_state[j / 4][j % 4] ^ (round_key)[j];
+    }
+}
+
+#define GEN_ZVKNS_HELPER_VV(NAME, ...)                                    \
+void HELPER(NAME)(void *vd_vptr, void *vs2_vptr, CPURISCVState *env,      \
+                  uint32_t desc)                                          \
+{                                                                         \
+    uint64_t *vd = vd_vptr;                                               \
+    uint64_t *vs2 = vs2_vptr;                                             \
+    uint32_t vl = env->vl;                                                \
+    uint32_t total_elems = vext_get_total_elems(env, desc, 4);            \
+    uint32_t vta = vext_vta(desc);                                        \
+    if (env->vl % 4 != 0) {                                               \
+        riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());     \
+    }                                                                     \
+    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {            \
+        uint64_t round_key[2] = {                                         \
+            cpu_to_le64(vs2[i * 2 + 0]),                                  \
+            cpu_to_le64(vs2[i * 2 + 1]),                                  \
+        };                                                                \
+        uint8_t round_state[4][4];                                        \
+        cpu_to_le64s(vd + i * 2 + 0);                                     \
+        cpu_to_le64s(vd + i * 2 + 1);                                     \
+        for (int j = 0; j < 16; j++) {                                    \
+            round_state[j / 4][j % 4] = ((uint8_t *)(vd + i * 2))[j];     \
+        }                                                                 \
+        __VA_ARGS__;                                                      \
+        for (int j = 0; j < 16; j++) {                                    \
+            ((uint8_t *)(vd + i * 2))[j] = round_state[j / 4][j % 4];     \
+        }                                                                 \
+        le64_to_cpus(vd + i * 2 + 0);                                     \
+        le64_to_cpus(vd + i * 2 + 1);                                     \
+    }                                                                     \
+    env->vstart = 0;                                                      \
+    /* set tail elements to 1s */                                         \
+    vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);                  \
+}
+
+GEN_ZVKNS_HELPER_VV(vaesef_vv, aes_sub_bytes(round_state);
+                    aes_shift_bytes(round_state);
+                    xor_round_key(round_state, (uint8_t *)round_key);)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 13/39] target/riscv: Add vaesef.vs decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (11 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 12/39] target/riscv: Add vaesef.vv decoding, translation and execution support Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 14/39] target/riscv: Add vaesdf.vv " Lawrence Hunter
                   ` (25 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Lawrence Hunter

Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
---
 target/riscv/helper.h                       |  1 +
 target/riscv/insn32.decode                  |  1 +
 target/riscv/insn_trans/trans_rvzvkns.c.inc | 16 +++++++++
 target/riscv/vcrypto_helper.c               | 38 +++++++++++++++++++++
 4 files changed, 56 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index bdfa63302e..42349837ef 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1182,3 +1182,4 @@ DEF_HELPER_6(vandn_vx_w, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_6(vandn_vx_d, void, ptr, ptr, tl, ptr, env, i32)
 
 DEF_HELPER_4(vaesef_vv, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesef_vs, void, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index fdaab9a189..0d65c2ea27 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -911,3 +911,4 @@ vandn_vx        000001 . ..... ..... 100 ..... 1010111 @r_vm
 
 # *** RV64 Zvkns vector crypto extension ***
 vaesef_vv       101000 1 ..... 00011 010 ..... 1110111 @r2_vm_1
+vaesef_vs       101001 1 ..... 00011 010 ..... 1110111 @r2_vm_1
diff --git a/target/riscv/insn_trans/trans_rvzvkns.c.inc b/target/riscv/insn_trans/trans_rvzvkns.c.inc
index 16e2c58001..2317af02a9 100644
--- a/target/riscv/insn_trans/trans_rvzvkns.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvkns.c.inc
@@ -37,4 +37,20 @@ static bool vaes_check_vv(DisasContext *s, arg_rmr *a)
            require_align(a->rd, s->lmul) && require_align(a->rs2, s->lmul) &&
            s->vstart % 4 == 0 && s->sew == MO_32;
 }
+
+static bool vaes_check_overlap(DisasContext *s, int vd, int vs2)
+{
+    int8_t op_size = s->lmul <= 0 ? 1 : 1 << s->lmul;
+    return !is_overlapped(vd, op_size, vs2, op_size);
+}
+
+static bool vaes_check_vs(DisasContext *s, arg_rmr *a)
+{
+    return vaes_check_overlap(s, a->rd, a->rs2) &&
+           s->cfg_ptr->ext_zvkns == true && vext_check_isa_ill(s) &&
+           require_align(a->rd, s->lmul) && s->vstart % 4 == 0 &&
+           s->sew == MO_32;
+}
+
 GEN_V_UNMASKED_TRANS(vaesef_vv, vaes_check_vv)
+GEN_V_UNMASKED_TRANS(vaesef_vs, vaes_check_vs)
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index 59d7d32a05..f59e090c03 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -226,6 +226,44 @@ void HELPER(NAME)(void *vd_vptr, void *vs2_vptr, CPURISCVState *env,      \
     vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);                  \
 }
 
+#define GEN_ZVKNS_HELPER_VS(NAME, ...)                                    \
+void HELPER(NAME)(void *vd_vptr, void *vs2_vptr, CPURISCVState *env,      \
+                  uint32_t desc)                                          \
+{                                                                         \
+    uint64_t *vd = vd_vptr;                                               \
+    uint64_t *vs2 = vs2_vptr;                                             \
+    uint32_t vl = env->vl;                                                \
+    uint32_t total_elems = vext_get_total_elems(env, desc, 4);            \
+    uint32_t vta = vext_vta(desc);                                        \
+    if (env->vl % 4 != 0) {                                               \
+        riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());     \
+    }                                                                     \
+    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {            \
+        uint64_t round_key[2] = {                                         \
+            cpu_to_le64(vs2[0]),                                          \
+            cpu_to_le64(vs2[1]),                                          \
+        };                                                                \
+        uint8_t round_state[4][4];                                        \
+        cpu_to_le64s(vd + i * 2 + 0);                                     \
+        cpu_to_le64s(vd + i * 2 + 1);                                     \
+        for (int j = 0; j < 16; j++) {                                    \
+            round_state[j / 4][j % 4] = ((uint8_t *)(vd + i * 2))[j];     \
+        }                                                                 \
+        __VA_ARGS__;                                                      \
+        for (int j = 0; j < 16; j++) {                                    \
+            ((uint8_t *)(vd + i * 2))[j] = round_state[j / 4][j % 4];     \
+        }                                                                 \
+        le64_to_cpus(vd + i * 2 + 0);                                     \
+        le64_to_cpus(vd + i * 2 + 1);                                     \
+    }                                                                     \
+    env->vstart = 0;                                                      \
+    /* set tail elements to 1s */                                         \
+    vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);                  \
+}
+
 GEN_ZVKNS_HELPER_VV(vaesef_vv, aes_sub_bytes(round_state);
                     aes_shift_bytes(round_state);
                     xor_round_key(round_state, (uint8_t *)round_key);)
+GEN_ZVKNS_HELPER_VS(vaesef_vs, aes_sub_bytes(round_state);
+                    aes_shift_bytes(round_state);
+                    xor_round_key(round_state, (uint8_t *)round_key);)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 14/39] target/riscv: Add vaesdf.vv decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (12 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 13/39] target/riscv: Add vaesef.vs " Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 15/39] target/riscv: Add vaesdf.vs " Lawrence Hunter
                   ` (24 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Lawrence Hunter

Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
---
 target/riscv/helper.h                       |  1 +
 target/riscv/insn32.decode                  |  1 +
 target/riscv/insn_trans/trans_rvzvkns.c.inc |  1 +
 target/riscv/vcrypto_helper.c               | 31 +++++++++++++++++++++
 4 files changed, 34 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 42349837ef..b7696cb6c4 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1183,3 +1183,4 @@ DEF_HELPER_6(vandn_vx_d, void, ptr, ptr, tl, ptr, env, i32)
 
 DEF_HELPER_4(vaesef_vv, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesef_vs, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesdf_vv, void, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 0d65c2ea27..ca907dac85 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -912,3 +912,4 @@ vandn_vx        000001 . ..... ..... 100 ..... 1010111 @r_vm
 # *** RV64 Zvkns vector crypto extension ***
 vaesef_vv       101000 1 ..... 00011 010 ..... 1110111 @r2_vm_1
 vaesef_vs       101001 1 ..... 00011 010 ..... 1110111 @r2_vm_1
+vaesdf_vv       101000 1 ..... 00001 010 ..... 1110111 @r2_vm_1
diff --git a/target/riscv/insn_trans/trans_rvzvkns.c.inc b/target/riscv/insn_trans/trans_rvzvkns.c.inc
index 2317af02a9..2e09deeb84 100644
--- a/target/riscv/insn_trans/trans_rvzvkns.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvkns.c.inc
@@ -54,3 +54,4 @@ static bool vaes_check_vs(DisasContext *s, arg_rmr *a)
 
 GEN_V_UNMASKED_TRANS(vaesef_vv, vaes_check_vv)
 GEN_V_UNMASKED_TRANS(vaesef_vs, vaes_check_vs)
+GEN_V_UNMASKED_TRANS(vaesdf_vv, vaes_check_vv)
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index f59e090c03..24b5336fa7 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -191,6 +191,34 @@ static inline void xor_round_key(uint8_t round_state[4][4], uint8_t *round_key)
     }
 }
 
+static inline void aes_inv_sub_bytes(uint8_t round_state[4][4])
+{
+    for (int j = 0; j < 16; j++) {
+        round_state[j / 4][j % 4] = AES_isbox[round_state[j / 4][j % 4]];
+    }
+}
+
+static inline void aes_inv_shift_bytes(uint8_t round_state[4][4])
+{
+    uint8_t temp;
+    temp = round_state[3][1];
+    round_state[3][1] = round_state[2][1];
+    round_state[2][1] = round_state[1][1];
+    round_state[1][1] = round_state[0][1];
+    round_state[0][1] = temp;
+    temp = round_state[0][2];
+    round_state[0][2] = round_state[2][2];
+    round_state[2][2] = temp;
+    temp = round_state[1][2];
+    round_state[1][2] = round_state[3][2];
+    round_state[3][2] = temp;
+    temp = round_state[0][3];
+    round_state[0][3] = round_state[1][3];
+    round_state[1][3] = round_state[2][3];
+    round_state[2][3] = round_state[3][3];
+    round_state[3][3] = temp;
+}
+
 #define GEN_ZVKNS_HELPER_VV(NAME, ...)                                    \
 void HELPER(NAME)(void *vd_vptr, void *vs2_vptr, CPURISCVState *env,      \
                   uint32_t desc)                                          \
@@ -267,3 +295,6 @@ GEN_ZVKNS_HELPER_VV(vaesef_vv, aes_sub_bytes(round_state);
 GEN_ZVKNS_HELPER_VS(vaesef_vs, aes_sub_bytes(round_state);
                     aes_shift_bytes(round_state);
                     xor_round_key(round_state, (uint8_t *)round_key);)
+GEN_ZVKNS_HELPER_VV(vaesdf_vv, aes_inv_shift_bytes(round_state);
+                    aes_inv_sub_bytes(round_state);
+                    xor_round_key(round_state, (uint8_t *)round_key);)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 15/39] target/riscv: Add vaesdf.vs decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (13 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 14/39] target/riscv: Add vaesdf.vv " Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 16/39] target/riscv: Add vaesdm.vv " Lawrence Hunter
                   ` (23 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Lawrence Hunter

Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
---
 target/riscv/helper.h                       | 1 +
 target/riscv/insn32.decode                  | 1 +
 target/riscv/insn_trans/trans_rvzvkns.c.inc | 1 +
 target/riscv/vcrypto_helper.c               | 3 +++
 4 files changed, 6 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index b7696cb6c4..020b9861cf 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1184,3 +1184,4 @@ DEF_HELPER_6(vandn_vx_d, void, ptr, ptr, tl, ptr, env, i32)
 DEF_HELPER_4(vaesef_vv, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesef_vs, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesdf_vv, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesdf_vs, void, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index ca907dac85..33e5883937 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -913,3 +913,4 @@ vandn_vx        000001 . ..... ..... 100 ..... 1010111 @r_vm
 vaesef_vv       101000 1 ..... 00011 010 ..... 1110111 @r2_vm_1
 vaesef_vs       101001 1 ..... 00011 010 ..... 1110111 @r2_vm_1
 vaesdf_vv       101000 1 ..... 00001 010 ..... 1110111 @r2_vm_1
+vaesdf_vs       101001 1 ..... 00001 010 ..... 1110111 @r2_vm_1
diff --git a/target/riscv/insn_trans/trans_rvzvkns.c.inc b/target/riscv/insn_trans/trans_rvzvkns.c.inc
index 2e09deeb84..2d6a92c153 100644
--- a/target/riscv/insn_trans/trans_rvzvkns.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvkns.c.inc
@@ -55,3 +55,4 @@ static bool vaes_check_vs(DisasContext *s, arg_rmr *a)
 GEN_V_UNMASKED_TRANS(vaesef_vv, vaes_check_vv)
 GEN_V_UNMASKED_TRANS(vaesef_vs, vaes_check_vs)
 GEN_V_UNMASKED_TRANS(vaesdf_vv, vaes_check_vv)
+GEN_V_UNMASKED_TRANS(vaesdf_vs, vaes_check_vs)
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index 24b5336fa7..d3ba5e9cb8 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -298,3 +298,6 @@ GEN_ZVKNS_HELPER_VS(vaesef_vs, aes_sub_bytes(round_state);
 GEN_ZVKNS_HELPER_VV(vaesdf_vv, aes_inv_shift_bytes(round_state);
                     aes_inv_sub_bytes(round_state);
                     xor_round_key(round_state, (uint8_t *)round_key);)
+GEN_ZVKNS_HELPER_VS(vaesdf_vs, aes_inv_shift_bytes(round_state);
+                    aes_inv_sub_bytes(round_state);
+                    xor_round_key(round_state, (uint8_t *)round_key);)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 16/39] target/riscv: Add vaesdm.vv decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (14 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 15/39] target/riscv: Add vaesdf.vs " Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 17/39] target/riscv: Add vaesdm.vs " Lawrence Hunter
                   ` (22 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Lawrence Hunter

Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
---
 target/riscv/helper.h                       |  1 +
 target/riscv/insn32.decode                  |  1 +
 target/riscv/insn_trans/trans_rvzvkns.c.inc |  1 +
 target/riscv/vcrypto_helper.c               | 36 +++++++++++++++++++++
 4 files changed, 39 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 020b9861cf..328a026039 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1185,3 +1185,4 @@ DEF_HELPER_4(vaesef_vv, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesef_vs, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesdf_vv, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesdf_vs, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesdm_vv, void, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 33e5883937..e5f3af999d 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -914,3 +914,4 @@ vaesef_vv       101000 1 ..... 00011 010 ..... 1110111 @r2_vm_1
 vaesef_vs       101001 1 ..... 00011 010 ..... 1110111 @r2_vm_1
 vaesdf_vv       101000 1 ..... 00001 010 ..... 1110111 @r2_vm_1
 vaesdf_vs       101001 1 ..... 00001 010 ..... 1110111 @r2_vm_1
+vaesdm_vv       101000 1 ..... 00000 010 ..... 1110111 @r2_vm_1
diff --git a/target/riscv/insn_trans/trans_rvzvkns.c.inc b/target/riscv/insn_trans/trans_rvzvkns.c.inc
index 2d6a92c153..6cb25d0d88 100644
--- a/target/riscv/insn_trans/trans_rvzvkns.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvkns.c.inc
@@ -56,3 +56,4 @@ GEN_V_UNMASKED_TRANS(vaesef_vv, vaes_check_vv)
 GEN_V_UNMASKED_TRANS(vaesef_vs, vaes_check_vs)
 GEN_V_UNMASKED_TRANS(vaesdf_vv, vaes_check_vv)
 GEN_V_UNMASKED_TRANS(vaesdf_vs, vaes_check_vs)
+GEN_V_UNMASKED_TRANS(vaesdm_vv, vaes_check_vv)
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index d3ba5e9cb8..bac21b5623 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -219,6 +219,38 @@ static inline void aes_inv_shift_bytes(uint8_t round_state[4][4])
     round_state[3][3] = temp;
 }
 
+static inline uint8_t xtime(uint8_t x)
+{
+    return (x << 1) ^ (((x >> 7) & 1) * 0x1b);
+}
+
+static inline uint8_t multiply(uint8_t x, uint8_t y)
+{
+    return (((y & 1) * x) ^ ((y >> 1 & 1) * xtime(x)) ^
+            ((y >> 2 & 1) * xtime(xtime(x))) ^
+            ((y >> 3 & 1) * xtime(xtime(xtime(x)))) ^
+            ((y >> 4 & 1) * xtime(xtime(xtime(xtime(x))))));
+}
+
+static inline void aes_inv_mix_cols(uint8_t round_state[4][4])
+{
+    uint8_t a, b, c, d;
+    for (int j = 0; j < 4; ++j) {
+        a = round_state[j][0];
+        b = round_state[j][1];
+        c = round_state[j][2];
+        d = round_state[j][3];
+        round_state[j][0] = multiply(a, 0x0e) ^ multiply(b, 0x0b) ^
+                            multiply(c, 0x0d) ^ multiply(d, 0x09);
+        round_state[j][1] = multiply(a, 0x09) ^ multiply(b, 0x0e) ^
+                            multiply(c, 0x0b) ^ multiply(d, 0x0d);
+        round_state[j][2] = multiply(a, 0x0d) ^ multiply(b, 0x09) ^
+                            multiply(c, 0x0e) ^ multiply(d, 0x0b);
+        round_state[j][3] = multiply(a, 0x0b) ^ multiply(b, 0x0d) ^
+                            multiply(c, 0x09) ^ multiply(d, 0x0e);
+    }
+}
+
 #define GEN_ZVKNS_HELPER_VV(NAME, ...)                                    \
 void HELPER(NAME)(void *vd_vptr, void *vs2_vptr, CPURISCVState *env,      \
                   uint32_t desc)                                          \
@@ -301,3 +333,7 @@ GEN_ZVKNS_HELPER_VV(vaesdf_vv, aes_inv_shift_bytes(round_state);
 GEN_ZVKNS_HELPER_VS(vaesdf_vs, aes_inv_shift_bytes(round_state);
                     aes_inv_sub_bytes(round_state);
                     xor_round_key(round_state, (uint8_t *)round_key);)
+GEN_ZVKNS_HELPER_VV(vaesdm_vv, aes_inv_shift_bytes(round_state);
+                    aes_inv_sub_bytes(round_state);
+                    xor_round_key(round_state, (uint8_t *)round_key);
+                    aes_inv_mix_cols(round_state);)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 17/39] target/riscv: Add vaesdm.vs decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (15 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 16/39] target/riscv: Add vaesdm.vv " Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 18/39] target/riscv: Add vaesz.vs " Lawrence Hunter
                   ` (21 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Lawrence Hunter

Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
---
 target/riscv/helper.h                       | 1 +
 target/riscv/insn32.decode                  | 1 +
 target/riscv/insn_trans/trans_rvzvkns.c.inc | 1 +
 target/riscv/vcrypto_helper.c               | 4 ++++
 4 files changed, 7 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 328a026039..9895bf5712 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1186,3 +1186,4 @@ DEF_HELPER_4(vaesef_vs, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesdf_vv, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesdf_vs, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesdm_vv, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesdm_vs, void, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index e5f3af999d..753039e954 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -915,3 +915,4 @@ vaesef_vs       101001 1 ..... 00011 010 ..... 1110111 @r2_vm_1
 vaesdf_vv       101000 1 ..... 00001 010 ..... 1110111 @r2_vm_1
 vaesdf_vs       101001 1 ..... 00001 010 ..... 1110111 @r2_vm_1
 vaesdm_vv       101000 1 ..... 00000 010 ..... 1110111 @r2_vm_1
+vaesdm_vs       101001 1 ..... 00000 010 ..... 1110111 @r2_vm_1
diff --git a/target/riscv/insn_trans/trans_rvzvkns.c.inc b/target/riscv/insn_trans/trans_rvzvkns.c.inc
index 6cb25d0d88..1459cb6d26 100644
--- a/target/riscv/insn_trans/trans_rvzvkns.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvkns.c.inc
@@ -57,3 +57,4 @@ GEN_V_UNMASKED_TRANS(vaesef_vs, vaes_check_vs)
 GEN_V_UNMASKED_TRANS(vaesdf_vv, vaes_check_vv)
 GEN_V_UNMASKED_TRANS(vaesdf_vs, vaes_check_vs)
 GEN_V_UNMASKED_TRANS(vaesdm_vv, vaes_check_vv)
+GEN_V_UNMASKED_TRANS(vaesdm_vs, vaes_check_vs)
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index bac21b5623..699bf25bbd 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -337,3 +337,7 @@ GEN_ZVKNS_HELPER_VV(vaesdm_vv, aes_inv_shift_bytes(round_state);
                     aes_inv_sub_bytes(round_state);
                     xor_round_key(round_state, (uint8_t *)round_key);
                     aes_inv_mix_cols(round_state);)
+GEN_ZVKNS_HELPER_VS(vaesdm_vs, aes_inv_shift_bytes(round_state);
+                    aes_inv_sub_bytes(round_state);
+                    xor_round_key(round_state, (uint8_t *)round_key);
+                    aes_inv_mix_cols(round_state);)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 18/39] target/riscv: Add vaesz.vs decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (16 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 17/39] target/riscv: Add vaesdm.vs " Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 19/39] target/riscv: Add vaesem.vv " Lawrence Hunter
                   ` (20 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Lawrence Hunter

Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
---
 target/riscv/helper.h                       |  1 +
 target/riscv/insn32.decode                  |  1 +
 target/riscv/insn_trans/trans_rvzvkns.c.inc | 13 ++++++++-----
 target/riscv/vcrypto_helper.c               |  2 ++
 4 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 9895bf5712..def126a59b 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1187,3 +1187,4 @@ DEF_HELPER_4(vaesdf_vv, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesdf_vs, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesdm_vv, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesdm_vs, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesz_vs, void, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 753039e954..b4ddc2586c 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -916,3 +916,4 @@ vaesdf_vv       101000 1 ..... 00001 010 ..... 1110111 @r2_vm_1
 vaesdf_vs       101001 1 ..... 00001 010 ..... 1110111 @r2_vm_1
 vaesdm_vv       101000 1 ..... 00000 010 ..... 1110111 @r2_vm_1
 vaesdm_vs       101001 1 ..... 00000 010 ..... 1110111 @r2_vm_1
+vaesz_vs        101001 1 ..... 00111 010 ..... 1110111 @r2_vm_1
diff --git a/target/riscv/insn_trans/trans_rvzvkns.c.inc b/target/riscv/insn_trans/trans_rvzvkns.c.inc
index 1459cb6d26..022fdeec00 100644
--- a/target/riscv/insn_trans/trans_rvzvkns.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvkns.c.inc
@@ -41,15 +41,17 @@ static bool vaes_check_vv(DisasContext *s, arg_rmr *a)
 static bool vaes_check_overlap(DisasContext *s, int vd, int vs2)
 {
     int8_t op_size = s->lmul <= 0 ? 1 : 1 << s->lmul;
-    return !is_overlapped(vd, op_size, vs2, op_size);
+    return !is_overlapped(vd, op_size, vs2, 1);
 }
 
 static bool vaes_check_vs(DisasContext *s, arg_rmr *a)
 {
-    return vaes_check_overlap(s, a->rd, a->rs2) &&
-           s->cfg_ptr->ext_zvkns == true && vext_check_isa_ill(s) &&
-           require_align(a->rd, s->lmul) && s->vstart % 4 == 0 &&
-           s->sew == MO_32;
+    return require_rvv(s) &&
+           vaes_check_overlap(s, a->rd, a->rs2) &&
+           s->cfg_ptr->ext_zvkns == true &&
+           vext_check_isa_ill(s) &&
+           require_align(a->rd, s->lmul) &&
+           s->vstart % 4 == 0 && s->sew == MO_32;
 }
 
 GEN_V_UNMASKED_TRANS(vaesef_vv, vaes_check_vv)
@@ -58,3 +60,4 @@ GEN_V_UNMASKED_TRANS(vaesdf_vv, vaes_check_vv)
 GEN_V_UNMASKED_TRANS(vaesdf_vs, vaes_check_vs)
 GEN_V_UNMASKED_TRANS(vaesdm_vv, vaes_check_vv)
 GEN_V_UNMASKED_TRANS(vaesdm_vs, vaes_check_vs)
+GEN_V_UNMASKED_TRANS(vaesz_vs, vaes_check_vs)
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index 699bf25bbd..39e2498b7d 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -341,3 +341,5 @@ GEN_ZVKNS_HELPER_VS(vaesdm_vs, aes_inv_shift_bytes(round_state);
                     aes_inv_sub_bytes(round_state);
                     xor_round_key(round_state, (uint8_t *)round_key);
                     aes_inv_mix_cols(round_state);)
+GEN_ZVKNS_HELPER_VS(vaesz_vs,
+                    xor_round_key(round_state, (uint8_t *)round_key);)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 19/39] target/riscv: Add vaesem.vv decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (17 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 18/39] target/riscv: Add vaesz.vs " Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 20/39] target/riscv: Add vaesem.vs " Lawrence Hunter
                   ` (19 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	William Salmon

From: William Salmon <will.salmon@codethink.co.uk>

Signed-off-by: William Salmon <will.salmon@codethink.co.uk>
---
 target/riscv/helper.h                       |  1 +
 target/riscv/insn32.decode                  |  1 +
 target/riscv/insn_trans/trans_rvzvkns.c.inc |  1 +
 target/riscv/vcrypto_helper.c               | 17 +++++++++++++++++
 4 files changed, 20 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index def126a59b..4bfc9a3387 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1185,6 +1185,7 @@ DEF_HELPER_4(vaesef_vv, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesef_vs, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesdf_vv, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesdf_vs, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesem_vv, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesdm_vv, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesdm_vs, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesz_vs, void, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index b4ddc2586c..f31414f72c 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -914,6 +914,7 @@ vaesef_vv       101000 1 ..... 00011 010 ..... 1110111 @r2_vm_1
 vaesef_vs       101001 1 ..... 00011 010 ..... 1110111 @r2_vm_1
 vaesdf_vv       101000 1 ..... 00001 010 ..... 1110111 @r2_vm_1
 vaesdf_vs       101001 1 ..... 00001 010 ..... 1110111 @r2_vm_1
+vaesem_vv       101000 1 ..... 00010 010 ..... 1110111 @r2_vm_1
 vaesdm_vv       101000 1 ..... 00000 010 ..... 1110111 @r2_vm_1
 vaesdm_vs       101001 1 ..... 00000 010 ..... 1110111 @r2_vm_1
 vaesz_vs        101001 1 ..... 00111 010 ..... 1110111 @r2_vm_1
diff --git a/target/riscv/insn_trans/trans_rvzvkns.c.inc b/target/riscv/insn_trans/trans_rvzvkns.c.inc
index 022fdeec00..e9080c61d2 100644
--- a/target/riscv/insn_trans/trans_rvzvkns.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvkns.c.inc
@@ -61,3 +61,4 @@ GEN_V_UNMASKED_TRANS(vaesdf_vs, vaes_check_vs)
 GEN_V_UNMASKED_TRANS(vaesdm_vv, vaes_check_vv)
 GEN_V_UNMASKED_TRANS(vaesdm_vs, vaes_check_vs)
 GEN_V_UNMASKED_TRANS(vaesz_vs, vaes_check_vs)
+GEN_V_UNMASKED_TRANS(vaesem_vv, vaes_check_vv)
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index 39e2498b7d..a1d66a64aa 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -251,6 +251,20 @@ static inline void aes_inv_mix_cols(uint8_t round_state[4][4])
     }
 }
 
+static inline void aes_mix_cols(uint8_t round_state[4][4])
+{
+    uint8_t a, b;
+    for (int j = 0; j < 4; ++j) {
+        a = round_state[j][0];
+        b = round_state[j][0] ^ round_state[j][1] ^ round_state[j][2] ^
+            round_state[j][3];
+        round_state[j][0] ^= xtime(round_state[j][0] ^ round_state[j][1]) ^ b;
+        round_state[j][1] ^= xtime(round_state[j][1] ^ round_state[j][2]) ^ b;
+        round_state[j][2] ^= xtime(round_state[j][2] ^ round_state[j][3]) ^ b;
+        round_state[j][3] ^= xtime(round_state[j][3] ^ a) ^ b;
+    }
+}
+
 #define GEN_ZVKNS_HELPER_VV(NAME, ...)                                    \
 void HELPER(NAME)(void *vd_vptr, void *vs2_vptr, CPURISCVState *env,      \
                   uint32_t desc)                                          \
@@ -333,6 +347,9 @@ GEN_ZVKNS_HELPER_VV(vaesdf_vv, aes_inv_shift_bytes(round_state);
 GEN_ZVKNS_HELPER_VS(vaesdf_vs, aes_inv_shift_bytes(round_state);
                     aes_inv_sub_bytes(round_state);
                     xor_round_key(round_state, (uint8_t *)round_key);)
+GEN_ZVKNS_HELPER_VV(vaesem_vv, aes_shift_bytes(round_state);
+                    aes_sub_bytes(round_state); aes_mix_cols(round_state);
+                    xor_round_key(round_state, (uint8_t *)round_key);)
 GEN_ZVKNS_HELPER_VV(vaesdm_vv, aes_inv_shift_bytes(round_state);
                     aes_inv_sub_bytes(round_state);
                     xor_round_key(round_state, (uint8_t *)round_key);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 20/39] target/riscv: Add vaesem.vs decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (18 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 19/39] target/riscv: Add vaesem.vv " Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 21/39] target/riscv: Add vaeskf1.vi " Lawrence Hunter
                   ` (18 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	William Salmon

From: William Salmon <will.salmon@codethink.co.uk>

Signed-off-by: William Salmon <will.salmon@codethink.co.uk>
---
 target/riscv/helper.h                       | 1 +
 target/riscv/insn32.decode                  | 1 +
 target/riscv/insn_trans/trans_rvzvkns.c.inc | 1 +
 target/riscv/vcrypto_helper.c               | 3 +++
 4 files changed, 6 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 4bfc9a3387..85981f2cad 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1186,6 +1186,7 @@ DEF_HELPER_4(vaesef_vs, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesdf_vv, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesdf_vs, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesem_vv, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesem_vs, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesdm_vv, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesdm_vs, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesz_vs, void, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index f31414f72c..cc91ca8794 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -915,6 +915,7 @@ vaesef_vs       101001 1 ..... 00011 010 ..... 1110111 @r2_vm_1
 vaesdf_vv       101000 1 ..... 00001 010 ..... 1110111 @r2_vm_1
 vaesdf_vs       101001 1 ..... 00001 010 ..... 1110111 @r2_vm_1
 vaesem_vv       101000 1 ..... 00010 010 ..... 1110111 @r2_vm_1
+vaesem_vs       101001 1 ..... 00010 010 ..... 1110111 @r2_vm_1
 vaesdm_vv       101000 1 ..... 00000 010 ..... 1110111 @r2_vm_1
 vaesdm_vs       101001 1 ..... 00000 010 ..... 1110111 @r2_vm_1
 vaesz_vs        101001 1 ..... 00111 010 ..... 1110111 @r2_vm_1
diff --git a/target/riscv/insn_trans/trans_rvzvkns.c.inc b/target/riscv/insn_trans/trans_rvzvkns.c.inc
index e9080c61d2..fd79b384ac 100644
--- a/target/riscv/insn_trans/trans_rvzvkns.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvkns.c.inc
@@ -62,3 +62,4 @@ GEN_V_UNMASKED_TRANS(vaesdm_vv, vaes_check_vv)
 GEN_V_UNMASKED_TRANS(vaesdm_vs, vaes_check_vs)
 GEN_V_UNMASKED_TRANS(vaesz_vs, vaes_check_vs)
 GEN_V_UNMASKED_TRANS(vaesem_vv, vaes_check_vv)
+GEN_V_UNMASKED_TRANS(vaesem_vs, vaes_check_vs)
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index a1d66a64aa..883739f4ac 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -350,6 +350,9 @@ GEN_ZVKNS_HELPER_VS(vaesdf_vs, aes_inv_shift_bytes(round_state);
 GEN_ZVKNS_HELPER_VV(vaesem_vv, aes_shift_bytes(round_state);
                     aes_sub_bytes(round_state); aes_mix_cols(round_state);
                     xor_round_key(round_state, (uint8_t *)round_key);)
+GEN_ZVKNS_HELPER_VS(vaesem_vs, aes_shift_bytes(round_state);
+                    aes_sub_bytes(round_state); aes_mix_cols(round_state);
+                    xor_round_key(round_state, (uint8_t *)round_key);)
 GEN_ZVKNS_HELPER_VV(vaesdm_vv, aes_inv_shift_bytes(round_state);
                     aes_inv_sub_bytes(round_state);
                     xor_round_key(round_state, (uint8_t *)round_key);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 21/39] target/riscv: Add vaeskf1.vi decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (19 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 20/39] target/riscv: Add vaesem.vs " Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 22/39] target/riscv: Add vaeskf2.vi " Lawrence Hunter
                   ` (17 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm

From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>

Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
---
 target/riscv/helper.h                       |  1 +
 target/riscv/insn32.decode                  |  1 +
 target/riscv/insn_trans/trans_rvzvkns.c.inc | 45 +++++++++++++++++++++
 target/riscv/vcrypto_helper.c               | 42 +++++++++++++++++++
 4 files changed, 89 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 85981f2cad..2ac02dde01 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1190,3 +1190,4 @@ DEF_HELPER_4(vaesem_vs, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesdm_vv, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesdm_vs, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesz_vs, void, ptr, ptr, env, i32)
+DEF_HELPER_5(vaeskf1_vi, void, ptr, ptr, i32, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index cc91ca8794..325e2401c8 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -919,3 +919,4 @@ vaesem_vs       101001 1 ..... 00010 010 ..... 1110111 @r2_vm_1
 vaesdm_vv       101000 1 ..... 00000 010 ..... 1110111 @r2_vm_1
 vaesdm_vs       101001 1 ..... 00000 010 ..... 1110111 @r2_vm_1
 vaesz_vs        101001 1 ..... 00111 010 ..... 1110111 @r2_vm_1
+vaeskf1_vi      100010 1 ..... ..... 010 ..... 1110111 @r_vm_1
diff --git a/target/riscv/insn_trans/trans_rvzvkns.c.inc b/target/riscv/insn_trans/trans_rvzvkns.c.inc
index fd79b384ac..27ff4f097c 100644
--- a/target/riscv/insn_trans/trans_rvzvkns.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvkns.c.inc
@@ -63,3 +63,48 @@ GEN_V_UNMASKED_TRANS(vaesdm_vs, vaes_check_vs)
 GEN_V_UNMASKED_TRANS(vaesz_vs, vaes_check_vs)
 GEN_V_UNMASKED_TRANS(vaesem_vv, vaes_check_vv)
 GEN_V_UNMASKED_TRANS(vaesem_vs, vaes_check_vs)
+
+#define GEN_VI_UNMASKED_TRANS(NAME, CHECK)                                \
+static bool trans_##NAME(DisasContext *s, arg_##NAME * a)                 \
+{                                                                         \
+    if (CHECK(s, a)) {                                                    \
+        TCGv_ptr rd_v, rs2_v;                                             \
+        TCGv_i32 uimm_v, desc;                                            \
+        uint32_t data = 0;                                                \
+                                                                          \
+        TCGLabel *over = gen_new_label();                                 \
+        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);                 \
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);        \
+                                                                          \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                        \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                    \
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);                      \
+        data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);    \
+        data = FIELD_DP32(data, VDATA, VMA, s->vma);                      \
+                                                                          \
+        rd_v = tcg_temp_new_ptr();                                        \
+        rs2_v = tcg_temp_new_ptr();                                       \
+        uimm_v = tcg_constant_i32(a->rs1);                                \
+        desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlen / 8,           \
+                                          s->cfg_ptr->vlen / 8, data));   \
+        tcg_gen_addi_ptr(rd_v, cpu_env, vreg_ofs(s, a->rd));              \
+        tcg_gen_addi_ptr(rs2_v, cpu_env, vreg_ofs(s, a->rs2));            \
+        gen_helper_##NAME(rd_v, rs2_v, uimm_v, cpu_env, desc);            \
+        tcg_temp_free_ptr(rd_v);                                          \
+        tcg_temp_free_ptr(rs2_v);                                         \
+        mark_vs_dirty(s);                                                 \
+        gen_set_label(over);                                              \
+        return true;                                                      \
+    }                                                                     \
+    return false;                                                         \
+}
+
+static bool vaeskf1_check(DisasContext *s, arg_vaeskf1_vi * a)
+{
+    return s->cfg_ptr->ext_zvkns == true && vext_check_isa_ill(s) &&
+           s->vstart % 4 == 0 && s->sew == MO_32 &&
+           require_align(a->rd, s->lmul) && require_align(a->rs2, s->lmul) &&
+           a->rs1 >= 1 && a->rs1 <= 10;
+}
+
+GEN_VI_UNMASKED_TRANS(vaeskf1_vi, vaeskf1_check)
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index 883739f4ac..13ceb705cd 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -363,3 +363,45 @@ GEN_ZVKNS_HELPER_VS(vaesdm_vs, aes_inv_shift_bytes(round_state);
                     aes_inv_mix_cols(round_state);)
 GEN_ZVKNS_HELPER_VS(vaesz_vs,
                     xor_round_key(round_state, (uint8_t *)round_key);)
+
+void HELPER(vaeskf1_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
+                        CPURISCVState *env, uint32_t desc)
+{
+    uint32_t *vd = vd_vptr;
+    uint32_t *vs2 = vs2_vptr;
+    uint32_t vl = env->vl;
+    uint32_t total_elems = vext_get_total_elems(env, desc, 4);
+    uint32_t vta = vext_vta(desc);
+    if (vl % 4 != 0) {
+        riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
+    }
+
+    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
+        uint32_t rk[8];
+        static const uint32_t rcon[] = {
+                0x01000000, 0x02000000, 0x04000000, 0x08000000, 0x10000000,
+                0x20000000, 0x40000000, 0x80000000, 0x1B000000, 0x36000000,
+        };
+
+        rk[0] = bswap32(vs2[i * 4 + H4(0)]);
+        rk[1] = bswap32(vs2[i * 4 + H4(1)]);
+        rk[2] = bswap32(vs2[i * 4 + H4(2)]);
+        rk[3] = bswap32(vs2[i * 4 + H4(3)]);
+
+        rk[4] = rk[0] ^ (AES_Te4[(rk[3] >> 16) & 0xff] & 0xff000000) ^
+                (AES_Te4[(rk[3] >> 8) & 0xff] & 0x00ff0000) ^
+                (AES_Te4[(rk[3] >> 0) & 0xff] & 0x0000ff00) ^
+                (AES_Te4[(rk[3] >> 24) & 0xff] & 0x000000ff) ^ rcon[uimm - 1];
+        rk[5] = rk[1] ^ rk[4];
+        rk[6] = rk[2] ^ rk[5];
+        rk[7] = rk[3] ^ rk[6];
+
+        vd[i * 4 + H4(0)] = bswap32(rk[4]);
+        vd[i * 4 + H4(1)] = bswap32(rk[5]);
+        vd[i * 4 + H4(2)] = bswap32(rk[6]);
+        vd[i * 4 + H4(3)] = bswap32(rk[7]);
+    }
+    env->vstart = 0;
+    /* set tail elements to 1s */
+    vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);
+}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 22/39] target/riscv: Add vaeskf2.vi decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (20 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 21/39] target/riscv: Add vaeskf1.vi " Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 23/39] target/riscv: expose zvkns cpu property Lawrence Hunter
                   ` (16 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm

From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>

Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
---
 target/riscv/helper.h                       |  1 +
 target/riscv/insn32.decode                  |  1 +
 target/riscv/insn_trans/trans_rvzvkns.c.inc |  9 ++++
 target/riscv/vcrypto_helper.c               | 56 +++++++++++++++++++++
 4 files changed, 67 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 2ac02dde01..312f59bb38 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1191,3 +1191,4 @@ DEF_HELPER_4(vaesdm_vv, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesdm_vs, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesz_vs, void, ptr, ptr, env, i32)
 DEF_HELPER_5(vaeskf1_vi, void, ptr, ptr, i32, env, i32)
+DEF_HELPER_5(vaeskf2_vi, void, ptr, ptr, i32, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 325e2401c8..1eed0a6b26 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -920,3 +920,4 @@ vaesdm_vv       101000 1 ..... 00000 010 ..... 1110111 @r2_vm_1
 vaesdm_vs       101001 1 ..... 00000 010 ..... 1110111 @r2_vm_1
 vaesz_vs        101001 1 ..... 00111 010 ..... 1110111 @r2_vm_1
 vaeskf1_vi      100010 1 ..... ..... 010 ..... 1110111 @r_vm_1
+vaeskf2_vi      101010 1 ..... ..... 010 ..... 1110111 @r_vm_1
diff --git a/target/riscv/insn_trans/trans_rvzvkns.c.inc b/target/riscv/insn_trans/trans_rvzvkns.c.inc
index 27ff4f097c..7daac1de5d 100644
--- a/target/riscv/insn_trans/trans_rvzvkns.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvkns.c.inc
@@ -107,4 +107,13 @@ static bool vaeskf1_check(DisasContext *s, arg_vaeskf1_vi * a)
            a->rs1 >= 1 && a->rs1 <= 10;
 }
 
+static bool vaeskf2_check(DisasContext *s, arg_vaeskf2_vi *a)
+{
+    return s->cfg_ptr->ext_zvkns == true && vext_check_isa_ill(s) &&
+           s->vstart % 4 == 0 && s->sew == MO_32 &&
+           require_align(a->rd, s->lmul) && require_align(a->rs2, s->lmul) &&
+           a->rs1 >= 2 && a->rs1 <= 14;
+}
+
 GEN_VI_UNMASKED_TRANS(vaeskf1_vi, vaeskf1_check)
+GEN_VI_UNMASKED_TRANS(vaeskf2_vi, vaeskf2_check)
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index 13ceb705cd..50207b4ff0 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -405,3 +405,59 @@ void HELPER(vaeskf1_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
     /* set tail elements to 1s */
     vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);
 }
+
+void HELPER(vaeskf2_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
+                        CPURISCVState *env, uint32_t desc)
+{
+    uint32_t *vd = vd_vptr;
+    uint32_t *vs2 = vs2_vptr;
+    uint32_t vl = env->vl;
+    uint32_t total_elems = vext_get_total_elems(env, desc, 4);
+    uint32_t vta = vext_vta(desc);
+    if (env->vl % 4 != 0) {
+        riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
+    }
+    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
+        uint32_t rk[12];
+        static const uint32_t rcon[] = {
+                0x01000000, 0x02000000, 0x04000000, 0x08000000, 0x10000000,
+                0x20000000, 0x40000000, 0x80000000, 0x1B000000, 0x36000000,
+        };
+
+        rk[0] = bswap32(vd[i * 4 + H4(0)]);
+        rk[1] = bswap32(vd[i * 4 + H4(1)]);
+        rk[2] = bswap32(vd[i * 4 + H4(2)]);
+        rk[3] = bswap32(vd[i * 4 + H4(3)]);
+        rk[4] = bswap32(vs2[i * 4 + H4(0)]);
+        rk[5] = bswap32(vs2[i * 4 + H4(1)]);
+        rk[6] = bswap32(vs2[i * 4 + H4(2)]);
+        rk[7] = bswap32(vs2[i * 4 + H4(3)]);
+
+        if (uimm % 2 == 0) {
+            rk[8] = rk[0] ^ (AES_Te4[(rk[7] >> 16) & 0xff] & 0xff000000) ^
+                    (AES_Te4[(rk[7] >> 8) & 0xff] & 0x00ff0000) ^
+                    (AES_Te4[(rk[7] >> 0) & 0xff] & 0x0000ff00) ^
+                    (AES_Te4[(rk[7] >> 24) & 0xff] & 0x000000ff) ^
+                    rcon[(uimm - 1) / 2];
+            rk[9] = rk[1] ^ rk[8];
+            rk[10] = rk[2] ^ rk[9];
+            rk[11] = rk[3] ^ rk[10];
+        } else {
+            rk[8] = rk[0] ^ (AES_Te4[(rk[7] >> 24) & 0xff] & 0xff000000) ^
+                    (AES_Te4[(rk[7] >> 16) & 0xff] & 0x00ff0000) ^
+                    (AES_Te4[(rk[7] >> 8) & 0xff] & 0x0000ff00) ^
+                    (AES_Te4[(rk[7] >> 0) & 0xff] & 0x000000ff);
+            rk[9] = rk[1] ^ rk[8];
+            rk[10] = rk[2] ^ rk[9];
+            rk[11] = rk[3] ^ rk[10];
+        }
+
+        vd[i * 4 + H4(0)] = bswap32(rk[8]);
+        vd[i * 4 + H4(1)] = bswap32(rk[9]);
+        vd[i * 4 + H4(2)] = bswap32(rk[10]);
+        vd[i * 4 + H4(3)] = bswap32(rk[11]);
+    }
+    env->vstart = 0;
+    /* set tail elements to 1s */
+    vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);
+}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 23/39] target/riscv: expose zvkns cpu property
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (21 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 22/39] target/riscv: Add vaeskf2.vi " Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 24/39] target/riscv: add zvknh cpu properties Lawrence Hunter
                   ` (15 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm

From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>

Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
---
 target/riscv/cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index fd09822b4f..0da04d0be1 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1084,6 +1084,7 @@ static Property riscv_cpu_extensions[] = {
     DEFINE_PROP_BOOL("zmmul", RISCVCPU, cfg.ext_zmmul, false),
 
     DEFINE_PROP_BOOL("zvkb", RISCVCPU, cfg.ext_zvkb, false),
+    DEFINE_PROP_BOOL("zvkns", RISCVCPU, cfg.ext_zvkns, false),
 
     /* Vendor-specific custom extensions */
     DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, false),
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 24/39] target/riscv: add zvknh cpu properties
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (22 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 23/39] target/riscv: expose zvkns cpu property Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 25/39] target/riscv: Add vsha2ms.vv decoding, translation and execution support Lawrence Hunter
                   ` (14 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm

From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>

Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
---
 target/riscv/cpu.c | 10 +++++++++-
 target/riscv/cpu.h |  2 ++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 0da04d0be1..a78d9ae120 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -101,6 +101,8 @@ static const struct isa_ext_data isa_edata_arr[] = {
     ISA_EXT_DATA_ENTRY(zve32f, true, PRIV_VERSION_1_12_0, ext_zve32f),
     ISA_EXT_DATA_ENTRY(zve64f, true, PRIV_VERSION_1_12_0, ext_zve64f),
     ISA_EXT_DATA_ENTRY(zvkb, true, PRIV_VERSION_1_12_0, ext_zvkb),
+    ISA_EXT_DATA_ENTRY(zvknha, true, PRIV_VERSION_1_12_0, ext_zvknha),
+    ISA_EXT_DATA_ENTRY(zvknhb, true, PRIV_VERSION_1_12_0, ext_zvknhb),
     ISA_EXT_DATA_ENTRY(zvkns, true, PRIV_VERSION_1_12_0, ext_zvkns),
     ISA_EXT_DATA_ENTRY(zhinx, true, PRIV_VERSION_1_12_0, ext_zhinx),
     ISA_EXT_DATA_ENTRY(zhinxmin, true, PRIV_VERSION_1_12_0, ext_zhinxmin),
@@ -798,13 +800,19 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
          * In principle zve*{x,d} would also suffice here, were they supported
          * in qemu
          */
-        if ((cpu->cfg.ext_zvkb || cpu->cfg.ext_zvkns) &&
+        if ((cpu->cfg.ext_zvkb || cpu->cfg.ext_zvkns || cpu->cfg.ext_zvknha) &&
             !(cpu->cfg.ext_zve32f || cpu->cfg.ext_zve64f || cpu->cfg.ext_v)) {
             error_setg(
                 errp, "Vector crypto extensions require V or Zve* extensions");
             return;
         }
 
+        if (cpu->cfg.ext_zvknhb && !(cpu->cfg.ext_zve64f || cpu->cfg.ext_v)) {
+            error_setg(errp,
+                       "Zvknhb extension requires V or Zve64f extensions");
+            return;
+        }
+
         /* Set the ISA extensions, checks should have happened above */
         if (cpu->cfg.ext_zdinx || cpu->cfg.ext_zhinx ||
             cpu->cfg.ext_zhinxmin) {
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 56008ef9b9..ebee902806 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -462,6 +462,8 @@ struct RISCVCPUConfig {
     bool ext_zve32f;
     bool ext_zve64f;
     bool ext_zvkb;
+    bool ext_zvknha;
+    bool ext_zvknhb;
     bool ext_zvkns;
     bool ext_zmmul;
     bool ext_smaia;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 25/39] target/riscv: Add vsha2ms.vv decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (23 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 24/39] target/riscv: add zvknh cpu properties Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 26/39] target/riscv: Add vsha2c[hl].vv " Lawrence Hunter
                   ` (13 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm

From: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>

Co-authored-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
---
 target/riscv/helper.h                       |  2 +
 target/riscv/insn32.decode                  |  3 +
 target/riscv/insn_trans/trans_rvzvknh.c.inc | 45 ++++++++++++
 target/riscv/translate.c                    |  1 +
 target/riscv/vcrypto_helper.c               | 80 +++++++++++++++++++++
 5 files changed, 131 insertions(+)
 create mode 100644 target/riscv/insn_trans/trans_rvzvknh.c.inc

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 312f59bb38..6e7777c879 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1192,3 +1192,5 @@ DEF_HELPER_4(vaesdm_vs, void, ptr, ptr, env, i32)
 DEF_HELPER_4(vaesz_vs, void, ptr, ptr, env, i32)
 DEF_HELPER_5(vaeskf1_vi, void, ptr, ptr, i32, env, i32)
 DEF_HELPER_5(vaeskf2_vi, void, ptr, ptr, i32, env, i32)
+
+DEF_HELPER_5(vsha2ms_vv, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 1eed0a6b26..57fbd63d91 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -921,3 +921,6 @@ vaesdm_vs       101001 1 ..... 00000 010 ..... 1110111 @r2_vm_1
 vaesz_vs        101001 1 ..... 00111 010 ..... 1110111 @r2_vm_1
 vaeskf1_vi      100010 1 ..... ..... 010 ..... 1110111 @r_vm_1
 vaeskf2_vi      101010 1 ..... ..... 010 ..... 1110111 @r_vm_1
+
+# *** RV64 Zvknh vector crypto extension ***
+vsha2ms_vv      101101 1 ..... ..... 010 ..... 1110111 @r_vm_1
diff --git a/target/riscv/insn_trans/trans_rvzvknh.c.inc b/target/riscv/insn_trans/trans_rvzvknh.c.inc
new file mode 100644
index 0000000000..ff30400100
--- /dev/null
+++ b/target/riscv/insn_trans/trans_rvzvknh.c.inc
@@ -0,0 +1,45 @@
+#define GEN_VV_UNMASKED_TRANS(NAME, CHECK)                             \
+static bool trans_##NAME(DisasContext *s, arg_rmrr * a)                \
+{                                                                      \
+    if (CHECK(s, a)) {                                                 \
+        uint32_t data = 0;                                             \
+        TCGLabel *over = gen_new_label();                              \
+        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);              \
+        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);     \
+                                                                       \
+        data = FIELD_DP32(data, VDATA, VM, a->vm);                     \
+        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                 \
+        data = FIELD_DP32(data, VDATA, VTA, s->vta);                   \
+        data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s); \
+        data = FIELD_DP32(data, VDATA, VMA, s->vma);                   \
+                                                                       \
+        tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd),                         \
+                           vreg_ofs(s, a->rs1),                        \
+                           vreg_ofs(s, a->rs2), cpu_env,               \
+                           s->cfg_ptr->vlen / 8,                       \
+                           s->cfg_ptr->vlen / 8, data,                 \
+                           gen_helper_##NAME);                         \
+                                                                       \
+        mark_vs_dirty(s);                                              \
+        gen_set_label(over);                                           \
+        return true;                                                   \
+    }                                                                  \
+    return false;                                                      \
+}
+
+static bool vsha_check_sew(DisasContext *s)
+{
+    return (s->cfg_ptr->ext_zvknha == true && s->sew == MO_32) ||
+           (s->cfg_ptr->ext_zvknhb == true &&
+               (s->sew == MO_32 || s->sew == MO_64));
+}
+
+static bool vsha_check(DisasContext *s, arg_rmrr *a)
+{
+    return opivv_check(s, a) &&
+           vsha_check_sew(s) &&
+           s->vstart % 4 == 0 &&
+           s->lmul >= 0;
+}
+
+GEN_VV_UNMASKED_TRANS(vsha2ms_vv, vsha_check)
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 8d39743de2..924a89de9f 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -1065,6 +1065,7 @@ static uint32_t opcode_at(DisasContextBase *dcbase, target_ulong pc)
 #include "insn_trans/trans_rvk.c.inc"
 #include "insn_trans/trans_rvzvkb.c.inc"
 #include "insn_trans/trans_rvzvkns.c.inc"
+#include "insn_trans/trans_rvzvknh.c.inc"
 #include "insn_trans/trans_privileged.c.inc"
 #include "insn_trans/trans_svinval.c.inc"
 #include "insn_trans/trans_xventanacondops.c.inc"
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index 50207b4ff0..af596cce09 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -461,3 +461,83 @@ void HELPER(vaeskf2_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
     /* set tail elements to 1s */
     vext_set_elems_1s(vd, vta, vl * 4, total_elems * 4);
 }
+
+static inline uint32_t sig0_sha256(uint32_t x)
+{
+    return ror32(x, 7) ^ ror32(x, 18) ^ (x >> 3);
+}
+
+static inline uint32_t sig1_sha256(uint32_t x)
+{
+    return ror32(x, 17) ^ ror32(x, 19) ^ (x >> 10);
+}
+
+static inline uint64_t sig0_sha512(uint64_t x)
+{
+    return ror64(x, 1) ^ ror64(x, 8) ^ (x >> 7);
+}
+
+static inline uint64_t sig1_sha512(uint64_t x)
+{
+    return ror64(x, 19) ^ ror64(x, 61) ^ (x >> 6);
+}
+
+static inline void vsha2ms_e32(uint32_t *vd, uint32_t *vs1, uint32_t *vs2)
+{
+    uint32_t res[4];
+    res[0] = sig1_sha256(vs2[H4(2)]) + vs1[H4(1)] + sig0_sha256(vd[H4(1)]) +
+        vd[H4(0)];
+    res[1] = sig1_sha256(vs2[H4(3)]) + vs1[H4(2)] + sig0_sha256(vd[H4(2)]) +
+        vd[H4(1)];
+    res[2] = sig1_sha256(res[0]) + vs1[H4(3)] + sig0_sha256(vd[H4(3)]) +
+        vd[H4(2)];
+    res[3] = sig1_sha256(res[1]) + vs2[H4(0)] + sig0_sha256(vs1[H4(0)]) +
+        vd[H4(3)];
+    vd[H4(3)] = res[3];
+    vd[H4(2)] = res[2];
+    vd[H4(1)] = res[1];
+    vd[H4(0)] = res[0];
+}
+
+static inline void vsha2ms_e64(uint64_t *vd, uint64_t *vs1, uint64_t *vs2)
+{
+    uint64_t res[4];
+    res[0] = sig1_sha512(vs2[2]) + vs1[1] + sig0_sha512(vd[1]) + vd[0];
+    res[1] = sig1_sha512(vs2[3]) + vs1[2] + sig0_sha512(vd[2]) + vd[1];
+    res[2] = sig1_sha512(res[0]) + vs1[3] + sig0_sha512(vd[3]) + vd[2];
+    res[3] = sig1_sha512(res[1]) + vs2[0] + sig0_sha512(vs1[0]) + vd[3];
+    vd[3] = res[3];
+    vd[2] = res[2];
+    vd[1] = res[1];
+    vd[0] = res[0];
+}
+
+void HELPER(vsha2ms_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
+                        uint32_t desc)
+{
+    uint32_t sew = FIELD_EX64(env->vtype, VTYPE, VSEW);
+    uint32_t esz = 0;
+    uint32_t total_elems;
+    uint32_t vta = vext_vta(desc);
+
+    if (env->vl % 4 != 0) {
+        riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
+    }
+
+    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
+        if (sew == MO_32) {
+            esz = 4;
+            vsha2ms_e32(((uint32_t *)vd) + i * 4, ((uint32_t *)vs1) + i * 4,
+                        ((uint32_t *)vs2) + i * 4);
+        } else {
+            /* If not 32 then SEW should be 64 */
+            esz = 8;
+            vsha2ms_e64(((uint64_t *)vd) + i * 4, ((uint64_t *)vs1) + i * 4,
+                        ((uint64_t *)vs2) + i * 4);
+        }
+    }
+    /* set tail elements to 1s */
+    total_elems = vext_get_total_elems(env, desc, esz);
+    vext_set_elems_1s(vd, vta, env->vl * esz, total_elems * esz);
+    env->vstart = 0;
+}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 26/39] target/riscv: Add vsha2c[hl].vv decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (24 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 25/39] target/riscv: Add vsha2ms.vv decoding, translation and execution support Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 27/39] target/riscv: expose zvknh cpu properties Lawrence Hunter
                   ` (12 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Lawrence Hunter

Co-authored-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
---
 target/riscv/helper.h                       |   2 +
 target/riscv/insn32.decode                  |   2 +
 target/riscv/insn_trans/trans_rvzvknh.c.inc |   2 +
 target/riscv/vcrypto_helper.c               | 153 ++++++++++++++++++++
 4 files changed, 159 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 6e7777c879..911270c387 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1194,3 +1194,5 @@ DEF_HELPER_5(vaeskf1_vi, void, ptr, ptr, i32, env, i32)
 DEF_HELPER_5(vaeskf2_vi, void, ptr, ptr, i32, env, i32)
 
 DEF_HELPER_5(vsha2ms_vv, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vsha2ch_vv, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vsha2cl_vv, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 57fbd63d91..2387bc179c 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -924,3 +924,5 @@ vaeskf2_vi      101010 1 ..... ..... 010 ..... 1110111 @r_vm_1
 
 # *** RV64 Zvknh vector crypto extension ***
 vsha2ms_vv      101101 1 ..... ..... 010 ..... 1110111 @r_vm_1
+vsha2ch_vv      101110 1 ..... ..... 010 ..... 1110111 @r_vm_1
+vsha2cl_vv      101111 1 ..... ..... 010 ..... 1110111 @r_vm_1
diff --git a/target/riscv/insn_trans/trans_rvzvknh.c.inc b/target/riscv/insn_trans/trans_rvzvknh.c.inc
index ff30400100..4d4d26eae5 100644
--- a/target/riscv/insn_trans/trans_rvzvknh.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvknh.c.inc
@@ -43,3 +43,5 @@ static bool vsha_check(DisasContext *s, arg_rmrr *a)
 }
 
 GEN_VV_UNMASKED_TRANS(vsha2ms_vv, vsha_check)
+GEN_VV_UNMASKED_TRANS(vsha2cl_vv, vsha_check)
+GEN_VV_UNMASKED_TRANS(vsha2ch_vv, vsha_check)
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index af596cce09..b73581641a 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -541,3 +541,156 @@ void HELPER(vsha2ms_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
     vext_set_elems_1s(vd, vta, env->vl * esz, total_elems * esz);
     env->vstart = 0;
 }
+
+static inline uint64_t sum0_64(uint64_t x)
+{
+    return ror64(x, 28) ^ ror64(x, 34) ^ ror64(x, 39);
+}
+
+static inline uint32_t sum0_32(uint32_t x)
+{
+    return ror32(x, 2) ^ ror32(x, 13) ^ ror32(x, 22);
+}
+
+static inline uint64_t sum1_64(uint64_t x)
+{
+    return ror64(x, 14) ^ ror64(x, 18) ^ ror64(x, 41);
+}
+
+static inline uint32_t sum1_32(uint32_t x)
+{
+    return ror32(x, 6) ^ ror32(x, 11) ^ ror32(x, 25);
+}
+
+#define ch(x, y, z) ((x & y) ^ ((~x) & z))
+
+#define maj(x, y, z) ((x & y) ^ (x & z) ^ (y & z))
+
+static void vsha2c_64(uint64_t *vs2, uint64_t *vd, uint64_t *vs1)
+{
+    uint64_t a = vs2[3], b = vs2[2], e = vs2[1], f = vs2[0];
+    uint64_t c = vd[3], d = vd[2], g = vd[1], h = vd[0];
+    uint64_t W0 = vs1[0], W1 = vs1[1];
+    uint64_t T1 = h + sum1_64(e) + ch(e, f, g) + W0;
+    uint64_t T2 = sum0_64(a) + maj(a, b, c);
+
+    h = g;
+    g = f;
+    f = e;
+    e = d + T1;
+    d = c;
+    c = b;
+    b = a;
+    a = T1 + T2;
+
+    T1 = h + sum1_64(e) + ch(e, f, g) + W1;
+    T2 = sum0_64(a) + maj(a, b, c);
+    h = g;
+    g = f;
+    f = e;
+    e = d + T1;
+    d = c;
+    c = b;
+    b = a;
+    a = T1 + T2;
+
+    vd[0] = f;
+    vd[1] = e;
+    vd[2] = b;
+    vd[3] = a;
+}
+
+static void vsha2c_32(uint32_t *vs2, uint32_t *vd, uint32_t *vs1)
+{
+    uint32_t a = vs2[H4(3)], b = vs2[H4(2)], e = vs2[H4(1)], f = vs2[H4(0)];
+    uint32_t c = vd[H4(3)], d = vd[H4(2)], g = vd[H4(1)], h = vd[H4(0)];
+    uint32_t W0 = vs1[H4(0)], W1 = vs1[H4(1)];
+    uint32_t T1 = h + sum1_32(e) + ch(e, f, g) + W0;
+    uint32_t T2 = sum0_32(a) + maj(a, b, c);
+
+    h = g;
+    g = f;
+    f = e;
+    e = d + T1;
+    d = c;
+    c = b;
+    b = a;
+    a = T1 + T2;
+
+    T1 = h + sum1_32(e) + ch(e, f, g) + W1;
+    T2 = sum0_32(a) + maj(a, b, c);
+    h = g;
+    g = f;
+    f = e;
+    e = d + T1;
+    d = c;
+    c = b;
+    b = a;
+    a = T1 + T2;
+
+    vd[H4(0)] = f;
+    vd[H4(1)] = e;
+    vd[H4(2)] = b;
+    vd[H4(3)] = a;
+
+}
+
+void HELPER(vsha2ch_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
+                        uint32_t desc)
+{
+    uint32_t sew = FIELD_EX64(env->vtype, VTYPE, VSEW);
+    uint32_t esz = 0;
+    uint32_t total_elems;
+    uint32_t vta = vext_vta(desc);
+
+    if (env->vl % 4 != 0) {
+        riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
+    }
+
+    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
+        if (sew == MO_64) {
+            esz = 8;
+            vsha2c_64(((uint64_t *)vs2) + 4 * i, ((uint64_t *)vd) + 4 * i,
+                        ((uint64_t *)vs1) + 4 * i + 2);
+        } else {
+            esz = 4;
+            vsha2c_32(((uint32_t *)vs2) + 4 * i, ((uint32_t *)vd) + 4 * i,
+                        ((uint32_t *)vs1) + 4 * i + 2);
+        }
+    }
+
+    /* set tail elements to 1s */
+    total_elems = vext_get_total_elems(env, desc, esz);
+    vext_set_elems_1s(vd, vta, env->vl * esz, total_elems * esz);
+    env->vstart = 0;
+}
+
+void HELPER(vsha2cl_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
+                        uint32_t desc)
+{
+    uint32_t sew = FIELD_EX64(env->vtype, VTYPE, VSEW);
+    uint32_t esz = 0;
+    uint32_t total_elems;
+    uint32_t vta = vext_vta(desc);
+
+    if (env->vl % 4 != 0) {
+        riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
+    }
+
+    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
+        if (sew == MO_64) {
+            esz = 8;
+            vsha2c_64(((uint64_t *)vs2) + 4 * i, ((uint64_t *)vd) + 4 * i,
+                        (((uint64_t *)vs1) + 4 * i));
+        } else {
+            esz = 4;
+            vsha2c_32(((uint32_t *)vs2) + 4 * i, ((uint32_t *)vd) + 4 * i,
+                        (((uint32_t *)vs1) + 4 * i));
+        }
+    }
+
+    /* set tail elements to 1s */
+    total_elems = vext_get_total_elems(env, desc, esz);
+    vext_set_elems_1s(vd, vta, env->vl * esz, total_elems * esz);
+    env->vstart = 0;
+}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 27/39] target/riscv: expose zvknh cpu properties
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (25 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 26/39] target/riscv: Add vsha2c[hl].vv " Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 28/39] target/riscv: add zvksh cpu property Lawrence Hunter
                   ` (11 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm

From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>

Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
---
 target/riscv/cpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index a78d9ae120..5076699226 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1092,6 +1092,8 @@ static Property riscv_cpu_extensions[] = {
     DEFINE_PROP_BOOL("zmmul", RISCVCPU, cfg.ext_zmmul, false),
 
     DEFINE_PROP_BOOL("zvkb", RISCVCPU, cfg.ext_zvkb, false),
+    DEFINE_PROP_BOOL("zvknha", RISCVCPU, cfg.ext_zvknha, false),
+    DEFINE_PROP_BOOL("zvknhb", RISCVCPU, cfg.ext_zvknhb, false),
     DEFINE_PROP_BOOL("zvkns", RISCVCPU, cfg.ext_zvkns, false),
 
     /* Vendor-specific custom extensions */
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 28/39]  target/riscv: add zvksh cpu property
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (26 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 27/39] target/riscv: expose zvknh cpu properties Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 29/39] target/riscv: Add vsm3me.vv decoding, translation and execution support Lawrence Hunter
                   ` (10 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm

From: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>

Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
---
 target/riscv/cpu.c | 4 +++-
 target/riscv/cpu.h | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 5076699226..9a412d9d53 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -104,6 +104,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
     ISA_EXT_DATA_ENTRY(zvknha, true, PRIV_VERSION_1_12_0, ext_zvknha),
     ISA_EXT_DATA_ENTRY(zvknhb, true, PRIV_VERSION_1_12_0, ext_zvknhb),
     ISA_EXT_DATA_ENTRY(zvkns, true, PRIV_VERSION_1_12_0, ext_zvkns),
+    ISA_EXT_DATA_ENTRY(zvksh, true, PRIV_VERSION_1_12_0, ext_zvksh),
     ISA_EXT_DATA_ENTRY(zhinx, true, PRIV_VERSION_1_12_0, ext_zhinx),
     ISA_EXT_DATA_ENTRY(zhinxmin, true, PRIV_VERSION_1_12_0, ext_zhinxmin),
     ISA_EXT_DATA_ENTRY(smaia, true, PRIV_VERSION_1_12_0, ext_smaia),
@@ -800,7 +801,8 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
          * In principle zve*{x,d} would also suffice here, were they supported
          * in qemu
          */
-        if ((cpu->cfg.ext_zvkb || cpu->cfg.ext_zvkns || cpu->cfg.ext_zvknha) &&
+        if ((cpu->cfg.ext_zvkb || cpu->cfg.ext_zvkns || cpu->cfg.ext_zvknha ||
+             cpu->cfg.ext_zvksh) &&
             !(cpu->cfg.ext_zve32f || cpu->cfg.ext_zve64f || cpu->cfg.ext_v)) {
             error_setg(
                 errp, "Vector crypto extensions require V or Zve* extensions");
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index ebee902806..92624bfc57 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -465,6 +465,7 @@ struct RISCVCPUConfig {
     bool ext_zvknha;
     bool ext_zvknhb;
     bool ext_zvkns;
+    bool ext_zvksh;
     bool ext_zmmul;
     bool ext_smaia;
     bool ext_ssaia;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 29/39]  target/riscv: Add vsm3me.vv decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (27 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 28/39] target/riscv: add zvksh cpu property Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 30/39] target/riscv: Add vsm3c.vi " Lawrence Hunter
                   ` (9 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Lawrence Hunter

Co-authored-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
---
 target/riscv/helper.h                       |  2 +
 target/riscv/insn32.decode                  |  3 ++
 target/riscv/insn_trans/trans_rvzvksh.c.inc | 12 ++++++
 target/riscv/translate.c                    |  1 +
 target/riscv/vcrypto_helper.c               | 43 +++++++++++++++++++++
 5 files changed, 61 insertions(+)
 create mode 100644 target/riscv/insn_trans/trans_rvzvksh.c.inc

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 911270c387..36e0d8eff3 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1196,3 +1196,5 @@ DEF_HELPER_5(vaeskf2_vi, void, ptr, ptr, i32, env, i32)
 DEF_HELPER_5(vsha2ms_vv, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vsha2ch_vv, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vsha2cl_vv, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_5(vsm3me_vv, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 2387bc179c..671614e354 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -926,3 +926,6 @@ vaeskf2_vi      101010 1 ..... ..... 010 ..... 1110111 @r_vm_1
 vsha2ms_vv      101101 1 ..... ..... 010 ..... 1110111 @r_vm_1
 vsha2ch_vv      101110 1 ..... ..... 010 ..... 1110111 @r_vm_1
 vsha2cl_vv      101111 1 ..... ..... 010 ..... 1110111 @r_vm_1
+
+# *** RV64 Zvksh vector crypto extensions ***
+vsm3me_vv       100000 1 ..... ..... 010 ..... 1110111 @r_vm_1
diff --git a/target/riscv/insn_trans/trans_rvzvksh.c.inc b/target/riscv/insn_trans/trans_rvzvksh.c.inc
new file mode 100644
index 0000000000..ad7105b3ed
--- /dev/null
+++ b/target/riscv/insn_trans/trans_rvzvksh.c.inc
@@ -0,0 +1,12 @@
+static inline bool vsm3_check(DisasContext *s)
+{
+    return s->cfg_ptr->ext_zvksh == true && vext_check_isa_ill(s) &&
+           s->vstart % 8 == 0 && s->sew == MO_32;
+}
+
+static inline bool vsm3me_check(DisasContext *s, arg_rmrr *a)
+{
+    return vsm3_check(s) && vext_check_sss(s, a->rd, a->rs1, a->rs2, a->vm);
+}
+
+GEN_VV_UNMASKED_TRANS(vsm3me_vv, vsm3me_check)
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 924a89de9f..9ca2cec23a 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -1066,6 +1066,7 @@ static uint32_t opcode_at(DisasContextBase *dcbase, target_ulong pc)
 #include "insn_trans/trans_rvzvkb.c.inc"
 #include "insn_trans/trans_rvzvkns.c.inc"
 #include "insn_trans/trans_rvzvknh.c.inc"
+#include "insn_trans/trans_rvzvksh.c.inc"
 #include "insn_trans/trans_privileged.c.inc"
 #include "insn_trans/trans_svinval.c.inc"
 #include "insn_trans/trans_xventanacondops.c.inc"
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index b73581641a..4dd5920aa4 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -694,3 +694,46 @@ void HELPER(vsha2cl_vv)(void *vd, void *vs1, void *vs2, CPURISCVState *env,
     vext_set_elems_1s(vd, vta, env->vl * esz, total_elems * esz);
     env->vstart = 0;
 }
+
+static inline uint32_t p1(uint32_t x)
+{
+    return (x) ^ rol32((x), 15) ^ rol32((x), 23);
+}
+
+static inline uint32_t zvksh_w(uint32_t m16, uint32_t m9, uint32_t m3,
+                               uint32_t m13, uint32_t m6)
+{
+    return p1(m16 ^ m9 ^ rol32(m3, 15)) ^ rol32(m13, 7) ^ m6;
+}
+
+void HELPER(vsm3me_vv)(void *vd_vptr, void *vs1_vptr, void *vs2_vptr,
+                       CPURISCVState *env, uint32_t desc)
+{
+    uint32_t esz = memop_size(FIELD_EX64(env->vtype, VTYPE, VSEW));
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+    uint32_t vta = vext_vta(desc);
+    uint32_t *vd = vd_vptr;
+    uint32_t *vs1 = vs1_vptr;
+    uint32_t *vs2 = vs2_vptr;
+
+    if (env->vl % 8 != 0) {
+        riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
+    }
+
+    for (int i = env->vstart / 8; i < env->vl / 8; i++) {
+        uint32_t w[24];
+        for (int j = 0; j < 8; j++) {
+            w[j] = bswap32(vs1[H4((i * 8) + j)]);
+            w[j + 8] = bswap32(vs2[H4((i * 8) + j)]);
+        }
+        for (int j = 0; j < 8; j++) {
+            w[j + 16] =
+                zvksh_w(w[j], w[j + 7], w[j + 13], w[j + 3], w[j + 10]);
+        }
+        for (int j = 0; j < 8; j++) {
+            vd[(i * 8) + j] = bswap32(w[H4(j + 16)]);
+        }
+    }
+    vext_set_elems_1s(vd_vptr, vta, env->vl * esz, total_elems * esz);
+    env->vstart = 0;
+}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 30/39] target/riscv: Add vsm3c.vi decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (28 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 29/39] target/riscv: Add vsm3me.vv decoding, translation and execution support Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 31/39] target/riscv: expose zvksh cpu property Lawrence Hunter
                   ` (8 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm

From: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>

Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
---
 target/riscv/helper.h                       |  1 +
 target/riscv/insn32.decode                  |  1 +
 target/riscv/insn_trans/trans_rvzvksh.c.inc |  8 ++
 target/riscv/vcrypto_helper.c               | 90 +++++++++++++++++++++
 4 files changed, 100 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 36e0d8eff3..a82103ead9 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1198,3 +1198,4 @@ DEF_HELPER_5(vsha2ch_vv, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vsha2cl_vv, void, ptr, ptr, ptr, env, i32)
 
 DEF_HELPER_5(vsm3me_vv, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vsm3c_vi, void, ptr, ptr, i32, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 671614e354..4a50114e92 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -929,3 +929,4 @@ vsha2cl_vv      101111 1 ..... ..... 010 ..... 1110111 @r_vm_1
 
 # *** RV64 Zvksh vector crypto extensions ***
 vsm3me_vv       100000 1 ..... ..... 010 ..... 1110111 @r_vm_1
+vsm3c_vi        101011 1 ..... ..... 010 ..... 1110111 @r_vm_1
diff --git a/target/riscv/insn_trans/trans_rvzvksh.c.inc b/target/riscv/insn_trans/trans_rvzvksh.c.inc
index ad7105b3ed..f11dfb24ff 100644
--- a/target/riscv/insn_trans/trans_rvzvksh.c.inc
+++ b/target/riscv/insn_trans/trans_rvzvksh.c.inc
@@ -9,4 +9,12 @@ static inline bool vsm3me_check(DisasContext *s, arg_rmrr *a)
     return vsm3_check(s) && vext_check_sss(s, a->rd, a->rs1, a->rs2, a->vm);
 }
 
+
+static inline bool vsm3c_check(DisasContext *s, arg_rmrr *a)
+{
+    return vsm3_check(s) && a->rs1 < 32 &&
+           vext_check_ss(s, a->rd, a->rs2, a->vm);
+}
+
 GEN_VV_UNMASKED_TRANS(vsm3me_vv, vsm3me_check)
+GEN_VI_UNMASKED_TRANS(vsm3c_vi, vsm3c_check)
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index 4dd5920aa4..478e652c9b 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -737,3 +737,93 @@ void HELPER(vsm3me_vv)(void *vd_vptr, void *vs1_vptr, void *vs2_vptr,
     vext_set_elems_1s(vd_vptr, vta, env->vl * esz, total_elems * esz);
     env->vstart = 0;
 }
+
+static inline uint32_t ff1(uint32_t x, uint32_t y, uint32_t z)
+{
+    return x ^ y ^ z;
+};
+static inline uint32_t ff2(uint32_t x, uint32_t y, uint32_t z)
+{
+    return (x & y) | (x & z) | (y & z);
+};
+static inline uint32_t ff_j(uint32_t x, uint32_t y, uint32_t z, uint32_t j)
+{
+    return (j <= 15) ? ff1(x, y, z) : ff2(x, y, z);
+};
+static inline uint32_t gg1(uint32_t x, uint32_t y, uint32_t z)
+{
+    return x ^ y ^ z;
+};
+static inline uint32_t gg2(uint32_t x, uint32_t y, uint32_t z)
+{
+    return (x & y) | (~x & z);
+};
+static inline uint32_t gg_j(uint32_t x, uint32_t y, uint32_t z, uint32_t j)
+{
+    return (j <= 15) ? gg1(x, y, z) : gg2(x, y, z);
+};
+static inline uint32_t t_j(uint32_t j)
+{
+    return (j <= 15) ? 0x79cc4519 : 0x7a879d8a;
+};
+static inline uint32_t p_0(uint32_t x)
+{
+    return x ^ rol32(x, 9) ^ rol32(x, 17);
+};
+static void sm3c(uint32_t *vd, uint32_t *vs1, uint32_t *vs2, uint32_t uimm)
+{
+    uint32_t x0, x1;
+    uint32_t j;
+    uint32_t ss1, ss2, tt1, tt2;
+    x0 = vs2[0] ^ vs2[4];
+    x1 = vs2[1] ^ vs2[5];
+    j = 2 * uimm;
+    ss1 = rol32(rol32(vs1[0], 12) + vs1[4] + rol32(t_j(j), j % 32), 7);
+    ss2 = ss1 ^ rol32(vs1[0], 12);
+    tt1 = ff_j(vs1[0], vs1[1], vs1[2], j) + vs1[3] + ss2 + x0;
+    tt2 = gg_j(vs1[4], vs1[5], vs1[6], j) + vs1[7] + ss1 + vs2[0];
+    vs1[3] = vs1[2];
+    vd[3] = rol32(vs1[1], 9);
+    vs1[1] = vs1[0];
+    vd[1] = tt1;
+    vs1[7] = vs1[6];
+    vd[7] = rol32(vs1[5], 19);
+    vs1[5] = vs1[4];
+    vd[5] = p_0(tt2);
+    j = 2 * uimm + 1;
+    ss1 = rol32(rol32(vd[1], 12) + vd[5] + rol32(t_j(j), j % 32), 7);
+    ss2 = ss1 ^ rol32(vd[1], 12);
+    tt1 = ff_j(vd[1], vs1[1], vd[3], j) + vs1[3] + ss2 + x1;
+    tt2 = gg_j(vd[5], vs1[5], vd[7], j) + vs1[7] + ss1 + vs2[1];
+    vd[2] = rol32(vs1[1], 9);
+    vd[0] = tt1;
+    vd[6] = rol32(vs1[5], 19);
+    vd[4] = p_0(tt2);
+}
+void HELPER(vsm3c_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
+                      CPURISCVState *env, uint32_t desc)
+{
+    uint32_t esz = memop_size(FIELD_EX64(env->vtype, VTYPE, VSEW));
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+    uint32_t vta = vext_vta(desc);
+    uint32_t *vd = vd_vptr;
+    uint32_t *vs2 = vs2_vptr;
+    uint32_t v1[8], v2[8], v3[8];
+
+    if (env->vl % 8 != 0) {
+        riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
+    }
+
+    for (int i = env->vstart / 8; i < env->vl / 8; i++) {
+        for (int k = 0; k < 8; k++) {
+            v2[k] = bswap32(vd[H4(i * 8 + k)]);
+            v3[k] = bswap32(vs2[H4(i * 8 + k)]);
+        }
+        sm3c(v1, v2, v3, uimm);
+        for (int k = 0; k < 8; k++) {
+            vd[i * 8 + k] = bswap32(v1[H4(k)]);
+        }
+    }
+    vext_set_elems_1s(vd_vptr, vta, env->vl * esz, total_elems * esz);
+    env->vstart = 0;
+}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 31/39]  target/riscv: expose zvksh cpu property
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (29 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 30/39] target/riscv: Add vsm3c.vi " Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 32/39] target/riscv: add zvkg " Lawrence Hunter
                   ` (7 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm

From: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>

Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
---
 target/riscv/cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 9a412d9d53..a3b08e9d27 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1097,6 +1097,7 @@ static Property riscv_cpu_extensions[] = {
     DEFINE_PROP_BOOL("zvknha", RISCVCPU, cfg.ext_zvknha, false),
     DEFINE_PROP_BOOL("zvknhb", RISCVCPU, cfg.ext_zvknhb, false),
     DEFINE_PROP_BOOL("zvkns", RISCVCPU, cfg.ext_zvkns, false),
+    DEFINE_PROP_BOOL("zvksh", RISCVCPU, cfg.ext_zvksh, false),
 
     /* Vendor-specific custom extensions */
     DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, false),
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 32/39] target/riscv: add zvkg cpu property
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (30 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 31/39] target/riscv: expose zvksh cpu property Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 33/39] target/riscv: Add vghmac.vv decoding, translation and execution support Lawrence Hunter
                   ` (6 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Lawrence Hunter

Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
---
 target/riscv/cpu.c | 3 ++-
 target/riscv/cpu.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index a3b08e9d27..6fded328f8 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -101,6 +101,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
     ISA_EXT_DATA_ENTRY(zve32f, true, PRIV_VERSION_1_12_0, ext_zve32f),
     ISA_EXT_DATA_ENTRY(zve64f, true, PRIV_VERSION_1_12_0, ext_zve64f),
     ISA_EXT_DATA_ENTRY(zvkb, true, PRIV_VERSION_1_12_0, ext_zvkb),
+    ISA_EXT_DATA_ENTRY(zvkg, true, PRIV_VERSION_1_12_0, ext_zvkg),
     ISA_EXT_DATA_ENTRY(zvknha, true, PRIV_VERSION_1_12_0, ext_zvknha),
     ISA_EXT_DATA_ENTRY(zvknhb, true, PRIV_VERSION_1_12_0, ext_zvknhb),
     ISA_EXT_DATA_ENTRY(zvkns, true, PRIV_VERSION_1_12_0, ext_zvkns),
@@ -802,7 +803,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
          * in qemu
          */
         if ((cpu->cfg.ext_zvkb || cpu->cfg.ext_zvkns || cpu->cfg.ext_zvknha ||
-             cpu->cfg.ext_zvksh) &&
+             cpu->cfg.ext_zvksh || cpu->cfg.ext_zvkg) &&
             !(cpu->cfg.ext_zve32f || cpu->cfg.ext_zve64f || cpu->cfg.ext_v)) {
             error_setg(
                 errp, "Vector crypto extensions require V or Zve* extensions");
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 92624bfc57..b3b1174d74 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -462,6 +462,7 @@ struct RISCVCPUConfig {
     bool ext_zve32f;
     bool ext_zve64f;
     bool ext_zvkb;
+    bool ext_zvkg;
     bool ext_zvknha;
     bool ext_zvknhb;
     bool ext_zvkns;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 33/39] target/riscv: Add vghmac.vv decoding, translation and execution support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (31 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 32/39] target/riscv: add zvkg " Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 34/39] target/riscv: expose zvkg cpu property Lawrence Hunter
                   ` (5 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Lawrence Hunter

Co-authored-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
---
 target/riscv/helper.h                      |  2 +
 target/riscv/insn32.decode                 |  3 ++
 target/riscv/insn_trans/trans_rvzvkg.c.inc |  9 +++++
 target/riscv/translate.c                   |  1 +
 target/riscv/vcrypto_helper.c              | 45 ++++++++++++++++++++++
 5 files changed, 60 insertions(+)
 create mode 100644 target/riscv/insn_trans/trans_rvzvkg.c.inc

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index a82103ead9..6272294d50 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1199,3 +1199,5 @@ DEF_HELPER_5(vsha2cl_vv, void, ptr, ptr, ptr, env, i32)
 
 DEF_HELPER_5(vsm3me_vv, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vsm3c_vi, void, ptr, ptr, i32, env, i32)
+
+DEF_HELPER_5(vghmac_vv, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 4a50114e92..ff044f8288 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -930,3 +930,6 @@ vsha2cl_vv      101111 1 ..... ..... 010 ..... 1110111 @r_vm_1
 # *** RV64 Zvksh vector crypto extensions ***
 vsm3me_vv       100000 1 ..... ..... 010 ..... 1110111 @r_vm_1
 vsm3c_vi        101011 1 ..... ..... 010 ..... 1110111 @r_vm_1
+
+# *** RV64 Zvkg vector crypto extension ***
+vghmac_vv       101100 1 ..... ..... 010 ..... 1110111 @r_vm_1
diff --git a/target/riscv/insn_trans/trans_rvzvkg.c.inc b/target/riscv/insn_trans/trans_rvzvkg.c.inc
new file mode 100644
index 0000000000..aa2004fb44
--- /dev/null
+++ b/target/riscv/insn_trans/trans_rvzvkg.c.inc
@@ -0,0 +1,9 @@
+static bool vghmac_check(DisasContext *s, arg_rmrr *a)
+{
+    return opivv_check(s, a) &&
+           s->cfg_ptr->ext_zvkg == true &&
+           s->vstart % 4 == 0 &&
+           s->sew == MO_32;
+}
+
+GEN_VV_UNMASKED_TRANS(vghmac_vv, vghmac_check)
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 9ca2cec23a..0bc1c9db65 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -1067,6 +1067,7 @@ static uint32_t opcode_at(DisasContextBase *dcbase, target_ulong pc)
 #include "insn_trans/trans_rvzvkns.c.inc"
 #include "insn_trans/trans_rvzvknh.c.inc"
 #include "insn_trans/trans_rvzvksh.c.inc"
+#include "insn_trans/trans_rvzvkg.c.inc"
 #include "insn_trans/trans_privileged.c.inc"
 #include "insn_trans/trans_svinval.c.inc"
 #include "insn_trans/trans_xventanacondops.c.inc"
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index 478e652c9b..a309ac3f03 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -827,3 +827,48 @@ void HELPER(vsm3c_vi)(void *vd_vptr, void *vs2_vptr, uint32_t uimm,
     vext_set_elems_1s(vd_vptr, vta, env->vl * esz, total_elems * esz);
     env->vstart = 0;
 }
+
+void HELPER(vghmac_vv)(void *vd_vptr, void *vs1_vptr, void *vs2_vptr,
+                       CPURISCVState *env, uint32_t desc)
+{
+    uint64_t *vd = vd_vptr;
+    uint64_t *vs1 = vs1_vptr;
+    uint64_t *vs2 = vs2_vptr;
+    uint32_t vta = vext_vta(desc);
+    uint32_t total_elems = vext_get_total_elems(env, desc, 4);
+
+    if (env->vl % 4 != 0) {
+        riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
+    }
+
+    for (uint32_t i = env->vstart / 4; i < env->vl / 4; i++) {
+        __uint128_t Y = reverse_bits_byte_8(vd[i * 2 + 0]) |
+                        ((__uint128_t)reverse_bits_byte_8(vd[i * 2 + 1]) << 64);
+        __uint128_t H = reverse_bits_byte_8(vs1[i * 2 + 0]) |
+                        ((__uint128_t)reverse_bits_byte_8(vs1[i * 2 + 1]) << 64);
+        __uint128_t X = vs2[i * 2 + 0] | ((__uint128_t)vs2[i * 2 + 1] << 64);
+        __uint128_t Z = 0;
+
+        for (uint j = 0; j < 128; j++) {
+            if ((Y >> j) & 1) {
+                Z ^= H;
+            }
+            bool reduce = ((H >> 127) & 1);
+            H = H << 1;
+            if (reduce) {
+                H ^= 0x87;
+            }
+        }
+
+        Z = reverse_bits_byte_8(Z & UINT64_MAX) |
+            ((__uint128_t)reverse_bits_byte_8((Z >> 64) & UINT64_MAX) << 64);
+
+        Z = Z ^ X;
+
+        vd[i * 2 + 0] = Z & UINT64_MAX;
+        vd[i * 2 + 1] = (Z >> 64) & UINT64_MAX;
+    }
+    /* set tail elements to 1s */
+    vext_set_elems_1s(vd, vta, env->vl * 4, total_elems * 4);
+    env->vstart = 0;
+}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 34/39] target/riscv: expose zvkg cpu property
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (32 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 33/39] target/riscv: Add vghmac.vv decoding, translation and execution support Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 35/39] crypto: Move SM4_SBOXWORD from target/riscv Lawrence Hunter
                   ` (4 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Lawrence Hunter

Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
---
 target/riscv/cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 6fded328f8..48701e118f 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1095,6 +1095,7 @@ static Property riscv_cpu_extensions[] = {
     DEFINE_PROP_BOOL("zmmul", RISCVCPU, cfg.ext_zmmul, false),
 
     DEFINE_PROP_BOOL("zvkb", RISCVCPU, cfg.ext_zvkb, false),
+    DEFINE_PROP_BOOL("zvkg", RISCVCPU, cfg.ext_zvkg, false),
     DEFINE_PROP_BOOL("zvknha", RISCVCPU, cfg.ext_zvknha, false),
     DEFINE_PROP_BOOL("zvknhb", RISCVCPU, cfg.ext_zvknhb, false),
     DEFINE_PROP_BOOL("zvkns", RISCVCPU, cfg.ext_zvkns, false),
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 35/39] crypto: Move SM4_SBOXWORD from target/riscv
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (33 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 34/39] target/riscv: expose zvkg cpu property Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 17:02   ` Richard Henderson
  2023-02-02 12:42 ` [PATCH 36/39] crypto: Add SM4 constant parameter CK Lawrence Hunter
                   ` (3 subsequent siblings)
  38 siblings, 1 reply; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Max Chou

From: Max Chou <max.chou@sifive.com>

    - Share SM4_SBOXWORD between target/riscv and target/arm.

Signed-off-by: Max Chou <max.chou@sifive.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
---
 include/crypto/sm4.h       |  7 +++++++
 target/arm/crypto_helper.c | 10 ++--------
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/crypto/sm4.h b/include/crypto/sm4.h
index 9bd3ebc62e..33478562a4 100644
--- a/include/crypto/sm4.h
+++ b/include/crypto/sm4.h
@@ -1,6 +1,13 @@
 #ifndef QEMU_SM4_H
 #define QEMU_SM4_H
 
+#define SM4_SBOXWORD(WORD) ( \
+    sm4_sbox[((WORD) >> 24) & 0xff] << 24 | \
+    sm4_sbox[((WORD) >> 16) & 0xff] << 16 | \
+    sm4_sbox[((WORD) >>  8) & 0xff] <<  8 | \
+    sm4_sbox[((WORD) >>  0) & 0xff] <<  0   \
+)
+
 extern const uint8_t sm4_sbox[256];
 
 #endif
diff --git a/target/arm/crypto_helper.c b/target/arm/crypto_helper.c
index d28690321f..4e97af9879 100644
--- a/target/arm/crypto_helper.c
+++ b/target/arm/crypto_helper.c
@@ -707,10 +707,7 @@ static void do_crypto_sm4e(uint64_t *rd, uint64_t *rn, uint64_t *rm)
             CR_ST_WORD(d, (i + 3) % 4) ^
             CR_ST_WORD(n, i);
 
-        t = sm4_sbox[t & 0xff] |
-            sm4_sbox[(t >> 8) & 0xff] << 8 |
-            sm4_sbox[(t >> 16) & 0xff] << 16 |
-            sm4_sbox[(t >> 24) & 0xff] << 24;
+        t = SM4_SBOXWORD(t);
 
         CR_ST_WORD(d, i) ^= t ^ rol32(t, 2) ^ rol32(t, 10) ^ rol32(t, 18) ^
                             rol32(t, 24);
@@ -744,10 +741,7 @@ static void do_crypto_sm4ekey(uint64_t *rd, uint64_t *rn, uint64_t *rm)
             CR_ST_WORD(d, (i + 3) % 4) ^
             CR_ST_WORD(m, i);
 
-        t = sm4_sbox[t & 0xff] |
-            sm4_sbox[(t >> 8) & 0xff] << 8 |
-            sm4_sbox[(t >> 16) & 0xff] << 16 |
-            sm4_sbox[(t >> 24) & 0xff] << 24;
+        t = SM4_SBOXWORD(t);
 
         CR_ST_WORD(d, i) ^= t ^ rol32(t, 13) ^ rol32(t, 23);
     }
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 36/39] crypto: Add SM4 constant parameter CK.
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (34 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 35/39] crypto: Move SM4_SBOXWORD from target/riscv Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 37/39] target/riscv: Add zvksed cfg property Lawrence Hunter
                   ` (2 subsequent siblings)
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Max Chou

From: Max Chou <max.chou@sifive.com>

Signed-off-by: Max Chou <max.chou@sifive.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
---
 crypto/sm4.c         | 10 ++++++++++
 include/crypto/sm4.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/crypto/sm4.c b/crypto/sm4.c
index 9f0cd452c7..2987306cf7 100644
--- a/crypto/sm4.c
+++ b/crypto/sm4.c
@@ -47,3 +47,13 @@ uint8_t const sm4_sbox[] = {
     0x79, 0xee, 0x5f, 0x3e, 0xd7, 0xcb, 0x39, 0x48,
 };
 
+uint32_t const sm4_ck[] = {
+    0x00070e15, 0x1c232a31, 0x383f464d, 0x545b6269,
+    0x70777e85, 0x8c939aa1, 0xa8afb6bd, 0xc4cbd2d9,
+    0xe0e7eef5, 0xfc030a11, 0x181f262d, 0x343b4249,
+    0x50575e65, 0x6c737a81, 0x888f969d, 0xa4abb2b9,
+    0xc0c7ced5, 0xdce3eaf1, 0xf8ff060d, 0x141b2229,
+    0x30373e45, 0x4c535a61, 0x686f767d, 0x848b9299,
+    0xa0a7aeb5, 0xbcc3cad1, 0xd8dfe6ed, 0xf4fb0209,
+    0x10171e25, 0x2c333a41, 0x484f565d, 0x646b7279
+};
diff --git a/include/crypto/sm4.h b/include/crypto/sm4.h
index 33478562a4..d0df6e473c 100644
--- a/include/crypto/sm4.h
+++ b/include/crypto/sm4.h
@@ -9,5 +9,6 @@
 )
 
 extern const uint8_t sm4_sbox[256];
+extern const uint32_t sm4_ck[32];
 
 #endif
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 37/39] target/riscv: Add zvksed cfg property
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (35 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 36/39] crypto: Add SM4 constant parameter CK Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 38/39] target/riscv: Add Zvksed support Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 39/39] target/riscv: Expose Zvksed property Lawrence Hunter
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Max Chou

From: Max Chou <max.chou@sifive.com>

Signed-off-by: Max Chou <max.chou@sifive.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
---
 target/riscv/cpu.c | 3 ++-
 target/riscv/cpu.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 48701e118f..0fa7049c3b 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -105,6 +105,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
     ISA_EXT_DATA_ENTRY(zvknha, true, PRIV_VERSION_1_12_0, ext_zvknha),
     ISA_EXT_DATA_ENTRY(zvknhb, true, PRIV_VERSION_1_12_0, ext_zvknhb),
     ISA_EXT_DATA_ENTRY(zvkns, true, PRIV_VERSION_1_12_0, ext_zvkns),
+    ISA_EXT_DATA_ENTRY(zvksed, true, PRIV_VERSION_1_12_0, ext_zvksed),
     ISA_EXT_DATA_ENTRY(zvksh, true, PRIV_VERSION_1_12_0, ext_zvksh),
     ISA_EXT_DATA_ENTRY(zhinx, true, PRIV_VERSION_1_12_0, ext_zhinx),
     ISA_EXT_DATA_ENTRY(zhinxmin, true, PRIV_VERSION_1_12_0, ext_zhinxmin),
@@ -803,7 +804,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)
          * in qemu
          */
         if ((cpu->cfg.ext_zvkb || cpu->cfg.ext_zvkns || cpu->cfg.ext_zvknha ||
-             cpu->cfg.ext_zvksh || cpu->cfg.ext_zvkg) &&
+             cpu->cfg.ext_zvksh || cpu->cfg.ext_zvkg || cpu->cfg.ext_zvksed) &&
             !(cpu->cfg.ext_zve32f || cpu->cfg.ext_zve64f || cpu->cfg.ext_v)) {
             error_setg(
                 errp, "Vector crypto extensions require V or Zve* extensions");
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index b3b1174d74..89e9fd61da 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -466,6 +466,7 @@ struct RISCVCPUConfig {
     bool ext_zvknha;
     bool ext_zvknhb;
     bool ext_zvkns;
+    bool ext_zvksed;
     bool ext_zvksh;
     bool ext_zmmul;
     bool ext_smaia;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 38/39] target/riscv: Add Zvksed support
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (36 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 37/39] target/riscv: Add zvksed cfg property Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  2023-02-02 12:42 ` [PATCH 39/39] target/riscv: Expose Zvksed property Lawrence Hunter
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Max Chou

From: Max Chou <max.chou@sifive.com>

    - add vsm4k, vsm4r instructions

Signed-off-by: Max Chou <max.chou@sifive.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
[lawrence.hunter@codethink.co.uk: Moved SM4 functions from
crypto_helper.c to vcrypto_helper.c]
[nazar.kazakov@codethink.co.uk: Added alignment checks, refactored code to
use macros, and minor style changes]
---
 target/riscv/crypto_helper.c                 |   1 +
 target/riscv/helper.h                        |   4 +
 target/riscv/insn32.decode                   |   5 +
 target/riscv/insn_trans/trans_rvzvksed.c.inc |  35 +++++
 target/riscv/translate.c                     |   1 +
 target/riscv/vcrypto_helper.c                | 139 +++++++++++++++++++
 6 files changed, 185 insertions(+)
 create mode 100644 target/riscv/insn_trans/trans_rvzvksed.c.inc

diff --git a/target/riscv/crypto_helper.c b/target/riscv/crypto_helper.c
index 2ef30281b1..760ce22570 100644
--- a/target/riscv/crypto_helper.c
+++ b/target/riscv/crypto_helper.c
@@ -23,6 +23,7 @@
 #include "exec/helper-proto.h"
 #include "crypto/aes.h"
 #include "crypto/sm4.h"
+#include "vector_internals.h"
 
 #define AES_XTIME(a) \
     ((a << 1) ^ ((a & 0x80) ? 0x1b : 0))
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 6272294d50..07fad2568c 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1201,3 +1201,7 @@ DEF_HELPER_5(vsm3me_vv, void, ptr, ptr, ptr, env, i32)
 DEF_HELPER_5(vsm3c_vi, void, ptr, ptr, i32, env, i32)
 
 DEF_HELPER_5(vghmac_vv, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_5(vsm4k_vi, void, ptr, ptr, i32, env, i32)
+DEF_HELPER_4(vsm4r_vv, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vsm4r_vs, void, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index ff044f8288..3e83884f43 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -933,3 +933,8 @@ vsm3c_vi        101011 1 ..... ..... 010 ..... 1110111 @r_vm_1
 
 # *** RV64 Zvkg vector crypto extension ***
 vghmac_vv       101100 1 ..... ..... 010 ..... 1110111 @r_vm_1
+
+# *** RV64 Zvksed Standart Extension ***
+vsm4k_vi        100001 1 ..... ..... 010 ..... 1110111 @r_vm_1
+vsm4r_vv        101000 1 ..... 10000 010 ..... 1110111 @r2_vm_1
+vsm4r_vs        101001 1 ..... 10000 010 ..... 1110111 @r2_vm_1
diff --git a/target/riscv/insn_trans/trans_rvzvksed.c.inc b/target/riscv/insn_trans/trans_rvzvksed.c.inc
new file mode 100644
index 0000000000..a30e0862e0
--- /dev/null
+++ b/target/riscv/insn_trans/trans_rvzvksed.c.inc
@@ -0,0 +1,35 @@
+#define ZVKSED_EGS 4
+
+static bool zvksed_check(DisasContext *s)
+{
+    return s->cfg_ptr->ext_zvksed == true && vext_check_isa_ill(s) &&
+           s->vstart % ZVKSED_EGS == 0 && s->sew == MO_32;
+}
+
+static bool vsm4k_vi_check(DisasContext *s, arg_rmrr *a)
+{
+    return zvksed_check(s) &&
+           require_align(a->rd, s->lmul) &&
+           require_align(a->rs2, s->lmul) &&
+           a->rs1 >= 0 && a->rs1 <= 7;
+}
+
+GEN_VI_UNMASKED_TRANS(vsm4k_vi, vsm4k_vi_check)
+
+static bool vsm4r_vv_check(DisasContext *s, arg_rmr *a)
+{
+    return zvksed_check(s) &&
+           require_align(a->rd, s->lmul) &&
+           require_align(a->rs2, s->lmul);
+}
+
+GEN_V_UNMASKED_TRANS(vsm4r_vv, vsm4r_vv_check)
+
+static bool vsm4r_vs_check(DisasContext *s, arg_rmr *a)
+{
+    return zvksed_check(s) &&
+           !is_overlapped(a->rd, 1 << MAX(s->lmul, 0), a->rs2, 1) &&
+           require_align(a->rd, s->lmul);
+}
+
+GEN_V_UNMASKED_TRANS(vsm4r_vs, vsm4r_vs_check)
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 0bc1c9db65..2ffb1827c5 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -1068,6 +1068,7 @@ static uint32_t opcode_at(DisasContextBase *dcbase, target_ulong pc)
 #include "insn_trans/trans_rvzvknh.c.inc"
 #include "insn_trans/trans_rvzvksh.c.inc"
 #include "insn_trans/trans_rvzvkg.c.inc"
+#include "insn_trans/trans_rvzvksed.c.inc"
 #include "insn_trans/trans_privileged.c.inc"
 #include "insn_trans/trans_svinval.c.inc"
 #include "insn_trans/trans_xventanacondops.c.inc"
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index a309ac3f03..fd9fc3c6d7 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -4,6 +4,7 @@
 #include "qemu/bswap.h"
 #include "cpu.h"
 #include "crypto/aes.h"
+#include "crypto/sm4.h"
 #include "exec/memop.h"
 #include "exec/exec-all.h"
 #include "exec/helper-proto.h"
@@ -872,3 +873,141 @@ void HELPER(vghmac_vv)(void *vd_vptr, void *vs1_vptr, void *vs2_vptr,
     vext_set_elems_1s(vd, vta, env->vl * 4, total_elems * 4);
     env->vstart = 0;
 }
+
+void HELPER(vsm4k_vi)(void *vd, void *vs2, uint32_t uimm5,
+                      CPURISCVState *env, uint32_t desc)
+{
+    const uint32_t egs = 4;
+    uint32_t rnd = uimm5;
+    uint32_t group_start = env->vstart / egs;
+    uint32_t group_end = env->vl / egs;
+    uint32_t esz = sizeof(uint32_t);
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+
+    if (env->vl % egs != 0) {
+        riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
+    }
+
+    for (uint32_t i = group_start; i < group_end; ++i) {
+        uint32_t vstart = i * egs;
+        uint32_t vend = (i + 1) * egs;
+        uint32_t rk[4] = {0};
+        uint32_t tmp[8] = {0};
+
+        for (uint32_t j = vstart; j < vend; ++j) {
+            rk[j - vstart] = *((uint32_t *)vs2 + H4(j));
+        }
+
+        for (uint32_t j = 0; j < egs; ++j) {
+            tmp[j] = rk[j];
+        }
+
+        for (uint32_t j = 0; j < egs; ++j) {
+            uint32_t b, s;
+            b = tmp[j + 1] ^ tmp[j + 2] ^ tmp[j + 3] ^ sm4_ck[rnd * 4 + j];
+
+            s = SM4_SBOXWORD(b);
+
+            tmp[j + 4] = tmp[j] ^ (s ^ rol32(s, 13) ^ rol32(s, 23));
+        }
+
+        for (uint32_t j = vstart; j < vend; ++j) {
+            *((uint32_t *)vd + H4(j)) = tmp[egs + (j - vstart)];
+        }
+    }
+
+    env->vstart = 0;
+    /* set tail elements to 1s */
+    vext_set_elems_1s(vd, vext_vta(desc), env->vl * esz, total_elems * esz);
+}
+
+static void do_sm4_round(uint32_t *rk, uint32_t *buf)
+{
+    const uint32_t egs = 4;
+    uint32_t s, b;
+
+    for (uint32_t j = egs; j < egs * 2; ++j) {
+        b = buf[j - 3] ^ buf[j - 2] ^ buf[j - 1] ^ rk[j - 4];
+
+        s = SM4_SBOXWORD(b);
+
+        buf[j] = buf[j - 4] ^ (s ^ rol32(s, 2) ^ rol32(s, 10) ^
+                rol32(s, 18) ^ rol32(s, 24));
+    }
+}
+
+void HELPER(vsm4r_vv)(void *vd, void *vs2, CPURISCVState *env, uint32_t desc)
+{
+    const uint32_t egs = 4;
+    uint32_t group_start = env->vstart / egs;
+    uint32_t group_end = env->vl / egs;
+    uint32_t esz = sizeof(uint32_t);
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+
+    if (env->vl % egs != 0) {
+        riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
+    }
+
+    for (uint32_t i = group_start; i < group_end; ++i) {
+        uint32_t vstart = i * egs;
+        uint32_t vend = (i + 1) * egs;
+        uint32_t rk[4] = {0};
+        uint32_t tmp[8] = {0};
+
+        for (uint32_t j = vstart; j < vend; ++j) {
+            rk[j - vstart] = *((uint32_t *)vs2 + H4(j));
+        }
+
+        for (uint32_t j = vstart; j < vend; ++j) {
+            tmp[j - vstart] = *((uint32_t *)vd + H4(j));
+        }
+
+        do_sm4_round(rk, tmp);
+
+        for (uint32_t j = vstart; j < vend; ++j) {
+            *((uint32_t *)vd + H4(j)) = tmp[egs + (j - vstart)];
+        }
+    }
+
+    env->vstart = 0;
+    /* set tail elements to 1s */
+    vext_set_elems_1s(vd, vext_vta(desc), env->vl * esz, total_elems * esz);
+}
+
+void HELPER(vsm4r_vs)(void *vd, void *vs2, CPURISCVState *env, uint32_t desc)
+{
+    const uint32_t egs = 4;
+    uint32_t group_start = env->vstart / egs;
+    uint32_t group_end = env->vl / egs;
+    uint32_t esz = sizeof(uint32_t);
+    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+
+    if (env->vl % egs != 0) {
+        riscv_raise_exception(env, RISCV_EXCP_ILLEGAL_INST, GETPC());
+    }
+
+    for (uint32_t i = group_start; i < group_end; ++i) {
+        uint32_t vstart = i * egs;
+        uint32_t vend = (i + 1) * egs;
+        uint32_t rk[4] = {0};
+        uint32_t tmp[8] = {0};
+
+        for (uint32_t j = 0; j < egs; ++j) {
+            rk[j] = *((uint32_t *)vs2 + H4(j));
+        }
+
+        for (uint32_t j = vstart; j < vend; ++j) {
+            tmp[j - vstart] = *((uint32_t *)vd + H4(j));
+        }
+
+        do_sm4_round(rk, tmp);
+
+        for (uint32_t j = vstart; j < vend; ++j) {
+            *((uint32_t *)vd + H4(j)) = tmp[egs + (j - vstart)];
+        }
+    }
+
+    env->vstart = 0;
+    /* set tail elements to 1s */
+    vext_set_elems_1s(vd, vext_vta(desc), env->vl * esz, total_elems * esz);
+}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 39/39] target/riscv: Expose Zvksed property
  2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
                   ` (37 preceding siblings ...)
  2023-02-02 12:42 ` [PATCH 38/39] target/riscv: Add Zvksed support Lawrence Hunter
@ 2023-02-02 12:42 ` Lawrence Hunter
  38 siblings, 0 replies; 59+ messages in thread
From: Lawrence Hunter @ 2023-02-02 12:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Max Chou

From: Max Chou <max.chou@sifive.com>

Signed-off-by: Max Chou <max.chou@sifive.com>
Reviewed-by: Frank Chang <frank.chang@sifive.com>
---
 target/riscv/cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 0fa7049c3b..a4e8347d5f 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1100,6 +1100,7 @@ static Property riscv_cpu_extensions[] = {
     DEFINE_PROP_BOOL("zvknha", RISCVCPU, cfg.ext_zvknha, false),
     DEFINE_PROP_BOOL("zvknhb", RISCVCPU, cfg.ext_zvknhb, false),
     DEFINE_PROP_BOOL("zvkns", RISCVCPU, cfg.ext_zvkns, false),
+    DEFINE_PROP_BOOL("zvksed", RISCVCPU, cfg.ext_zvksed, false),
     DEFINE_PROP_BOOL("zvksh", RISCVCPU, cfg.ext_zvksh, false),
 
     /* Vendor-specific custom extensions */
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 02/39] target/riscv: Add vclmul.vv decoding, translation and execution support
  2023-02-02 12:41 ` [PATCH 02/39] target/riscv: Add vclmul.vv decoding, translation and execution support Lawrence Hunter
@ 2023-02-02 13:53   ` Philipp Tomsich
  0 siblings, 0 replies; 59+ messages in thread
From: Philipp Tomsich @ 2023-02-02 13:53 UTC (permalink / raw)
  To: Lawrence Hunter
  Cc: qemu-devel, dickon.hood, nazar.kazakov, kiran.ostrolenk,
	frank.chang, palmer, alistair.francis, bin.meng, pbonzini, kvm,
	Max Chou

On Thu, 2 Feb 2023 at 13:42, Lawrence Hunter
<lawrence.hunter@codethink.co.uk> wrote:
>

Given that this is a non-trivial change, the commit message seems a bit brief?

> Co-authored-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
> Co-authored-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
> Co-authored-by: Max Chou <max.chou@sifive.com>
> Signed-off-by: Max Chou <max.chou@sifive.com>
> Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
> Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
> Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
> ---
>  target/riscv/helper.h                      |   3 +
>  target/riscv/insn32.decode                 |   3 +
>  target/riscv/insn_trans/trans_rvzvkb.c.inc |  41 +++++++
>  target/riscv/meson.build                   |   4 +-
>  target/riscv/translate.c                   |   1 +
>  target/riscv/vcrypto_helper.c              |  23 ++++
>  target/riscv/vector_helper.c               | 120 +--------------------
>  target/riscv/vector_internals.c            |  39 +++++++
>  target/riscv/vector_internals.h            | 116 ++++++++++++++++++++
>  9 files changed, 230 insertions(+), 120 deletions(-)
>  create mode 100644 target/riscv/insn_trans/trans_rvzvkb.c.inc
>  create mode 100644 target/riscv/vcrypto_helper.c
>  create mode 100644 target/riscv/vector_internals.c
>  create mode 100644 target/riscv/vector_internals.h
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 227c7122ef..e9127c9ccb 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1136,3 +1136,6 @@ DEF_HELPER_FLAGS_1(aes64im, TCG_CALL_NO_RWG_SE, tl, tl)
>
>  DEF_HELPER_FLAGS_3(sm4ed, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl)
>  DEF_HELPER_FLAGS_3(sm4ks, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl)
> +
> +/* Vector crypto functions */
> +DEF_HELPER_6(vclmul_vv, void, ptr, ptr, ptr, ptr, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index b7e7613ea2..5ddee69d60 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -890,3 +890,6 @@ sm3p1       00 01000 01001 ..... 001 ..... 0010011 @r2
>  # *** RV32 Zksed Standard Extension ***
>  sm4ed       .. 11000 ..... ..... 000 ..... 0110011 @k_aes
>  sm4ks       .. 11010 ..... ..... 000 ..... 0110011 @k_aes
> +
> +# *** RV64 Zvkb vector crypto extension ***
> +vclmul_vv       001100 . ..... ..... 010 ..... 1010111 @r_vm
> diff --git a/target/riscv/insn_trans/trans_rvzvkb.c.inc b/target/riscv/insn_trans/trans_rvzvkb.c.inc
> new file mode 100644
> index 0000000000..fb1995f737
> --- /dev/null
> +++ b/target/riscv/insn_trans/trans_rvzvkb.c.inc
> @@ -0,0 +1,41 @@
> +#define GEN_VV_MASKED_TRANS(NAME, CHECK)                               \
> +static bool trans_##NAME(DisasContext *s, arg_rmrr * a)                \
> +{                                                                      \
> +    if (CHECK(s, a)) {                                                 \
> +        uint32_t data = 0;                                             \
> +        TCGLabel *over = gen_new_label();                              \
> +        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);              \
> +        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);     \
> +                                                                       \
> +        data = FIELD_DP32(data, VDATA, VM, a->vm);                     \
> +        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                 \
> +        data = FIELD_DP32(data, VDATA, VTA, s->vta);                   \
> +        data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s); \
> +        data = FIELD_DP32(data, VDATA, VMA, s->vma);                   \
> +                                                                       \
> +        tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),         \
> +                           vreg_ofs(s, a->rs1),                        \
> +                           vreg_ofs(s, a->rs2), cpu_env,               \
> +                           s->cfg_ptr->vlen / 8,                       \
> +                           s->cfg_ptr->vlen / 8, data,                 \
> +                           gen_helper_##NAME);                         \
> +                                                                       \
> +        mark_vs_dirty(s);                                              \
> +        gen_set_label(over);                                           \
> +        return true;                                                   \
> +    }                                                                  \
> +    return false;                                                      \
> +}

This largely duplicates GEN_OPIVV_TRANS(NAME, CHECK).
Please refactor and share the common part into an 'opivv_trans' that
can be reused.

I would expect the common part to have the following signature:
   static bool opivv_trans(DisasContext *s, arg_rmrr *a,
gen_helper_gvec_4_ptr *fn)

> +
> +static bool zvkb_vv_check(DisasContext *s, arg_rmrr *a)
> +{
> +    return opivv_check(s, a) &&
> +           s->cfg_ptr->ext_zvkb == true;
> +}
> +
> +static bool vclmul_vv_check(DisasContext *s, arg_rmrr *a)
> +{
> +    return zvkb_vv_check(s, a) && s->sew == MO_64;
> +}
> +
> +GEN_VV_MASKED_TRANS(vclmul_vv, vclmul_vv_check)
> diff --git a/target/riscv/meson.build b/target/riscv/meson.build
> index ba25164d74..5313b01e5f 100644
> --- a/target/riscv/meson.build
> +++ b/target/riscv/meson.build
> @@ -15,10 +15,12 @@ riscv_ss.add(files(
>    'gdbstub.c',
>    'op_helper.c',
>    'vector_helper.c',
> +  'vector_internals.c',
>    'bitmanip_helper.c',
>    'translate.c',
>    'm128_helper.c',
> -  'crypto_helper.c'
> +  'crypto_helper.c',
> +  'vcrypto_helper.c'
>  ))
>  riscv_ss.add(when: 'CONFIG_KVM', if_true: files('kvm.c'), if_false: files('kvm-stub.c'))
>
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index df38db7553..71684c10f3 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -1063,6 +1063,7 @@ static uint32_t opcode_at(DisasContextBase *dcbase, target_ulong pc)
>  #include "insn_trans/trans_rvzawrs.c.inc"
>  #include "insn_trans/trans_rvzfh.c.inc"
>  #include "insn_trans/trans_rvk.c.inc"
> +#include "insn_trans/trans_rvzvkb.c.inc"
>  #include "insn_trans/trans_privileged.c.inc"
>  #include "insn_trans/trans_svinval.c.inc"
>  #include "insn_trans/trans_xventanacondops.c.inc"
> diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
> new file mode 100644
> index 0000000000..8a11e56754
> --- /dev/null
> +++ b/target/riscv/vcrypto_helper.c
> @@ -0,0 +1,23 @@
> +#include "qemu/osdep.h"
> +#include "qemu/host-utils.h"
> +#include "qemu/bitops.h"
> +#include "cpu.h"
> +#include "exec/memop.h"
> +#include "exec/exec-all.h"
> +#include "exec/helper-proto.h"
> +#include "tcg/tcg-gvec-desc.h"
> +#include "internals.h"
> +#include "vector_internals.h"
> +
> +static void do_vclmul_vv(void *vd, void *vs1, void *vs2, int i)
> +{
> +    uint64_t result = 0;
> +    for (int j = 63; j >= 0; j--) {
> +        if ((((uint64_t *)vs1)[i] >> j) & 1) {

Why reverse the order we evaluate the bits here?
The spec has:

foreach (i from 0 to (width - 1)) {
      if y[i] == 1 then result = result ^ (x << i);
}

> +            result ^= (((uint64_t *)vs2)[i] << j);
> +        }
> +    }
> +    ((uint64_t *)vd)[i] = result;
> +}
> +
> +GEN_VEXT_VV(vclmul_vv, 8)

This should go through the RVVCALL macro (may need refactoring into a
different header file) from vector_helper.c.
You could then easily (and without the gratuitous casting and repeated
array-accesses) express this as:

/* vclmul.vv */
static uint64_t clmul64(uint64_t x, uint64_t y)
{
    target_ulong result = 0;
    const unsigned int elem_width = 64;

    for (unsigned int i = 0; i < elem_width; ++i)
        if ((y >> i) & 1)
            result ^= (x << i);

    return result;
}

RVVCALL(OPIVV2, vclmul_vv_d, OP_UUU_D, H8, H8, H8, clmul64)
GEN_VEXT_VV(vclmul_vv_d, 8)

> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 00de879787..def1b21414 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -26,6 +26,7 @@
>  #include "fpu/softfloat.h"
>  #include "tcg/tcg-gvec-desc.h"
>  #include "internals.h"
> +#include "vector_internals.h"
>  #include <math.h>
>
>  target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
> @@ -95,48 +96,6 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
>  #define H8(x)   (x)
>  #endif
>
> -static inline uint32_t vext_nf(uint32_t desc)
> -{
> -    return FIELD_EX32(simd_data(desc), VDATA, NF);
> -}
> -
> -static inline uint32_t vext_vm(uint32_t desc)
> -{
> -    return FIELD_EX32(simd_data(desc), VDATA, VM);
> -}
> -
> -/*
> - * Encode LMUL to lmul as following:
> - *     LMUL    vlmul    lmul
> - *      1       000       0
> - *      2       001       1
> - *      4       010       2
> - *      8       011       3
> - *      -       100       -
> - *     1/8      101      -3
> - *     1/4      110      -2
> - *     1/2      111      -1
> - */
> -static inline int32_t vext_lmul(uint32_t desc)
> -{
> -    return sextract32(FIELD_EX32(simd_data(desc), VDATA, LMUL), 0, 3);
> -}
> -
> -static inline uint32_t vext_vta(uint32_t desc)
> -{
> -    return FIELD_EX32(simd_data(desc), VDATA, VTA);
> -}
> -
> -static inline uint32_t vext_vma(uint32_t desc)
> -{
> -    return FIELD_EX32(simd_data(desc), VDATA, VMA);
> -}
> -
> -static inline uint32_t vext_vta_all_1s(uint32_t desc)
> -{
> -    return FIELD_EX32(simd_data(desc), VDATA, VTA_ALL_1S);
> -}
> -

Please refactor in a standalone patch in the series.

>  /*
>   * Get the maximum number of elements can be operated.
>   *
> @@ -155,21 +114,6 @@ static inline uint32_t vext_max_elems(uint32_t desc, uint32_t log2_esz)
>      return scale < 0 ? vlenb >> -scale : vlenb << scale;
>  }
>
> -/*
> - * Get number of total elements, including prestart, body and tail elements.
> - * Note that when LMUL < 1, the tail includes the elements past VLMAX that
> - * are held in the same vector register.
> - */
> -static inline uint32_t vext_get_total_elems(CPURISCVState *env, uint32_t desc,
> -                                            uint32_t esz)
> -{
> -    uint32_t vlenb = simd_maxsz(desc);
> -    uint32_t sew = 1 << FIELD_EX64(env->vtype, VTYPE, VSEW);
> -    int8_t emul = ctzl(esz) - ctzl(sew) + vext_lmul(desc) < 0 ? 0 :
> -                  ctzl(esz) - ctzl(sew) + vext_lmul(desc);
> -    return (vlenb << emul) / esz;
> -}
> -
>  static inline target_ulong adjust_addr(CPURISCVState *env, target_ulong addr)
>  {
>      return (addr & env->cur_pmmask) | env->cur_pmbase;
> @@ -202,20 +146,6 @@ static void probe_pages(CPURISCVState *env, target_ulong addr,
>      }
>  }
>
> -/* set agnostic elements to 1s */
> -static void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
> -                              uint32_t tot)
> -{
> -    if (is_agnostic == 0) {
> -        /* policy undisturbed */
> -        return;
> -    }
> -    if (tot - cnt == 0) {
> -        return;
> -    }
> -    memset(base + cnt, -1, tot - cnt);
> -}
> -
>  static inline void vext_set_elem_mask(void *v0, int index,
>                                        uint8_t value)
>  {
> @@ -225,18 +155,6 @@ static inline void vext_set_elem_mask(void *v0, int index,
>      ((uint64_t *)v0)[idx] = deposit64(old, pos, 1, value);
>  }
>
> -/*
> - * Earlier designs (pre-0.9) had a varying number of bits
> - * per mask value (MLEN). In the 0.9 design, MLEN=1.
> - * (Section 4.5)
> - */
> -static inline int vext_elem_mask(void *v0, int index)
> -{
> -    int idx = index / 64;
> -    int pos = index  % 64;
> -    return (((uint64_t *)v0)[idx] >> pos) & 1;
> -}
> -
>  /* elements operations for load and store */
>  typedef void vext_ldst_elem_fn(CPURISCVState *env, target_ulong addr,
>                                 uint32_t idx, void *vd, uintptr_t retaddr);
> @@ -800,8 +718,6 @@ GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b)
>  #define NOP_UUU_H uint16_t, uint16_t, uint32_t, uint16_t, uint32_t
>  #define NOP_UUU_W uint32_t, uint32_t, uint64_t, uint32_t, uint64_t
>
> -/* operation of two vector elements */
> -typedef void opivv2_fn(void *vd, void *vs1, void *vs2, int i);
>
>  #define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
>  static void do_##NAME(void *vd, void *vs1, void *vs2, int i)    \
> @@ -822,40 +738,6 @@ RVVCALL(OPIVV2, vsub_vv_h, OP_SSS_H, H2, H2, H2, DO_SUB)
>  RVVCALL(OPIVV2, vsub_vv_w, OP_SSS_W, H4, H4, H4, DO_SUB)
>  RVVCALL(OPIVV2, vsub_vv_d, OP_SSS_D, H8, H8, H8, DO_SUB)
>
> -static void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
> -                       CPURISCVState *env, uint32_t desc,
> -                       opivv2_fn *fn, uint32_t esz)
> -{
> -    uint32_t vm = vext_vm(desc);
> -    uint32_t vl = env->vl;
> -    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
> -    uint32_t vta = vext_vta(desc);
> -    uint32_t vma = vext_vma(desc);
> -    uint32_t i;
> -
> -    for (i = env->vstart; i < vl; i++) {
> -        if (!vm && !vext_elem_mask(v0, i)) {
> -            /* set masked-off elements to 1s */
> -            vext_set_elems_1s(vd, vma, i * esz, (i + 1) * esz);
> -            continue;
> -        }
> -        fn(vd, vs1, vs2, i);
> -    }
> -    env->vstart = 0;
> -    /* set tail elements to 1s */
> -    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
> -}
> -
> -/* generate the helpers for OPIVV */
> -#define GEN_VEXT_VV(NAME, ESZ)                            \
> -void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
> -                  void *vs2, CPURISCVState *env,          \
> -                  uint32_t desc)                          \
> -{                                                         \
> -    do_vext_vv(vd, v0, vs1, vs2, env, desc,               \
> -               do_##NAME, ESZ);                           \
> -}
> -
>  GEN_VEXT_VV(vadd_vv_b, 1)
>  GEN_VEXT_VV(vadd_vv_h, 2)
>  GEN_VEXT_VV(vadd_vv_w, 4)
> diff --git a/target/riscv/vector_internals.c b/target/riscv/vector_internals.c
> new file mode 100644
> index 0000000000..a264797882
> --- /dev/null
> +++ b/target/riscv/vector_internals.c
> @@ -0,0 +1,39 @@
> +#include "vector_internals.h"
> +
> +/* set agnostic elements to 1s */
> +void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
> +                       uint32_t tot)
> +{
> +    if (is_agnostic == 0) {
> +        /* policy undisturbed */
> +        return;
> +    }
> +    if (tot - cnt == 0) {
> +        return ;
> +    }
> +    memset(base + cnt, -1, tot - cnt);
> +}
> +
> +void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
> +                CPURISCVState *env, uint32_t desc,
> +                opivv2_fn *fn, uint32_t esz)
> +{
> +    uint32_t vm = vext_vm(desc);
> +    uint32_t vl = env->vl;
> +    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
> +    uint32_t vta = vext_vta(desc);
> +    uint32_t vma = vext_vma(desc);
> +    uint32_t i;
> +
> +    for (i = env->vstart; i < vl; i++) {
> +        if (!vm && !vext_elem_mask(v0, i)) {
> +            /* set masked-off elements to 1s */
> +            vext_set_elems_1s(vd, vma, i * esz, (i + 1) * esz);
> +            continue;
> +        }
> +        fn(vd, vs1, vs2, i);
> +    }
> +    env->vstart = 0;
> +    /* set tail elements to 1s */
> +    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
> +}
> diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h
> new file mode 100644
> index 0000000000..f61803acc0
> --- /dev/null
> +++ b/target/riscv/vector_internals.h
> @@ -0,0 +1,116 @@
> +/*
> + * RISC-V Vector Extension Internals
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef TARGET_RISCV_VECTOR_INTERNAL_H
> +#define TARGET_RISCV_VECTOR_INTERNAL_H
> +
> +#include "qemu/osdep.h"
> +#include "qemu/bitops.h"
> +#include "cpu.h"
> +#include "tcg/tcg-gvec-desc.h"
> +#include "internals.h"
> +
> +static inline uint32_t vext_nf(uint32_t desc)
> +{
> +    return FIELD_EX32(simd_data(desc), VDATA, NF);
> +}
> +
> +/*
> + * Encode LMUL to lmul as following:
> + *     LMUL    vlmul    lmul
> + *      1       000       0
> + *      2       001       1
> + *      4       010       2
> + *      8       011       3
> + *      -       100       -
> + *     1/8      101      -3
> + *     1/4      110      -2
> + *     1/2      111      -1
> + */
> +static inline int32_t vext_lmul(uint32_t desc)
> +{
> +    return sextract32(FIELD_EX32(simd_data(desc), VDATA, LMUL), 0, 3);
> +}
> +
> +static inline uint32_t vext_vm(uint32_t desc)
> +{
> +    return FIELD_EX32(simd_data(desc), VDATA, VM);
> +}
> +
> +static inline uint32_t vext_vma(uint32_t desc)
> +{
> +    return FIELD_EX32(simd_data(desc), VDATA, VMA);
> +}
> +
> +static inline uint32_t vext_vta(uint32_t desc)
> +{
> +    return FIELD_EX32(simd_data(desc), VDATA, VTA);
> +}
> +
> +static inline uint32_t vext_vta_all_1s(uint32_t desc)
> +{
> +    return FIELD_EX32(simd_data(desc), VDATA, VTA_ALL_1S);
> +}
> +
> +/*
> + * Earlier designs (pre-0.9) had a varying number of bits
> + * per mask value (MLEN). In the 0.9 design, MLEN=1.
> + * (Section 4.5)
> + */
> +static inline int vext_elem_mask(void *v0, int index)
> +{
> +    int idx = index / 64;
> +    int pos = index  % 64;
> +    return (((uint64_t *)v0)[idx] >> pos) & 1;
> +}
> +
> +/*
> + * Get number of total elements, including prestart, body and tail elements.
> + * Note that when LMUL < 1, the tail includes the elements past VLMAX that
> + * are held in the same vector register.
> + */
> +static inline uint32_t vext_get_total_elems(CPURISCVState *env, uint32_t desc,
> +                                            uint32_t esz)
> +{
> +    uint32_t vlenb = simd_maxsz(desc);
> +    uint32_t sew = 1 << FIELD_EX64(env->vtype, VTYPE, VSEW);
> +    int8_t emul = ctzl(esz) - ctzl(sew) + vext_lmul(desc) < 0 ? 0 :
> +                  ctzl(esz) - ctzl(sew) + vext_lmul(desc);
> +    return (vlenb << emul) / esz;
> +}
> +
> +/* set agnostic elements to 1s */
> +void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
> +                       uint32_t tot);
> +
> +/* operation of two vector elements */
> +typedef void opivv2_fn(void *vd, void *vs1, void *vs2, int i);
> +
> +void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
> +                CPURISCVState *env, uint32_t desc,
> +                opivv2_fn *fn, uint32_t esz);
> +
> +/* generate the helpers for OPIVV */
> +#define GEN_VEXT_VV(NAME, ESZ)                            \
> +void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
> +                  void *vs2, CPURISCVState *env,          \
> +                  uint32_t desc)                          \
> +{                                                         \
> +    do_vext_vv(vd, v0, vs1, vs2, env, desc,               \
> +               do_##NAME, ESZ);                           \
> +}
> +
> +#endif /* TARGET_RISCV_VECTOR_INTERNAL_H */

Again: please split the refactoring off into a separate patch.


> --
> 2.39.1
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 03/39] target/riscv: Add vclmul.vx decoding, translation and execution support
  2023-02-02 12:41 ` [PATCH 03/39] target/riscv: Add vclmul.vx " Lawrence Hunter
@ 2023-02-02 13:59   ` Philipp Tomsich
  0 siblings, 0 replies; 59+ messages in thread
From: Philipp Tomsich @ 2023-02-02 13:59 UTC (permalink / raw)
  To: Lawrence Hunter
  Cc: qemu-devel, dickon.hood, nazar.kazakov, kiran.ostrolenk,
	frank.chang, palmer, alistair.francis, bin.meng, pbonzini, kvm

On Thu, 2 Feb 2023 at 13:42, Lawrence Hunter
<lawrence.hunter@codethink.co.uk> wrote:
>

Please split off the refactoring.
See below for more comments.

> Co-authored-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
> Co-authored-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
> Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
> Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
> Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
> ---
>  target/riscv/helper.h                      |  1 +
>  target/riscv/insn32.decode                 |  1 +
>  target/riscv/insn_trans/trans_rvzvkb.c.inc | 54 ++++++++++++++++++++++
>  target/riscv/vcrypto_helper.c              | 12 +++++
>  target/riscv/vector_helper.c               | 36 ---------------
>  target/riscv/vector_internals.c            | 24 ++++++++++
>  target/riscv/vector_internals.h            | 16 +++++++
>  7 files changed, 108 insertions(+), 36 deletions(-)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index e9127c9ccb..6c786ef6f3 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1139,3 +1139,4 @@ DEF_HELPER_FLAGS_3(sm4ks, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl)
>
>  /* Vector crypto functions */
>  DEF_HELPER_6(vclmul_vv, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vclmul_vx, void, ptr, ptr, tl, ptr, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 5ddee69d60..4a7421354d 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -893,3 +893,4 @@ sm4ks       .. 11010 ..... ..... 000 ..... 0110011 @k_aes
>
>  # *** RV64 Zvkb vector crypto extension ***
>  vclmul_vv       001100 . ..... ..... 010 ..... 1010111 @r_vm
> +vclmul_vx       001100 . ..... ..... 110 ..... 1010111 @r_vm
> diff --git a/target/riscv/insn_trans/trans_rvzvkb.c.inc b/target/riscv/insn_trans/trans_rvzvkb.c.inc
> index fb1995f737..6e8b81136c 100644
> --- a/target/riscv/insn_trans/trans_rvzvkb.c.inc
> +++ b/target/riscv/insn_trans/trans_rvzvkb.c.inc
> @@ -39,3 +39,57 @@ static bool vclmul_vv_check(DisasContext *s, arg_rmrr *a)
>  }
>
>  GEN_VV_MASKED_TRANS(vclmul_vv, vclmul_vv_check)
> +
> +#define GEN_VX_MASKED_TRANS(NAME, CHECK)                                \
> +static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                  \
> +{                                                                       \
> +    if (CHECK(s, a)) {                                                  \
> +        TCGv_ptr rd_v, v0_v, rs2_v;                                     \
> +        TCGv rs1;                                                       \
> +        TCGv_i32 desc;                                                  \
> +        uint32_t data = 0;                                              \
> +                                                                        \
> +        TCGLabel *over = gen_new_label();                               \
> +        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);               \
> +        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);      \
> +                                                                        \
> +        data = FIELD_DP32(data, VDATA, VM, a->vm);                      \
> +        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                  \
> +        data = FIELD_DP32(data, VDATA, VTA, s->vta);                    \
> +        data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s);  \
> +        data = FIELD_DP32(data, VDATA, VMA, s->vma);                    \
> +                                                                        \
> +        rd_v = tcg_temp_new_ptr();                                      \
> +        v0_v = tcg_temp_new_ptr();                                      \
> +        rs1 = get_gpr(s, a->rs1, EXT_ZERO);                             \
> +        rs2_v = tcg_temp_new_ptr();                                     \
> +        desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlen / 8,         \
> +                                          s->cfg_ptr->vlen / 8, data)); \
> +        tcg_gen_addi_ptr(rd_v, cpu_env, vreg_ofs(s, a->rd));            \
> +        tcg_gen_addi_ptr(v0_v, cpu_env, vreg_ofs(s, 0));                \
> +        tcg_gen_addi_ptr(rs2_v, cpu_env, vreg_ofs(s, a->rs2));          \
> +        gen_helper_##NAME(rd_v, v0_v, rs1, rs2_v, cpu_env, desc);       \
> +        tcg_temp_free_ptr(rd_v);                                        \
> +        tcg_temp_free_ptr(v0_v);                                        \
> +        tcg_temp_free_ptr(rs2_v);                                       \
> +                                                                        \
> +        mark_vs_dirty(s);                                               \
> +        gen_set_label(over);                                            \
> +        return true;                                                    \
> +    }                                                                   \
> +    return false;                                                       \
> +}

Why not reuse the opivx_trans() function that is already present and
call that from this macro?

> +
> +static bool zvkb_vx_check(DisasContext *s, arg_rmrr *a)
> +{
> +    return opivx_check(s, a) &&
> +           s->cfg_ptr->ext_zvkb == true;
> +}
> +
> +static bool vclmul_vx_check(DisasContext *s, arg_rmrr *a)
> +{
> +    return zvkb_vx_check(s, a) &&
> +           s->sew == MO_64;
> +}
> +
> +GEN_VX_MASKED_TRANS(vclmul_vx, vclmul_vx_check)
> diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
> index 8a11e56754..c453d348ad 100644
> --- a/target/riscv/vcrypto_helper.c
> +++ b/target/riscv/vcrypto_helper.c
> @@ -20,4 +20,16 @@ static void do_vclmul_vv(void *vd, void *vs1, void *vs2, int i)
>      ((uint64_t *)vd)[i] = result;
>  }
>
> +static void do_vclmul_vx(void *vd, target_long rs1, void *vs2, int i)
> +{
> +    uint64_t result = 0;
> +    for (int j = 63; j >= 0; j--) {
> +        if ((rs1 >> j) & 1) {
> +            result ^= (((uint64_t *)vs2)[i] << j);
> +        }
> +    }
> +    ((uint64_t *)vd)[i] = result;
> +}

This can be dropped, if you use the clmul64() I had proposed in the
previous patch review.
You can then use the existing generator macros:

RVVCALL(OPIVV2, vclmul_vv_d, OP_UUU_D, H8, H8, H8, clmul64)
RVVCALL(OPIVX2, vclmul_vx_d, OP_UUU_D, H8, H8, clmul64)
GEN_VEXT_VV(vclmul_vv_d, 8)
GEN_VEXT_VX(vclmul_vx_d, 8)

> +
>  GEN_VEXT_VV(vclmul_vv, 8)
> +GEN_VEXT_VX(vclmul_vx, 8)
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index def1b21414..ab470092f6 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -747,8 +747,6 @@ GEN_VEXT_VV(vsub_vv_h, 2)
>  GEN_VEXT_VV(vsub_vv_w, 4)
>  GEN_VEXT_VV(vsub_vv_d, 8)
>
> -typedef void opivx2_fn(void *vd, target_long s1, void *vs2, int i);
> -
>  /*
>   * (T1)s1 gives the real operator type.
>   * (TX1)(T1)s1 expands the operator type of widen or narrow operations.
> @@ -773,40 +771,6 @@ RVVCALL(OPIVX2, vrsub_vx_h, OP_SSS_H, H2, H2, DO_RSUB)
>  RVVCALL(OPIVX2, vrsub_vx_w, OP_SSS_W, H4, H4, DO_RSUB)
>  RVVCALL(OPIVX2, vrsub_vx_d, OP_SSS_D, H8, H8, DO_RSUB)
>
> -static void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
> -                       CPURISCVState *env, uint32_t desc,
> -                       opivx2_fn fn, uint32_t esz)
> -{
> -    uint32_t vm = vext_vm(desc);
> -    uint32_t vl = env->vl;
> -    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
> -    uint32_t vta = vext_vta(desc);
> -    uint32_t vma = vext_vma(desc);
> -    uint32_t i;
> -
> -    for (i = env->vstart; i < vl; i++) {
> -        if (!vm && !vext_elem_mask(v0, i)) {
> -            /* set masked-off elements to 1s */
> -            vext_set_elems_1s(vd, vma, i * esz, (i + 1) * esz);
> -            continue;
> -        }
> -        fn(vd, s1, vs2, i);
> -    }
> -    env->vstart = 0;
> -    /* set tail elements to 1s */
> -    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
> -}
> -
> -/* generate the helpers for OPIVX */
> -#define GEN_VEXT_VX(NAME, ESZ)                            \
> -void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
> -                  void *vs2, CPURISCVState *env,          \
> -                  uint32_t desc)                          \
> -{                                                         \
> -    do_vext_vx(vd, v0, s1, vs2, env, desc,                \
> -               do_##NAME, ESZ);                           \
> -}
> -
>  GEN_VEXT_VX(vadd_vx_b, 1)
>  GEN_VEXT_VX(vadd_vx_h, 2)
>  GEN_VEXT_VX(vadd_vx_w, 4)
> diff --git a/target/riscv/vector_internals.c b/target/riscv/vector_internals.c
> index a264797882..b23fa4dd74 100644
> --- a/target/riscv/vector_internals.c
> +++ b/target/riscv/vector_internals.c
> @@ -37,3 +37,27 @@ void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
>      /* set tail elements to 1s */
>      vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
>  }
> +
> +void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
> +                CPURISCVState *env, uint32_t desc,
> +                opivx2_fn fn, uint32_t esz)
> +{
> +    uint32_t vm = vext_vm(desc);
> +    uint32_t vl = env->vl;
> +    uint32_t total_elems = vext_get_total_elems(env, desc, esz);
> +    uint32_t vta = vext_vta(desc);
> +    uint32_t vma = vext_vma(desc);
> +    uint32_t i;
> +
> +    for (i = env->vstart; i < vl; i++) {
> +        if (!vm && !vext_elem_mask(v0, i)) {
> +            /* set masked-off elements to 1s */
> +            vext_set_elems_1s(vd, vma, i * esz, (i + 1) * esz);
> +            continue;
> +        }
> +        fn(vd, s1, vs2, i);
> +    }
> +    env->vstart = 0;
> +    /* set tail elements to 1s */
> +    vext_set_elems_1s(vd, vta, vl * esz, total_elems * esz);
> +}
> diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h
> index f61803acc0..49529d2379 100644
> --- a/target/riscv/vector_internals.h
> +++ b/target/riscv/vector_internals.h
> @@ -113,4 +113,20 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
>                 do_##NAME, ESZ);                           \
>  }
>
> +typedef void opivx2_fn(void *vd, target_long s1, void *vs2, int i);
> +
> +void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
> +                CPURISCVState *env, uint32_t desc,
> +                opivx2_fn fn, uint32_t esz);
> +
> +/* generate the helpers for OPIVX */
> +#define GEN_VEXT_VX(NAME, ESZ)                            \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1,    \
> +                  void *vs2, CPURISCVState *env,          \
> +                  uint32_t desc)                          \
> +{                                                         \
> +    do_vext_vx(vd, v0, s1, vs2, env, desc,                \
> +               do_##NAME, ESZ);                           \
> +}
> +
>  #endif /* TARGET_RISCV_VECTOR_INTERNAL_H */
> --
> 2.39.1
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/39] target/riscv: Add vclmulh.vv decoding, translation and execution support
  2023-02-02 12:41 ` [PATCH 04/39] target/riscv: Add vclmulh.vv " Lawrence Hunter
@ 2023-02-02 14:03   ` Philipp Tomsich
  2023-02-02 16:54   ` Richard Henderson
  1 sibling, 0 replies; 59+ messages in thread
From: Philipp Tomsich @ 2023-02-02 14:03 UTC (permalink / raw)
  To: Lawrence Hunter
  Cc: qemu-devel, dickon.hood, nazar.kazakov, kiran.ostrolenk,
	frank.chang, palmer, alistair.francis, bin.meng, pbonzini, kvm

On Thu, 2 Feb 2023 at 13:42, Lawrence Hunter
<lawrence.hunter@codethink.co.uk> wrote:
>
> Signed-off-by: Lawrence Hunter <lawrence.hunter@codethink.co.uk>
> ---
>  target/riscv/helper.h                      |  1 +
>  target/riscv/insn32.decode                 |  1 +
>  target/riscv/insn_trans/trans_rvzvkb.c.inc |  1 +
>  target/riscv/vcrypto_helper.c              | 12 ++++++++++++
>  4 files changed, 15 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 6c786ef6f3..a155272701 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1140,3 +1140,4 @@ DEF_HELPER_FLAGS_3(sm4ks, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl)
>  /* Vector crypto functions */
>  DEF_HELPER_6(vclmul_vv, void, ptr, ptr, ptr, ptr, env, i32)
>  DEF_HELPER_6(vclmul_vx, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vclmulh_vv, void, ptr, ptr, ptr, ptr, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 4a7421354d..e26ea1df08 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -894,3 +894,4 @@ sm4ks       .. 11010 ..... ..... 000 ..... 0110011 @k_aes
>  # *** RV64 Zvkb vector crypto extension ***
>  vclmul_vv       001100 . ..... ..... 010 ..... 1010111 @r_vm
>  vclmul_vx       001100 . ..... ..... 110 ..... 1010111 @r_vm
> +vclmulh_vv      001101 . ..... ..... 010 ..... 1010111 @r_vm
> diff --git a/target/riscv/insn_trans/trans_rvzvkb.c.inc b/target/riscv/insn_trans/trans_rvzvkb.c.inc
> index 6e8b81136c..19ce4c7431 100644
> --- a/target/riscv/insn_trans/trans_rvzvkb.c.inc
> +++ b/target/riscv/insn_trans/trans_rvzvkb.c.inc
> @@ -39,6 +39,7 @@ static bool vclmul_vv_check(DisasContext *s, arg_rmrr *a)
>  }
>
>  GEN_VV_MASKED_TRANS(vclmul_vv, vclmul_vv_check)
> +GEN_VV_MASKED_TRANS(vclmulh_vv, vclmul_vv_check)
>
>  #define GEN_VX_MASKED_TRANS(NAME, CHECK)                                \
>  static bool trans_##NAME(DisasContext *s, arg_rmrr *a)                  \
> diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
> index c453d348ad..022b941131 100644
> --- a/target/riscv/vcrypto_helper.c
> +++ b/target/riscv/vcrypto_helper.c
> @@ -31,5 +31,17 @@ static void do_vclmul_vx(void *vd, target_long rs1, void *vs2, int i)
>      ((uint64_t *)vd)[i] = result;
>  }
>
> +static void do_vclmulh_vv(void *vd, void *vs1, void *vs2, int i)
> +{
> +    __uint128_t result = 0;
> +    for (int j = 63; j >= 0; j--) {
> +        if ((((uint64_t *)vs1)[i] >> j) & 1) {
> +            result ^= (((__uint128_t)(((uint64_t *)vs2)[i])) << j);

Why are we computing a 128 bit result, if we reduce it to 64b bits anyway?

> +        }
> +    }
> +    ((uint64_t *)vd)[i] = (result >> 64);
> +}

Please simplify in the same way as for clmul (i.e. a single function
that computes uint64 x uint64 -> uint64 ... and 2 calls to the RVVCALL
generator for OPIVV2 and OPIVX2):

static uint64_t clmulh64(uint64_t x, uint64_t y)
{
    target_ulong result = 0;
    const unsigned int elem_width = 64;

    for (unsigned int i = 1; i < elem_width; ++i)
        if ((y >> i) & 1)
            result ^= (x >> (elem_width - i));

    return result;
}

/* vclmulh.vv */
RVVCALL(OPIVV2, vclmulh_vv_d, OP_UUU_D, H8, H8, H8, clmulh64)
GEN_VEXT_VV(vclmulh_vv_d, 8)

/* vclmulh.vx */
RVVCALL(OPIVX2, vclmulh_vx_d, OP_UUU_D, H8, H8, clmulh64)
GEN_VEXT_VX(vclmulh_vx_d, 8)


> +
>  GEN_VEXT_VV(vclmul_vv, 8)
>  GEN_VEXT_VX(vclmul_vx, 8)
> +GEN_VEXT_VV(vclmulh_vv, 8)
> --
> 2.39.1
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/39] target/riscv: Add vrol.[vv,vx] and vror.[vv,vx,vi] decoding, translation and execution support
  2023-02-02 12:41   ` [PATCH 06/39] target/riscv: Add vrol.[vv, vx] and vror.[vv, vx, vi] " Lawrence Hunter
@ 2023-02-02 14:13     ` Philipp Tomsich
  -1 siblings, 0 replies; 59+ messages in thread
From: Philipp Tomsich @ 2023-02-02 14:13 UTC (permalink / raw)
  To: Lawrence Hunter
  Cc: qemu-devel, dickon.hood, nazar.kazakov, kiran.ostrolenk,
	frank.chang, palmer, alistair.francis, bin.meng, pbonzini, kvm

On Thu, 2 Feb 2023 at 13:42, Lawrence Hunter
<lawrence.hunter@codethink.co.uk> wrote:
>
> From: Dickon Hood <dickon.hood@codethink.co.uk>
>
> Add an initial implementation of the vrol.* and vror.* instructions,
> with mappings between the RISC-V instructions and their internal TCG
> accelerated implmentations.
>
> There are some missing ror helpers, so I've bodged it by converting them
> to rols.
>
> Co-authored-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
> Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
> Signed-off-by: Dickon Hood <dickon.hood@codethink.co.uk>
> ---
>  target/riscv/helper.h                      | 20 ++++++++
>  target/riscv/insn32.decode                 |  6 +++
>  target/riscv/insn_trans/trans_rvzvkb.c.inc | 20 ++++++++
>  target/riscv/vcrypto_helper.c              | 58 ++++++++++++++++++++++
>  target/riscv/vector_helper.c               | 45 -----------------
>  target/riscv/vector_internals.h            | 52 +++++++++++++++++++
>  6 files changed, 156 insertions(+), 45 deletions(-)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 32f1179e29..e5b6b3360f 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1142,3 +1142,23 @@ DEF_HELPER_6(vclmul_vv, void, ptr, ptr, ptr, ptr, env, i32)
>  DEF_HELPER_6(vclmul_vx, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vclmulh_vv, void, ptr, ptr, ptr, ptr, env, i32)
>  DEF_HELPER_6(vclmulh_vx, void, ptr, ptr, tl, ptr, env, i32)
> +
> +DEF_HELPER_6(vror_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vror_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vror_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vror_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +
> +DEF_HELPER_6(vror_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vror_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vror_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vror_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> +
> +DEF_HELPER_6(vrol_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vrol_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vrol_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vrol_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +
> +DEF_HELPER_6(vrol_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vrol_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vrol_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vrol_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index b4d88dd1cb..725f907ad1 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -896,3 +896,9 @@ vclmul_vv       001100 . ..... ..... 010 ..... 1010111 @r_vm
>  vclmul_vx       001100 . ..... ..... 110 ..... 1010111 @r_vm
>  vclmulh_vv      001101 . ..... ..... 010 ..... 1010111 @r_vm
>  vclmulh_vx      001101 . ..... ..... 110 ..... 1010111 @r_vm
> +vrol_vv         010101 . ..... ..... 000 ..... 1010111 @r_vm
> +vrol_vx         010101 . ..... ..... 100 ..... 1010111 @r_vm
> +vror_vv         010100 . ..... ..... 000 ..... 1010111 @r_vm
> +vror_vx         010100 . ..... ..... 100 ..... 1010111 @r_vm
> +vror_vi         010100 . ..... ..... 011 ..... 1010111 @r_vm
> +vror_vi2        010101 . ..... ..... 011 ..... 1010111 @r_vm

We have only a single vror_vi instruction... and it has a 6bit immediate.
There's no need to deviate from the spec. Just write it as follows:
%imm_z6   26:1 15:5
@r2_zimm6  ..... . vm:1 ..... ..... ... ..... .......  &rmrr %rs2
rs1=%imm_z6 %rd
vror_vi         01010. . ..... ..... 011 ..... 1010111 @r2_zimm6

> diff --git a/target/riscv/insn_trans/trans_rvzvkb.c.inc b/target/riscv/insn_trans/trans_rvzvkb.c.inc
> index 533141e559..d2a7a92d42 100644
> --- a/target/riscv/insn_trans/trans_rvzvkb.c.inc
> +++ b/target/riscv/insn_trans/trans_rvzvkb.c.inc
> @@ -95,3 +95,23 @@ static bool vclmul_vx_check(DisasContext *s, arg_rmrr *a)
>
>  GEN_VX_MASKED_TRANS(vclmul_vx, vclmul_vx_check)
>  GEN_VX_MASKED_TRANS(vclmulh_vx, vclmul_vx_check)
> +
> +GEN_OPIVV_TRANS(vror_vv, zvkb_vv_check)
> +GEN_OPIVX_TRANS(vror_vx, zvkb_vx_check)
> +GEN_OPIVV_TRANS(vrol_vv, zvkb_vv_check)
> +GEN_OPIVX_TRANS(vrol_vx, zvkb_vx_check)
> +
> +GEN_OPIVI_TRANS(vror_vi, IMM_TRUNC_SEW, vror_vx, zvkb_vx_check)

Please introduce a IMM_ZIMM6 that integrates with the extract_imm as follows:
    case IMM_ZIMM6:
        return extract64(imm, 0, 6);

> +
> +/*
> + * Immediates are 5b long, and we need six for the rotate-immediate.  The
> + * decision has been taken to remove the vrol.vi instruction -- you can
> + * emulate it with a ror, after all -- and use the bottom bit of the funct6
> + * part of the opcode to encode the extra bit.  I've chosen to implement it
> + * like this because it's easy and reasonably clean.
> + */
> +static bool trans_vror_vi2(DisasContext *s, arg_rmrr *a)
> +{
> +    a->rs1 += 32;
> +    return trans_vror_vi(s, a);
> +}

As discussed above, please handle this in a single trans_vror_vi:
   GEN_OPIVI_GVEC_TRANS_CHECK(vror_vi, IMM_ZIMM6, vror_vx, rotri, zvkb_check_vi)

> diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
> index 46e2e510c5..7ec75c5589 100644
> --- a/target/riscv/vcrypto_helper.c
> +++ b/target/riscv/vcrypto_helper.c
> @@ -57,3 +57,61 @@ GEN_VEXT_VV(vclmul_vv, 8)
>  GEN_VEXT_VX(vclmul_vx, 8)
>  GEN_VEXT_VV(vclmulh_vv, 8)
>  GEN_VEXT_VX(vclmulh_vx, 8)
> +
> +/*
> + *  Looks a mess, but produces reasonable (aarch32) code on clang:
> + * https://godbolt.org/z/jchjsTda8
> + */
> +#define DO_ROR(x, n)                       \
> +    ((x >> (n & ((sizeof(x) << 3) - 1))) | \
> +     (x << ((sizeof(x) << 3) - (n & ((sizeof(x) << 3) - 1)))))
> +#define DO_ROL(x, n)                       \
> +    ((x << (n & ((sizeof(x) << 3) - 1))) | \
> +     (x >> ((sizeof(x) << 3) - (n & ((sizeof(x) << 3) - 1)))))
> +
> +RVVCALL(OPIVV2, vror_vv_b, OP_UUU_B, H1, H1, H1, DO_ROR)
> +RVVCALL(OPIVV2, vror_vv_h, OP_UUU_H, H2, H2, H2, DO_ROR)
> +RVVCALL(OPIVV2, vror_vv_w, OP_UUU_W, H4, H4, H4, DO_ROR)
> +RVVCALL(OPIVV2, vror_vv_d, OP_UUU_D, H8, H8, H8, DO_ROR)

Indeed, this is a mess: we already have ror8/16/32/64 available.
Why not the following?

/* vror.vv */
GEN_VEXT_SHIFT_VV(vror_vv_b, uint8_t,  uint8_t,  H1, H1, ror8,  0x7)
GEN_VEXT_SHIFT_VV(vror_vv_h, uint16_t, uint16_t, H2, H2, ror16, 0xf)
GEN_VEXT_SHIFT_VV(vror_vv_w, uint32_t, uint32_t, H4, H4, ror32, 0x1f)
GEN_VEXT_SHIFT_VV(vror_vv_d, uint64_t, uint64_t, H8, H8, ror64, 0x3f)

> +GEN_VEXT_VV(vror_vv_b, 1)
> +GEN_VEXT_VV(vror_vv_h, 2)
> +GEN_VEXT_VV(vror_vv_w, 4)
> +GEN_VEXT_VV(vror_vv_d, 8)
> +
> +/*
> + * There's a missing tcg_gen_gvec_rotrs() helper function.
> + */
> +#define GEN_VEXT_VX_RTOL(NAME, ESZ)                                      \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,        \
> +                  CPURISCVState *env, uint32_t desc)                     \
> +{                                                                        \
> +    do_vext_vx(vd, v0, (ESZ << 3) - s1, vs2, env, desc, do_##NAME, ESZ); \
> +}
> +
> +/* DO_ROL because GEN_VEXT_VX_RTOL() converts from R to L */
> +RVVCALL(OPIVX2, vror_vx_b, OP_UUU_B, H1, H1, DO_ROL)
> +RVVCALL(OPIVX2, vror_vx_h, OP_UUU_H, H2, H2, DO_ROL)
> +RVVCALL(OPIVX2, vror_vx_w, OP_UUU_W, H4, H4, DO_ROL)
> +RVVCALL(OPIVX2, vror_vx_d, OP_UUU_D, H8, H8, DO_ROL)

Same applies as above:
/* vror.vx */
GEN_VEXT_SHIFT_VX(vror_vx_b, uint8_t,  uint8_t,  H1, H1, ror8,  0x7)
GEN_VEXT_SHIFT_VX(vror_vx_h, uint16_t, uint16_t, H2, H2, ror16, 0xf)
GEN_VEXT_SHIFT_VX(vror_vx_w, uint32_t, uint32_t, H4, H4, ror32, 0x1f)
GEN_VEXT_SHIFT_VX(vror_vx_d, uint64_t, uint64_t, H8, H8, ror64, 0x3f)

> +GEN_VEXT_VX_RTOL(vror_vx_b, 1)
> +GEN_VEXT_VX_RTOL(vror_vx_h, 2)
> +GEN_VEXT_VX_RTOL(vror_vx_w, 4)
> +GEN_VEXT_VX_RTOL(vror_vx_d, 8)
> +
> +RVVCALL(OPIVV2, vrol_vv_b, OP_UUU_B, H1, H1, H1, DO_ROL)
> +RVVCALL(OPIVV2, vrol_vv_h, OP_UUU_H, H2, H2, H2, DO_ROL)
> +RVVCALL(OPIVV2, vrol_vv_w, OP_UUU_W, H4, H4, H4, DO_ROL)
> +RVVCALL(OPIVV2, vrol_vv_d, OP_UUU_D, H8, H8, H8, DO_ROL)
> +GEN_VEXT_VV(vrol_vv_b, 1)
> +GEN_VEXT_VV(vrol_vv_h, 2)
> +GEN_VEXT_VV(vrol_vv_w, 4)
> +GEN_VEXT_VV(vrol_vv_d, 8)
> +
> +RVVCALL(OPIVX2, vrol_vx_b, OP_UUU_B, H1, H1, DO_ROL)
> +RVVCALL(OPIVX2, vrol_vx_h, OP_UUU_H, H2, H2, DO_ROL)
> +RVVCALL(OPIVX2, vrol_vx_w, OP_UUU_W, H4, H4, DO_ROL)
> +RVVCALL(OPIVX2, vrol_vx_d, OP_UUU_D, H8, H8, DO_ROL)
> +GEN_VEXT_VX(vrol_vx_b, 1)
> +GEN_VEXT_VX(vrol_vx_h, 2)
> +GEN_VEXT_VX(vrol_vx_w, 4)
> +GEN_VEXT_VX(vrol_vx_d, 8)
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index ab470092f6..ff7b03cbe3 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -76,26 +76,6 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
>      return vl;
>  }
>
> -/*
> - * Note that vector data is stored in host-endian 64-bit chunks,
> - * so addressing units smaller than that needs a host-endian fixup.
> - */
> -#if HOST_BIG_ENDIAN
> -#define H1(x)   ((x) ^ 7)
> -#define H1_2(x) ((x) ^ 6)
> -#define H1_4(x) ((x) ^ 4)
> -#define H2(x)   ((x) ^ 3)
> -#define H4(x)   ((x) ^ 1)
> -#define H8(x)   ((x))
> -#else
> -#define H1(x)   (x)
> -#define H1_2(x) (x)
> -#define H1_4(x) (x)
> -#define H2(x)   (x)
> -#define H4(x)   (x)
> -#define H8(x)   (x)
> -#endif
> -
>  /*
>   * Get the maximum number of elements can be operated.
>   *
> @@ -683,18 +663,11 @@ GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b)
>   *** Vector Integer Arithmetic Instructions
>   */
>
> -/* expand macro args before macro */
> -#define RVVCALL(macro, ...)  macro(__VA_ARGS__)
> -
>  /* (TD, T1, T2, TX1, TX2) */
>  #define OP_SSS_B int8_t, int8_t, int8_t, int8_t, int8_t
>  #define OP_SSS_H int16_t, int16_t, int16_t, int16_t, int16_t
>  #define OP_SSS_W int32_t, int32_t, int32_t, int32_t, int32_t
>  #define OP_SSS_D int64_t, int64_t, int64_t, int64_t, int64_t
> -#define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
> -#define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
> -#define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
> -#define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
>  #define OP_SUS_B int8_t, uint8_t, int8_t, uint8_t, int8_t
>  #define OP_SUS_H int16_t, uint16_t, int16_t, uint16_t, int16_t
>  #define OP_SUS_W int32_t, uint32_t, int32_t, uint32_t, int32_t
> @@ -718,14 +691,6 @@ GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b)
>  #define NOP_UUU_H uint16_t, uint16_t, uint32_t, uint16_t, uint32_t
>  #define NOP_UUU_W uint32_t, uint32_t, uint64_t, uint32_t, uint64_t
>
> -
> -#define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
> -static void do_##NAME(void *vd, void *vs1, void *vs2, int i)    \
> -{                                                               \
> -    TX1 s1 = *((T1 *)vs1 + HS1(i));                             \
> -    TX2 s2 = *((T2 *)vs2 + HS2(i));                             \
> -    *((TD *)vd + HD(i)) = OP(s2, s1);                           \
> -}
>  #define DO_SUB(N, M) (N - M)
>  #define DO_RSUB(N, M) (M - N)
>
> @@ -747,16 +712,6 @@ GEN_VEXT_VV(vsub_vv_h, 2)
>  GEN_VEXT_VV(vsub_vv_w, 4)
>  GEN_VEXT_VV(vsub_vv_d, 8)
>
> -/*
> - * (T1)s1 gives the real operator type.
> - * (TX1)(T1)s1 expands the operator type of widen or narrow operations.
> - */
> -#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
> -static void do_##NAME(void *vd, target_long s1, void *vs2, int i)   \
> -{                                                                   \
> -    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
> -    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1);                      \
> -}
>
>  RVVCALL(OPIVX2, vadd_vx_b, OP_SSS_B, H1, H1, DO_ADD)
>  RVVCALL(OPIVX2, vadd_vx_h, OP_SSS_H, H2, H2, DO_ADD)
> diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h
> index 49529d2379..a0fbac7bf3 100644
> --- a/target/riscv/vector_internals.h
> +++ b/target/riscv/vector_internals.h
> @@ -28,6 +28,26 @@ static inline uint32_t vext_nf(uint32_t desc)
>      return FIELD_EX32(simd_data(desc), VDATA, NF);
>  }
>
> +/*
> + * Note that vector data is stored in host-endian 64-bit chunks,
> + * so addressing units smaller than that needs a host-endian fixup.
> + */
> +#if HOST_BIG_ENDIAN
> +#define H1(x)   ((x) ^ 7)
> +#define H1_2(x) ((x) ^ 6)
> +#define H1_4(x) ((x) ^ 4)
> +#define H2(x)   ((x) ^ 3)
> +#define H4(x)   ((x) ^ 1)
> +#define H8(x)   ((x))
> +#else
> +#define H1(x)   (x)
> +#define H1_2(x) (x)
> +#define H1_4(x) (x)
> +#define H2(x)   (x)
> +#define H4(x)   (x)
> +#define H8(x)   (x)
> +#endif
> +
>  /*
>   * Encode LMUL to lmul as following:
>   *     LMUL    vlmul    lmul
> @@ -96,9 +116,30 @@ static inline uint32_t vext_get_total_elems(CPURISCVState *env, uint32_t desc,
>  void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
>                         uint32_t tot);
>
> +/*
> + *** Vector Integer Arithmetic Instructions
> + */
> +
> +/* expand macro args before macro */
> +#define RVVCALL(macro, ...)  macro(__VA_ARGS__)
> +
> +/* (TD, T1, T2, TX1, TX2) */
> +#define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
> +#define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
> +#define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
> +#define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
> +
>  /* operation of two vector elements */
>  typedef void opivv2_fn(void *vd, void *vs1, void *vs2, int i);
>
> +#define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
> +static void do_##NAME(void *vd, void *vs1, void *vs2, int i)    \
> +{                                                               \
> +    TX1 s1 = *((T1 *)vs1 + HS1(i));                             \
> +    TX2 s2 = *((T2 *)vs2 + HS2(i));                             \
> +    *((TD *)vd + HD(i)) = OP(s2, s1);                           \
> +}
> +
>  void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
>                  CPURISCVState *env, uint32_t desc,
>                  opivv2_fn *fn, uint32_t esz);
> @@ -115,6 +156,17 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
>
>  typedef void opivx2_fn(void *vd, target_long s1, void *vs2, int i);
>
> +/*
> + * (T1)s1 gives the real operator type.
> + * (TX1)(T1)s1 expands the operator type of widen or narrow operations.
> + */
> +#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
> +static void do_##NAME(void *vd, target_long s1, void *vs2, int i)   \
> +{                                                                   \
> +    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
> +    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1);                      \
> +}
> +
>  void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
>                  CPURISCVState *env, uint32_t desc,
>                  opivx2_fn fn, uint32_t esz);

Again: refactoring needs to go into a separate patch.

> --
> 2.39.1
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/39] target/riscv: Add vrol.[vv, vx] and vror.[vv, vx, vi] decoding, translation and execution support
@ 2023-02-02 14:13     ` Philipp Tomsich
  0 siblings, 0 replies; 59+ messages in thread
From: Philipp Tomsich @ 2023-02-02 14:13 UTC (permalink / raw)
  To: Lawrence Hunter
  Cc: qemu-devel, dickon.hood, nazar.kazakov, kiran.ostrolenk,
	frank.chang, palmer, alistair.francis, bin.meng, pbonzini, kvm

On Thu, 2 Feb 2023 at 13:42, Lawrence Hunter
<lawrence.hunter@codethink.co.uk> wrote:
>
> From: Dickon Hood <dickon.hood@codethink.co.uk>
>
> Add an initial implementation of the vrol.* and vror.* instructions,
> with mappings between the RISC-V instructions and their internal TCG
> accelerated implmentations.
>
> There are some missing ror helpers, so I've bodged it by converting them
> to rols.
>
> Co-authored-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
> Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
> Signed-off-by: Dickon Hood <dickon.hood@codethink.co.uk>
> ---
>  target/riscv/helper.h                      | 20 ++++++++
>  target/riscv/insn32.decode                 |  6 +++
>  target/riscv/insn_trans/trans_rvzvkb.c.inc | 20 ++++++++
>  target/riscv/vcrypto_helper.c              | 58 ++++++++++++++++++++++
>  target/riscv/vector_helper.c               | 45 -----------------
>  target/riscv/vector_internals.h            | 52 +++++++++++++++++++
>  6 files changed, 156 insertions(+), 45 deletions(-)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 32f1179e29..e5b6b3360f 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1142,3 +1142,23 @@ DEF_HELPER_6(vclmul_vv, void, ptr, ptr, ptr, ptr, env, i32)
>  DEF_HELPER_6(vclmul_vx, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vclmulh_vv, void, ptr, ptr, ptr, ptr, env, i32)
>  DEF_HELPER_6(vclmulh_vx, void, ptr, ptr, tl, ptr, env, i32)
> +
> +DEF_HELPER_6(vror_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vror_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vror_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vror_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +
> +DEF_HELPER_6(vror_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vror_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vror_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vror_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> +
> +DEF_HELPER_6(vrol_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vrol_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vrol_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vrol_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +
> +DEF_HELPER_6(vrol_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vrol_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vrol_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vrol_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index b4d88dd1cb..725f907ad1 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -896,3 +896,9 @@ vclmul_vv       001100 . ..... ..... 010 ..... 1010111 @r_vm
>  vclmul_vx       001100 . ..... ..... 110 ..... 1010111 @r_vm
>  vclmulh_vv      001101 . ..... ..... 010 ..... 1010111 @r_vm
>  vclmulh_vx      001101 . ..... ..... 110 ..... 1010111 @r_vm
> +vrol_vv         010101 . ..... ..... 000 ..... 1010111 @r_vm
> +vrol_vx         010101 . ..... ..... 100 ..... 1010111 @r_vm
> +vror_vv         010100 . ..... ..... 000 ..... 1010111 @r_vm
> +vror_vx         010100 . ..... ..... 100 ..... 1010111 @r_vm
> +vror_vi         010100 . ..... ..... 011 ..... 1010111 @r_vm
> +vror_vi2        010101 . ..... ..... 011 ..... 1010111 @r_vm

We have only a single vror_vi instruction... and it has a 6bit immediate.
There's no need to deviate from the spec. Just write it as follows:
%imm_z6   26:1 15:5
@r2_zimm6  ..... . vm:1 ..... ..... ... ..... .......  &rmrr %rs2
rs1=%imm_z6 %rd
vror_vi         01010. . ..... ..... 011 ..... 1010111 @r2_zimm6

> diff --git a/target/riscv/insn_trans/trans_rvzvkb.c.inc b/target/riscv/insn_trans/trans_rvzvkb.c.inc
> index 533141e559..d2a7a92d42 100644
> --- a/target/riscv/insn_trans/trans_rvzvkb.c.inc
> +++ b/target/riscv/insn_trans/trans_rvzvkb.c.inc
> @@ -95,3 +95,23 @@ static bool vclmul_vx_check(DisasContext *s, arg_rmrr *a)
>
>  GEN_VX_MASKED_TRANS(vclmul_vx, vclmul_vx_check)
>  GEN_VX_MASKED_TRANS(vclmulh_vx, vclmul_vx_check)
> +
> +GEN_OPIVV_TRANS(vror_vv, zvkb_vv_check)
> +GEN_OPIVX_TRANS(vror_vx, zvkb_vx_check)
> +GEN_OPIVV_TRANS(vrol_vv, zvkb_vv_check)
> +GEN_OPIVX_TRANS(vrol_vx, zvkb_vx_check)
> +
> +GEN_OPIVI_TRANS(vror_vi, IMM_TRUNC_SEW, vror_vx, zvkb_vx_check)

Please introduce a IMM_ZIMM6 that integrates with the extract_imm as follows:
    case IMM_ZIMM6:
        return extract64(imm, 0, 6);

> +
> +/*
> + * Immediates are 5b long, and we need six for the rotate-immediate.  The
> + * decision has been taken to remove the vrol.vi instruction -- you can
> + * emulate it with a ror, after all -- and use the bottom bit of the funct6
> + * part of the opcode to encode the extra bit.  I've chosen to implement it
> + * like this because it's easy and reasonably clean.
> + */
> +static bool trans_vror_vi2(DisasContext *s, arg_rmrr *a)
> +{
> +    a->rs1 += 32;
> +    return trans_vror_vi(s, a);
> +}

As discussed above, please handle this in a single trans_vror_vi:
   GEN_OPIVI_GVEC_TRANS_CHECK(vror_vi, IMM_ZIMM6, vror_vx, rotri, zvkb_check_vi)

> diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
> index 46e2e510c5..7ec75c5589 100644
> --- a/target/riscv/vcrypto_helper.c
> +++ b/target/riscv/vcrypto_helper.c
> @@ -57,3 +57,61 @@ GEN_VEXT_VV(vclmul_vv, 8)
>  GEN_VEXT_VX(vclmul_vx, 8)
>  GEN_VEXT_VV(vclmulh_vv, 8)
>  GEN_VEXT_VX(vclmulh_vx, 8)
> +
> +/*
> + *  Looks a mess, but produces reasonable (aarch32) code on clang:
> + * https://godbolt.org/z/jchjsTda8
> + */
> +#define DO_ROR(x, n)                       \
> +    ((x >> (n & ((sizeof(x) << 3) - 1))) | \
> +     (x << ((sizeof(x) << 3) - (n & ((sizeof(x) << 3) - 1)))))
> +#define DO_ROL(x, n)                       \
> +    ((x << (n & ((sizeof(x) << 3) - 1))) | \
> +     (x >> ((sizeof(x) << 3) - (n & ((sizeof(x) << 3) - 1)))))
> +
> +RVVCALL(OPIVV2, vror_vv_b, OP_UUU_B, H1, H1, H1, DO_ROR)
> +RVVCALL(OPIVV2, vror_vv_h, OP_UUU_H, H2, H2, H2, DO_ROR)
> +RVVCALL(OPIVV2, vror_vv_w, OP_UUU_W, H4, H4, H4, DO_ROR)
> +RVVCALL(OPIVV2, vror_vv_d, OP_UUU_D, H8, H8, H8, DO_ROR)

Indeed, this is a mess: we already have ror8/16/32/64 available.
Why not the following?

/* vror.vv */
GEN_VEXT_SHIFT_VV(vror_vv_b, uint8_t,  uint8_t,  H1, H1, ror8,  0x7)
GEN_VEXT_SHIFT_VV(vror_vv_h, uint16_t, uint16_t, H2, H2, ror16, 0xf)
GEN_VEXT_SHIFT_VV(vror_vv_w, uint32_t, uint32_t, H4, H4, ror32, 0x1f)
GEN_VEXT_SHIFT_VV(vror_vv_d, uint64_t, uint64_t, H8, H8, ror64, 0x3f)

> +GEN_VEXT_VV(vror_vv_b, 1)
> +GEN_VEXT_VV(vror_vv_h, 2)
> +GEN_VEXT_VV(vror_vv_w, 4)
> +GEN_VEXT_VV(vror_vv_d, 8)
> +
> +/*
> + * There's a missing tcg_gen_gvec_rotrs() helper function.
> + */
> +#define GEN_VEXT_VX_RTOL(NAME, ESZ)                                      \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,        \
> +                  CPURISCVState *env, uint32_t desc)                     \
> +{                                                                        \
> +    do_vext_vx(vd, v0, (ESZ << 3) - s1, vs2, env, desc, do_##NAME, ESZ); \
> +}
> +
> +/* DO_ROL because GEN_VEXT_VX_RTOL() converts from R to L */
> +RVVCALL(OPIVX2, vror_vx_b, OP_UUU_B, H1, H1, DO_ROL)
> +RVVCALL(OPIVX2, vror_vx_h, OP_UUU_H, H2, H2, DO_ROL)
> +RVVCALL(OPIVX2, vror_vx_w, OP_UUU_W, H4, H4, DO_ROL)
> +RVVCALL(OPIVX2, vror_vx_d, OP_UUU_D, H8, H8, DO_ROL)

Same applies as above:
/* vror.vx */
GEN_VEXT_SHIFT_VX(vror_vx_b, uint8_t,  uint8_t,  H1, H1, ror8,  0x7)
GEN_VEXT_SHIFT_VX(vror_vx_h, uint16_t, uint16_t, H2, H2, ror16, 0xf)
GEN_VEXT_SHIFT_VX(vror_vx_w, uint32_t, uint32_t, H4, H4, ror32, 0x1f)
GEN_VEXT_SHIFT_VX(vror_vx_d, uint64_t, uint64_t, H8, H8, ror64, 0x3f)

> +GEN_VEXT_VX_RTOL(vror_vx_b, 1)
> +GEN_VEXT_VX_RTOL(vror_vx_h, 2)
> +GEN_VEXT_VX_RTOL(vror_vx_w, 4)
> +GEN_VEXT_VX_RTOL(vror_vx_d, 8)
> +
> +RVVCALL(OPIVV2, vrol_vv_b, OP_UUU_B, H1, H1, H1, DO_ROL)
> +RVVCALL(OPIVV2, vrol_vv_h, OP_UUU_H, H2, H2, H2, DO_ROL)
> +RVVCALL(OPIVV2, vrol_vv_w, OP_UUU_W, H4, H4, H4, DO_ROL)
> +RVVCALL(OPIVV2, vrol_vv_d, OP_UUU_D, H8, H8, H8, DO_ROL)
> +GEN_VEXT_VV(vrol_vv_b, 1)
> +GEN_VEXT_VV(vrol_vv_h, 2)
> +GEN_VEXT_VV(vrol_vv_w, 4)
> +GEN_VEXT_VV(vrol_vv_d, 8)
> +
> +RVVCALL(OPIVX2, vrol_vx_b, OP_UUU_B, H1, H1, DO_ROL)
> +RVVCALL(OPIVX2, vrol_vx_h, OP_UUU_H, H2, H2, DO_ROL)
> +RVVCALL(OPIVX2, vrol_vx_w, OP_UUU_W, H4, H4, DO_ROL)
> +RVVCALL(OPIVX2, vrol_vx_d, OP_UUU_D, H8, H8, DO_ROL)
> +GEN_VEXT_VX(vrol_vx_b, 1)
> +GEN_VEXT_VX(vrol_vx_h, 2)
> +GEN_VEXT_VX(vrol_vx_w, 4)
> +GEN_VEXT_VX(vrol_vx_d, 8)
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index ab470092f6..ff7b03cbe3 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -76,26 +76,6 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
>      return vl;
>  }
>
> -/*
> - * Note that vector data is stored in host-endian 64-bit chunks,
> - * so addressing units smaller than that needs a host-endian fixup.
> - */
> -#if HOST_BIG_ENDIAN
> -#define H1(x)   ((x) ^ 7)
> -#define H1_2(x) ((x) ^ 6)
> -#define H1_4(x) ((x) ^ 4)
> -#define H2(x)   ((x) ^ 3)
> -#define H4(x)   ((x) ^ 1)
> -#define H8(x)   ((x))
> -#else
> -#define H1(x)   (x)
> -#define H1_2(x) (x)
> -#define H1_4(x) (x)
> -#define H2(x)   (x)
> -#define H4(x)   (x)
> -#define H8(x)   (x)
> -#endif
> -
>  /*
>   * Get the maximum number of elements can be operated.
>   *
> @@ -683,18 +663,11 @@ GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b)
>   *** Vector Integer Arithmetic Instructions
>   */
>
> -/* expand macro args before macro */
> -#define RVVCALL(macro, ...)  macro(__VA_ARGS__)
> -
>  /* (TD, T1, T2, TX1, TX2) */
>  #define OP_SSS_B int8_t, int8_t, int8_t, int8_t, int8_t
>  #define OP_SSS_H int16_t, int16_t, int16_t, int16_t, int16_t
>  #define OP_SSS_W int32_t, int32_t, int32_t, int32_t, int32_t
>  #define OP_SSS_D int64_t, int64_t, int64_t, int64_t, int64_t
> -#define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
> -#define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
> -#define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
> -#define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
>  #define OP_SUS_B int8_t, uint8_t, int8_t, uint8_t, int8_t
>  #define OP_SUS_H int16_t, uint16_t, int16_t, uint16_t, int16_t
>  #define OP_SUS_W int32_t, uint32_t, int32_t, uint32_t, int32_t
> @@ -718,14 +691,6 @@ GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b)
>  #define NOP_UUU_H uint16_t, uint16_t, uint32_t, uint16_t, uint32_t
>  #define NOP_UUU_W uint32_t, uint32_t, uint64_t, uint32_t, uint64_t
>
> -
> -#define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
> -static void do_##NAME(void *vd, void *vs1, void *vs2, int i)    \
> -{                                                               \
> -    TX1 s1 = *((T1 *)vs1 + HS1(i));                             \
> -    TX2 s2 = *((T2 *)vs2 + HS2(i));                             \
> -    *((TD *)vd + HD(i)) = OP(s2, s1);                           \
> -}
>  #define DO_SUB(N, M) (N - M)
>  #define DO_RSUB(N, M) (M - N)
>
> @@ -747,16 +712,6 @@ GEN_VEXT_VV(vsub_vv_h, 2)
>  GEN_VEXT_VV(vsub_vv_w, 4)
>  GEN_VEXT_VV(vsub_vv_d, 8)
>
> -/*
> - * (T1)s1 gives the real operator type.
> - * (TX1)(T1)s1 expands the operator type of widen or narrow operations.
> - */
> -#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
> -static void do_##NAME(void *vd, target_long s1, void *vs2, int i)   \
> -{                                                                   \
> -    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
> -    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1);                      \
> -}
>
>  RVVCALL(OPIVX2, vadd_vx_b, OP_SSS_B, H1, H1, DO_ADD)
>  RVVCALL(OPIVX2, vadd_vx_h, OP_SSS_H, H2, H2, DO_ADD)
> diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h
> index 49529d2379..a0fbac7bf3 100644
> --- a/target/riscv/vector_internals.h
> +++ b/target/riscv/vector_internals.h
> @@ -28,6 +28,26 @@ static inline uint32_t vext_nf(uint32_t desc)
>      return FIELD_EX32(simd_data(desc), VDATA, NF);
>  }
>
> +/*
> + * Note that vector data is stored in host-endian 64-bit chunks,
> + * so addressing units smaller than that needs a host-endian fixup.
> + */
> +#if HOST_BIG_ENDIAN
> +#define H1(x)   ((x) ^ 7)
> +#define H1_2(x) ((x) ^ 6)
> +#define H1_4(x) ((x) ^ 4)
> +#define H2(x)   ((x) ^ 3)
> +#define H4(x)   ((x) ^ 1)
> +#define H8(x)   ((x))
> +#else
> +#define H1(x)   (x)
> +#define H1_2(x) (x)
> +#define H1_4(x) (x)
> +#define H2(x)   (x)
> +#define H4(x)   (x)
> +#define H8(x)   (x)
> +#endif
> +
>  /*
>   * Encode LMUL to lmul as following:
>   *     LMUL    vlmul    lmul
> @@ -96,9 +116,30 @@ static inline uint32_t vext_get_total_elems(CPURISCVState *env, uint32_t desc,
>  void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
>                         uint32_t tot);
>
> +/*
> + *** Vector Integer Arithmetic Instructions
> + */
> +
> +/* expand macro args before macro */
> +#define RVVCALL(macro, ...)  macro(__VA_ARGS__)
> +
> +/* (TD, T1, T2, TX1, TX2) */
> +#define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
> +#define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
> +#define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
> +#define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
> +
>  /* operation of two vector elements */
>  typedef void opivv2_fn(void *vd, void *vs1, void *vs2, int i);
>
> +#define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
> +static void do_##NAME(void *vd, void *vs1, void *vs2, int i)    \
> +{                                                               \
> +    TX1 s1 = *((T1 *)vs1 + HS1(i));                             \
> +    TX2 s2 = *((T2 *)vs2 + HS2(i));                             \
> +    *((TD *)vd + HD(i)) = OP(s2, s1);                           \
> +}
> +
>  void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
>                  CPURISCVState *env, uint32_t desc,
>                  opivv2_fn *fn, uint32_t esz);
> @@ -115,6 +156,17 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
>
>  typedef void opivx2_fn(void *vd, target_long s1, void *vs2, int i);
>
> +/*
> + * (T1)s1 gives the real operator type.
> + * (TX1)(T1)s1 expands the operator type of widen or narrow operations.
> + */
> +#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
> +static void do_##NAME(void *vd, target_long s1, void *vs2, int i)   \
> +{                                                                   \
> +    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
> +    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1);                      \
> +}
> +
>  void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
>                  CPURISCVState *env, uint32_t desc,
>                  opivx2_fn fn, uint32_t esz);

Again: refactoring needs to go into a separate patch.

> --
> 2.39.1
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 07/39] target/riscv: Add vbrev8.v decoding, translation and execution support
  2023-02-02 12:41 ` [PATCH 07/39] target/riscv: Add vbrev8.v " Lawrence Hunter
@ 2023-02-02 14:21   ` Philipp Tomsich
  0 siblings, 0 replies; 59+ messages in thread
From: Philipp Tomsich @ 2023-02-02 14:21 UTC (permalink / raw)
  To: Lawrence Hunter
  Cc: qemu-devel, dickon.hood, nazar.kazakov, kiran.ostrolenk,
	frank.chang, palmer, alistair.francis, bin.meng, pbonzini, kvm,
	William Salmon

On Thu, 2 Feb 2023 at 13:42, Lawrence Hunter
<lawrence.hunter@codethink.co.uk> wrote:
>
> From: William Salmon <will.salmon@codethink.co.uk>
>
> Co-authored-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
> Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
> Signed-off-by: William Salmon <will.salmon@codethink.co.uk>
> ---
>  include/qemu/bitops.h                      | 32 +++++++++++++++++
>  target/riscv/helper.h                      |  5 +++
>  target/riscv/insn32.decode                 |  1 +
>  target/riscv/insn_trans/trans_rvzvkb.c.inc | 39 ++++++++++++++++++++
>  target/riscv/vcrypto_helper.c              | 10 ++++++
>  target/riscv/vector_helper.c               | 41 ---------------------
>  target/riscv/vector_internals.h            | 42 ++++++++++++++++++++++
>  7 files changed, 129 insertions(+), 41 deletions(-)
>
> diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
> index 03213ce952..dfce1cb10c 100644
> --- a/include/qemu/bitops.h
> +++ b/include/qemu/bitops.h
> @@ -618,4 +618,36 @@ static inline uint64_t half_unshuffle64(uint64_t x)
>      return x;
>  }
>
> +static inline uint8_t reverse_bits_byte(uint8_t x)
> +{
> +    x = (((x & 0b10101010) >> 1) | ((x & 0b01010101) << 1));
> +    x = (((x & 0b11001100) >> 2) | ((x & 0b00110011) << 2));
> +    return ((x & 0b11110000) >> 4) | ((x & 0b00001111) << 4);
> +}
> +
> +static inline uint16_t reverse_bits_byte_2(uint16_t x)
> +{
> +    return (uint16_t)reverse_bits_byte(x & 0xFF) | \
> +       (uint16_t)reverse_bits_byte((x >> 8) & 0xFF) << 8;
> +}
> +
> +static inline uint32_t reverse_bits_byte_4(uint32_t x)
> +{
> +    return (uint32_t)reverse_bits_byte(x & 0xFF) | \
> +       (uint32_t)reverse_bits_byte((x >> 8) & 0xFF) << 8 | \
> +       (uint32_t)reverse_bits_byte((x >> 16) & 0xFF) << 16 | \
> +       (uint32_t)reverse_bits_byte((x >> 24) & 0xFF) << 24;
> +}
> +
> +static inline uint64_t reverse_bits_byte_8(uint64_t x)
> +{
> +    return (uint64_t)reverse_bits_byte(x & 0xFF) | \
> +       (uint64_t)reverse_bits_byte((x >> 8) & 0xFF) << 8 | \
> +       (uint64_t)reverse_bits_byte((x >> 16) & 0xFF) << 16 | \
> +       (uint64_t)reverse_bits_byte((x >> 24) & 0xFF) << 24 | \
> +       (uint64_t)reverse_bits_byte((x >> 32) & 0xFF) << 32 | \
> +       (uint64_t)reverse_bits_byte((x >> 40) & 0xFF) << 40 | \
> +       (uint64_t)reverse_bits_byte((x >> 48) & 0xFF) << 48 | \
> +       (uint64_t)reverse_bits_byte((x >> 56) & 0xFF) << 56;
> +}


Why split this up into individual functions?
You can do the following (OPIVV1 will take care of extending this to a
uint64_t and then truncating the result again):

/* vbrev8.v */
static uint64_t brev8(uint64_t val)
{
    val = ((val & 0x5555555555555555ull) << 1)
        | ((val & 0xAAAAAAAAAAAAAAAAull) >> 1);
    val = ((val & 0x3333333333333333ull) << 2)
        | ((val & 0xCCCCCCCCCCCCCCCCull) >> 2);
    val = ((val & 0x0F0F0F0F0F0F0F0Full) << 4)
        | ((val & 0xF0F0F0F0F0F0F0F0ull) >> 4);

    return val;
}

RVVCALL(OPIVV1, vbrev8_v_b, OP_UU_B, H1, H1, brev8)
RVVCALL(OPIVV1, vbrev8_v_h, OP_UU_H, H2, H2, brev8)
RVVCALL(OPIVV1, vbrev8_v_w, OP_UU_W, H4, H4, brev8)
RVVCALL(OPIVV1, vbrev8_v_d, OP_UU_D, H8, H8, brev8)

>
>  #endif
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index e5b6b3360f..c94627d8a4 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1162,3 +1162,8 @@ DEF_HELPER_6(vrol_vx_b, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vrol_vx_h, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vrol_vx_w, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vrol_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> +
> +DEF_HELPER_5(vbrev8_v_b, void, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_5(vbrev8_v_h, void, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_5(vbrev8_v_w, void, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_5(vbrev8_v_d, void, ptr, ptr, ptr, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 725f907ad1..782632a165 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -902,3 +902,4 @@ vror_vv         010100 . ..... ..... 000 ..... 1010111 @r_vm
>  vror_vx         010100 . ..... ..... 100 ..... 1010111 @r_vm
>  vror_vi         010100 . ..... ..... 011 ..... 1010111 @r_vm
>  vror_vi2        010101 . ..... ..... 011 ..... 1010111 @r_vm
> +vbrev8_v        010010 . ..... 01000 010 ..... 1010111 @r2_vm
> diff --git a/target/riscv/insn_trans/trans_rvzvkb.c.inc b/target/riscv/insn_trans/trans_rvzvkb.c.inc
> index d2a7a92d42..591980459a 100644
> --- a/target/riscv/insn_trans/trans_rvzvkb.c.inc
> +++ b/target/riscv/insn_trans/trans_rvzvkb.c.inc
> @@ -115,3 +115,42 @@ static bool trans_vror_vi2(DisasContext *s, arg_rmrr *a)
>      a->rs1 += 32;
>      return trans_vror_vi(s, a);
>  }
> +
> +#define GEN_OPIV_TRANS(NAME, CHECK)                                    \
> +static bool trans_##NAME(DisasContext *s, arg_rmr * a)                 \
> +{                                                                      \
> +    if (CHECK(s, a)) {                                                 \
> +        uint32_t data = 0;                                             \
> +        static gen_helper_gvec_3_ptr * const fns[4] = {                \
> +            gen_helper_##NAME##_b,                                     \
> +            gen_helper_##NAME##_h,                                     \
> +            gen_helper_##NAME##_w,                                     \
> +            gen_helper_##NAME##_d,                                     \
> +        };                                                             \
> +        TCGLabel *over = gen_new_label();                              \
> +        tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);              \
> +        tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);     \
> +                                                                       \
> +        data = FIELD_DP32(data, VDATA, VM, a->vm);                     \
> +        data = FIELD_DP32(data, VDATA, LMUL, s->lmul);                 \
> +        data = FIELD_DP32(data, VDATA, VTA, s->vta);                   \
> +        data = FIELD_DP32(data, VDATA, VTA_ALL_1S, s->cfg_vta_all_1s); \
> +        data = FIELD_DP32(data, VDATA, VMA, s->vma);                   \
> +        tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),         \
> +                           vreg_ofs(s, a->rs2), cpu_env,               \
> +                           s->cfg_ptr->vlen / 8, s->cfg_ptr->vlen / 8, \
> +                           data, fns[s->sew]);                         \
> +        mark_vs_dirty(s);                                              \
> +        gen_set_label(over);                                           \
> +        return true;                                                   \
> +    }                                                                  \
> +    return false;                                                      \
> +}
> +
> +static bool vxrev8_check(DisasContext *s, arg_rmr *a)
> +{
> +    return s->cfg_ptr->ext_zvkb == true && vext_check_isa_ill(s) &&
> +           vext_check_ss(s, a->rd, a->rs2, a->vm);
> +}
> +
> +GEN_OPIV_TRANS(vbrev8_v, vxrev8_check)
> diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
> index 7ec75c5589..303a656141 100644
> --- a/target/riscv/vcrypto_helper.c
> +++ b/target/riscv/vcrypto_helper.c
> @@ -1,6 +1,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu/host-utils.h"
>  #include "qemu/bitops.h"
> +#include "qemu/bswap.h"
>  #include "cpu.h"
>  #include "exec/memop.h"
>  #include "exec/exec-all.h"
> @@ -115,3 +116,12 @@ GEN_VEXT_VX(vrol_vx_b, 1)
>  GEN_VEXT_VX(vrol_vx_h, 2)
>  GEN_VEXT_VX(vrol_vx_w, 4)
>  GEN_VEXT_VX(vrol_vx_d, 8)
> +
> +RVVCALL(OPIVV1, vbrev8_v_b, OP_UU_B, H1, H1, reverse_bits_byte)
> +RVVCALL(OPIVV1, vbrev8_v_h, OP_UU_H, H2, H2, reverse_bits_byte_2)
> +RVVCALL(OPIVV1, vbrev8_v_w, OP_UU_W, H4, H4, reverse_bits_byte_4)
> +RVVCALL(OPIVV1, vbrev8_v_d, OP_UU_D, H8, H8, reverse_bits_byte_8)

See above how a single brev8 function can be used here.

> +GEN_VEXT_V(vbrev8_v_b, 1)
> +GEN_VEXT_V(vbrev8_v_h, 2)
> +GEN_VEXT_V(vbrev8_v_w, 4)
> +GEN_VEXT_V(vbrev8_v_d, 8)
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index ff7b03cbe3..07da4b5e16 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -3437,12 +3437,6 @@ RVVCALL(OPFVF3, vfwnmsac_vf_w, WOP_UUU_W, H8, H4, fwnmsac32)
>  GEN_VEXT_VF(vfwnmsac_vf_h, 4)
>  GEN_VEXT_VF(vfwnmsac_vf_w, 8)
>
> -/* Vector Floating-Point Square-Root Instruction */
> -/* (TD, T2, TX2) */
> -#define OP_UU_H uint16_t, uint16_t, uint16_t
> -#define OP_UU_W uint32_t, uint32_t, uint32_t
> -#define OP_UU_D uint64_t, uint64_t, uint64_t
> -
>  #define OPFVV1(NAME, TD, T2, TX2, HD, HS2, OP)        \
>  static void do_##NAME(void *vd, void *vs2, int i,      \
>          CPURISCVState *env)                            \
> @@ -4134,41 +4128,6 @@ GEN_VEXT_CMP_VF(vmfge_vf_h, uint16_t, H2, vmfge16)
>  GEN_VEXT_CMP_VF(vmfge_vf_w, uint32_t, H4, vmfge32)
>  GEN_VEXT_CMP_VF(vmfge_vf_d, uint64_t, H8, vmfge64)
>
> -/* Vector Floating-Point Classify Instruction */
> -#define OPIVV1(NAME, TD, T2, TX2, HD, HS2, OP)         \
> -static void do_##NAME(void *vd, void *vs2, int i)      \
> -{                                                      \
> -    TX2 s2 = *((T2 *)vs2 + HS2(i));                    \
> -    *((TD *)vd + HD(i)) = OP(s2);                      \
> -}
> -
> -#define GEN_VEXT_V(NAME, ESZ)                          \
> -void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
> -                  CPURISCVState *env, uint32_t desc)   \
> -{                                                      \
> -    uint32_t vm = vext_vm(desc);                       \
> -    uint32_t vl = env->vl;                             \
> -    uint32_t total_elems =                             \
> -        vext_get_total_elems(env, desc, ESZ);          \
> -    uint32_t vta = vext_vta(desc);                     \
> -    uint32_t vma = vext_vma(desc);                     \
> -    uint32_t i;                                        \
> -                                                       \
> -    for (i = env->vstart; i < vl; i++) {               \
> -        if (!vm && !vext_elem_mask(v0, i)) {           \
> -            /* set masked-off elements to 1s */        \
> -            vext_set_elems_1s(vd, vma, i * ESZ,        \
> -                              (i + 1) * ESZ);          \
> -            continue;                                  \
> -        }                                              \
> -        do_##NAME(vd, vs2, i);                         \
> -    }                                                  \
> -    env->vstart = 0;                                   \
> -    /* set tail elements to 1s */                      \
> -    vext_set_elems_1s(vd, vta, vl * ESZ,               \
> -                      total_elems * ESZ);              \
> -}
> -
>  target_ulong fclass_h(uint64_t frs1)
>  {
>      float16 f = frs1;
> diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h
> index a0fbac7bf3..f1f16453dc 100644
> --- a/target/riscv/vector_internals.h
> +++ b/target/riscv/vector_internals.h
> @@ -123,12 +123,54 @@ void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
>  /* expand macro args before macro */
>  #define RVVCALL(macro, ...)  macro(__VA_ARGS__)
>
> +/* Vector Floating-Point Square-Root Instruction */
> +/* (TD, T2, TX2) */
> +#define OP_UU_B uint8_t, uint8_t, uint8_t
> +#define OP_UU_H uint16_t, uint16_t, uint16_t
> +#define OP_UU_W uint32_t, uint32_t, uint32_t
> +#define OP_UU_D uint64_t, uint64_t, uint64_t
> +
>  /* (TD, T1, T2, TX1, TX2) */
>  #define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
>  #define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
>  #define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
>  #define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
>
> +/* Vector Floating-Point Classify Instruction */
> +#define OPIVV1(NAME, TD, T2, TX2, HD, HS2, OP)         \
> +static void do_##NAME(void *vd, void *vs2, int i)      \
> +{                                                      \
> +    TX2 s2 = *((T2 *)vs2 + HS2(i));                    \
> +    *((TD *)vd + HD(i)) = OP(s2);                      \
> +}
> +
> +#define GEN_VEXT_V(NAME, ESZ)                          \
> +void HELPER(NAME)(void *vd, void *v0, void *vs2,       \
> +                  CPURISCVState *env, uint32_t desc)   \
> +{                                                      \
> +    uint32_t vm = vext_vm(desc);                       \
> +    uint32_t vl = env->vl;                             \
> +    uint32_t total_elems =                             \
> +        vext_get_total_elems(env, desc, ESZ);          \
> +    uint32_t vta = vext_vta(desc);                     \
> +    uint32_t vma = vext_vma(desc);                     \
> +    uint32_t i;                                        \
> +                                                       \
> +    for (i = env->vstart; i < vl; i++) {               \
> +        if (!vm && !vext_elem_mask(v0, i)) {           \
> +            /* set masked-off elements to 1s */        \
> +            vext_set_elems_1s(vd, vma, i * ESZ,        \
> +                              (i + 1) * ESZ);          \
> +            continue;                                  \
> +        }                                              \
> +        do_##NAME(vd, vs2, i);                         \
> +    }                                                  \
> +    env->vstart = 0;                                   \
> +    /* set tail elements to 1s */                      \
> +    vext_set_elems_1s(vd, vta, vl * ESZ,               \
> +                      total_elems * ESZ);              \
> +}
> +

My usual gripe about not mixing the refactoring in with the feature
patches applies...

>  /* operation of two vector elements */
>  typedef void opivv2_fn(void *vd, void *vs1, void *vs2, int i);
>
> --
> 2.39.1
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 08/39] target/riscv: Add vrev8.v decoding, translation and execution support
  2023-02-02 12:41 ` [PATCH 08/39] target/riscv: Add vrev8.v " Lawrence Hunter
@ 2023-02-02 14:22   ` Philipp Tomsich
  0 siblings, 0 replies; 59+ messages in thread
From: Philipp Tomsich @ 2023-02-02 14:22 UTC (permalink / raw)
  To: Lawrence Hunter
  Cc: qemu-devel, dickon.hood, nazar.kazakov, kiran.ostrolenk,
	frank.chang, palmer, alistair.francis, bin.meng, pbonzini, kvm

On Thu, 2 Feb 2023 at 13:42, Lawrence Hunter
<lawrence.hunter@codethink.co.uk> wrote:
>
> From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
>
> Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
> ---
>  target/riscv/helper.h                      |  4 ++++
>  target/riscv/insn32.decode                 |  1 +
>  target/riscv/insn_trans/trans_rvzvkb.c.inc |  1 +
>  target/riscv/vcrypto_helper.c              | 10 ++++++++++
>  4 files changed, 16 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index c94627d8a4..c980d52828 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1163,6 +1163,10 @@ DEF_HELPER_6(vrol_vx_h, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vrol_vx_w, void, ptr, ptr, tl, ptr, env, i32)
>  DEF_HELPER_6(vrol_vx_d, void, ptr, ptr, tl, ptr, env, i32)
>
> +DEF_HELPER_5(vrev8_v_b, void, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_5(vrev8_v_h, void, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_5(vrev8_v_w, void, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_5(vrev8_v_d, void, ptr, ptr, ptr, env, i32)
>  DEF_HELPER_5(vbrev8_v_b, void, ptr, ptr, ptr, env, i32)
>  DEF_HELPER_5(vbrev8_v_h, void, ptr, ptr, ptr, env, i32)
>  DEF_HELPER_5(vbrev8_v_w, void, ptr, ptr, ptr, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 782632a165..342199abc0 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -903,3 +903,4 @@ vror_vx         010100 . ..... ..... 100 ..... 1010111 @r_vm
>  vror_vi         010100 . ..... ..... 011 ..... 1010111 @r_vm
>  vror_vi2        010101 . ..... ..... 011 ..... 1010111 @r_vm
>  vbrev8_v        010010 . ..... 01000 010 ..... 1010111 @r2_vm
> +vrev8_v         010010 . ..... 01001 010 ..... 1010111 @r2_vm
> diff --git a/target/riscv/insn_trans/trans_rvzvkb.c.inc b/target/riscv/insn_trans/trans_rvzvkb.c.inc
> index 591980459a..18b362db92 100644
> --- a/target/riscv/insn_trans/trans_rvzvkb.c.inc
> +++ b/target/riscv/insn_trans/trans_rvzvkb.c.inc
> @@ -154,3 +154,4 @@ static bool vxrev8_check(DisasContext *s, arg_rmr *a)
>  }
>
>  GEN_OPIV_TRANS(vbrev8_v, vxrev8_check)
> +GEN_OPIV_TRANS(vrev8_v, vxrev8_check)
> diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
> index 303a656141..b09fe5fa2a 100644
> --- a/target/riscv/vcrypto_helper.c
> +++ b/target/riscv/vcrypto_helper.c
> @@ -125,3 +125,13 @@ GEN_VEXT_V(vbrev8_v_b, 1)
>  GEN_VEXT_V(vbrev8_v_h, 2)
>  GEN_VEXT_V(vbrev8_v_w, 4)
>  GEN_VEXT_V(vbrev8_v_d, 8)
> +
> +#define DO_VREV8_B(a) (a)
> +RVVCALL(OPIVV1, vrev8_v_b, OP_UU_B, H1, H1, DO_VREV8_B)

Let's call it what it is: "DO_COPY" or "DO_IDENTITY" ...

> +RVVCALL(OPIVV1, vrev8_v_h, OP_UU_H, H2, H2, bswap16)
> +RVVCALL(OPIVV1, vrev8_v_w, OP_UU_W, H4, H4, bswap32)
> +RVVCALL(OPIVV1, vrev8_v_d, OP_UU_D, H8, H8, bswap64)
> +GEN_VEXT_V(vrev8_v_b, 1)
> +GEN_VEXT_V(vrev8_v_h, 2)
> +GEN_VEXT_V(vrev8_v_w, 4)
> +GEN_VEXT_V(vrev8_v_d, 8)
> --
> 2.39.1
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 10/39] target/riscv: expose zvkb cpu property
  2023-02-02 12:42 ` [PATCH 10/39] target/riscv: expose zvkb cpu property Lawrence Hunter
@ 2023-02-02 14:23   ` Philipp Tomsich
  2023-02-02 14:24     ` Philipp Tomsich
  0 siblings, 1 reply; 59+ messages in thread
From: Philipp Tomsich @ 2023-02-02 14:23 UTC (permalink / raw)
  To: Lawrence Hunter
  Cc: qemu-devel, dickon.hood, nazar.kazakov, kiran.ostrolenk,
	frank.chang, palmer, alistair.francis, bin.meng, pbonzini, kvm

On Thu, 2 Feb 2023 at 13:42, Lawrence Hunter
<lawrence.hunter@codethink.co.uk> wrote:
>
> From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
>
> Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>

You might want to squash this onto the patch that first introduces the property.

Reviewed-by: Philipp Tomsich <philipp.tomsich@vrull.eu>

> ---
>  target/riscv/cpu.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index bd34119c75..35790befc0 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -1082,6 +1082,8 @@ static Property riscv_cpu_extensions[] = {
>
>      DEFINE_PROP_BOOL("zmmul", RISCVCPU, cfg.ext_zmmul, false),
>
> +    DEFINE_PROP_BOOL("zvkb", RISCVCPU, cfg.ext_zvkb, false),
> +
>      /* Vendor-specific custom extensions */
>      DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, false),
>
> --
> 2.39.1
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 10/39] target/riscv: expose zvkb cpu property
  2023-02-02 14:23   ` Philipp Tomsich
@ 2023-02-02 14:24     ` Philipp Tomsich
  0 siblings, 0 replies; 59+ messages in thread
From: Philipp Tomsich @ 2023-02-02 14:24 UTC (permalink / raw)
  To: Lawrence Hunter
  Cc: qemu-devel, dickon.hood, nazar.kazakov, kiran.ostrolenk,
	frank.chang, palmer, alistair.francis, bin.meng, pbonzini, kvm

On Thu, 2 Feb 2023 at 15:23, Philipp Tomsich <philipp.tomsich@vrull.eu> wrote:
>
> On Thu, 2 Feb 2023 at 13:42, Lawrence Hunter
> <lawrence.hunter@codethink.co.uk> wrote:
> >
> > From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
> >
> > Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
>
> You might want to squash this onto the patch that first introduces the property.
>
> Reviewed-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
>
> > ---
> >  target/riscv/cpu.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> > index bd34119c75..35790befc0 100644
> > --- a/target/riscv/cpu.c
> > +++ b/target/riscv/cpu.c
> > @@ -1082,6 +1082,8 @@ static Property riscv_cpu_extensions[] = {
> >
> >      DEFINE_PROP_BOOL("zmmul", RISCVCPU, cfg.ext_zmmul, false),
> >
> > +    DEFINE_PROP_BOOL("zvkb", RISCVCPU, cfg.ext_zvkb, false),

I missed this earlier: the extension is not ratified. So please: "x-zvkb".
And it needs to go under the comment:
  /* These are experimental so mark with 'x-' */

> > +
> >      /* Vendor-specific custom extensions */
> >      DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, false),
> >
> > --
> > 2.39.1
> >

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 09/39] target/riscv: Add vandn.[vv,vx,vi] decoding, translation and execution support
  2023-02-02 12:42   ` [PATCH 09/39] target/riscv: Add vandn.[vv, vx, vi] " Lawrence Hunter
  (?)
@ 2023-02-02 14:29   ` Philipp Tomsich
  -1 siblings, 0 replies; 59+ messages in thread
From: Philipp Tomsich @ 2023-02-02 14:29 UTC (permalink / raw)
  To: Lawrence Hunter
  Cc: qemu-devel, dickon.hood, nazar.kazakov, kiran.ostrolenk,
	frank.chang, palmer, alistair.francis, bin.meng, pbonzini, kvm

On Thu, 2 Feb 2023 at 13:42, Lawrence Hunter
<lawrence.hunter@codethink.co.uk> wrote:
>
> From: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
>
> Signed-off-by: Nazar Kazakov <nazar.kazakov@codethink.co.uk>
> ---
>  target/riscv/helper.h                      |  9 +++++++++
>  target/riscv/insn32.decode                 |  3 +++
>  target/riscv/insn_trans/trans_rvzvkb.c.inc |  5 +++++
>  target/riscv/vcrypto_helper.c              | 19 +++++++++++++++++++
>  4 files changed, 36 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index c980d52828..5de615ea78 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1171,3 +1171,12 @@ DEF_HELPER_5(vbrev8_v_b, void, ptr, ptr, ptr, env, i32)
>  DEF_HELPER_5(vbrev8_v_h, void, ptr, ptr, ptr, env, i32)
>  DEF_HELPER_5(vbrev8_v_w, void, ptr, ptr, ptr, env, i32)
>  DEF_HELPER_5(vbrev8_v_d, void, ptr, ptr, ptr, env, i32)
> +
> +DEF_HELPER_6(vandn_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vandn_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vandn_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vandn_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> +DEF_HELPER_6(vandn_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vandn_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vandn_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> +DEF_HELPER_6(vandn_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 342199abc0..d6f5e4d198 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -904,3 +904,6 @@ vror_vi         010100 . ..... ..... 011 ..... 1010111 @r_vm
>  vror_vi2        010101 . ..... ..... 011 ..... 1010111 @r_vm
>  vbrev8_v        010010 . ..... 01000 010 ..... 1010111 @r2_vm
>  vrev8_v         010010 . ..... 01001 010 ..... 1010111 @r2_vm
> +vandn_vi        000001 . ..... ..... 011 ..... 1010111 @r_vm
> +vandn_vv        000001 . ..... ..... 000 ..... 1010111 @r_vm
> +vandn_vx        000001 . ..... ..... 100 ..... 1010111 @r_vm
> diff --git a/target/riscv/insn_trans/trans_rvzvkb.c.inc b/target/riscv/insn_trans/trans_rvzvkb.c.inc
> index 18b362db92..a973b27bdd 100644
> --- a/target/riscv/insn_trans/trans_rvzvkb.c.inc
> +++ b/target/riscv/insn_trans/trans_rvzvkb.c.inc
> @@ -147,6 +147,11 @@ static bool trans_##NAME(DisasContext *s, arg_rmr * a)                 \
>      return false;                                                      \
>  }
>
> +
> +GEN_OPIVV_TRANS(vandn_vv, zvkb_vv_check)
> +GEN_OPIVX_TRANS(vandn_vx, zvkb_vx_check)
> +GEN_OPIVI_TRANS(vandn_vi, IMM_SX, vandn_vx, zvkb_vx_check)

I don't see any reason why this shouldn't have gvec support (after
all, it is a andc with the arguments inverted) with something like
this:

static void gen_andn_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
{
    tcg_gen_andc_i64(ret, arg1, arg2);
}

static void gen_andn_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
{
    tcg_gen_andc_vec(vece, r, b, a);
}

static void tcg_gen_gvec_andn(unsigned vece, uint32_t dofs, uint32_t aofs,
                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
{
    static const GVecGen3 g = {
        .fni8 = gen_andn_i64,
        .fniv = gen_andn_vec,
        .fno = gen_helper_vec_andn,
        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
    };

    if (aofs == bofs) {
        tcg_gen_gvec_dup_imm(MO_64, dofs, oprsz, maxsz, 0);
    } else {
        tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
    }
}

static void tcg_gen_gvec_andns(unsigned vece, uint32_t dofs, uint32_t aofs,
                               TCGv_i64 c, uint32_t oprsz, uint32_t maxsz)
{
    static const GVecGen2s g = {
        .fni8 = gen_andn_i64,
        .fniv = gen_andn_vec,
        .fno = gen_helper_vec_andns,
        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
    };

    tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, c, &g);
}

static void tcg_gen_gvec_andni(unsigned vece, uint32_t dofs, uint32_t aofs,
                               int64_t c, uint32_t oprsz, uint32_t maxsz)
{
    TCGv_i64 tmp = tcg_constant_i64(c);
    tcg_gen_gvec_andns(vece, dofs, aofs, tmp, oprsz, maxsz);
}

/* vandn.v[vxi] */
GEN_OPIVV_GVEC_TRANS_CHECK(vandn_vv, andn, zvkb_check_vv)
GEN_OPIVX_GVEC_TRANS_CHECK(vandn_vx, andns, zvkb_check_vx)
GEN_OPIVI_GVEC_TRANS_CHECK(vandn_vi, IMM_SX, vandn_vx, andni, zvkb_check_vi)

> +
>  static bool vxrev8_check(DisasContext *s, arg_rmr *a)
>  {
>      return s->cfg_ptr->ext_zvkb == true && vext_check_isa_ill(s) &&
> diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
> index b09fe5fa2a..900e68dfb0 100644
> --- a/target/riscv/vcrypto_helper.c
> +++ b/target/riscv/vcrypto_helper.c
> @@ -135,3 +135,22 @@ GEN_VEXT_V(vrev8_v_b, 1)
>  GEN_VEXT_V(vrev8_v_h, 2)
>  GEN_VEXT_V(vrev8_v_w, 4)
>  GEN_VEXT_V(vrev8_v_d, 8)
> +
> +#define DO_ANDN(a, b) ((b) & ~(a))
> +RVVCALL(OPIVV2, vandn_vv_b, OP_UUU_B, H1, H1, H1, DO_ANDN)
> +RVVCALL(OPIVV2, vandn_vv_h, OP_UUU_H, H2, H2, H2, DO_ANDN)
> +RVVCALL(OPIVV2, vandn_vv_w, OP_UUU_W, H4, H4, H4, DO_ANDN)
> +RVVCALL(OPIVV2, vandn_vv_d, OP_UUU_D, H8, H8, H8, DO_ANDN)
> +GEN_VEXT_VV(vandn_vv_b, 1)
> +GEN_VEXT_VV(vandn_vv_h, 2)
> +GEN_VEXT_VV(vandn_vv_w, 4)
> +GEN_VEXT_VV(vandn_vv_d, 8)
> +
> +RVVCALL(OPIVX2, vandn_vx_b, OP_UUU_B, H1, H1, DO_ANDN)
> +RVVCALL(OPIVX2, vandn_vx_h, OP_UUU_H, H2, H2, DO_ANDN)
> +RVVCALL(OPIVX2, vandn_vx_w, OP_UUU_W, H4, H4, DO_ANDN)
> +RVVCALL(OPIVX2, vandn_vx_d, OP_UUU_D, H8, H8, DO_ANDN)
> +GEN_VEXT_VX(vandn_vx_b, 1)
> +GEN_VEXT_VX(vandn_vx_h, 2)
> +GEN_VEXT_VX(vandn_vx_w, 4)
> +GEN_VEXT_VX(vandn_vx_d, 8)
> --
> 2.39.1
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/39] target/riscv: Add vrol.[vv,vx] and vror.[vv,vx,vi] decoding, translation and execution support
  2023-02-02 14:13     ` [PATCH 06/39] target/riscv: Add vrol.[vv, vx] and vror.[vv, vx, vi] " Philipp Tomsich
@ 2023-02-02 14:30       ` Philipp Tomsich
  -1 siblings, 0 replies; 59+ messages in thread
From: Philipp Tomsich @ 2023-02-02 14:30 UTC (permalink / raw)
  To: Lawrence Hunter
  Cc: qemu-devel, dickon.hood, nazar.kazakov, kiran.ostrolenk,
	frank.chang, palmer, alistair.francis, bin.meng, pbonzini, kvm

On Thu, 2 Feb 2023 at 15:13, Philipp Tomsich <philipp.tomsich@vrull.eu> wrote:
>
> On Thu, 2 Feb 2023 at 13:42, Lawrence Hunter
> <lawrence.hunter@codethink.co.uk> wrote:
> >
> > From: Dickon Hood <dickon.hood@codethink.co.uk>
> >
> > Add an initial implementation of the vrol.* and vror.* instructions,
> > with mappings between the RISC-V instructions and their internal TCG
> > accelerated implmentations.
> >
> > There are some missing ror helpers, so I've bodged it by converting them
> > to rols.
> >
> > Co-authored-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
> > Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
> > Signed-off-by: Dickon Hood <dickon.hood@codethink.co.uk>
> > ---
> >  target/riscv/helper.h                      | 20 ++++++++
> >  target/riscv/insn32.decode                 |  6 +++
> >  target/riscv/insn_trans/trans_rvzvkb.c.inc | 20 ++++++++
> >  target/riscv/vcrypto_helper.c              | 58 ++++++++++++++++++++++
> >  target/riscv/vector_helper.c               | 45 -----------------
> >  target/riscv/vector_internals.h            | 52 +++++++++++++++++++
> >  6 files changed, 156 insertions(+), 45 deletions(-)
> >
> > diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> > index 32f1179e29..e5b6b3360f 100644
> > --- a/target/riscv/helper.h
> > +++ b/target/riscv/helper.h
> > @@ -1142,3 +1142,23 @@ DEF_HELPER_6(vclmul_vv, void, ptr, ptr, ptr, ptr, env, i32)
> >  DEF_HELPER_6(vclmul_vx, void, ptr, ptr, tl, ptr, env, i32)
> >  DEF_HELPER_6(vclmulh_vv, void, ptr, ptr, ptr, ptr, env, i32)
> >  DEF_HELPER_6(vclmulh_vx, void, ptr, ptr, tl, ptr, env, i32)
> > +
> > +DEF_HELPER_6(vror_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> > +DEF_HELPER_6(vror_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> > +DEF_HELPER_6(vror_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> > +DEF_HELPER_6(vror_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> > +
> > +DEF_HELPER_6(vror_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> > +DEF_HELPER_6(vror_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> > +DEF_HELPER_6(vror_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> > +DEF_HELPER_6(vror_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> > +
> > +DEF_HELPER_6(vrol_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> > +DEF_HELPER_6(vrol_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> > +DEF_HELPER_6(vrol_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> > +DEF_HELPER_6(vrol_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> > +
> > +DEF_HELPER_6(vrol_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> > +DEF_HELPER_6(vrol_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> > +DEF_HELPER_6(vrol_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> > +DEF_HELPER_6(vrol_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> > diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> > index b4d88dd1cb..725f907ad1 100644
> > --- a/target/riscv/insn32.decode
> > +++ b/target/riscv/insn32.decode
> > @@ -896,3 +896,9 @@ vclmul_vv       001100 . ..... ..... 010 ..... 1010111 @r_vm
> >  vclmul_vx       001100 . ..... ..... 110 ..... 1010111 @r_vm
> >  vclmulh_vv      001101 . ..... ..... 010 ..... 1010111 @r_vm
> >  vclmulh_vx      001101 . ..... ..... 110 ..... 1010111 @r_vm
> > +vrol_vv         010101 . ..... ..... 000 ..... 1010111 @r_vm
> > +vrol_vx         010101 . ..... ..... 100 ..... 1010111 @r_vm
> > +vror_vv         010100 . ..... ..... 000 ..... 1010111 @r_vm
> > +vror_vx         010100 . ..... ..... 100 ..... 1010111 @r_vm
> > +vror_vi         010100 . ..... ..... 011 ..... 1010111 @r_vm
> > +vror_vi2        010101 . ..... ..... 011 ..... 1010111 @r_vm
>
> We have only a single vror_vi instruction... and it has a 6bit immediate.
> There's no need to deviate from the spec. Just write it as follows:
> %imm_z6   26:1 15:5
> @r2_zimm6  ..... . vm:1 ..... ..... ... ..... .......  &rmrr %rs2
> rs1=%imm_z6 %rd
> vror_vi         01010. . ..... ..... 011 ..... 1010111 @r2_zimm6
>
> > diff --git a/target/riscv/insn_trans/trans_rvzvkb.c.inc b/target/riscv/insn_trans/trans_rvzvkb.c.inc
> > index 533141e559..d2a7a92d42 100644
> > --- a/target/riscv/insn_trans/trans_rvzvkb.c.inc
> > +++ b/target/riscv/insn_trans/trans_rvzvkb.c.inc
> > @@ -95,3 +95,23 @@ static bool vclmul_vx_check(DisasContext *s, arg_rmrr *a)
> >
> >  GEN_VX_MASKED_TRANS(vclmul_vx, vclmul_vx_check)
> >  GEN_VX_MASKED_TRANS(vclmulh_vx, vclmul_vx_check)
> > +
> > +GEN_OPIVV_TRANS(vror_vv, zvkb_vv_check)
> > +GEN_OPIVX_TRANS(vror_vx, zvkb_vx_check)
> > +GEN_OPIVV_TRANS(vrol_vv, zvkb_vv_check)
> > +GEN_OPIVX_TRANS(vrol_vx, zvkb_vx_check)
> > +
> > +GEN_OPIVI_TRANS(vror_vi, IMM_TRUNC_SEW, vror_vx, zvkb_vx_check)
>
> Please introduce a IMM_ZIMM6 that integrates with the extract_imm as follows:
>     case IMM_ZIMM6:
>         return extract64(imm, 0, 6);
>
> > +
> > +/*
> > + * Immediates are 5b long, and we need six for the rotate-immediate.  The
> > + * decision has been taken to remove the vrol.vi instruction -- you can
> > + * emulate it with a ror, after all -- and use the bottom bit of the funct6
> > + * part of the opcode to encode the extra bit.  I've chosen to implement it
> > + * like this because it's easy and reasonably clean.
> > + */
> > +static bool trans_vror_vi2(DisasContext *s, arg_rmrr *a)
> > +{
> > +    a->rs1 += 32;
> > +    return trans_vror_vi(s, a);
> > +}
>
> As discussed above, please handle this in a single trans_vror_vi:
>    GEN_OPIVI_GVEC_TRANS_CHECK(vror_vi, IMM_ZIMM6, vror_vx, rotri, zvkb_check_vi)

On the second pass over these patches, here's how we can use gvec
support for both vror and vrol:

/* Synthesize a rotate-right from a negate(shift-amount) + rotate-left */
static void tcg_gen_gvec_rotrs(unsigned vece, uint32_t dofs, uint32_t aofs,
                       TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz)
{
    TCGv_i32 tmp = tcg_temp_new_i32();
    tcg_gen_neg_i32(tmp, shift);
    tcg_gen_gvec_rotls(vece, dofs, aofs, tmp, oprsz, maxsz);
    tcg_temp_free_i32(tmp);
}

/* vrol.v[vx] */
GEN_OPIVV_GVEC_TRANS_CHECK(vrol_vv, rotlv, zvkb_check_vv)
GEN_OPIVX_GVEC_SHIFT_TRANS_CHECK(vrol_vx, rotls, zvkb_check_vx)

/* vror.v[vxi] */
GEN_OPIVV_GVEC_TRANS_CHECK(vror_vv, rotrv, zvkb_check_vv)
GEN_OPIVX_GVEC_SHIFT_TRANS_CHECK(vror_vx, rotrs, zvkb_check_vx)
GEN_OPIVI_GVEC_TRANS_CHECK(vror_vi, IMM_ZIMM6, vror_vx, rotri, zvkb_check_vi)


> > diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
> > index 46e2e510c5..7ec75c5589 100644
> > --- a/target/riscv/vcrypto_helper.c
> > +++ b/target/riscv/vcrypto_helper.c
> > @@ -57,3 +57,61 @@ GEN_VEXT_VV(vclmul_vv, 8)
> >  GEN_VEXT_VX(vclmul_vx, 8)
> >  GEN_VEXT_VV(vclmulh_vv, 8)
> >  GEN_VEXT_VX(vclmulh_vx, 8)
> > +
> > +/*
> > + *  Looks a mess, but produces reasonable (aarch32) code on clang:
> > + * https://godbolt.org/z/jchjsTda8
> > + */
> > +#define DO_ROR(x, n)                       \
> > +    ((x >> (n & ((sizeof(x) << 3) - 1))) | \
> > +     (x << ((sizeof(x) << 3) - (n & ((sizeof(x) << 3) - 1)))))
> > +#define DO_ROL(x, n)                       \
> > +    ((x << (n & ((sizeof(x) << 3) - 1))) | \
> > +     (x >> ((sizeof(x) << 3) - (n & ((sizeof(x) << 3) - 1)))))
> > +
> > +RVVCALL(OPIVV2, vror_vv_b, OP_UUU_B, H1, H1, H1, DO_ROR)
> > +RVVCALL(OPIVV2, vror_vv_h, OP_UUU_H, H2, H2, H2, DO_ROR)
> > +RVVCALL(OPIVV2, vror_vv_w, OP_UUU_W, H4, H4, H4, DO_ROR)
> > +RVVCALL(OPIVV2, vror_vv_d, OP_UUU_D, H8, H8, H8, DO_ROR)
>
> Indeed, this is a mess: we already have ror8/16/32/64 available.
> Why not the following?
>
> /* vror.vv */
> GEN_VEXT_SHIFT_VV(vror_vv_b, uint8_t,  uint8_t,  H1, H1, ror8,  0x7)
> GEN_VEXT_SHIFT_VV(vror_vv_h, uint16_t, uint16_t, H2, H2, ror16, 0xf)
> GEN_VEXT_SHIFT_VV(vror_vv_w, uint32_t, uint32_t, H4, H4, ror32, 0x1f)
> GEN_VEXT_SHIFT_VV(vror_vv_d, uint64_t, uint64_t, H8, H8, ror64, 0x3f)
>
> > +GEN_VEXT_VV(vror_vv_b, 1)
> > +GEN_VEXT_VV(vror_vv_h, 2)
> > +GEN_VEXT_VV(vror_vv_w, 4)
> > +GEN_VEXT_VV(vror_vv_d, 8)
> > +
> > +/*
> > + * There's a missing tcg_gen_gvec_rotrs() helper function.
> > + */
> > +#define GEN_VEXT_VX_RTOL(NAME, ESZ)                                      \
> > +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,        \
> > +                  CPURISCVState *env, uint32_t desc)                     \
> > +{                                                                        \
> > +    do_vext_vx(vd, v0, (ESZ << 3) - s1, vs2, env, desc, do_##NAME, ESZ); \
> > +}
> > +
> > +/* DO_ROL because GEN_VEXT_VX_RTOL() converts from R to L */
> > +RVVCALL(OPIVX2, vror_vx_b, OP_UUU_B, H1, H1, DO_ROL)
> > +RVVCALL(OPIVX2, vror_vx_h, OP_UUU_H, H2, H2, DO_ROL)
> > +RVVCALL(OPIVX2, vror_vx_w, OP_UUU_W, H4, H4, DO_ROL)
> > +RVVCALL(OPIVX2, vror_vx_d, OP_UUU_D, H8, H8, DO_ROL)
>
> Same applies as above:
> /* vror.vx */
> GEN_VEXT_SHIFT_VX(vror_vx_b, uint8_t,  uint8_t,  H1, H1, ror8,  0x7)
> GEN_VEXT_SHIFT_VX(vror_vx_h, uint16_t, uint16_t, H2, H2, ror16, 0xf)
> GEN_VEXT_SHIFT_VX(vror_vx_w, uint32_t, uint32_t, H4, H4, ror32, 0x1f)
> GEN_VEXT_SHIFT_VX(vror_vx_d, uint64_t, uint64_t, H8, H8, ror64, 0x3f)
>
> > +GEN_VEXT_VX_RTOL(vror_vx_b, 1)
> > +GEN_VEXT_VX_RTOL(vror_vx_h, 2)
> > +GEN_VEXT_VX_RTOL(vror_vx_w, 4)
> > +GEN_VEXT_VX_RTOL(vror_vx_d, 8)
> > +
> > +RVVCALL(OPIVV2, vrol_vv_b, OP_UUU_B, H1, H1, H1, DO_ROL)
> > +RVVCALL(OPIVV2, vrol_vv_h, OP_UUU_H, H2, H2, H2, DO_ROL)
> > +RVVCALL(OPIVV2, vrol_vv_w, OP_UUU_W, H4, H4, H4, DO_ROL)
> > +RVVCALL(OPIVV2, vrol_vv_d, OP_UUU_D, H8, H8, H8, DO_ROL)
> > +GEN_VEXT_VV(vrol_vv_b, 1)
> > +GEN_VEXT_VV(vrol_vv_h, 2)
> > +GEN_VEXT_VV(vrol_vv_w, 4)
> > +GEN_VEXT_VV(vrol_vv_d, 8)
> > +
> > +RVVCALL(OPIVX2, vrol_vx_b, OP_UUU_B, H1, H1, DO_ROL)
> > +RVVCALL(OPIVX2, vrol_vx_h, OP_UUU_H, H2, H2, DO_ROL)
> > +RVVCALL(OPIVX2, vrol_vx_w, OP_UUU_W, H4, H4, DO_ROL)
> > +RVVCALL(OPIVX2, vrol_vx_d, OP_UUU_D, H8, H8, DO_ROL)
> > +GEN_VEXT_VX(vrol_vx_b, 1)
> > +GEN_VEXT_VX(vrol_vx_h, 2)
> > +GEN_VEXT_VX(vrol_vx_w, 4)
> > +GEN_VEXT_VX(vrol_vx_d, 8)
> > diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> > index ab470092f6..ff7b03cbe3 100644
> > --- a/target/riscv/vector_helper.c
> > +++ b/target/riscv/vector_helper.c
> > @@ -76,26 +76,6 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
> >      return vl;
> >  }
> >
> > -/*
> > - * Note that vector data is stored in host-endian 64-bit chunks,
> > - * so addressing units smaller than that needs a host-endian fixup.
> > - */
> > -#if HOST_BIG_ENDIAN
> > -#define H1(x)   ((x) ^ 7)
> > -#define H1_2(x) ((x) ^ 6)
> > -#define H1_4(x) ((x) ^ 4)
> > -#define H2(x)   ((x) ^ 3)
> > -#define H4(x)   ((x) ^ 1)
> > -#define H8(x)   ((x))
> > -#else
> > -#define H1(x)   (x)
> > -#define H1_2(x) (x)
> > -#define H1_4(x) (x)
> > -#define H2(x)   (x)
> > -#define H4(x)   (x)
> > -#define H8(x)   (x)
> > -#endif
> > -
> >  /*
> >   * Get the maximum number of elements can be operated.
> >   *
> > @@ -683,18 +663,11 @@ GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b)
> >   *** Vector Integer Arithmetic Instructions
> >   */
> >
> > -/* expand macro args before macro */
> > -#define RVVCALL(macro, ...)  macro(__VA_ARGS__)
> > -
> >  /* (TD, T1, T2, TX1, TX2) */
> >  #define OP_SSS_B int8_t, int8_t, int8_t, int8_t, int8_t
> >  #define OP_SSS_H int16_t, int16_t, int16_t, int16_t, int16_t
> >  #define OP_SSS_W int32_t, int32_t, int32_t, int32_t, int32_t
> >  #define OP_SSS_D int64_t, int64_t, int64_t, int64_t, int64_t
> > -#define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
> > -#define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
> > -#define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
> > -#define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
> >  #define OP_SUS_B int8_t, uint8_t, int8_t, uint8_t, int8_t
> >  #define OP_SUS_H int16_t, uint16_t, int16_t, uint16_t, int16_t
> >  #define OP_SUS_W int32_t, uint32_t, int32_t, uint32_t, int32_t
> > @@ -718,14 +691,6 @@ GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b)
> >  #define NOP_UUU_H uint16_t, uint16_t, uint32_t, uint16_t, uint32_t
> >  #define NOP_UUU_W uint32_t, uint32_t, uint64_t, uint32_t, uint64_t
> >
> > -
> > -#define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
> > -static void do_##NAME(void *vd, void *vs1, void *vs2, int i)    \
> > -{                                                               \
> > -    TX1 s1 = *((T1 *)vs1 + HS1(i));                             \
> > -    TX2 s2 = *((T2 *)vs2 + HS2(i));                             \
> > -    *((TD *)vd + HD(i)) = OP(s2, s1);                           \
> > -}
> >  #define DO_SUB(N, M) (N - M)
> >  #define DO_RSUB(N, M) (M - N)
> >
> > @@ -747,16 +712,6 @@ GEN_VEXT_VV(vsub_vv_h, 2)
> >  GEN_VEXT_VV(vsub_vv_w, 4)
> >  GEN_VEXT_VV(vsub_vv_d, 8)
> >
> > -/*
> > - * (T1)s1 gives the real operator type.
> > - * (TX1)(T1)s1 expands the operator type of widen or narrow operations.
> > - */
> > -#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
> > -static void do_##NAME(void *vd, target_long s1, void *vs2, int i)   \
> > -{                                                                   \
> > -    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
> > -    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1);                      \
> > -}
> >
> >  RVVCALL(OPIVX2, vadd_vx_b, OP_SSS_B, H1, H1, DO_ADD)
> >  RVVCALL(OPIVX2, vadd_vx_h, OP_SSS_H, H2, H2, DO_ADD)
> > diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h
> > index 49529d2379..a0fbac7bf3 100644
> > --- a/target/riscv/vector_internals.h
> > +++ b/target/riscv/vector_internals.h
> > @@ -28,6 +28,26 @@ static inline uint32_t vext_nf(uint32_t desc)
> >      return FIELD_EX32(simd_data(desc), VDATA, NF);
> >  }
> >
> > +/*
> > + * Note that vector data is stored in host-endian 64-bit chunks,
> > + * so addressing units smaller than that needs a host-endian fixup.
> > + */
> > +#if HOST_BIG_ENDIAN
> > +#define H1(x)   ((x) ^ 7)
> > +#define H1_2(x) ((x) ^ 6)
> > +#define H1_4(x) ((x) ^ 4)
> > +#define H2(x)   ((x) ^ 3)
> > +#define H4(x)   ((x) ^ 1)
> > +#define H8(x)   ((x))
> > +#else
> > +#define H1(x)   (x)
> > +#define H1_2(x) (x)
> > +#define H1_4(x) (x)
> > +#define H2(x)   (x)
> > +#define H4(x)   (x)
> > +#define H8(x)   (x)
> > +#endif
> > +
> >  /*
> >   * Encode LMUL to lmul as following:
> >   *     LMUL    vlmul    lmul
> > @@ -96,9 +116,30 @@ static inline uint32_t vext_get_total_elems(CPURISCVState *env, uint32_t desc,
> >  void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
> >                         uint32_t tot);
> >
> > +/*
> > + *** Vector Integer Arithmetic Instructions
> > + */
> > +
> > +/* expand macro args before macro */
> > +#define RVVCALL(macro, ...)  macro(__VA_ARGS__)
> > +
> > +/* (TD, T1, T2, TX1, TX2) */
> > +#define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
> > +#define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
> > +#define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
> > +#define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
> > +
> >  /* operation of two vector elements */
> >  typedef void opivv2_fn(void *vd, void *vs1, void *vs2, int i);
> >
> > +#define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
> > +static void do_##NAME(void *vd, void *vs1, void *vs2, int i)    \
> > +{                                                               \
> > +    TX1 s1 = *((T1 *)vs1 + HS1(i));                             \
> > +    TX2 s2 = *((T2 *)vs2 + HS2(i));                             \
> > +    *((TD *)vd + HD(i)) = OP(s2, s1);                           \
> > +}
> > +
> >  void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
> >                  CPURISCVState *env, uint32_t desc,
> >                  opivv2_fn *fn, uint32_t esz);
> > @@ -115,6 +156,17 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
> >
> >  typedef void opivx2_fn(void *vd, target_long s1, void *vs2, int i);
> >
> > +/*
> > + * (T1)s1 gives the real operator type.
> > + * (TX1)(T1)s1 expands the operator type of widen or narrow operations.
> > + */
> > +#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
> > +static void do_##NAME(void *vd, target_long s1, void *vs2, int i)   \
> > +{                                                                   \
> > +    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
> > +    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1);                      \
> > +}
> > +
> >  void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
> >                  CPURISCVState *env, uint32_t desc,
> >                  opivx2_fn fn, uint32_t esz);
>
> Again: refactoring needs to go into a separate patch.
>
> > --
> > 2.39.1
> >

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/39] target/riscv: Add vrol.[vv, vx] and vror.[vv, vx, vi] decoding, translation and execution support
@ 2023-02-02 14:30       ` Philipp Tomsich
  0 siblings, 0 replies; 59+ messages in thread
From: Philipp Tomsich @ 2023-02-02 14:30 UTC (permalink / raw)
  To: Lawrence Hunter
  Cc: qemu-devel, dickon.hood, nazar.kazakov, kiran.ostrolenk,
	frank.chang, palmer, alistair.francis, bin.meng, pbonzini, kvm

On Thu, 2 Feb 2023 at 15:13, Philipp Tomsich <philipp.tomsich@vrull.eu> wrote:
>
> On Thu, 2 Feb 2023 at 13:42, Lawrence Hunter
> <lawrence.hunter@codethink.co.uk> wrote:
> >
> > From: Dickon Hood <dickon.hood@codethink.co.uk>
> >
> > Add an initial implementation of the vrol.* and vror.* instructions,
> > with mappings between the RISC-V instructions and their internal TCG
> > accelerated implmentations.
> >
> > There are some missing ror helpers, so I've bodged it by converting them
> > to rols.
> >
> > Co-authored-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
> > Signed-off-by: Kiran Ostrolenk <kiran.ostrolenk@codethink.co.uk>
> > Signed-off-by: Dickon Hood <dickon.hood@codethink.co.uk>
> > ---
> >  target/riscv/helper.h                      | 20 ++++++++
> >  target/riscv/insn32.decode                 |  6 +++
> >  target/riscv/insn_trans/trans_rvzvkb.c.inc | 20 ++++++++
> >  target/riscv/vcrypto_helper.c              | 58 ++++++++++++++++++++++
> >  target/riscv/vector_helper.c               | 45 -----------------
> >  target/riscv/vector_internals.h            | 52 +++++++++++++++++++
> >  6 files changed, 156 insertions(+), 45 deletions(-)
> >
> > diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> > index 32f1179e29..e5b6b3360f 100644
> > --- a/target/riscv/helper.h
> > +++ b/target/riscv/helper.h
> > @@ -1142,3 +1142,23 @@ DEF_HELPER_6(vclmul_vv, void, ptr, ptr, ptr, ptr, env, i32)
> >  DEF_HELPER_6(vclmul_vx, void, ptr, ptr, tl, ptr, env, i32)
> >  DEF_HELPER_6(vclmulh_vv, void, ptr, ptr, ptr, ptr, env, i32)
> >  DEF_HELPER_6(vclmulh_vx, void, ptr, ptr, tl, ptr, env, i32)
> > +
> > +DEF_HELPER_6(vror_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> > +DEF_HELPER_6(vror_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> > +DEF_HELPER_6(vror_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> > +DEF_HELPER_6(vror_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> > +
> > +DEF_HELPER_6(vror_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> > +DEF_HELPER_6(vror_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> > +DEF_HELPER_6(vror_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> > +DEF_HELPER_6(vror_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> > +
> > +DEF_HELPER_6(vrol_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
> > +DEF_HELPER_6(vrol_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
> > +DEF_HELPER_6(vrol_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
> > +DEF_HELPER_6(vrol_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
> > +
> > +DEF_HELPER_6(vrol_vx_b, void, ptr, ptr, tl, ptr, env, i32)
> > +DEF_HELPER_6(vrol_vx_h, void, ptr, ptr, tl, ptr, env, i32)
> > +DEF_HELPER_6(vrol_vx_w, void, ptr, ptr, tl, ptr, env, i32)
> > +DEF_HELPER_6(vrol_vx_d, void, ptr, ptr, tl, ptr, env, i32)
> > diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> > index b4d88dd1cb..725f907ad1 100644
> > --- a/target/riscv/insn32.decode
> > +++ b/target/riscv/insn32.decode
> > @@ -896,3 +896,9 @@ vclmul_vv       001100 . ..... ..... 010 ..... 1010111 @r_vm
> >  vclmul_vx       001100 . ..... ..... 110 ..... 1010111 @r_vm
> >  vclmulh_vv      001101 . ..... ..... 010 ..... 1010111 @r_vm
> >  vclmulh_vx      001101 . ..... ..... 110 ..... 1010111 @r_vm
> > +vrol_vv         010101 . ..... ..... 000 ..... 1010111 @r_vm
> > +vrol_vx         010101 . ..... ..... 100 ..... 1010111 @r_vm
> > +vror_vv         010100 . ..... ..... 000 ..... 1010111 @r_vm
> > +vror_vx         010100 . ..... ..... 100 ..... 1010111 @r_vm
> > +vror_vi         010100 . ..... ..... 011 ..... 1010111 @r_vm
> > +vror_vi2        010101 . ..... ..... 011 ..... 1010111 @r_vm
>
> We have only a single vror_vi instruction... and it has a 6bit immediate.
> There's no need to deviate from the spec. Just write it as follows:
> %imm_z6   26:1 15:5
> @r2_zimm6  ..... . vm:1 ..... ..... ... ..... .......  &rmrr %rs2
> rs1=%imm_z6 %rd
> vror_vi         01010. . ..... ..... 011 ..... 1010111 @r2_zimm6
>
> > diff --git a/target/riscv/insn_trans/trans_rvzvkb.c.inc b/target/riscv/insn_trans/trans_rvzvkb.c.inc
> > index 533141e559..d2a7a92d42 100644
> > --- a/target/riscv/insn_trans/trans_rvzvkb.c.inc
> > +++ b/target/riscv/insn_trans/trans_rvzvkb.c.inc
> > @@ -95,3 +95,23 @@ static bool vclmul_vx_check(DisasContext *s, arg_rmrr *a)
> >
> >  GEN_VX_MASKED_TRANS(vclmul_vx, vclmul_vx_check)
> >  GEN_VX_MASKED_TRANS(vclmulh_vx, vclmul_vx_check)
> > +
> > +GEN_OPIVV_TRANS(vror_vv, zvkb_vv_check)
> > +GEN_OPIVX_TRANS(vror_vx, zvkb_vx_check)
> > +GEN_OPIVV_TRANS(vrol_vv, zvkb_vv_check)
> > +GEN_OPIVX_TRANS(vrol_vx, zvkb_vx_check)
> > +
> > +GEN_OPIVI_TRANS(vror_vi, IMM_TRUNC_SEW, vror_vx, zvkb_vx_check)
>
> Please introduce a IMM_ZIMM6 that integrates with the extract_imm as follows:
>     case IMM_ZIMM6:
>         return extract64(imm, 0, 6);
>
> > +
> > +/*
> > + * Immediates are 5b long, and we need six for the rotate-immediate.  The
> > + * decision has been taken to remove the vrol.vi instruction -- you can
> > + * emulate it with a ror, after all -- and use the bottom bit of the funct6
> > + * part of the opcode to encode the extra bit.  I've chosen to implement it
> > + * like this because it's easy and reasonably clean.
> > + */
> > +static bool trans_vror_vi2(DisasContext *s, arg_rmrr *a)
> > +{
> > +    a->rs1 += 32;
> > +    return trans_vror_vi(s, a);
> > +}
>
> As discussed above, please handle this in a single trans_vror_vi:
>    GEN_OPIVI_GVEC_TRANS_CHECK(vror_vi, IMM_ZIMM6, vror_vx, rotri, zvkb_check_vi)

On the second pass over these patches, here's how we can use gvec
support for both vror and vrol:

/* Synthesize a rotate-right from a negate(shift-amount) + rotate-left */
static void tcg_gen_gvec_rotrs(unsigned vece, uint32_t dofs, uint32_t aofs,
                       TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz)
{
    TCGv_i32 tmp = tcg_temp_new_i32();
    tcg_gen_neg_i32(tmp, shift);
    tcg_gen_gvec_rotls(vece, dofs, aofs, tmp, oprsz, maxsz);
    tcg_temp_free_i32(tmp);
}

/* vrol.v[vx] */
GEN_OPIVV_GVEC_TRANS_CHECK(vrol_vv, rotlv, zvkb_check_vv)
GEN_OPIVX_GVEC_SHIFT_TRANS_CHECK(vrol_vx, rotls, zvkb_check_vx)

/* vror.v[vxi] */
GEN_OPIVV_GVEC_TRANS_CHECK(vror_vv, rotrv, zvkb_check_vv)
GEN_OPIVX_GVEC_SHIFT_TRANS_CHECK(vror_vx, rotrs, zvkb_check_vx)
GEN_OPIVI_GVEC_TRANS_CHECK(vror_vi, IMM_ZIMM6, vror_vx, rotri, zvkb_check_vi)


> > diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
> > index 46e2e510c5..7ec75c5589 100644
> > --- a/target/riscv/vcrypto_helper.c
> > +++ b/target/riscv/vcrypto_helper.c
> > @@ -57,3 +57,61 @@ GEN_VEXT_VV(vclmul_vv, 8)
> >  GEN_VEXT_VX(vclmul_vx, 8)
> >  GEN_VEXT_VV(vclmulh_vv, 8)
> >  GEN_VEXT_VX(vclmulh_vx, 8)
> > +
> > +/*
> > + *  Looks a mess, but produces reasonable (aarch32) code on clang:
> > + * https://godbolt.org/z/jchjsTda8
> > + */
> > +#define DO_ROR(x, n)                       \
> > +    ((x >> (n & ((sizeof(x) << 3) - 1))) | \
> > +     (x << ((sizeof(x) << 3) - (n & ((sizeof(x) << 3) - 1)))))
> > +#define DO_ROL(x, n)                       \
> > +    ((x << (n & ((sizeof(x) << 3) - 1))) | \
> > +     (x >> ((sizeof(x) << 3) - (n & ((sizeof(x) << 3) - 1)))))
> > +
> > +RVVCALL(OPIVV2, vror_vv_b, OP_UUU_B, H1, H1, H1, DO_ROR)
> > +RVVCALL(OPIVV2, vror_vv_h, OP_UUU_H, H2, H2, H2, DO_ROR)
> > +RVVCALL(OPIVV2, vror_vv_w, OP_UUU_W, H4, H4, H4, DO_ROR)
> > +RVVCALL(OPIVV2, vror_vv_d, OP_UUU_D, H8, H8, H8, DO_ROR)
>
> Indeed, this is a mess: we already have ror8/16/32/64 available.
> Why not the following?
>
> /* vror.vv */
> GEN_VEXT_SHIFT_VV(vror_vv_b, uint8_t,  uint8_t,  H1, H1, ror8,  0x7)
> GEN_VEXT_SHIFT_VV(vror_vv_h, uint16_t, uint16_t, H2, H2, ror16, 0xf)
> GEN_VEXT_SHIFT_VV(vror_vv_w, uint32_t, uint32_t, H4, H4, ror32, 0x1f)
> GEN_VEXT_SHIFT_VV(vror_vv_d, uint64_t, uint64_t, H8, H8, ror64, 0x3f)
>
> > +GEN_VEXT_VV(vror_vv_b, 1)
> > +GEN_VEXT_VV(vror_vv_h, 2)
> > +GEN_VEXT_VV(vror_vv_w, 4)
> > +GEN_VEXT_VV(vror_vv_d, 8)
> > +
> > +/*
> > + * There's a missing tcg_gen_gvec_rotrs() helper function.
> > + */
> > +#define GEN_VEXT_VX_RTOL(NAME, ESZ)                                      \
> > +void HELPER(NAME)(void *vd, void *v0, target_ulong s1, void *vs2,        \
> > +                  CPURISCVState *env, uint32_t desc)                     \
> > +{                                                                        \
> > +    do_vext_vx(vd, v0, (ESZ << 3) - s1, vs2, env, desc, do_##NAME, ESZ); \
> > +}
> > +
> > +/* DO_ROL because GEN_VEXT_VX_RTOL() converts from R to L */
> > +RVVCALL(OPIVX2, vror_vx_b, OP_UUU_B, H1, H1, DO_ROL)
> > +RVVCALL(OPIVX2, vror_vx_h, OP_UUU_H, H2, H2, DO_ROL)
> > +RVVCALL(OPIVX2, vror_vx_w, OP_UUU_W, H4, H4, DO_ROL)
> > +RVVCALL(OPIVX2, vror_vx_d, OP_UUU_D, H8, H8, DO_ROL)
>
> Same applies as above:
> /* vror.vx */
> GEN_VEXT_SHIFT_VX(vror_vx_b, uint8_t,  uint8_t,  H1, H1, ror8,  0x7)
> GEN_VEXT_SHIFT_VX(vror_vx_h, uint16_t, uint16_t, H2, H2, ror16, 0xf)
> GEN_VEXT_SHIFT_VX(vror_vx_w, uint32_t, uint32_t, H4, H4, ror32, 0x1f)
> GEN_VEXT_SHIFT_VX(vror_vx_d, uint64_t, uint64_t, H8, H8, ror64, 0x3f)
>
> > +GEN_VEXT_VX_RTOL(vror_vx_b, 1)
> > +GEN_VEXT_VX_RTOL(vror_vx_h, 2)
> > +GEN_VEXT_VX_RTOL(vror_vx_w, 4)
> > +GEN_VEXT_VX_RTOL(vror_vx_d, 8)
> > +
> > +RVVCALL(OPIVV2, vrol_vv_b, OP_UUU_B, H1, H1, H1, DO_ROL)
> > +RVVCALL(OPIVV2, vrol_vv_h, OP_UUU_H, H2, H2, H2, DO_ROL)
> > +RVVCALL(OPIVV2, vrol_vv_w, OP_UUU_W, H4, H4, H4, DO_ROL)
> > +RVVCALL(OPIVV2, vrol_vv_d, OP_UUU_D, H8, H8, H8, DO_ROL)
> > +GEN_VEXT_VV(vrol_vv_b, 1)
> > +GEN_VEXT_VV(vrol_vv_h, 2)
> > +GEN_VEXT_VV(vrol_vv_w, 4)
> > +GEN_VEXT_VV(vrol_vv_d, 8)
> > +
> > +RVVCALL(OPIVX2, vrol_vx_b, OP_UUU_B, H1, H1, DO_ROL)
> > +RVVCALL(OPIVX2, vrol_vx_h, OP_UUU_H, H2, H2, DO_ROL)
> > +RVVCALL(OPIVX2, vrol_vx_w, OP_UUU_W, H4, H4, DO_ROL)
> > +RVVCALL(OPIVX2, vrol_vx_d, OP_UUU_D, H8, H8, DO_ROL)
> > +GEN_VEXT_VX(vrol_vx_b, 1)
> > +GEN_VEXT_VX(vrol_vx_h, 2)
> > +GEN_VEXT_VX(vrol_vx_w, 4)
> > +GEN_VEXT_VX(vrol_vx_d, 8)
> > diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> > index ab470092f6..ff7b03cbe3 100644
> > --- a/target/riscv/vector_helper.c
> > +++ b/target/riscv/vector_helper.c
> > @@ -76,26 +76,6 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
> >      return vl;
> >  }
> >
> > -/*
> > - * Note that vector data is stored in host-endian 64-bit chunks,
> > - * so addressing units smaller than that needs a host-endian fixup.
> > - */
> > -#if HOST_BIG_ENDIAN
> > -#define H1(x)   ((x) ^ 7)
> > -#define H1_2(x) ((x) ^ 6)
> > -#define H1_4(x) ((x) ^ 4)
> > -#define H2(x)   ((x) ^ 3)
> > -#define H4(x)   ((x) ^ 1)
> > -#define H8(x)   ((x))
> > -#else
> > -#define H1(x)   (x)
> > -#define H1_2(x) (x)
> > -#define H1_4(x) (x)
> > -#define H2(x)   (x)
> > -#define H4(x)   (x)
> > -#define H8(x)   (x)
> > -#endif
> > -
> >  /*
> >   * Get the maximum number of elements can be operated.
> >   *
> > @@ -683,18 +663,11 @@ GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b)
> >   *** Vector Integer Arithmetic Instructions
> >   */
> >
> > -/* expand macro args before macro */
> > -#define RVVCALL(macro, ...)  macro(__VA_ARGS__)
> > -
> >  /* (TD, T1, T2, TX1, TX2) */
> >  #define OP_SSS_B int8_t, int8_t, int8_t, int8_t, int8_t
> >  #define OP_SSS_H int16_t, int16_t, int16_t, int16_t, int16_t
> >  #define OP_SSS_W int32_t, int32_t, int32_t, int32_t, int32_t
> >  #define OP_SSS_D int64_t, int64_t, int64_t, int64_t, int64_t
> > -#define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
> > -#define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
> > -#define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
> > -#define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
> >  #define OP_SUS_B int8_t, uint8_t, int8_t, uint8_t, int8_t
> >  #define OP_SUS_H int16_t, uint16_t, int16_t, uint16_t, int16_t
> >  #define OP_SUS_W int32_t, uint32_t, int32_t, uint32_t, int32_t
> > @@ -718,14 +691,6 @@ GEN_VEXT_ST_WHOLE(vs8r_v, int8_t, ste_b)
> >  #define NOP_UUU_H uint16_t, uint16_t, uint32_t, uint16_t, uint32_t
> >  #define NOP_UUU_W uint32_t, uint32_t, uint64_t, uint32_t, uint64_t
> >
> > -
> > -#define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
> > -static void do_##NAME(void *vd, void *vs1, void *vs2, int i)    \
> > -{                                                               \
> > -    TX1 s1 = *((T1 *)vs1 + HS1(i));                             \
> > -    TX2 s2 = *((T2 *)vs2 + HS2(i));                             \
> > -    *((TD *)vd + HD(i)) = OP(s2, s1);                           \
> > -}
> >  #define DO_SUB(N, M) (N - M)
> >  #define DO_RSUB(N, M) (M - N)
> >
> > @@ -747,16 +712,6 @@ GEN_VEXT_VV(vsub_vv_h, 2)
> >  GEN_VEXT_VV(vsub_vv_w, 4)
> >  GEN_VEXT_VV(vsub_vv_d, 8)
> >
> > -/*
> > - * (T1)s1 gives the real operator type.
> > - * (TX1)(T1)s1 expands the operator type of widen or narrow operations.
> > - */
> > -#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
> > -static void do_##NAME(void *vd, target_long s1, void *vs2, int i)   \
> > -{                                                                   \
> > -    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
> > -    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1);                      \
> > -}
> >
> >  RVVCALL(OPIVX2, vadd_vx_b, OP_SSS_B, H1, H1, DO_ADD)
> >  RVVCALL(OPIVX2, vadd_vx_h, OP_SSS_H, H2, H2, DO_ADD)
> > diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h
> > index 49529d2379..a0fbac7bf3 100644
> > --- a/target/riscv/vector_internals.h
> > +++ b/target/riscv/vector_internals.h
> > @@ -28,6 +28,26 @@ static inline uint32_t vext_nf(uint32_t desc)
> >      return FIELD_EX32(simd_data(desc), VDATA, NF);
> >  }
> >
> > +/*
> > + * Note that vector data is stored in host-endian 64-bit chunks,
> > + * so addressing units smaller than that needs a host-endian fixup.
> > + */
> > +#if HOST_BIG_ENDIAN
> > +#define H1(x)   ((x) ^ 7)
> > +#define H1_2(x) ((x) ^ 6)
> > +#define H1_4(x) ((x) ^ 4)
> > +#define H2(x)   ((x) ^ 3)
> > +#define H4(x)   ((x) ^ 1)
> > +#define H8(x)   ((x))
> > +#else
> > +#define H1(x)   (x)
> > +#define H1_2(x) (x)
> > +#define H1_4(x) (x)
> > +#define H2(x)   (x)
> > +#define H4(x)   (x)
> > +#define H8(x)   (x)
> > +#endif
> > +
> >  /*
> >   * Encode LMUL to lmul as following:
> >   *     LMUL    vlmul    lmul
> > @@ -96,9 +116,30 @@ static inline uint32_t vext_get_total_elems(CPURISCVState *env, uint32_t desc,
> >  void vext_set_elems_1s(void *base, uint32_t is_agnostic, uint32_t cnt,
> >                         uint32_t tot);
> >
> > +/*
> > + *** Vector Integer Arithmetic Instructions
> > + */
> > +
> > +/* expand macro args before macro */
> > +#define RVVCALL(macro, ...)  macro(__VA_ARGS__)
> > +
> > +/* (TD, T1, T2, TX1, TX2) */
> > +#define OP_UUU_B uint8_t, uint8_t, uint8_t, uint8_t, uint8_t
> > +#define OP_UUU_H uint16_t, uint16_t, uint16_t, uint16_t, uint16_t
> > +#define OP_UUU_W uint32_t, uint32_t, uint32_t, uint32_t, uint32_t
> > +#define OP_UUU_D uint64_t, uint64_t, uint64_t, uint64_t, uint64_t
> > +
> >  /* operation of two vector elements */
> >  typedef void opivv2_fn(void *vd, void *vs1, void *vs2, int i);
> >
> > +#define OPIVV2(NAME, TD, T1, T2, TX1, TX2, HD, HS1, HS2, OP)    \
> > +static void do_##NAME(void *vd, void *vs1, void *vs2, int i)    \
> > +{                                                               \
> > +    TX1 s1 = *((T1 *)vs1 + HS1(i));                             \
> > +    TX2 s2 = *((T2 *)vs2 + HS2(i));                             \
> > +    *((TD *)vd + HD(i)) = OP(s2, s1);                           \
> > +}
> > +
> >  void do_vext_vv(void *vd, void *v0, void *vs1, void *vs2,
> >                  CPURISCVState *env, uint32_t desc,
> >                  opivv2_fn *fn, uint32_t esz);
> > @@ -115,6 +156,17 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,          \
> >
> >  typedef void opivx2_fn(void *vd, target_long s1, void *vs2, int i);
> >
> > +/*
> > + * (T1)s1 gives the real operator type.
> > + * (TX1)(T1)s1 expands the operator type of widen or narrow operations.
> > + */
> > +#define OPIVX2(NAME, TD, T1, T2, TX1, TX2, HD, HS2, OP)             \
> > +static void do_##NAME(void *vd, target_long s1, void *vs2, int i)   \
> > +{                                                                   \
> > +    TX2 s2 = *((T2 *)vs2 + HS2(i));                                 \
> > +    *((TD *)vd + HD(i)) = OP(s2, (TX1)(T1)s1);                      \
> > +}
> > +
> >  void do_vext_vx(void *vd, void *v0, target_long s1, void *vs2,
> >                  CPURISCVState *env, uint32_t desc,
> >                  opivx2_fn fn, uint32_t esz);
>
> Again: refactoring needs to go into a separate patch.
>
> > --
> > 2.39.1
> >


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/39] target/riscv: Add vclmulh.vv decoding, translation and execution support
  2023-02-02 12:41 ` [PATCH 04/39] target/riscv: Add vclmulh.vv " Lawrence Hunter
  2023-02-02 14:03   ` Philipp Tomsich
@ 2023-02-02 16:54   ` Richard Henderson
  1 sibling, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2023-02-02 16:54 UTC (permalink / raw)
  To: Lawrence Hunter, qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm

On 2/2/23 02:41, Lawrence Hunter wrote:
> +static void do_vclmulh_vv(void *vd, void *vs1, void *vs2, int i)
> +{
> +    __uint128_t result = 0;

In passing, you may not use __uint128_t directly, as it is not supported on all hosts. 
Philipp has given you good advice on adjusting the computation.


r~

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 35/39] crypto: Move SM4_SBOXWORD from target/riscv
  2023-02-02 12:42 ` [PATCH 35/39] crypto: Move SM4_SBOXWORD from target/riscv Lawrence Hunter
@ 2023-02-02 17:02   ` Richard Henderson
  0 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2023-02-02 17:02 UTC (permalink / raw)
  To: Lawrence Hunter, qemu-devel
  Cc: dickon.hood, nazar.kazakov, kiran.ostrolenk, frank.chang, palmer,
	alistair.francis, bin.meng, pbonzini, philipp.tomsich, kvm,
	Max Chou

On 2/2/23 02:42, Lawrence Hunter wrote:
> From: Max Chou <max.chou@sifive.com>
> 
>      - Share SM4_SBOXWORD between target/riscv and target/arm.
> 
> Signed-off-by: Max Chou <max.chou@sifive.com>
> Reviewed-by: Frank Chang <frank.chang@sifive.com>
> ---
>   include/crypto/sm4.h       |  7 +++++++
>   target/arm/crypto_helper.c | 10 ++--------
>   2 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/include/crypto/sm4.h b/include/crypto/sm4.h
> index 9bd3ebc62e..33478562a4 100644
> --- a/include/crypto/sm4.h
> +++ b/include/crypto/sm4.h
> @@ -1,6 +1,13 @@
>   #ifndef QEMU_SM4_H
>   #define QEMU_SM4_H
>   
> +#define SM4_SBOXWORD(WORD) ( \
> +    sm4_sbox[((WORD) >> 24) & 0xff] << 24 | \
> +    sm4_sbox[((WORD) >> 16) & 0xff] << 16 | \
> +    sm4_sbox[((WORD) >>  8) & 0xff] <<  8 | \
> +    sm4_sbox[((WORD) >>  0) & 0xff] <<  0   \
> +)
> +
>   extern const uint8_t sm4_sbox[256];

I think this would be better as an inline function, so that the types are clear.


r~


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/39] target/riscv: Add vrol.[vv, vx] and vror.[vv, vx, vi] decoding, translation and execution support
  2023-02-02 14:30       ` [PATCH 06/39] target/riscv: Add vrol.[vv, vx] and vror.[vv, vx, vi] " Philipp Tomsich
  (?)
@ 2023-02-02 17:35       ` Richard Henderson
  2023-02-02 18:07         ` Philipp Tomsich
  -1 siblings, 1 reply; 59+ messages in thread
From: Richard Henderson @ 2023-02-02 17:35 UTC (permalink / raw)
  To: Philipp Tomsich, Lawrence Hunter
  Cc: qemu-devel, dickon.hood, nazar.kazakov, kiran.ostrolenk,
	frank.chang, palmer, alistair.francis, bin.meng, pbonzini, kvm

On 2/2/23 04:30, Philipp Tomsich wrote:
> On the second pass over these patches, here's how we can use gvec
> support for both vror and vrol:
> 
> /* Synthesize a rotate-right from a negate(shift-amount) + rotate-left */
> static void tcg_gen_gvec_rotrs(unsigned vece, uint32_t dofs, uint32_t aofs,
>                         TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz)
> {
>      TCGv_i32 tmp = tcg_temp_new_i32();
>      tcg_gen_neg_i32(tmp, shift);
>      tcg_gen_gvec_rotls(vece, dofs, aofs, tmp, oprsz, maxsz);

We can add rotls generically.
I hadn't done this so far because there were no users.


r~

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/39] target/riscv: Add vrol.[vv, vx] and vror.[vv, vx, vi] decoding, translation and execution support
  2023-02-02 17:35       ` Richard Henderson
@ 2023-02-02 18:07         ` Philipp Tomsich
  2023-02-02 23:14           ` Richard Henderson
  0 siblings, 1 reply; 59+ messages in thread
From: Philipp Tomsich @ 2023-02-02 18:07 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Lawrence Hunter, qemu-devel, dickon.hood, nazar.kazakov,
	kiran.ostrolenk, frank.chang, palmer, alistair.francis, bin.meng,
	pbonzini, kvm

On Thu, 2 Feb 2023 at 18:35, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 2/2/23 04:30, Philipp Tomsich wrote:
> > On the second pass over these patches, here's how we can use gvec
> > support for both vror and vrol:
> >
> > /* Synthesize a rotate-right from a negate(shift-amount) + rotate-left */
> > static void tcg_gen_gvec_rotrs(unsigned vece, uint32_t dofs, uint32_t aofs,
> >                         TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz)
> > {
> >      TCGv_i32 tmp = tcg_temp_new_i32();
> >      tcg_gen_neg_i32(tmp, shift);
> >      tcg_gen_gvec_rotls(vece, dofs, aofs, tmp, oprsz, maxsz);
>
> We can add rotls generically.
> I hadn't done this so far because there were no users.

I read this such that your preference is to have a generic gvec rotrs?
If this is correct, I can drop a patch to that effect…

Philipp.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/39] target/riscv: Add vrol.[vv, vx] and vror.[vv, vx, vi] decoding, translation and execution support
  2023-02-02 18:07         ` Philipp Tomsich
@ 2023-02-02 23:14           ` Richard Henderson
  0 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2023-02-02 23:14 UTC (permalink / raw)
  To: Philipp Tomsich
  Cc: Lawrence Hunter, qemu-devel, dickon.hood, nazar.kazakov,
	kiran.ostrolenk, frank.chang, palmer, alistair.francis, bin.meng,
	pbonzini, kvm

On 2/2/23 08:07, Philipp Tomsich wrote:
>>>       tcg_gen_gvec_rotls(vece, dofs, aofs, tmp, oprsz, maxsz);
>>
>> We can add rotls generically.
>> I hadn't done this so far because there were no users.
> 
> I read this such that your preference is to have a generic gvec rotrs?
> If this is correct, I can drop a patch to that effect…

Yes, please.


r~

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2023-02-02 23:14 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-02 12:41 [PATCH 00/39] Add RISC-V vector cryptography extensions Lawrence Hunter
2023-02-02 12:41 ` [PATCH 01/39] target/riscv: add zvkb cpu property Lawrence Hunter
2023-02-02 12:41 ` [PATCH 02/39] target/riscv: Add vclmul.vv decoding, translation and execution support Lawrence Hunter
2023-02-02 13:53   ` Philipp Tomsich
2023-02-02 12:41 ` [PATCH 03/39] target/riscv: Add vclmul.vx " Lawrence Hunter
2023-02-02 13:59   ` Philipp Tomsich
2023-02-02 12:41 ` [PATCH 04/39] target/riscv: Add vclmulh.vv " Lawrence Hunter
2023-02-02 14:03   ` Philipp Tomsich
2023-02-02 16:54   ` Richard Henderson
2023-02-02 12:41 ` [PATCH 05/39] target/riscv: Add vclmulh.vx " Lawrence Hunter
2023-02-02 12:41 ` [PATCH 06/39] target/riscv: Add vrol.[vv,vx] and vror.[vv,vx,vi] " Lawrence Hunter
2023-02-02 12:41   ` [PATCH 06/39] target/riscv: Add vrol.[vv, vx] and vror.[vv, vx, vi] " Lawrence Hunter
2023-02-02 14:13   ` [PATCH 06/39] target/riscv: Add vrol.[vv,vx] and vror.[vv,vx,vi] " Philipp Tomsich
2023-02-02 14:13     ` [PATCH 06/39] target/riscv: Add vrol.[vv, vx] and vror.[vv, vx, vi] " Philipp Tomsich
2023-02-02 14:30     ` [PATCH 06/39] target/riscv: Add vrol.[vv,vx] and vror.[vv,vx,vi] " Philipp Tomsich
2023-02-02 14:30       ` [PATCH 06/39] target/riscv: Add vrol.[vv, vx] and vror.[vv, vx, vi] " Philipp Tomsich
2023-02-02 17:35       ` Richard Henderson
2023-02-02 18:07         ` Philipp Tomsich
2023-02-02 23:14           ` Richard Henderson
2023-02-02 12:41 ` [PATCH 07/39] target/riscv: Add vbrev8.v " Lawrence Hunter
2023-02-02 14:21   ` Philipp Tomsich
2023-02-02 12:41 ` [PATCH 08/39] target/riscv: Add vrev8.v " Lawrence Hunter
2023-02-02 14:22   ` Philipp Tomsich
2023-02-02 12:42 ` [PATCH 09/39] target/riscv: Add vandn.[vv,vx,vi] " Lawrence Hunter
2023-02-02 12:42   ` [PATCH 09/39] target/riscv: Add vandn.[vv, vx, vi] " Lawrence Hunter
2023-02-02 14:29   ` [PATCH 09/39] target/riscv: Add vandn.[vv,vx,vi] " Philipp Tomsich
2023-02-02 12:42 ` [PATCH 10/39] target/riscv: expose zvkb cpu property Lawrence Hunter
2023-02-02 14:23   ` Philipp Tomsich
2023-02-02 14:24     ` Philipp Tomsich
2023-02-02 12:42 ` [PATCH 11/39] target/riscv: add zvkns " Lawrence Hunter
2023-02-02 12:42 ` [PATCH 12/39] target/riscv: Add vaesef.vv decoding, translation and execution support Lawrence Hunter
2023-02-02 12:42 ` [PATCH 13/39] target/riscv: Add vaesef.vs " Lawrence Hunter
2023-02-02 12:42 ` [PATCH 14/39] target/riscv: Add vaesdf.vv " Lawrence Hunter
2023-02-02 12:42 ` [PATCH 15/39] target/riscv: Add vaesdf.vs " Lawrence Hunter
2023-02-02 12:42 ` [PATCH 16/39] target/riscv: Add vaesdm.vv " Lawrence Hunter
2023-02-02 12:42 ` [PATCH 17/39] target/riscv: Add vaesdm.vs " Lawrence Hunter
2023-02-02 12:42 ` [PATCH 18/39] target/riscv: Add vaesz.vs " Lawrence Hunter
2023-02-02 12:42 ` [PATCH 19/39] target/riscv: Add vaesem.vv " Lawrence Hunter
2023-02-02 12:42 ` [PATCH 20/39] target/riscv: Add vaesem.vs " Lawrence Hunter
2023-02-02 12:42 ` [PATCH 21/39] target/riscv: Add vaeskf1.vi " Lawrence Hunter
2023-02-02 12:42 ` [PATCH 22/39] target/riscv: Add vaeskf2.vi " Lawrence Hunter
2023-02-02 12:42 ` [PATCH 23/39] target/riscv: expose zvkns cpu property Lawrence Hunter
2023-02-02 12:42 ` [PATCH 24/39] target/riscv: add zvknh cpu properties Lawrence Hunter
2023-02-02 12:42 ` [PATCH 25/39] target/riscv: Add vsha2ms.vv decoding, translation and execution support Lawrence Hunter
2023-02-02 12:42 ` [PATCH 26/39] target/riscv: Add vsha2c[hl].vv " Lawrence Hunter
2023-02-02 12:42 ` [PATCH 27/39] target/riscv: expose zvknh cpu properties Lawrence Hunter
2023-02-02 12:42 ` [PATCH 28/39] target/riscv: add zvksh cpu property Lawrence Hunter
2023-02-02 12:42 ` [PATCH 29/39] target/riscv: Add vsm3me.vv decoding, translation and execution support Lawrence Hunter
2023-02-02 12:42 ` [PATCH 30/39] target/riscv: Add vsm3c.vi " Lawrence Hunter
2023-02-02 12:42 ` [PATCH 31/39] target/riscv: expose zvksh cpu property Lawrence Hunter
2023-02-02 12:42 ` [PATCH 32/39] target/riscv: add zvkg " Lawrence Hunter
2023-02-02 12:42 ` [PATCH 33/39] target/riscv: Add vghmac.vv decoding, translation and execution support Lawrence Hunter
2023-02-02 12:42 ` [PATCH 34/39] target/riscv: expose zvkg cpu property Lawrence Hunter
2023-02-02 12:42 ` [PATCH 35/39] crypto: Move SM4_SBOXWORD from target/riscv Lawrence Hunter
2023-02-02 17:02   ` Richard Henderson
2023-02-02 12:42 ` [PATCH 36/39] crypto: Add SM4 constant parameter CK Lawrence Hunter
2023-02-02 12:42 ` [PATCH 37/39] target/riscv: Add zvksed cfg property Lawrence Hunter
2023-02-02 12:42 ` [PATCH 38/39] target/riscv: Add Zvksed support Lawrence Hunter
2023-02-02 12:42 ` [PATCH 39/39] target/riscv: Expose Zvksed property Lawrence Hunter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.