[PATCH v3 0/5] target/riscv: support vector extension part 2

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/5] target/riscv: support vector extension part 2
@ 2020-02-10  7:42 ` LIU Zhiwei
  0 siblings, 0 replies; 18+ messages in thread
From: LIU Zhiwei @ 2020-02-10  7:42 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768, LIU Zhiwei

This is the second part of v3 patchset. This changelog of v3 is only coverd
the part2.

Features:
  * support specification riscv-v-spec-0.7.1.
  * support basic vector extension.
  * support Zvlsseg.
  * support Zvamo.
  * not support Zvediv as it is changing.
  * fixed SLEN 128bit.
  * element width support 8bit, 16bit, 32bit, 64bit.

Changelog:
v3
  * move check code from execution time to translation time.
  * probe pages before real load or store access.
  * use probe_page_check for no-fault operations in linux user mode.
  * add atomic and noatomic operation for vector amo instructions.
V2
  * use float16_compare{_quiet}
  * only use GETPC() in outer most helper
  * add ctx.ext_v Property

LIU Zhiwei (5):
  target/riscv: add vector unit stride load and store instructions
  target/riscv: add vector stride load and store instructions
  target/riscv: add vector index load and store instructions
  target/riscv: add fault-only-first unit stride load
  target/riscv: add vector amo operations

 target/riscv/helper.h                   |  219 ++++
 target/riscv/insn32-64.decode           |   11 +
 target/riscv/insn32.decode              |   67 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  851 +++++++++++++++
 target/riscv/translate.c                |    2 +
 target/riscv/vector_helper.c            | 1251 +++++++++++++++++++++++
 6 files changed, 2401 insertions(+)

-- 
2.23.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 0/5] target/riscv: support vector extension part 2
@ 2020-02-10  7:42 ` LIU Zhiwei
  0 siblings, 0 replies; 18+ messages in thread
From: LIU Zhiwei @ 2020-02-10  7:42 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, qemu-devel, qemu-riscv, LIU Zhiwei

This is the second part of v3 patchset. This changelog of v3 is only coverd
the part2.

Features:
  * support specification riscv-v-spec-0.7.1.
  * support basic vector extension.
  * support Zvlsseg.
  * support Zvamo.
  * not support Zvediv as it is changing.
  * fixed SLEN 128bit.
  * element width support 8bit, 16bit, 32bit, 64bit.

Changelog:
v3
  * move check code from execution time to translation time.
  * probe pages before real load or store access.
  * use probe_page_check for no-fault operations in linux user mode.
  * add atomic and noatomic operation for vector amo instructions.
V2
  * use float16_compare{_quiet}
  * only use GETPC() in outer most helper
  * add ctx.ext_v Property

LIU Zhiwei (5):
  target/riscv: add vector unit stride load and store instructions
  target/riscv: add vector stride load and store instructions
  target/riscv: add vector index load and store instructions
  target/riscv: add fault-only-first unit stride load
  target/riscv: add vector amo operations

 target/riscv/helper.h                   |  219 ++++
 target/riscv/insn32-64.decode           |   11 +
 target/riscv/insn32.decode              |   67 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  851 +++++++++++++++
 target/riscv/translate.c                |    2 +
 target/riscv/vector_helper.c            | 1251 +++++++++++++++++++++++
 6 files changed, 2401 insertions(+)

-- 
2.23.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/5] target/riscv: add vector unit stride load and store instructions
  2020-02-10  7:42 ` LIU Zhiwei
@ 2020-02-10  7:42   ` LIU Zhiwei
  -1 siblings, 0 replies; 18+ messages in thread
From: LIU Zhiwei @ 2020-02-10  7:42 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768, LIU Zhiwei

Vector unit-stride operations access elements stored contiguously in memory
starting from the base effective address.

The Zvlsseg expands some vector load/store segment instructions, which move
multiple contiguous fields in memory to and from consecutively numbered
vector register

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  70 ++++
 target/riscv/insn32.decode              |  17 +
 target/riscv/insn_trans/trans_rvv.inc.c | 294 ++++++++++++++++
 target/riscv/translate.c                |   2 +
 target/riscv/vector_helper.c            | 438 ++++++++++++++++++++++++
 5 files changed, 821 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3c28c7e407..74c483ef9e 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -78,3 +78,73 @@ DEF_HELPER_1(tlb_flush, void, env)
 #endif
 /* Vector functions */
 DEF_HELPER_3(vsetvl, tl, env, tl, tl)
+DEF_HELPER_5(vlb_v_b, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlh_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlh_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlh_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlh_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlh_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlh_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlw_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlw_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlw_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlw_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_b, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_b, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhu_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhu_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhu_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhu_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhu_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhu_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwu_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwu_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwu_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwu_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_b, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsh_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsh_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsh_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsh_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsh_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsh_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsw_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsw_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsw_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsw_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_b, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_d_mask, void, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 5dc009c3cd..dad3ed91c7 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -43,6 +43,7 @@
 &u    imm rd
 &shift     shamt rs1 rd
 &atomic    aq rl rs2 rs1 rd
+&r2nfvm    vm rd rs1 nf
 
 # Formats 32:
 @r       .......   ..... ..... ... ..... ....... &r                %rs2 %rs1 %rd
@@ -62,6 +63,7 @@
 @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
 @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
 @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
+@r2_nfvm nf:3 ... vm:1 ..... ..... ... ..... ....... &r2nfvm %rs1 %rd
 @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
 
 @sfence_vma ....... ..... .....   ... ..... ....... %rs2 %rs1
@@ -206,5 +208,20 @@ fcvt_d_w   1101001  00000 ..... ... ..... 1010011 @r2_rm
 fcvt_d_wu  1101001  00001 ..... ... ..... 1010011 @r2_rm
 
 # *** RV32V Extension ***
+
+# *** Vector loads and stores are encoded within LOADFP/STORE-FP ***
+vlb_v      ... 100 . 00000 ..... 000 ..... 0000111 @r2_nfvm
+vlh_v      ... 100 . 00000 ..... 101 ..... 0000111 @r2_nfvm
+vlw_v      ... 100 . 00000 ..... 110 ..... 0000111 @r2_nfvm
+vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
+vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
+vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
+vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
+vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
+vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
+vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
+vse_v      ... 000 . 00000 ..... 111 ..... 0100111 @r2_nfvm
+
+# *** new major opcode OP-V ***
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index da82c72bbf..d93eb00651 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -15,6 +15,8 @@
  * You should have received a copy of the GNU General Public License along with
  * this program.  If not, see <http://www.gnu.org/licenses/>.
  */
+#include "tcg/tcg-op-gvec.h"
+#include "tcg/tcg-gvec-desc.h"
 
 static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
 {
@@ -67,3 +69,295 @@ static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
     tcg_temp_free(dst);
     return true;
 }
+
+/* define aidding fucntions */
+/* vector register offset from env */
+static uint32_t vreg_ofs(DisasContext *s, int reg)
+{
+    return offsetof(CPURISCVState, vext.vreg) + reg * s->vlen / 8;
+}
+
+/*
+ * As simd_desc supports at most 256 bytes, and in this implementation,
+ * the max vector group length is 2048 bytes. So split it into two parts.
+ *
+ * The first part is floor(maxsz, 64), encoded in maxsz of simd_desc.
+ * The second part is (maxsz % 64) >> 3, encoded in data of simd_desc.
+ */
+static uint32_t maxsz_part1(uint32_t maxsz)
+{
+    return ((maxsz & ~(0x3f)) >> 3) + 0x8; /* add offset 8 to avoid return 0 */
+}
+
+static uint32_t maxsz_part2(uint32_t maxsz)
+{
+    return (maxsz & 0x3f) >> 3;
+}
+
+/* define concrete check functions */
+static bool vext_check_vill(bool vill)
+{
+    if (vill) {
+        return false;
+    }
+    return true;
+}
+
+static bool vext_check_reg(uint32_t lmul, uint32_t reg, bool widen)
+{
+    int legal = widen ? (lmul * 2) : lmul;
+
+    if ((lmul != 1 && lmul != 2 && lmul != 4 && lmul != 8) ||
+        (lmul == 8 && widen)) {
+        return false;
+    }
+
+    if (reg % legal != 0) {
+        return false;
+    }
+    return true;
+}
+
+static bool vext_check_overlap_mask(uint32_t lmul, uint32_t vd, bool vm)
+{
+    if (lmul > 1 && vm == 0 && vd == 0) {
+        return false;
+    }
+    return true;
+}
+
+static bool vext_check_nf(uint32_t lmul, uint32_t nf)
+{
+    if (lmul * (nf + 1) > 8) {
+        return false;
+    }
+    return true;
+}
+
+/* define check conditions data structure */
+struct vext_check_ctx {
+
+    struct vext_reg {
+        uint8_t reg;
+        bool widen;
+        bool need_check;
+    } check_reg[6];
+
+    struct vext_overlap_mask {
+        uint8_t reg;
+        uint8_t vm;
+        bool need_check;
+    } check_overlap_mask;
+
+    struct vext_nf {
+        uint8_t nf;
+        bool need_check;
+    } check_nf;
+    target_ulong check_misa;
+
+} vchkctx;
+
+/* define general function */
+static bool vext_check(DisasContext *s)
+{
+    int i;
+    bool ret;
+
+    /* check ISA extend */
+    ret = ((s->misa & vchkctx.check_misa) == vchkctx.check_misa);
+    if (!ret) {
+        return false;
+    }
+    /* check vill */
+    ret = vext_check_vill(s->vill);
+    if (!ret) {
+        return false;
+    }
+    /* check register number is legal */
+    for (i = 0; i < 6; i++) {
+        if (vchkctx.check_reg[i].need_check) {
+            ret = vext_check_reg((1 << s->lmul), vchkctx.check_reg[i].reg,
+                    vchkctx.check_reg[i].widen);
+            if (!ret) {
+                return false;
+            }
+        }
+    }
+    /* check if mask register will be overlapped */
+    if (vchkctx.check_overlap_mask.need_check) {
+        ret = vext_check_overlap_mask((1 << s->lmul),
+                vchkctx.check_overlap_mask.reg, vchkctx.check_overlap_mask.vm);
+        if (!ret) {
+            return false;
+        }
+
+    }
+    /* check nf for Zvlsseg */
+    if (vchkctx.check_nf.need_check) {
+        ret = vext_check_nf((1 << s->lmul), vchkctx.check_nf.nf);
+        if (!ret) {
+            return false;
+        }
+
+    }
+    return true;
+}
+
+/* unit stride load and store */
+typedef void gen_helper_vext_ldst_us(TCGv_ptr, TCGv, TCGv_ptr,
+        TCGv_env, TCGv_i32);
+
+static bool do_vext_ldst_us_trans(uint32_t vd, uint32_t rs1, uint32_t data,
+        gen_helper_vext_ldst_us *fn, DisasContext *s)
+{
+    TCGv_ptr dest, mask;
+    TCGv base;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    base = tcg_temp_new();
+    desc = tcg_const_i32(simd_desc(0, maxsz_part1(s->maxsz), data));
+
+    gen_get_gpr(base, rs1);
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, base, mask, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free(base);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool vext_ld_us_trans(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
+{
+    uint8_t nf = a->nf + 1;
+    uint32_t data = s->mlen | (a->vm << 8) | (maxsz_part2(s->maxsz) << 9)
+        | (nf << 12);
+    gen_helper_vext_ldst_us *fn;
+    static gen_helper_vext_ldst_us * const fns[2][7][4] = {
+        /* masked unit stride load */
+        { { gen_helper_vlb_v_b_mask,  gen_helper_vlb_v_h_mask,
+            gen_helper_vlb_v_w_mask,  gen_helper_vlb_v_d_mask },
+          { NULL,                     gen_helper_vlh_v_h_mask,
+            gen_helper_vlh_v_w_mask,  gen_helper_vlh_v_d_mask },
+          { NULL,                     NULL,
+            gen_helper_vlw_v_w_mask,  gen_helper_vlw_v_d_mask },
+          { gen_helper_vle_v_b_mask,  gen_helper_vle_v_h_mask,
+            gen_helper_vle_v_w_mask,  gen_helper_vle_v_d_mask },
+          { gen_helper_vlbu_v_b_mask, gen_helper_vlbu_v_h_mask,
+            gen_helper_vlbu_v_w_mask, gen_helper_vlbu_v_d_mask },
+          { NULL,                     gen_helper_vlhu_v_h_mask,
+            gen_helper_vlhu_v_w_mask, gen_helper_vlhu_v_d_mask },
+          { NULL,                     NULL,
+            gen_helper_vlwu_v_w_mask, gen_helper_vlwu_v_d_mask } },
+        /* unmasked unit stride load */
+        { { gen_helper_vlb_v_b,  gen_helper_vlb_v_h,
+            gen_helper_vlb_v_w,  gen_helper_vlb_v_d },
+          { NULL,                gen_helper_vlh_v_h,
+            gen_helper_vlh_v_w,  gen_helper_vlh_v_d },
+          { NULL,                NULL,
+            gen_helper_vlw_v_w,  gen_helper_vlw_v_d },
+          { gen_helper_vle_v_b,  gen_helper_vle_v_h,
+            gen_helper_vle_v_w,  gen_helper_vle_v_d },
+          { gen_helper_vlbu_v_b, gen_helper_vlbu_v_h,
+            gen_helper_vlbu_v_w, gen_helper_vlbu_v_d },
+          { NULL,                gen_helper_vlhu_v_h,
+            gen_helper_vlhu_v_w, gen_helper_vlhu_v_d },
+          { NULL,                NULL,
+            gen_helper_vlwu_v_w, gen_helper_vlwu_v_d } }
+    };
+
+    fn =  fns[a->vm][seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    return do_vext_ldst_us_trans(a->rd, a->rs1, data, fn, s);
+}
+
+#define GEN_VEXT_LD_US_TRANS(NAME, DO_OP, SEQ)                            \
+static bool trans_##NAME(DisasContext *s, arg_r2nfvm* a)                  \
+{                                                                         \
+    vchkctx.check_misa = RVV;                                             \
+    vchkctx.check_overlap_mask.need_check = true;                         \
+    vchkctx.check_overlap_mask.reg = a->rd;                               \
+    vchkctx.check_overlap_mask.vm = a->vm;                                \
+    vchkctx.check_reg[0].need_check = true;                               \
+    vchkctx.check_reg[0].reg = a->rd;                                     \
+    vchkctx.check_reg[0].widen = false;                                   \
+    vchkctx.check_nf.need_check = true;                                   \
+    vchkctx.check_nf.nf = a->nf;                                          \
+                                                                          \
+    if (!vext_check(s)) {                                                 \
+        return false;                                                     \
+    }                                                                     \
+    return DO_OP(s, a, SEQ);                                              \
+}
+
+GEN_VEXT_LD_US_TRANS(vlb_v, vext_ld_us_trans, 0)
+GEN_VEXT_LD_US_TRANS(vlh_v, vext_ld_us_trans, 1)
+GEN_VEXT_LD_US_TRANS(vlw_v, vext_ld_us_trans, 2)
+GEN_VEXT_LD_US_TRANS(vle_v, vext_ld_us_trans, 3)
+GEN_VEXT_LD_US_TRANS(vlbu_v, vext_ld_us_trans, 4)
+GEN_VEXT_LD_US_TRANS(vlhu_v, vext_ld_us_trans, 5)
+GEN_VEXT_LD_US_TRANS(vlwu_v, vext_ld_us_trans, 6)
+
+static bool vext_st_us_trans(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
+{
+    uint8_t nf = a->nf + 1;
+    uint32_t data = s->mlen | (a->vm << 8) | (maxsz_part2(s->maxsz) << 9)
+        | (nf << 12);
+    gen_helper_vext_ldst_us *fn;
+    static gen_helper_vext_ldst_us * const fns[2][4][4] = {
+        /* masked unit stride load and store */
+        { { gen_helper_vsb_v_b_mask,  gen_helper_vsb_v_h_mask,
+            gen_helper_vsb_v_w_mask,  gen_helper_vsb_v_d_mask },
+          { NULL,                     gen_helper_vsh_v_h_mask,
+            gen_helper_vsh_v_w_mask,  gen_helper_vsh_v_d_mask },
+          { NULL,                     NULL,
+            gen_helper_vsw_v_w_mask,  gen_helper_vsw_v_d_mask },
+          { gen_helper_vse_v_b_mask,  gen_helper_vse_v_h_mask,
+            gen_helper_vse_v_w_mask,  gen_helper_vse_v_d_mask } },
+        /* unmasked unit stride store */
+        { { gen_helper_vsb_v_b,  gen_helper_vsb_v_h,
+            gen_helper_vsb_v_w,  gen_helper_vsb_v_d },
+          { NULL,                gen_helper_vsh_v_h,
+            gen_helper_vsh_v_w,  gen_helper_vsh_v_d },
+          { NULL,                NULL,
+            gen_helper_vsw_v_w,  gen_helper_vsw_v_d },
+          { gen_helper_vse_v_b,  gen_helper_vse_v_h,
+            gen_helper_vse_v_w,  gen_helper_vse_v_d } }
+    };
+
+    fn =  fns[a->vm][seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    return do_vext_ldst_us_trans(a->rd, a->rs1, data, fn, s);
+}
+
+#define GEN_VEXT_ST_US_TRANS(NAME, DO_OP, SEQ)                            \
+static bool trans_##NAME(DisasContext *s, arg_r2nfvm* a)                  \
+{                                                                         \
+    vchkctx.check_misa = RVV;                                             \
+    vchkctx.check_reg[0].need_check = true;                               \
+    vchkctx.check_reg[0].reg = a->rd;                                     \
+    vchkctx.check_reg[0].widen = false;                                   \
+    vchkctx.check_nf.need_check = true;                                   \
+    vchkctx.check_nf.nf = a->nf;                                          \
+                                                                          \
+    if (!vext_check(s)) {                                                 \
+        return false;                                                     \
+    }                                                                     \
+    return DO_OP(s, a, SEQ);                                              \
+}
+
+GEN_VEXT_ST_US_TRANS(vsb_v, vext_st_us_trans, 0)
+GEN_VEXT_ST_US_TRANS(vsh_v, vext_st_us_trans, 1)
+GEN_VEXT_ST_US_TRANS(vsw_v, vext_st_us_trans, 2)
+GEN_VEXT_ST_US_TRANS(vse_v, vext_st_us_trans, 3)
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index cc356aabd8..7eaaf172cf 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -60,6 +60,8 @@ typedef struct DisasContext {
     uint8_t lmul;
     uint8_t sew;
     uint16_t vlen;
+    uint32_t maxsz;
+    uint16_t mlen;
     bool vl_eq_vlmax;
 } DisasContext;
 
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index e0f2415345..406fcd1dfe 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -20,6 +20,7 @@
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "exec/helper-proto.h"
+#include "tcg/tcg-gvec-desc.h"
 #include <math.h>
 
 target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
@@ -47,3 +48,440 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
     env->vext.vstart = 0;
     return vl;
 }
+
+/*
+ * Note that vector data is stored in host-endian 64-bit chunks,
+ * so addressing units smaller than that needs a host-endian fixup.
+ */
+#ifdef HOST_WORDS_BIGENDIAN
+#define H1(x)   ((x) ^ 7)
+#define H1_2(x) ((x) ^ 6)
+#define H1_4(x) ((x) ^ 4)
+#define H2(x)   ((x) ^ 3)
+#define H4(x)   ((x) ^ 1)
+#define H8(x)   ((x))
+#else
+#define H1(x)   (x)
+#define H1_2(x) (x)
+#define H1_4(x) (x)
+#define H2(x)   (x)
+#define H4(x)   (x)
+#define H8(x)   (x)
+#endif
+
+#ifdef CONFIG_USER_ONLY
+#define MO_SB 0
+#define MO_LESW 0
+#define MO_LESL 0
+#define MO_LEQ 0
+#define MO_UB 0
+#define MO_LEUW 0
+#define MO_LEUL 0
+#endif
+
+static inline int vext_elem_mask(void *v0, int mlen, int index)
+{
+    int idx = (index * mlen) / 8;
+    int pos = (index * mlen) % 8;
+
+    return (*((uint8_t *)v0 + idx) >> pos) & 0x1;
+}
+
+static uint32_t vext_nf(uint32_t desc)
+{
+    return (simd_data(desc) >> 12) & 0xf;
+}
+
+static uint32_t vext_mlen(uint32_t desc)
+{
+    return simd_data(desc) & 0xff;
+}
+
+static uint32_t vext_vm(uint32_t desc)
+{
+    return (simd_data(desc) >> 8) & 0x1;
+}
+
+/*
+ * Get vector group length [64, 2048] in bytes. Its range is [64, 2048].
+ *
+ * As simd_desc support at most 256 bytes, split it into two parts.
+ * The first part is floor(maxsz, 64), encoded in maxsz of simd_desc.
+ * The second part is (maxsz % 64) >> 3, encoded in data of simd_desc.
+ */
+static uint32_t vext_maxsz(uint32_t desc)
+{
+    return (simd_maxsz(desc) - 0x8) * 8 + ((simd_data(desc) >> 9) & 0x7) * 8;
+}
+
+/*
+ * This function checks watchpoint before really load operation.
+ *
+ * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
+ * In user mode, there is no watchpoint support now.
+ *
+ * It will triggle an exception if there is no mapping in TLB
+ * and page table walk can't fill the TLB entry. Then the guest
+ * software can return here after process the exception or never return.
+ */
+static void probe_read_access(CPURISCVState *env, target_ulong addr,
+        target_ulong len, uintptr_t ra)
+{
+    while (len) {
+        const target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
+        const target_ulong curlen = MIN(pagelen, len);
+
+        probe_read(env, addr, curlen, cpu_mmu_index(env, false), ra);
+        addr += curlen;
+        len -= curlen;
+    }
+}
+
+static void probe_write_access(CPURISCVState *env, target_ulong addr,
+        target_ulong len, uintptr_t ra)
+{
+    while (len) {
+        const target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
+        const target_ulong curlen = MIN(pagelen, len);
+
+        probe_write(env, addr, curlen, cpu_mmu_index(env, false), ra);
+        addr += curlen;
+        len -= curlen;
+    }
+}
+
+#ifdef HOST_WORDS_BIGENDIAN
+static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
+{
+    /*
+     * Split the remaining range to two parts.
+     * The first part is in the last uint64_t unit.
+     * The second part start from the next uint64_t unit.
+     */
+    int part1 = 0, part2 = tot - cnt;
+    if (cnt % 64) {
+        part1 = 64 - (cnt % 64);
+        part2 = tot - cnt - part1;
+        memset(tail & ~(63ULL), 0, part1);
+        memset((tail + 64) & ~(63ULL), 0, part2);
+    } else {
+        memset(tail, 0, part2);
+    }
+}
+#else
+static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
+{
+    memset(tail, 0, tot - cnt);
+}
+#endif
+/* common structure for all vector instructions */
+struct vext_common_ctx {
+    uint32_t vlmax;
+    uint32_t mlen;
+    uint32_t vl;
+    uint32_t msz;
+    uint32_t esz;
+    uint32_t vm;
+};
+
+static void vext_common_ctx_init(struct vext_common_ctx *ctx, uint32_t esz,
+        uint32_t msz, uint32_t vl, uint32_t desc)
+{
+    ctx->vlmax = vext_maxsz(desc) / esz;
+    ctx->mlen = vext_mlen(desc);
+    ctx->vm = vext_vm(desc);
+    ctx->vl = vl;
+    ctx->msz = msz;
+    ctx->esz = esz;
+}
+
+/* data structure and common functions for load and store */
+typedef void vext_ld_elem_fn(CPURISCVState *env, target_ulong addr,
+        uint32_t idx, void *vd, uintptr_t retaddr);
+typedef void vext_st_elem_fn(CPURISCVState *env, target_ulong addr,
+        uint32_t idx, void *vd, uintptr_t retaddr);
+typedef target_ulong vext_get_index_addr(target_ulong base,
+        uint32_t idx, void *vs2);
+typedef void vext_ld_clear_elem(void *vd, uint32_t idx,
+        uint32_t cnt, uint32_t tot);
+
+struct vext_ldst_ctx {
+    struct vext_common_ctx vcc;
+    uint32_t nf;
+    target_ulong base;
+    target_ulong stride;
+    int mmuidx;
+
+    vext_ld_elem_fn *ld_elem;
+    vext_st_elem_fn *st_elem;
+    vext_get_index_addr *get_index_addr;
+    vext_ld_clear_elem *clear_elem;
+};
+
+#define GEN_VEXT_LD_ELEM(NAME, MTYPE, ETYPE, H, LDSUF)              \
+static void vext_##NAME##_ld_elem(CPURISCVState *env, abi_ptr addr, \
+        uint32_t idx, void *vd, uintptr_t retaddr)                  \
+{                                                                   \
+    int mmu_idx = cpu_mmu_index(env, false);                        \
+    MTYPE data;                                                     \
+    ETYPE *cur = ((ETYPE *)vd + H(idx));                            \
+    data = cpu_##LDSUF##_mmuidx_ra(env, addr, mmu_idx, retaddr);    \
+    *cur = data;                                                    \
+}                                                                   \
+static void vext_##NAME##_clear_elem(void *vd, uint32_t idx,        \
+        uint32_t cnt, uint32_t tot)                                 \
+{                                                                   \
+    ETYPE *cur = ((ETYPE *)vd + H(idx));                            \
+    vext_clear(cur, cnt, tot);                                      \
+}
+
+GEN_VEXT_LD_ELEM(vlb_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vlb_v_h, int8_t,  int16_t, H2, ldsb)
+GEN_VEXT_LD_ELEM(vlb_v_w, int8_t,  int32_t, H4, ldsb)
+GEN_VEXT_LD_ELEM(vlb_v_d, int8_t,  int64_t, H8, ldsb)
+GEN_VEXT_LD_ELEM(vlh_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vlh_v_w, int16_t, int32_t, H4, ldsw)
+GEN_VEXT_LD_ELEM(vlh_v_d, int16_t, int64_t, H8, ldsw)
+GEN_VEXT_LD_ELEM(vlw_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlw_v_d, int32_t, int64_t, H8, ldl)
+GEN_VEXT_LD_ELEM(vle_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vle_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vle_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vle_v_d, int64_t, int64_t, H8, ldq)
+GEN_VEXT_LD_ELEM(vlbu_v_b, uint8_t,  uint8_t,  H1, ldub)
+GEN_VEXT_LD_ELEM(vlbu_v_h, uint8_t,  uint16_t, H2, ldub)
+GEN_VEXT_LD_ELEM(vlbu_v_w, uint8_t,  uint32_t, H4, ldub)
+GEN_VEXT_LD_ELEM(vlbu_v_d, uint8_t,  uint64_t, H8, ldub)
+GEN_VEXT_LD_ELEM(vlhu_v_h, uint16_t, uint16_t, H2, lduw)
+GEN_VEXT_LD_ELEM(vlhu_v_w, uint16_t, uint32_t, H4, lduw)
+GEN_VEXT_LD_ELEM(vlhu_v_d, uint16_t, uint64_t, H8, lduw)
+GEN_VEXT_LD_ELEM(vlwu_v_w, uint32_t, uint32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlwu_v_d, uint32_t, uint64_t, H8, ldl)
+
+#define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF)                       \
+static void vext_##NAME##_st_elem(CPURISCVState *env, abi_ptr addr,   \
+        uint32_t idx, void *vd, uintptr_t retaddr)                    \
+{                                                                     \
+    int mmu_idx = cpu_mmu_index(env, false);                          \
+    ETYPE data = *((ETYPE *)vd + H(idx));                             \
+    cpu_##STSUF##_mmuidx_ra(env, addr, data, mmu_idx, retaddr);       \
+}
+
+GEN_VEXT_ST_ELEM(vsb_v_b, int8_t,  H1, stb)
+GEN_VEXT_ST_ELEM(vsb_v_h, int16_t, H2, stb)
+GEN_VEXT_ST_ELEM(vsb_v_w, int32_t, H4, stb)
+GEN_VEXT_ST_ELEM(vsb_v_d, int64_t, H8, stb)
+GEN_VEXT_ST_ELEM(vsh_v_h, int16_t, H2, stw)
+GEN_VEXT_ST_ELEM(vsh_v_w, int32_t, H4, stw)
+GEN_VEXT_ST_ELEM(vsh_v_d, int64_t, H8, stw)
+GEN_VEXT_ST_ELEM(vsw_v_w, int32_t, H4, stl)
+GEN_VEXT_ST_ELEM(vsw_v_d, int64_t, H8, stl)
+GEN_VEXT_ST_ELEM(vse_v_b, int8_t,  H1, stb)
+GEN_VEXT_ST_ELEM(vse_v_h, int16_t, H2, stw)
+GEN_VEXT_ST_ELEM(vse_v_w, int32_t, H4, stl)
+GEN_VEXT_ST_ELEM(vse_v_d, int64_t, H8, stq)
+
+/* unit-stride: load vector element from continuous guest memory */
+static void vext_ld_unit_stride_mask(void *vd, void *v0, CPURISCVState *env,
+        struct vext_ldst_ctx *ctx, uintptr_t ra)
+{
+    uint32_t i, k;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    if (s->vl == 0) {
+        return;
+    }
+    /* probe every access*/
+    for (i = 0; i < s->vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        probe_read_access(env, ctx->base + ctx->nf * i * s->msz,
+                ctx->nf * s->msz, ra);
+    }
+    /* load bytes from guest memory */
+    for (i = 0; i < s->vl; i++) {
+        k = 0;
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        while (k < ctx->nf) {
+            target_ulong addr = ctx->base + (i * ctx->nf + k) * s->msz;
+            ctx->ld_elem(env, addr, i + k * s->vlmax, vd, ra);
+            k++;
+        }
+    }
+    /* clear tail elements */
+    for (k = 0; k < ctx->nf; k++) {
+        ctx->clear_elem(vd, s->vl + k * s->vlmax, s->vl * s->esz,
+                s->vlmax * s->esz);
+    }
+}
+
+static void vext_ld_unit_stride(void *vd, void *v0, CPURISCVState *env,
+        struct vext_ldst_ctx *ctx, uintptr_t ra)
+{
+    uint32_t i, k;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    if (s->vl == 0) {
+        return;
+    }
+    /* probe every access*/
+    probe_read_access(env, ctx->base, s->vl * ctx->nf * s->msz, ra);
+    /* load bytes from guest memory */
+    for (i = 0; i < s->vl; i++) {
+        k = 0;
+        while (k < ctx->nf) {
+            target_ulong addr = ctx->base + (i * ctx->nf + k) * s->msz;
+            ctx->ld_elem(env, addr, i + k * s->vlmax, vd, ra);
+            k++;
+        }
+    }
+    /* clear tail elements */
+    for (k = 0; k < ctx->nf; k++) {
+        ctx->clear_elem(vd, s->vl + k * s->vlmax, s->vl * s->esz,
+                s->vlmax * s->esz);
+    }
+}
+
+#define GEN_VEXT_LD_UNIT_STRIDE(NAME, MTYPE, ETYPE)                \
+void HELPER(NAME##_mask)(void *vd, target_ulong base, void *v0,    \
+        CPURISCVState *env, uint32_t desc)                         \
+{                                                                  \
+    static struct vext_ldst_ctx ctx;                               \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                  \
+        sizeof(MTYPE), env->vext.vl, desc);                        \
+    ctx.nf = vext_nf(desc);                                        \
+    ctx.base = base;                                               \
+    ctx.ld_elem = vext_##NAME##_ld_elem;                           \
+    ctx.clear_elem = vext_##NAME##_clear_elem;                     \
+                                                                   \
+    vext_ld_unit_stride_mask(vd, v0, env, &ctx, GETPC());          \
+}                                                                  \
+                                                                   \
+void HELPER(NAME)(void *vd, target_ulong base, void *v0,           \
+        CPURISCVState *env, uint32_t desc)                         \
+{                                                                  \
+    static struct vext_ldst_ctx ctx;                               \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                  \
+        sizeof(MTYPE), env->vext.vl, desc);                        \
+    ctx.nf = vext_nf(desc);                                        \
+    ctx.base = base;                                               \
+    ctx.ld_elem = vext_##NAME##_ld_elem;                           \
+    ctx.clear_elem = vext_##NAME##_clear_elem;                     \
+                                                                   \
+    vext_ld_unit_stride(vd, v0, env, &ctx, GETPC());               \
+}
+
+GEN_VEXT_LD_UNIT_STRIDE(vlb_v_b, int8_t,  int8_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlb_v_h, int8_t,  int16_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlb_v_w, int8_t,  int32_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlb_v_d, int8_t,  int64_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlh_v_h, int16_t, int16_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlh_v_w, int16_t, int32_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlh_v_d, int16_t, int64_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlw_v_w, int32_t, int32_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlw_v_d, int32_t, int64_t)
+GEN_VEXT_LD_UNIT_STRIDE(vle_v_b, int8_t,  int8_t)
+GEN_VEXT_LD_UNIT_STRIDE(vle_v_h, int16_t, int16_t)
+GEN_VEXT_LD_UNIT_STRIDE(vle_v_w, int32_t, int32_t)
+GEN_VEXT_LD_UNIT_STRIDE(vle_v_d, int64_t, int64_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlbu_v_b, uint8_t,  uint8_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlbu_v_h, uint8_t,  uint16_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlbu_v_w, uint8_t,  uint32_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlbu_v_d, uint8_t,  uint64_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlhu_v_h, uint16_t, uint16_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlhu_v_w, uint16_t, uint32_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlhu_v_d, uint16_t, uint64_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlwu_v_w, uint32_t, uint32_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlwu_v_d, uint32_t, uint64_t)
+
+/* unit-stride: store vector element to guest memory */
+static void vext_st_unit_stride_mask(void *vd, void *v0, CPURISCVState *env,
+        struct vext_ldst_ctx *ctx, uintptr_t ra)
+{
+    uint32_t i, k;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    /* probe every access*/
+    for (i = 0; i < s->vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        probe_write_access(env, ctx->base + ctx->nf * i * s->msz,
+                ctx->nf * s->msz, ra);
+    }
+    /* store bytes to guest memory */
+    for (i = 0; i < s->vl; i++) {
+        k = 0;
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        while (k < ctx->nf) {
+            target_ulong addr = ctx->base + (i * ctx->nf + k) * s->msz;
+            ctx->st_elem(env, addr, i + k * s->vlmax, vd, ra);
+            k++;
+        }
+    }
+}
+
+static void vext_st_unit_stride(void *vd, void *v0, CPURISCVState *env,
+        struct vext_ldst_ctx *ctx, uintptr_t ra)
+{
+    uint32_t i, k;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    /* probe every access*/
+    probe_write_access(env, ctx->base, s->vl * ctx->nf * s->msz, ra);
+    /* load bytes from guest memory */
+    for (i = 0; i < s->vl; i++) {
+        k = 0;
+        while (k < ctx->nf) {
+            target_ulong addr = ctx->base + (i * ctx->nf + k) * s->msz;
+            ctx->st_elem(env, addr, i + k * s->vlmax, vd, ra);
+            k++;
+        }
+    }
+}
+
+#define GEN_VEXT_ST_UNIT_STRIDE(NAME, MTYPE, ETYPE)                \
+void HELPER(NAME##_mask)(void *vd, target_ulong base, void *v0,    \
+        CPURISCVState *env, uint32_t desc)                         \
+{                                                                  \
+    static struct vext_ldst_ctx ctx;                               \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                  \
+        sizeof(MTYPE), env->vext.vl, desc);                        \
+    ctx.nf = vext_nf(desc);                                        \
+    ctx.base = base;                                               \
+    ctx.st_elem = vext_##NAME##_st_elem;                           \
+                                                                   \
+    vext_st_unit_stride_mask(vd, v0, env, &ctx, GETPC());          \
+}                                                                  \
+                                                                   \
+void HELPER(NAME)(void *vd, target_ulong base, void *v0,           \
+        CPURISCVState *env, uint32_t desc)                         \
+{                                                                  \
+    static struct vext_ldst_ctx ctx;                               \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                  \
+        sizeof(MTYPE), env->vext.vl, desc);                        \
+    ctx.nf = vext_nf(desc);                                        \
+    ctx.base = base;                                               \
+    ctx.st_elem = vext_##NAME##_st_elem;                           \
+                                                                   \
+    vext_st_unit_stride(vd, v0, env, &ctx, GETPC());               \
+}
+
+GEN_VEXT_ST_UNIT_STRIDE(vsb_v_b, int8_t,  int8_t)
+GEN_VEXT_ST_UNIT_STRIDE(vsb_v_h, int8_t,  int16_t)
+GEN_VEXT_ST_UNIT_STRIDE(vsb_v_w, int8_t,  int32_t)
+GEN_VEXT_ST_UNIT_STRIDE(vsb_v_d, int8_t,  int64_t)
+GEN_VEXT_ST_UNIT_STRIDE(vsh_v_h, int16_t, int16_t)
+GEN_VEXT_ST_UNIT_STRIDE(vsh_v_w, int16_t, int32_t)
+GEN_VEXT_ST_UNIT_STRIDE(vsh_v_d, int16_t, int64_t)
+GEN_VEXT_ST_UNIT_STRIDE(vsw_v_w, int32_t, int32_t)
+GEN_VEXT_ST_UNIT_STRIDE(vsw_v_d, int32_t, int64_t)
+GEN_VEXT_ST_UNIT_STRIDE(vse_v_b, int8_t,  int8_t)
+GEN_VEXT_ST_UNIT_STRIDE(vse_v_h, int16_t, int16_t)
+GEN_VEXT_ST_UNIT_STRIDE(vse_v_w, int32_t, int32_t)
+GEN_VEXT_ST_UNIT_STRIDE(vse_v_d, int64_t, int64_t)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 1/5] target/riscv: add vector unit stride load and store instructions
@ 2020-02-10  7:42   ` LIU Zhiwei
  0 siblings, 0 replies; 18+ messages in thread
From: LIU Zhiwei @ 2020-02-10  7:42 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, qemu-devel, qemu-riscv, LIU Zhiwei

Vector unit-stride operations access elements stored contiguously in memory
starting from the base effective address.

The Zvlsseg expands some vector load/store segment instructions, which move
multiple contiguous fields in memory to and from consecutively numbered
vector register

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  70 ++++
 target/riscv/insn32.decode              |  17 +
 target/riscv/insn_trans/trans_rvv.inc.c | 294 ++++++++++++++++
 target/riscv/translate.c                |   2 +
 target/riscv/vector_helper.c            | 438 ++++++++++++++++++++++++
 5 files changed, 821 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 3c28c7e407..74c483ef9e 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -78,3 +78,73 @@ DEF_HELPER_1(tlb_flush, void, env)
 #endif
 /* Vector functions */
 DEF_HELPER_3(vsetvl, tl, env, tl, tl)
+DEF_HELPER_5(vlb_v_b, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlb_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlh_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlh_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlh_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlh_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlh_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlh_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlw_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlw_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlw_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlw_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_b, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vle_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_b, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbu_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhu_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhu_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhu_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhu_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhu_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhu_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwu_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwu_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwu_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwu_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_b, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsb_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsh_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsh_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsh_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsh_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsh_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsh_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsw_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsw_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsw_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vsw_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_b, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_h, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_w, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_d, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vse_v_d_mask, void, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 5dc009c3cd..dad3ed91c7 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -43,6 +43,7 @@
 &u    imm rd
 &shift     shamt rs1 rd
 &atomic    aq rl rs2 rs1 rd
+&r2nfvm    vm rd rs1 nf
 
 # Formats 32:
 @r       .......   ..... ..... ... ..... ....... &r                %rs2 %rs1 %rd
@@ -62,6 +63,7 @@
 @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
 @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
 @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
+@r2_nfvm nf:3 ... vm:1 ..... ..... ... ..... ....... &r2nfvm %rs1 %rd
 @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
 
 @sfence_vma ....... ..... .....   ... ..... ....... %rs2 %rs1
@@ -206,5 +208,20 @@ fcvt_d_w   1101001  00000 ..... ... ..... 1010011 @r2_rm
 fcvt_d_wu  1101001  00001 ..... ... ..... 1010011 @r2_rm
 
 # *** RV32V Extension ***
+
+# *** Vector loads and stores are encoded within LOADFP/STORE-FP ***
+vlb_v      ... 100 . 00000 ..... 000 ..... 0000111 @r2_nfvm
+vlh_v      ... 100 . 00000 ..... 101 ..... 0000111 @r2_nfvm
+vlw_v      ... 100 . 00000 ..... 110 ..... 0000111 @r2_nfvm
+vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
+vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
+vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
+vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
+vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
+vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
+vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
+vse_v      ... 000 . 00000 ..... 111 ..... 0100111 @r2_nfvm
+
+# *** new major opcode OP-V ***
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index da82c72bbf..d93eb00651 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -15,6 +15,8 @@
  * You should have received a copy of the GNU General Public License along with
  * this program.  If not, see <http://www.gnu.org/licenses/>.
  */
+#include "tcg/tcg-op-gvec.h"
+#include "tcg/tcg-gvec-desc.h"
 
 static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl * a)
 {
@@ -67,3 +69,295 @@ static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli * a)
     tcg_temp_free(dst);
     return true;
 }
+
+/* define aidding fucntions */
+/* vector register offset from env */
+static uint32_t vreg_ofs(DisasContext *s, int reg)
+{
+    return offsetof(CPURISCVState, vext.vreg) + reg * s->vlen / 8;
+}
+
+/*
+ * As simd_desc supports at most 256 bytes, and in this implementation,
+ * the max vector group length is 2048 bytes. So split it into two parts.
+ *
+ * The first part is floor(maxsz, 64), encoded in maxsz of simd_desc.
+ * The second part is (maxsz % 64) >> 3, encoded in data of simd_desc.
+ */
+static uint32_t maxsz_part1(uint32_t maxsz)
+{
+    return ((maxsz & ~(0x3f)) >> 3) + 0x8; /* add offset 8 to avoid return 0 */
+}
+
+static uint32_t maxsz_part2(uint32_t maxsz)
+{
+    return (maxsz & 0x3f) >> 3;
+}
+
+/* define concrete check functions */
+static bool vext_check_vill(bool vill)
+{
+    if (vill) {
+        return false;
+    }
+    return true;
+}
+
+static bool vext_check_reg(uint32_t lmul, uint32_t reg, bool widen)
+{
+    int legal = widen ? (lmul * 2) : lmul;
+
+    if ((lmul != 1 && lmul != 2 && lmul != 4 && lmul != 8) ||
+        (lmul == 8 && widen)) {
+        return false;
+    }
+
+    if (reg % legal != 0) {
+        return false;
+    }
+    return true;
+}
+
+static bool vext_check_overlap_mask(uint32_t lmul, uint32_t vd, bool vm)
+{
+    if (lmul > 1 && vm == 0 && vd == 0) {
+        return false;
+    }
+    return true;
+}
+
+static bool vext_check_nf(uint32_t lmul, uint32_t nf)
+{
+    if (lmul * (nf + 1) > 8) {
+        return false;
+    }
+    return true;
+}
+
+/* define check conditions data structure */
+struct vext_check_ctx {
+
+    struct vext_reg {
+        uint8_t reg;
+        bool widen;
+        bool need_check;
+    } check_reg[6];
+
+    struct vext_overlap_mask {
+        uint8_t reg;
+        uint8_t vm;
+        bool need_check;
+    } check_overlap_mask;
+
+    struct vext_nf {
+        uint8_t nf;
+        bool need_check;
+    } check_nf;
+    target_ulong check_misa;
+
+} vchkctx;
+
+/* define general function */
+static bool vext_check(DisasContext *s)
+{
+    int i;
+    bool ret;
+
+    /* check ISA extend */
+    ret = ((s->misa & vchkctx.check_misa) == vchkctx.check_misa);
+    if (!ret) {
+        return false;
+    }
+    /* check vill */
+    ret = vext_check_vill(s->vill);
+    if (!ret) {
+        return false;
+    }
+    /* check register number is legal */
+    for (i = 0; i < 6; i++) {
+        if (vchkctx.check_reg[i].need_check) {
+            ret = vext_check_reg((1 << s->lmul), vchkctx.check_reg[i].reg,
+                    vchkctx.check_reg[i].widen);
+            if (!ret) {
+                return false;
+            }
+        }
+    }
+    /* check if mask register will be overlapped */
+    if (vchkctx.check_overlap_mask.need_check) {
+        ret = vext_check_overlap_mask((1 << s->lmul),
+                vchkctx.check_overlap_mask.reg, vchkctx.check_overlap_mask.vm);
+        if (!ret) {
+            return false;
+        }
+
+    }
+    /* check nf for Zvlsseg */
+    if (vchkctx.check_nf.need_check) {
+        ret = vext_check_nf((1 << s->lmul), vchkctx.check_nf.nf);
+        if (!ret) {
+            return false;
+        }
+
+    }
+    return true;
+}
+
+/* unit stride load and store */
+typedef void gen_helper_vext_ldst_us(TCGv_ptr, TCGv, TCGv_ptr,
+        TCGv_env, TCGv_i32);
+
+static bool do_vext_ldst_us_trans(uint32_t vd, uint32_t rs1, uint32_t data,
+        gen_helper_vext_ldst_us *fn, DisasContext *s)
+{
+    TCGv_ptr dest, mask;
+    TCGv base;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    base = tcg_temp_new();
+    desc = tcg_const_i32(simd_desc(0, maxsz_part1(s->maxsz), data));
+
+    gen_get_gpr(base, rs1);
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, base, mask, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free(base);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool vext_ld_us_trans(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
+{
+    uint8_t nf = a->nf + 1;
+    uint32_t data = s->mlen | (a->vm << 8) | (maxsz_part2(s->maxsz) << 9)
+        | (nf << 12);
+    gen_helper_vext_ldst_us *fn;
+    static gen_helper_vext_ldst_us * const fns[2][7][4] = {
+        /* masked unit stride load */
+        { { gen_helper_vlb_v_b_mask,  gen_helper_vlb_v_h_mask,
+            gen_helper_vlb_v_w_mask,  gen_helper_vlb_v_d_mask },
+          { NULL,                     gen_helper_vlh_v_h_mask,
+            gen_helper_vlh_v_w_mask,  gen_helper_vlh_v_d_mask },
+          { NULL,                     NULL,
+            gen_helper_vlw_v_w_mask,  gen_helper_vlw_v_d_mask },
+          { gen_helper_vle_v_b_mask,  gen_helper_vle_v_h_mask,
+            gen_helper_vle_v_w_mask,  gen_helper_vle_v_d_mask },
+          { gen_helper_vlbu_v_b_mask, gen_helper_vlbu_v_h_mask,
+            gen_helper_vlbu_v_w_mask, gen_helper_vlbu_v_d_mask },
+          { NULL,                     gen_helper_vlhu_v_h_mask,
+            gen_helper_vlhu_v_w_mask, gen_helper_vlhu_v_d_mask },
+          { NULL,                     NULL,
+            gen_helper_vlwu_v_w_mask, gen_helper_vlwu_v_d_mask } },
+        /* unmasked unit stride load */
+        { { gen_helper_vlb_v_b,  gen_helper_vlb_v_h,
+            gen_helper_vlb_v_w,  gen_helper_vlb_v_d },
+          { NULL,                gen_helper_vlh_v_h,
+            gen_helper_vlh_v_w,  gen_helper_vlh_v_d },
+          { NULL,                NULL,
+            gen_helper_vlw_v_w,  gen_helper_vlw_v_d },
+          { gen_helper_vle_v_b,  gen_helper_vle_v_h,
+            gen_helper_vle_v_w,  gen_helper_vle_v_d },
+          { gen_helper_vlbu_v_b, gen_helper_vlbu_v_h,
+            gen_helper_vlbu_v_w, gen_helper_vlbu_v_d },
+          { NULL,                gen_helper_vlhu_v_h,
+            gen_helper_vlhu_v_w, gen_helper_vlhu_v_d },
+          { NULL,                NULL,
+            gen_helper_vlwu_v_w, gen_helper_vlwu_v_d } }
+    };
+
+    fn =  fns[a->vm][seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    return do_vext_ldst_us_trans(a->rd, a->rs1, data, fn, s);
+}
+
+#define GEN_VEXT_LD_US_TRANS(NAME, DO_OP, SEQ)                            \
+static bool trans_##NAME(DisasContext *s, arg_r2nfvm* a)                  \
+{                                                                         \
+    vchkctx.check_misa = RVV;                                             \
+    vchkctx.check_overlap_mask.need_check = true;                         \
+    vchkctx.check_overlap_mask.reg = a->rd;                               \
+    vchkctx.check_overlap_mask.vm = a->vm;                                \
+    vchkctx.check_reg[0].need_check = true;                               \
+    vchkctx.check_reg[0].reg = a->rd;                                     \
+    vchkctx.check_reg[0].widen = false;                                   \
+    vchkctx.check_nf.need_check = true;                                   \
+    vchkctx.check_nf.nf = a->nf;                                          \
+                                                                          \
+    if (!vext_check(s)) {                                                 \
+        return false;                                                     \
+    }                                                                     \
+    return DO_OP(s, a, SEQ);                                              \
+}
+
+GEN_VEXT_LD_US_TRANS(vlb_v, vext_ld_us_trans, 0)
+GEN_VEXT_LD_US_TRANS(vlh_v, vext_ld_us_trans, 1)
+GEN_VEXT_LD_US_TRANS(vlw_v, vext_ld_us_trans, 2)
+GEN_VEXT_LD_US_TRANS(vle_v, vext_ld_us_trans, 3)
+GEN_VEXT_LD_US_TRANS(vlbu_v, vext_ld_us_trans, 4)
+GEN_VEXT_LD_US_TRANS(vlhu_v, vext_ld_us_trans, 5)
+GEN_VEXT_LD_US_TRANS(vlwu_v, vext_ld_us_trans, 6)
+
+static bool vext_st_us_trans(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
+{
+    uint8_t nf = a->nf + 1;
+    uint32_t data = s->mlen | (a->vm << 8) | (maxsz_part2(s->maxsz) << 9)
+        | (nf << 12);
+    gen_helper_vext_ldst_us *fn;
+    static gen_helper_vext_ldst_us * const fns[2][4][4] = {
+        /* masked unit stride load and store */
+        { { gen_helper_vsb_v_b_mask,  gen_helper_vsb_v_h_mask,
+            gen_helper_vsb_v_w_mask,  gen_helper_vsb_v_d_mask },
+          { NULL,                     gen_helper_vsh_v_h_mask,
+            gen_helper_vsh_v_w_mask,  gen_helper_vsh_v_d_mask },
+          { NULL,                     NULL,
+            gen_helper_vsw_v_w_mask,  gen_helper_vsw_v_d_mask },
+          { gen_helper_vse_v_b_mask,  gen_helper_vse_v_h_mask,
+            gen_helper_vse_v_w_mask,  gen_helper_vse_v_d_mask } },
+        /* unmasked unit stride store */
+        { { gen_helper_vsb_v_b,  gen_helper_vsb_v_h,
+            gen_helper_vsb_v_w,  gen_helper_vsb_v_d },
+          { NULL,                gen_helper_vsh_v_h,
+            gen_helper_vsh_v_w,  gen_helper_vsh_v_d },
+          { NULL,                NULL,
+            gen_helper_vsw_v_w,  gen_helper_vsw_v_d },
+          { gen_helper_vse_v_b,  gen_helper_vse_v_h,
+            gen_helper_vse_v_w,  gen_helper_vse_v_d } }
+    };
+
+    fn =  fns[a->vm][seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    return do_vext_ldst_us_trans(a->rd, a->rs1, data, fn, s);
+}
+
+#define GEN_VEXT_ST_US_TRANS(NAME, DO_OP, SEQ)                            \
+static bool trans_##NAME(DisasContext *s, arg_r2nfvm* a)                  \
+{                                                                         \
+    vchkctx.check_misa = RVV;                                             \
+    vchkctx.check_reg[0].need_check = true;                               \
+    vchkctx.check_reg[0].reg = a->rd;                                     \
+    vchkctx.check_reg[0].widen = false;                                   \
+    vchkctx.check_nf.need_check = true;                                   \
+    vchkctx.check_nf.nf = a->nf;                                          \
+                                                                          \
+    if (!vext_check(s)) {                                                 \
+        return false;                                                     \
+    }                                                                     \
+    return DO_OP(s, a, SEQ);                                              \
+}
+
+GEN_VEXT_ST_US_TRANS(vsb_v, vext_st_us_trans, 0)
+GEN_VEXT_ST_US_TRANS(vsh_v, vext_st_us_trans, 1)
+GEN_VEXT_ST_US_TRANS(vsw_v, vext_st_us_trans, 2)
+GEN_VEXT_ST_US_TRANS(vse_v, vext_st_us_trans, 3)
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index cc356aabd8..7eaaf172cf 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -60,6 +60,8 @@ typedef struct DisasContext {
     uint8_t lmul;
     uint8_t sew;
     uint16_t vlen;
+    uint32_t maxsz;
+    uint16_t mlen;
     bool vl_eq_vlmax;
 } DisasContext;
 
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index e0f2415345..406fcd1dfe 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -20,6 +20,7 @@
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "exec/helper-proto.h"
+#include "tcg/tcg-gvec-desc.h"
 #include <math.h>
 
 target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
@@ -47,3 +48,440 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong s1,
     env->vext.vstart = 0;
     return vl;
 }
+
+/*
+ * Note that vector data is stored in host-endian 64-bit chunks,
+ * so addressing units smaller than that needs a host-endian fixup.
+ */
+#ifdef HOST_WORDS_BIGENDIAN
+#define H1(x)   ((x) ^ 7)
+#define H1_2(x) ((x) ^ 6)
+#define H1_4(x) ((x) ^ 4)
+#define H2(x)   ((x) ^ 3)
+#define H4(x)   ((x) ^ 1)
+#define H8(x)   ((x))
+#else
+#define H1(x)   (x)
+#define H1_2(x) (x)
+#define H1_4(x) (x)
+#define H2(x)   (x)
+#define H4(x)   (x)
+#define H8(x)   (x)
+#endif
+
+#ifdef CONFIG_USER_ONLY
+#define MO_SB 0
+#define MO_LESW 0
+#define MO_LESL 0
+#define MO_LEQ 0
+#define MO_UB 0
+#define MO_LEUW 0
+#define MO_LEUL 0
+#endif
+
+static inline int vext_elem_mask(void *v0, int mlen, int index)
+{
+    int idx = (index * mlen) / 8;
+    int pos = (index * mlen) % 8;
+
+    return (*((uint8_t *)v0 + idx) >> pos) & 0x1;
+}
+
+static uint32_t vext_nf(uint32_t desc)
+{
+    return (simd_data(desc) >> 12) & 0xf;
+}
+
+static uint32_t vext_mlen(uint32_t desc)
+{
+    return simd_data(desc) & 0xff;
+}
+
+static uint32_t vext_vm(uint32_t desc)
+{
+    return (simd_data(desc) >> 8) & 0x1;
+}
+
+/*
+ * Get vector group length [64, 2048] in bytes. Its range is [64, 2048].
+ *
+ * As simd_desc support at most 256 bytes, split it into two parts.
+ * The first part is floor(maxsz, 64), encoded in maxsz of simd_desc.
+ * The second part is (maxsz % 64) >> 3, encoded in data of simd_desc.
+ */
+static uint32_t vext_maxsz(uint32_t desc)
+{
+    return (simd_maxsz(desc) - 0x8) * 8 + ((simd_data(desc) >> 9) & 0x7) * 8;
+}
+
+/*
+ * This function checks watchpoint before really load operation.
+ *
+ * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
+ * In user mode, there is no watchpoint support now.
+ *
+ * It will triggle an exception if there is no mapping in TLB
+ * and page table walk can't fill the TLB entry. Then the guest
+ * software can return here after process the exception or never return.
+ */
+static void probe_read_access(CPURISCVState *env, target_ulong addr,
+        target_ulong len, uintptr_t ra)
+{
+    while (len) {
+        const target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
+        const target_ulong curlen = MIN(pagelen, len);
+
+        probe_read(env, addr, curlen, cpu_mmu_index(env, false), ra);
+        addr += curlen;
+        len -= curlen;
+    }
+}
+
+static void probe_write_access(CPURISCVState *env, target_ulong addr,
+        target_ulong len, uintptr_t ra)
+{
+    while (len) {
+        const target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
+        const target_ulong curlen = MIN(pagelen, len);
+
+        probe_write(env, addr, curlen, cpu_mmu_index(env, false), ra);
+        addr += curlen;
+        len -= curlen;
+    }
+}
+
+#ifdef HOST_WORDS_BIGENDIAN
+static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
+{
+    /*
+     * Split the remaining range to two parts.
+     * The first part is in the last uint64_t unit.
+     * The second part start from the next uint64_t unit.
+     */
+    int part1 = 0, part2 = tot - cnt;
+    if (cnt % 64) {
+        part1 = 64 - (cnt % 64);
+        part2 = tot - cnt - part1;
+        memset(tail & ~(63ULL), 0, part1);
+        memset((tail + 64) & ~(63ULL), 0, part2);
+    } else {
+        memset(tail, 0, part2);
+    }
+}
+#else
+static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
+{
+    memset(tail, 0, tot - cnt);
+}
+#endif
+/* common structure for all vector instructions */
+struct vext_common_ctx {
+    uint32_t vlmax;
+    uint32_t mlen;
+    uint32_t vl;
+    uint32_t msz;
+    uint32_t esz;
+    uint32_t vm;
+};
+
+static void vext_common_ctx_init(struct vext_common_ctx *ctx, uint32_t esz,
+        uint32_t msz, uint32_t vl, uint32_t desc)
+{
+    ctx->vlmax = vext_maxsz(desc) / esz;
+    ctx->mlen = vext_mlen(desc);
+    ctx->vm = vext_vm(desc);
+    ctx->vl = vl;
+    ctx->msz = msz;
+    ctx->esz = esz;
+}
+
+/* data structure and common functions for load and store */
+typedef void vext_ld_elem_fn(CPURISCVState *env, target_ulong addr,
+        uint32_t idx, void *vd, uintptr_t retaddr);
+typedef void vext_st_elem_fn(CPURISCVState *env, target_ulong addr,
+        uint32_t idx, void *vd, uintptr_t retaddr);
+typedef target_ulong vext_get_index_addr(target_ulong base,
+        uint32_t idx, void *vs2);
+typedef void vext_ld_clear_elem(void *vd, uint32_t idx,
+        uint32_t cnt, uint32_t tot);
+
+struct vext_ldst_ctx {
+    struct vext_common_ctx vcc;
+    uint32_t nf;
+    target_ulong base;
+    target_ulong stride;
+    int mmuidx;
+
+    vext_ld_elem_fn *ld_elem;
+    vext_st_elem_fn *st_elem;
+    vext_get_index_addr *get_index_addr;
+    vext_ld_clear_elem *clear_elem;
+};
+
+#define GEN_VEXT_LD_ELEM(NAME, MTYPE, ETYPE, H, LDSUF)              \
+static void vext_##NAME##_ld_elem(CPURISCVState *env, abi_ptr addr, \
+        uint32_t idx, void *vd, uintptr_t retaddr)                  \
+{                                                                   \
+    int mmu_idx = cpu_mmu_index(env, false);                        \
+    MTYPE data;                                                     \
+    ETYPE *cur = ((ETYPE *)vd + H(idx));                            \
+    data = cpu_##LDSUF##_mmuidx_ra(env, addr, mmu_idx, retaddr);    \
+    *cur = data;                                                    \
+}                                                                   \
+static void vext_##NAME##_clear_elem(void *vd, uint32_t idx,        \
+        uint32_t cnt, uint32_t tot)                                 \
+{                                                                   \
+    ETYPE *cur = ((ETYPE *)vd + H(idx));                            \
+    vext_clear(cur, cnt, tot);                                      \
+}
+
+GEN_VEXT_LD_ELEM(vlb_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vlb_v_h, int8_t,  int16_t, H2, ldsb)
+GEN_VEXT_LD_ELEM(vlb_v_w, int8_t,  int32_t, H4, ldsb)
+GEN_VEXT_LD_ELEM(vlb_v_d, int8_t,  int64_t, H8, ldsb)
+GEN_VEXT_LD_ELEM(vlh_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vlh_v_w, int16_t, int32_t, H4, ldsw)
+GEN_VEXT_LD_ELEM(vlh_v_d, int16_t, int64_t, H8, ldsw)
+GEN_VEXT_LD_ELEM(vlw_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlw_v_d, int32_t, int64_t, H8, ldl)
+GEN_VEXT_LD_ELEM(vle_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vle_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vle_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vle_v_d, int64_t, int64_t, H8, ldq)
+GEN_VEXT_LD_ELEM(vlbu_v_b, uint8_t,  uint8_t,  H1, ldub)
+GEN_VEXT_LD_ELEM(vlbu_v_h, uint8_t,  uint16_t, H2, ldub)
+GEN_VEXT_LD_ELEM(vlbu_v_w, uint8_t,  uint32_t, H4, ldub)
+GEN_VEXT_LD_ELEM(vlbu_v_d, uint8_t,  uint64_t, H8, ldub)
+GEN_VEXT_LD_ELEM(vlhu_v_h, uint16_t, uint16_t, H2, lduw)
+GEN_VEXT_LD_ELEM(vlhu_v_w, uint16_t, uint32_t, H4, lduw)
+GEN_VEXT_LD_ELEM(vlhu_v_d, uint16_t, uint64_t, H8, lduw)
+GEN_VEXT_LD_ELEM(vlwu_v_w, uint32_t, uint32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlwu_v_d, uint32_t, uint64_t, H8, ldl)
+
+#define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF)                       \
+static void vext_##NAME##_st_elem(CPURISCVState *env, abi_ptr addr,   \
+        uint32_t idx, void *vd, uintptr_t retaddr)                    \
+{                                                                     \
+    int mmu_idx = cpu_mmu_index(env, false);                          \
+    ETYPE data = *((ETYPE *)vd + H(idx));                             \
+    cpu_##STSUF##_mmuidx_ra(env, addr, data, mmu_idx, retaddr);       \
+}
+
+GEN_VEXT_ST_ELEM(vsb_v_b, int8_t,  H1, stb)
+GEN_VEXT_ST_ELEM(vsb_v_h, int16_t, H2, stb)
+GEN_VEXT_ST_ELEM(vsb_v_w, int32_t, H4, stb)
+GEN_VEXT_ST_ELEM(vsb_v_d, int64_t, H8, stb)
+GEN_VEXT_ST_ELEM(vsh_v_h, int16_t, H2, stw)
+GEN_VEXT_ST_ELEM(vsh_v_w, int32_t, H4, stw)
+GEN_VEXT_ST_ELEM(vsh_v_d, int64_t, H8, stw)
+GEN_VEXT_ST_ELEM(vsw_v_w, int32_t, H4, stl)
+GEN_VEXT_ST_ELEM(vsw_v_d, int64_t, H8, stl)
+GEN_VEXT_ST_ELEM(vse_v_b, int8_t,  H1, stb)
+GEN_VEXT_ST_ELEM(vse_v_h, int16_t, H2, stw)
+GEN_VEXT_ST_ELEM(vse_v_w, int32_t, H4, stl)
+GEN_VEXT_ST_ELEM(vse_v_d, int64_t, H8, stq)
+
+/* unit-stride: load vector element from continuous guest memory */
+static void vext_ld_unit_stride_mask(void *vd, void *v0, CPURISCVState *env,
+        struct vext_ldst_ctx *ctx, uintptr_t ra)
+{
+    uint32_t i, k;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    if (s->vl == 0) {
+        return;
+    }
+    /* probe every access*/
+    for (i = 0; i < s->vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        probe_read_access(env, ctx->base + ctx->nf * i * s->msz,
+                ctx->nf * s->msz, ra);
+    }
+    /* load bytes from guest memory */
+    for (i = 0; i < s->vl; i++) {
+        k = 0;
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        while (k < ctx->nf) {
+            target_ulong addr = ctx->base + (i * ctx->nf + k) * s->msz;
+            ctx->ld_elem(env, addr, i + k * s->vlmax, vd, ra);
+            k++;
+        }
+    }
+    /* clear tail elements */
+    for (k = 0; k < ctx->nf; k++) {
+        ctx->clear_elem(vd, s->vl + k * s->vlmax, s->vl * s->esz,
+                s->vlmax * s->esz);
+    }
+}
+
+static void vext_ld_unit_stride(void *vd, void *v0, CPURISCVState *env,
+        struct vext_ldst_ctx *ctx, uintptr_t ra)
+{
+    uint32_t i, k;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    if (s->vl == 0) {
+        return;
+    }
+    /* probe every access*/
+    probe_read_access(env, ctx->base, s->vl * ctx->nf * s->msz, ra);
+    /* load bytes from guest memory */
+    for (i = 0; i < s->vl; i++) {
+        k = 0;
+        while (k < ctx->nf) {
+            target_ulong addr = ctx->base + (i * ctx->nf + k) * s->msz;
+            ctx->ld_elem(env, addr, i + k * s->vlmax, vd, ra);
+            k++;
+        }
+    }
+    /* clear tail elements */
+    for (k = 0; k < ctx->nf; k++) {
+        ctx->clear_elem(vd, s->vl + k * s->vlmax, s->vl * s->esz,
+                s->vlmax * s->esz);
+    }
+}
+
+#define GEN_VEXT_LD_UNIT_STRIDE(NAME, MTYPE, ETYPE)                \
+void HELPER(NAME##_mask)(void *vd, target_ulong base, void *v0,    \
+        CPURISCVState *env, uint32_t desc)                         \
+{                                                                  \
+    static struct vext_ldst_ctx ctx;                               \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                  \
+        sizeof(MTYPE), env->vext.vl, desc);                        \
+    ctx.nf = vext_nf(desc);                                        \
+    ctx.base = base;                                               \
+    ctx.ld_elem = vext_##NAME##_ld_elem;                           \
+    ctx.clear_elem = vext_##NAME##_clear_elem;                     \
+                                                                   \
+    vext_ld_unit_stride_mask(vd, v0, env, &ctx, GETPC());          \
+}                                                                  \
+                                                                   \
+void HELPER(NAME)(void *vd, target_ulong base, void *v0,           \
+        CPURISCVState *env, uint32_t desc)                         \
+{                                                                  \
+    static struct vext_ldst_ctx ctx;                               \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                  \
+        sizeof(MTYPE), env->vext.vl, desc);                        \
+    ctx.nf = vext_nf(desc);                                        \
+    ctx.base = base;                                               \
+    ctx.ld_elem = vext_##NAME##_ld_elem;                           \
+    ctx.clear_elem = vext_##NAME##_clear_elem;                     \
+                                                                   \
+    vext_ld_unit_stride(vd, v0, env, &ctx, GETPC());               \
+}
+
+GEN_VEXT_LD_UNIT_STRIDE(vlb_v_b, int8_t,  int8_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlb_v_h, int8_t,  int16_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlb_v_w, int8_t,  int32_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlb_v_d, int8_t,  int64_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlh_v_h, int16_t, int16_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlh_v_w, int16_t, int32_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlh_v_d, int16_t, int64_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlw_v_w, int32_t, int32_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlw_v_d, int32_t, int64_t)
+GEN_VEXT_LD_UNIT_STRIDE(vle_v_b, int8_t,  int8_t)
+GEN_VEXT_LD_UNIT_STRIDE(vle_v_h, int16_t, int16_t)
+GEN_VEXT_LD_UNIT_STRIDE(vle_v_w, int32_t, int32_t)
+GEN_VEXT_LD_UNIT_STRIDE(vle_v_d, int64_t, int64_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlbu_v_b, uint8_t,  uint8_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlbu_v_h, uint8_t,  uint16_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlbu_v_w, uint8_t,  uint32_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlbu_v_d, uint8_t,  uint64_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlhu_v_h, uint16_t, uint16_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlhu_v_w, uint16_t, uint32_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlhu_v_d, uint16_t, uint64_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlwu_v_w, uint32_t, uint32_t)
+GEN_VEXT_LD_UNIT_STRIDE(vlwu_v_d, uint32_t, uint64_t)
+
+/* unit-stride: store vector element to guest memory */
+static void vext_st_unit_stride_mask(void *vd, void *v0, CPURISCVState *env,
+        struct vext_ldst_ctx *ctx, uintptr_t ra)
+{
+    uint32_t i, k;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    /* probe every access*/
+    for (i = 0; i < s->vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        probe_write_access(env, ctx->base + ctx->nf * i * s->msz,
+                ctx->nf * s->msz, ra);
+    }
+    /* store bytes to guest memory */
+    for (i = 0; i < s->vl; i++) {
+        k = 0;
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        while (k < ctx->nf) {
+            target_ulong addr = ctx->base + (i * ctx->nf + k) * s->msz;
+            ctx->st_elem(env, addr, i + k * s->vlmax, vd, ra);
+            k++;
+        }
+    }
+}
+
+static void vext_st_unit_stride(void *vd, void *v0, CPURISCVState *env,
+        struct vext_ldst_ctx *ctx, uintptr_t ra)
+{
+    uint32_t i, k;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    /* probe every access*/
+    probe_write_access(env, ctx->base, s->vl * ctx->nf * s->msz, ra);
+    /* load bytes from guest memory */
+    for (i = 0; i < s->vl; i++) {
+        k = 0;
+        while (k < ctx->nf) {
+            target_ulong addr = ctx->base + (i * ctx->nf + k) * s->msz;
+            ctx->st_elem(env, addr, i + k * s->vlmax, vd, ra);
+            k++;
+        }
+    }
+}
+
+#define GEN_VEXT_ST_UNIT_STRIDE(NAME, MTYPE, ETYPE)                \
+void HELPER(NAME##_mask)(void *vd, target_ulong base, void *v0,    \
+        CPURISCVState *env, uint32_t desc)                         \
+{                                                                  \
+    static struct vext_ldst_ctx ctx;                               \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                  \
+        sizeof(MTYPE), env->vext.vl, desc);                        \
+    ctx.nf = vext_nf(desc);                                        \
+    ctx.base = base;                                               \
+    ctx.st_elem = vext_##NAME##_st_elem;                           \
+                                                                   \
+    vext_st_unit_stride_mask(vd, v0, env, &ctx, GETPC());          \
+}                                                                  \
+                                                                   \
+void HELPER(NAME)(void *vd, target_ulong base, void *v0,           \
+        CPURISCVState *env, uint32_t desc)                         \
+{                                                                  \
+    static struct vext_ldst_ctx ctx;                               \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                  \
+        sizeof(MTYPE), env->vext.vl, desc);                        \
+    ctx.nf = vext_nf(desc);                                        \
+    ctx.base = base;                                               \
+    ctx.st_elem = vext_##NAME##_st_elem;                           \
+                                                                   \
+    vext_st_unit_stride(vd, v0, env, &ctx, GETPC());               \
+}
+
+GEN_VEXT_ST_UNIT_STRIDE(vsb_v_b, int8_t,  int8_t)
+GEN_VEXT_ST_UNIT_STRIDE(vsb_v_h, int8_t,  int16_t)
+GEN_VEXT_ST_UNIT_STRIDE(vsb_v_w, int8_t,  int32_t)
+GEN_VEXT_ST_UNIT_STRIDE(vsb_v_d, int8_t,  int64_t)
+GEN_VEXT_ST_UNIT_STRIDE(vsh_v_h, int16_t, int16_t)
+GEN_VEXT_ST_UNIT_STRIDE(vsh_v_w, int16_t, int32_t)
+GEN_VEXT_ST_UNIT_STRIDE(vsh_v_d, int16_t, int64_t)
+GEN_VEXT_ST_UNIT_STRIDE(vsw_v_w, int32_t, int32_t)
+GEN_VEXT_ST_UNIT_STRIDE(vsw_v_d, int32_t, int64_t)
+GEN_VEXT_ST_UNIT_STRIDE(vse_v_b, int8_t,  int8_t)
+GEN_VEXT_ST_UNIT_STRIDE(vse_v_h, int16_t, int16_t)
+GEN_VEXT_ST_UNIT_STRIDE(vse_v_w, int32_t, int32_t)
+GEN_VEXT_ST_UNIT_STRIDE(vse_v_d, int64_t, int64_t)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 2/5] target/riscv: add vector stride load and store instructions
  2020-02-10  7:42 ` LIU Zhiwei
@ 2020-02-10  7:42   ` LIU Zhiwei
  -1 siblings, 0 replies; 18+ messages in thread
From: LIU Zhiwei @ 2020-02-10  7:42 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768, LIU Zhiwei

Vector strided operations access the first memory element at the base address,
and then access subsequent elements at address increments given by the byte
offset contained in the x register specified by rs2.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  35 +++++
 target/riscv/insn32.decode              |  14 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 138 +++++++++++++++++++
 target/riscv/vector_helper.c            | 169 ++++++++++++++++++++++++
 4 files changed, 356 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 74c483ef9e..19c1bfc317 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -148,3 +148,38 @@ DEF_HELPER_5(vse_v_w, void, ptr, tl, ptr, env, i32)
 DEF_HELPER_5(vse_v_w_mask, void, ptr, tl, ptr, env, i32)
 DEF_HELPER_5(vse_v_d, void, ptr, tl, ptr, env, i32)
 DEF_HELPER_5(vse_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlsb_v_b_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsb_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsb_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsb_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsh_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsh_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsh_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsw_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsw_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlse_v_b_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlse_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlse_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlse_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsbu_v_b_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsbu_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsbu_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsbu_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlshu_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlshu_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlshu_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlswu_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlswu_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssb_v_b_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssb_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssb_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssb_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssh_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssh_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssh_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssw_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssw_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vsse_v_b_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vsse_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vsse_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vsse_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index dad3ed91c7..2f2d3d13b3 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -44,6 +44,7 @@
 &shift     shamt rs1 rd
 &atomic    aq rl rs2 rs1 rd
 &r2nfvm    vm rd rs1 nf
+&rnfvm     vm rd rs1 rs2 nf
 
 # Formats 32:
 @r       .......   ..... ..... ... ..... ....... &r                %rs2 %rs1 %rd
@@ -64,6 +65,7 @@
 @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
 @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
 @r2_nfvm nf:3 ... vm:1 ..... ..... ... ..... ....... &r2nfvm %rs1 %rd
+@r_nfvm  nf:3 ... vm:1 ..... ..... ... ..... ....... &rnfvm %rs2 %rs1 %rd
 @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
 
 @sfence_vma ....... ..... .....   ... ..... ....... %rs2 %rs1
@@ -222,6 +224,18 @@ vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
 vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
 vse_v      ... 000 . 00000 ..... 111 ..... 0100111 @r2_nfvm
 
+vlsb_v     ... 110 . ..... ..... 000 ..... 0000111 @r_nfvm
+vlsh_v     ... 110 . ..... ..... 101 ..... 0000111 @r_nfvm
+vlsw_v     ... 110 . ..... ..... 110 ..... 0000111 @r_nfvm
+vlse_v     ... 010 . ..... ..... 111 ..... 0000111 @r_nfvm
+vlsbu_v    ... 010 . ..... ..... 000 ..... 0000111 @r_nfvm
+vlshu_v    ... 010 . ..... ..... 101 ..... 0000111 @r_nfvm
+vlswu_v    ... 010 . ..... ..... 110 ..... 0000111 @r_nfvm
+vssb_v     ... 010 . ..... ..... 000 ..... 0100111 @r_nfvm
+vssh_v     ... 010 . ..... ..... 101 ..... 0100111 @r_nfvm
+vssw_v     ... 010 . ..... ..... 110 ..... 0100111 @r_nfvm
+vsse_v     ... 010 . ..... ..... 111 ..... 0100111 @r_nfvm
+
 # *** new major opcode OP-V ***
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index d93eb00651..5a7ea94c2d 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -361,3 +361,141 @@ GEN_VEXT_ST_US_TRANS(vsb_v, vext_st_us_trans, 0)
 GEN_VEXT_ST_US_TRANS(vsh_v, vext_st_us_trans, 1)
 GEN_VEXT_ST_US_TRANS(vsw_v, vext_st_us_trans, 2)
 GEN_VEXT_ST_US_TRANS(vse_v, vext_st_us_trans, 3)
+
+/* stride load and store */
+typedef void gen_helper_vext_ldst_stride(TCGv_ptr, TCGv, TCGv,
+        TCGv_ptr, TCGv_env, TCGv_i32);
+
+static bool do_vext_ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
+        uint32_t data, gen_helper_vext_ldst_stride *fn, DisasContext *s)
+{
+    TCGv_ptr dest, mask;
+    TCGv base, stride;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    base = tcg_temp_new();
+    stride = tcg_temp_new();
+    desc = tcg_const_i32(simd_desc(0, maxsz_part1(s->maxsz), data));
+
+    gen_get_gpr(base, rs1);
+    gen_get_gpr(stride, rs2);
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, base, stride, mask, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free(base);
+    tcg_temp_free(stride);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool vext_ld_stride_trans(DisasContext *s, arg_rnfvm *a, uint8_t seq)
+{
+    uint8_t nf = a->nf + 1;
+    uint32_t data = s->mlen | (a->vm << 8) | (maxsz_part2(s->maxsz) << 9)
+        | (nf << 12);
+    gen_helper_vext_ldst_stride *fn;
+    static gen_helper_vext_ldst_stride * const fns[7][4] = {
+        /* masked stride load */
+        { gen_helper_vlsb_v_b_mask,  gen_helper_vlsb_v_h_mask,
+          gen_helper_vlsb_v_w_mask,  gen_helper_vlsb_v_d_mask },
+        { NULL,                      gen_helper_vlsh_v_h_mask,
+          gen_helper_vlsh_v_w_mask,  gen_helper_vlsh_v_d_mask },
+        { NULL,                      NULL,
+          gen_helper_vlsw_v_w_mask,  gen_helper_vlsw_v_d_mask },
+        { gen_helper_vlse_v_b_mask,  gen_helper_vlse_v_h_mask,
+          gen_helper_vlse_v_w_mask,  gen_helper_vlse_v_d_mask },
+        { gen_helper_vlsbu_v_b_mask, gen_helper_vlsbu_v_h_mask,
+          gen_helper_vlsbu_v_w_mask, gen_helper_vlsbu_v_d_mask },
+        { NULL,                      gen_helper_vlshu_v_h_mask,
+          gen_helper_vlshu_v_w_mask, gen_helper_vlshu_v_d_mask },
+        { NULL,                      NULL,
+          gen_helper_vlswu_v_w_mask, gen_helper_vlswu_v_d_mask },
+    };
+
+    fn =  fns[seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    return do_vext_ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+
+#define GEN_VEXT_LD_STRIDE_TRANS(NAME, DO_OP, SEQ)           \
+static bool trans_##NAME(DisasContext *s, arg_rnfvm* a)      \
+{                                                            \
+    vchkctx.check_misa = RVV;                                \
+    vchkctx.check_overlap_mask.need_check = true;            \
+    vchkctx.check_overlap_mask.reg = a->rd;                  \
+    vchkctx.check_overlap_mask.vm = a->vm;                   \
+    vchkctx.check_reg[0].need_check = true;                  \
+    vchkctx.check_reg[0].reg = a->rd;                        \
+    vchkctx.check_reg[0].widen = false;                      \
+    vchkctx.check_nf.need_check = true;                      \
+    vchkctx.check_nf.nf = a->nf;                             \
+                                                             \
+    if (!vext_check(s)) {                                    \
+        return false;                                        \
+    }                                                        \
+    return DO_OP(s, a, SEQ);                                 \
+}
+
+GEN_VEXT_LD_STRIDE_TRANS(vlsb_v, vext_ld_stride_trans, 0)
+GEN_VEXT_LD_STRIDE_TRANS(vlsh_v, vext_ld_stride_trans, 1)
+GEN_VEXT_LD_STRIDE_TRANS(vlsw_v, vext_ld_stride_trans, 2)
+GEN_VEXT_LD_STRIDE_TRANS(vlse_v, vext_ld_stride_trans, 3)
+GEN_VEXT_LD_STRIDE_TRANS(vlsbu_v, vext_ld_stride_trans, 4)
+GEN_VEXT_LD_STRIDE_TRANS(vlshu_v, vext_ld_stride_trans, 5)
+GEN_VEXT_LD_STRIDE_TRANS(vlswu_v, vext_ld_stride_trans, 6)
+
+static bool vext_st_stride_trans(DisasContext *s, arg_rnfvm *a, uint8_t seq)
+{
+    uint8_t nf = a->nf + 1;
+    uint32_t data = s->mlen | (a->vm << 8) | (maxsz_part2(s->maxsz) << 9)
+        | (nf << 12);
+    gen_helper_vext_ldst_stride *fn;
+    static gen_helper_vext_ldst_stride * const fns[4][4] = {
+        /* masked stride store */
+        { gen_helper_vssb_v_b_mask,  gen_helper_vssb_v_h_mask,
+          gen_helper_vssb_v_w_mask,  gen_helper_vssb_v_d_mask },
+        { NULL,                      gen_helper_vssh_v_h_mask,
+          gen_helper_vssh_v_w_mask,  gen_helper_vssh_v_d_mask },
+        { NULL,                      NULL,
+          gen_helper_vssw_v_w_mask,  gen_helper_vssw_v_d_mask },
+        { gen_helper_vsse_v_b_mask,  gen_helper_vsse_v_h_mask,
+          gen_helper_vsse_v_w_mask,  gen_helper_vsse_v_d_mask }
+    };
+
+    fn =  fns[seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    return do_vext_ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+
+#define GEN_VEXT_ST_STRIDE_TRANS(NAME, DO_OP, SEQ)           \
+static bool trans_##NAME(DisasContext *s, arg_rnfvm* a)      \
+{                                                            \
+    vchkctx.check_misa = RVV;                                \
+    vchkctx.check_reg[0].need_check = true;                  \
+    vchkctx.check_reg[0].reg = a->rd;                        \
+    vchkctx.check_reg[0].widen = false;                      \
+    vchkctx.check_nf.need_check = true;                      \
+    vchkctx.check_nf.nf = a->nf;                             \
+                                                             \
+    if (!vext_check(s)) {                                    \
+        return false;                                        \
+    }                                                        \
+    return DO_OP(s, a, SEQ);                                 \
+}
+
+GEN_VEXT_ST_STRIDE_TRANS(vssb_v, vext_st_stride_trans, 0)
+GEN_VEXT_ST_STRIDE_TRANS(vssh_v, vext_st_stride_trans, 1)
+GEN_VEXT_ST_STRIDE_TRANS(vssw_v, vext_st_stride_trans, 2)
+GEN_VEXT_ST_STRIDE_TRANS(vsse_v, vext_st_stride_trans, 3)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 406fcd1dfe..345945d19c 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -257,6 +257,28 @@ GEN_VEXT_LD_ELEM(vlhu_v_w, uint16_t, uint32_t, H4, lduw)
 GEN_VEXT_LD_ELEM(vlhu_v_d, uint16_t, uint64_t, H8, lduw)
 GEN_VEXT_LD_ELEM(vlwu_v_w, uint32_t, uint32_t, H4, ldl)
 GEN_VEXT_LD_ELEM(vlwu_v_d, uint32_t, uint64_t, H8, ldl)
+GEN_VEXT_LD_ELEM(vlsb_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vlsb_v_h, int8_t,  int16_t, H2, ldsb)
+GEN_VEXT_LD_ELEM(vlsb_v_w, int8_t,  int32_t, H4, ldsb)
+GEN_VEXT_LD_ELEM(vlsb_v_d, int8_t,  int64_t, H8, ldsb)
+GEN_VEXT_LD_ELEM(vlsh_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vlsh_v_w, int16_t, int32_t, H4, ldsw)
+GEN_VEXT_LD_ELEM(vlsh_v_d, int16_t, int64_t, H8, ldsw)
+GEN_VEXT_LD_ELEM(vlsw_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlsw_v_d, int32_t, int64_t, H8, ldl)
+GEN_VEXT_LD_ELEM(vlse_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vlse_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vlse_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlse_v_d, int64_t, int64_t, H8, ldq)
+GEN_VEXT_LD_ELEM(vlsbu_v_b, uint8_t,  uint8_t,  H1, ldub)
+GEN_VEXT_LD_ELEM(vlsbu_v_h, uint8_t,  uint16_t, H2, ldub)
+GEN_VEXT_LD_ELEM(vlsbu_v_w, uint8_t,  uint32_t, H4, ldub)
+GEN_VEXT_LD_ELEM(vlsbu_v_d, uint8_t,  uint64_t, H8, ldub)
+GEN_VEXT_LD_ELEM(vlshu_v_h, uint16_t, uint16_t, H2, lduw)
+GEN_VEXT_LD_ELEM(vlshu_v_w, uint16_t, uint32_t, H4, lduw)
+GEN_VEXT_LD_ELEM(vlshu_v_d, uint16_t, uint64_t, H8, lduw)
+GEN_VEXT_LD_ELEM(vlswu_v_w, uint32_t, uint32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlswu_v_d, uint32_t, uint64_t, H8, ldl)
 
 #define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF)                       \
 static void vext_##NAME##_st_elem(CPURISCVState *env, abi_ptr addr,   \
@@ -280,6 +302,19 @@ GEN_VEXT_ST_ELEM(vse_v_b, int8_t,  H1, stb)
 GEN_VEXT_ST_ELEM(vse_v_h, int16_t, H2, stw)
 GEN_VEXT_ST_ELEM(vse_v_w, int32_t, H4, stl)
 GEN_VEXT_ST_ELEM(vse_v_d, int64_t, H8, stq)
+GEN_VEXT_ST_ELEM(vssb_v_b, int8_t,  H1, stb)
+GEN_VEXT_ST_ELEM(vssb_v_h, int16_t, H2, stb)
+GEN_VEXT_ST_ELEM(vssb_v_w, int32_t, H4, stb)
+GEN_VEXT_ST_ELEM(vssb_v_d, int64_t, H8, stb)
+GEN_VEXT_ST_ELEM(vssh_v_h, int16_t, H2, stw)
+GEN_VEXT_ST_ELEM(vssh_v_w, int32_t, H4, stw)
+GEN_VEXT_ST_ELEM(vssh_v_d, int64_t, H8, stw)
+GEN_VEXT_ST_ELEM(vssw_v_w, int32_t, H4, stl)
+GEN_VEXT_ST_ELEM(vssw_v_d, int64_t, H8, stl)
+GEN_VEXT_ST_ELEM(vsse_v_b, int8_t,  H1, stb)
+GEN_VEXT_ST_ELEM(vsse_v_h, int16_t, H2, stw)
+GEN_VEXT_ST_ELEM(vsse_v_w, int32_t, H4, stl)
+GEN_VEXT_ST_ELEM(vsse_v_d, int64_t, H8, stq)
 
 /* unit-stride: load vector element from continuous guest memory */
 static void vext_ld_unit_stride_mask(void *vd, void *v0, CPURISCVState *env,
@@ -485,3 +520,137 @@ GEN_VEXT_ST_UNIT_STRIDE(vse_v_b, int8_t,  int8_t)
 GEN_VEXT_ST_UNIT_STRIDE(vse_v_h, int16_t, int16_t)
 GEN_VEXT_ST_UNIT_STRIDE(vse_v_w, int32_t, int32_t)
 GEN_VEXT_ST_UNIT_STRIDE(vse_v_d, int64_t, int64_t)
+
+/* stride: load strided vector element from guest memory */
+static void vext_ld_stride_mask(void *vd, void *v0, CPURISCVState *env,
+        struct vext_ldst_ctx *ctx, uintptr_t ra)
+{
+    uint32_t i, k;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    if (s->vl == 0) {
+        return;
+    }
+    /* probe every access*/
+    for (i = 0; i < s->vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        probe_read_access(env, ctx->base + ctx->stride * i,
+                ctx->nf * s->msz, ra);
+    }
+    /* load bytes from guest memory */
+    for (i = 0; i < s->vl; i++) {
+        k = 0;
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        while (k < ctx->nf) {
+            target_ulong addr = ctx->base + ctx->stride * i + k * s->msz;
+            ctx->ld_elem(env, addr, i + k * s->vlmax, vd, ra);
+            k++;
+        }
+    }
+    /* clear tail elements */
+    for (k = 0; k < ctx->nf; k++) {
+        ctx->clear_elem(vd, s->vl + k * s->vlmax, s->vl * s->esz,
+                s->vlmax * s->esz);
+    }
+}
+
+#define GEN_VEXT_LD_STRIDE(NAME, MTYPE, ETYPE)                                 \
+void HELPER(NAME##_mask)(void *vd, target_ulong base, target_ulong stride,     \
+        void *v0, CPURISCVState *env, uint32_t desc)                           \
+{                                                                              \
+    static struct vext_ldst_ctx ctx;                                           \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                              \
+        sizeof(MTYPE), env->vext.vl, desc);                                    \
+    ctx.nf = vext_nf(desc);                                                    \
+    ctx.base = base;                                                           \
+    ctx.stride = stride;                                                       \
+    ctx.ld_elem = vext_##NAME##_ld_elem;                                       \
+    ctx.clear_elem = vext_##NAME##_clear_elem;                                 \
+                                                                               \
+    vext_ld_stride_mask(vd, v0, env, &ctx, GETPC());                           \
+}
+
+GEN_VEXT_LD_STRIDE(vlsb_v_b, int8_t,  int8_t)
+GEN_VEXT_LD_STRIDE(vlsb_v_h, int8_t,  int16_t)
+GEN_VEXT_LD_STRIDE(vlsb_v_w, int8_t,  int32_t)
+GEN_VEXT_LD_STRIDE(vlsb_v_d, int8_t,  int64_t)
+GEN_VEXT_LD_STRIDE(vlsh_v_h, int16_t, int16_t)
+GEN_VEXT_LD_STRIDE(vlsh_v_w, int16_t, int32_t)
+GEN_VEXT_LD_STRIDE(vlsh_v_d, int16_t, int64_t)
+GEN_VEXT_LD_STRIDE(vlsw_v_w, int32_t, int32_t)
+GEN_VEXT_LD_STRIDE(vlsw_v_d, int32_t, int64_t)
+GEN_VEXT_LD_STRIDE(vlse_v_b, int8_t,  int8_t)
+GEN_VEXT_LD_STRIDE(vlse_v_h, int16_t, int16_t)
+GEN_VEXT_LD_STRIDE(vlse_v_w, int32_t, int32_t)
+GEN_VEXT_LD_STRIDE(vlse_v_d, int64_t, int64_t)
+GEN_VEXT_LD_STRIDE(vlsbu_v_b, uint8_t,  uint8_t)
+GEN_VEXT_LD_STRIDE(vlsbu_v_h, uint8_t,  uint16_t)
+GEN_VEXT_LD_STRIDE(vlsbu_v_w, uint8_t,  uint32_t)
+GEN_VEXT_LD_STRIDE(vlsbu_v_d, uint8_t,  uint64_t)
+GEN_VEXT_LD_STRIDE(vlshu_v_h, uint16_t, uint16_t)
+GEN_VEXT_LD_STRIDE(vlshu_v_w, uint16_t, uint32_t)
+GEN_VEXT_LD_STRIDE(vlshu_v_d, uint16_t, uint64_t)
+GEN_VEXT_LD_STRIDE(vlswu_v_w, uint32_t, uint32_t)
+GEN_VEXT_LD_STRIDE(vlswu_v_d, uint32_t, uint64_t)
+
+/* stride: store strided vector element to guest memory */
+static void vext_st_stride_mask(void *vd, void *v0, CPURISCVState *env,
+        struct vext_ldst_ctx *ctx, uintptr_t ra)
+{
+    uint32_t i, k;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    /* probe every access*/
+    for (i = 0; i < s->vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        probe_write_access(env, ctx->base + ctx->stride * i,
+                ctx->nf * s->msz, ra);
+    }
+    /* store bytes to guest memory */
+    for (i = 0; i < s->vl; i++) {
+        k = 0;
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        while (k < ctx->nf) {
+            target_ulong addr = ctx->base + ctx->stride * i + k * s->msz;
+            ctx->st_elem(env, addr, i + k * s->vlmax, vd, ra);
+            k++;
+        }
+    }
+}
+
+#define GEN_VEXT_ST_STRIDE(NAME, MTYPE, ETYPE)                                 \
+void HELPER(NAME##_mask)(void *vd, target_ulong base, target_ulong stride,     \
+        void *v0, CPURISCVState *env, uint32_t desc)                           \
+{                                                                              \
+    static struct vext_ldst_ctx ctx;                                           \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                              \
+        sizeof(MTYPE), env->vext.vl, desc);                                    \
+    ctx.nf = vext_nf(desc);                                                    \
+    ctx.base = base;                                                           \
+    ctx.stride = stride;                                                       \
+    ctx.st_elem = vext_##NAME##_st_elem;                                       \
+                                                                               \
+    vext_st_stride_mask(vd, v0, env, &ctx, GETPC());                           \
+}
+
+GEN_VEXT_ST_STRIDE(vssb_v_b, int8_t,  int8_t)
+GEN_VEXT_ST_STRIDE(vssb_v_h, int8_t,  int16_t)
+GEN_VEXT_ST_STRIDE(vssb_v_w, int8_t,  int32_t)
+GEN_VEXT_ST_STRIDE(vssb_v_d, int8_t,  int64_t)
+GEN_VEXT_ST_STRIDE(vssh_v_h, int16_t, int16_t)
+GEN_VEXT_ST_STRIDE(vssh_v_w, int16_t, int32_t)
+GEN_VEXT_ST_STRIDE(vssh_v_d, int16_t, int64_t)
+GEN_VEXT_ST_STRIDE(vssw_v_w, int32_t, int32_t)
+GEN_VEXT_ST_STRIDE(vssw_v_d, int32_t, int64_t)
+GEN_VEXT_ST_STRIDE(vsse_v_b, int8_t,  int8_t)
+GEN_VEXT_ST_STRIDE(vsse_v_h, int16_t, int16_t)
+GEN_VEXT_ST_STRIDE(vsse_v_w, int32_t, int32_t)
+GEN_VEXT_ST_STRIDE(vsse_v_d, int64_t, int64_t)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 2/5] target/riscv: add vector stride load and store instructions
@ 2020-02-10  7:42   ` LIU Zhiwei
  0 siblings, 0 replies; 18+ messages in thread
From: LIU Zhiwei @ 2020-02-10  7:42 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, qemu-devel, qemu-riscv, LIU Zhiwei

Vector strided operations access the first memory element at the base address,
and then access subsequent elements at address increments given by the byte
offset contained in the x register specified by rs2.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  35 +++++
 target/riscv/insn32.decode              |  14 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 138 +++++++++++++++++++
 target/riscv/vector_helper.c            | 169 ++++++++++++++++++++++++
 4 files changed, 356 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 74c483ef9e..19c1bfc317 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -148,3 +148,38 @@ DEF_HELPER_5(vse_v_w, void, ptr, tl, ptr, env, i32)
 DEF_HELPER_5(vse_v_w_mask, void, ptr, tl, ptr, env, i32)
 DEF_HELPER_5(vse_v_d, void, ptr, tl, ptr, env, i32)
 DEF_HELPER_5(vse_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vlsb_v_b_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsb_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsb_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsb_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsh_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsh_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsh_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsw_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsw_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlse_v_b_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlse_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlse_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlse_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsbu_v_b_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsbu_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsbu_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlsbu_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlshu_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlshu_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlshu_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlswu_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlswu_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssb_v_b_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssb_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssb_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssb_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssh_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssh_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssh_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssw_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vssw_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vsse_v_b_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vsse_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vsse_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vsse_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index dad3ed91c7..2f2d3d13b3 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -44,6 +44,7 @@
 &shift     shamt rs1 rd
 &atomic    aq rl rs2 rs1 rd
 &r2nfvm    vm rd rs1 nf
+&rnfvm     vm rd rs1 rs2 nf
 
 # Formats 32:
 @r       .......   ..... ..... ... ..... ....... &r                %rs2 %rs1 %rd
@@ -64,6 +65,7 @@
 @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
 @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
 @r2_nfvm nf:3 ... vm:1 ..... ..... ... ..... ....... &r2nfvm %rs1 %rd
+@r_nfvm  nf:3 ... vm:1 ..... ..... ... ..... ....... &rnfvm %rs2 %rs1 %rd
 @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
 
 @sfence_vma ....... ..... .....   ... ..... ....... %rs2 %rs1
@@ -222,6 +224,18 @@ vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
 vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
 vse_v      ... 000 . 00000 ..... 111 ..... 0100111 @r2_nfvm
 
+vlsb_v     ... 110 . ..... ..... 000 ..... 0000111 @r_nfvm
+vlsh_v     ... 110 . ..... ..... 101 ..... 0000111 @r_nfvm
+vlsw_v     ... 110 . ..... ..... 110 ..... 0000111 @r_nfvm
+vlse_v     ... 010 . ..... ..... 111 ..... 0000111 @r_nfvm
+vlsbu_v    ... 010 . ..... ..... 000 ..... 0000111 @r_nfvm
+vlshu_v    ... 010 . ..... ..... 101 ..... 0000111 @r_nfvm
+vlswu_v    ... 010 . ..... ..... 110 ..... 0000111 @r_nfvm
+vssb_v     ... 010 . ..... ..... 000 ..... 0100111 @r_nfvm
+vssh_v     ... 010 . ..... ..... 101 ..... 0100111 @r_nfvm
+vssw_v     ... 010 . ..... ..... 110 ..... 0100111 @r_nfvm
+vsse_v     ... 010 . ..... ..... 111 ..... 0100111 @r_nfvm
+
 # *** new major opcode OP-V ***
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index d93eb00651..5a7ea94c2d 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -361,3 +361,141 @@ GEN_VEXT_ST_US_TRANS(vsb_v, vext_st_us_trans, 0)
 GEN_VEXT_ST_US_TRANS(vsh_v, vext_st_us_trans, 1)
 GEN_VEXT_ST_US_TRANS(vsw_v, vext_st_us_trans, 2)
 GEN_VEXT_ST_US_TRANS(vse_v, vext_st_us_trans, 3)
+
+/* stride load and store */
+typedef void gen_helper_vext_ldst_stride(TCGv_ptr, TCGv, TCGv,
+        TCGv_ptr, TCGv_env, TCGv_i32);
+
+static bool do_vext_ldst_stride_trans(uint32_t vd, uint32_t rs1, uint32_t rs2,
+        uint32_t data, gen_helper_vext_ldst_stride *fn, DisasContext *s)
+{
+    TCGv_ptr dest, mask;
+    TCGv base, stride;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    base = tcg_temp_new();
+    stride = tcg_temp_new();
+    desc = tcg_const_i32(simd_desc(0, maxsz_part1(s->maxsz), data));
+
+    gen_get_gpr(base, rs1);
+    gen_get_gpr(stride, rs2);
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, base, stride, mask, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free(base);
+    tcg_temp_free(stride);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool vext_ld_stride_trans(DisasContext *s, arg_rnfvm *a, uint8_t seq)
+{
+    uint8_t nf = a->nf + 1;
+    uint32_t data = s->mlen | (a->vm << 8) | (maxsz_part2(s->maxsz) << 9)
+        | (nf << 12);
+    gen_helper_vext_ldst_stride *fn;
+    static gen_helper_vext_ldst_stride * const fns[7][4] = {
+        /* masked stride load */
+        { gen_helper_vlsb_v_b_mask,  gen_helper_vlsb_v_h_mask,
+          gen_helper_vlsb_v_w_mask,  gen_helper_vlsb_v_d_mask },
+        { NULL,                      gen_helper_vlsh_v_h_mask,
+          gen_helper_vlsh_v_w_mask,  gen_helper_vlsh_v_d_mask },
+        { NULL,                      NULL,
+          gen_helper_vlsw_v_w_mask,  gen_helper_vlsw_v_d_mask },
+        { gen_helper_vlse_v_b_mask,  gen_helper_vlse_v_h_mask,
+          gen_helper_vlse_v_w_mask,  gen_helper_vlse_v_d_mask },
+        { gen_helper_vlsbu_v_b_mask, gen_helper_vlsbu_v_h_mask,
+          gen_helper_vlsbu_v_w_mask, gen_helper_vlsbu_v_d_mask },
+        { NULL,                      gen_helper_vlshu_v_h_mask,
+          gen_helper_vlshu_v_w_mask, gen_helper_vlshu_v_d_mask },
+        { NULL,                      NULL,
+          gen_helper_vlswu_v_w_mask, gen_helper_vlswu_v_d_mask },
+    };
+
+    fn =  fns[seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    return do_vext_ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+
+#define GEN_VEXT_LD_STRIDE_TRANS(NAME, DO_OP, SEQ)           \
+static bool trans_##NAME(DisasContext *s, arg_rnfvm* a)      \
+{                                                            \
+    vchkctx.check_misa = RVV;                                \
+    vchkctx.check_overlap_mask.need_check = true;            \
+    vchkctx.check_overlap_mask.reg = a->rd;                  \
+    vchkctx.check_overlap_mask.vm = a->vm;                   \
+    vchkctx.check_reg[0].need_check = true;                  \
+    vchkctx.check_reg[0].reg = a->rd;                        \
+    vchkctx.check_reg[0].widen = false;                      \
+    vchkctx.check_nf.need_check = true;                      \
+    vchkctx.check_nf.nf = a->nf;                             \
+                                                             \
+    if (!vext_check(s)) {                                    \
+        return false;                                        \
+    }                                                        \
+    return DO_OP(s, a, SEQ);                                 \
+}
+
+GEN_VEXT_LD_STRIDE_TRANS(vlsb_v, vext_ld_stride_trans, 0)
+GEN_VEXT_LD_STRIDE_TRANS(vlsh_v, vext_ld_stride_trans, 1)
+GEN_VEXT_LD_STRIDE_TRANS(vlsw_v, vext_ld_stride_trans, 2)
+GEN_VEXT_LD_STRIDE_TRANS(vlse_v, vext_ld_stride_trans, 3)
+GEN_VEXT_LD_STRIDE_TRANS(vlsbu_v, vext_ld_stride_trans, 4)
+GEN_VEXT_LD_STRIDE_TRANS(vlshu_v, vext_ld_stride_trans, 5)
+GEN_VEXT_LD_STRIDE_TRANS(vlswu_v, vext_ld_stride_trans, 6)
+
+static bool vext_st_stride_trans(DisasContext *s, arg_rnfvm *a, uint8_t seq)
+{
+    uint8_t nf = a->nf + 1;
+    uint32_t data = s->mlen | (a->vm << 8) | (maxsz_part2(s->maxsz) << 9)
+        | (nf << 12);
+    gen_helper_vext_ldst_stride *fn;
+    static gen_helper_vext_ldst_stride * const fns[4][4] = {
+        /* masked stride store */
+        { gen_helper_vssb_v_b_mask,  gen_helper_vssb_v_h_mask,
+          gen_helper_vssb_v_w_mask,  gen_helper_vssb_v_d_mask },
+        { NULL,                      gen_helper_vssh_v_h_mask,
+          gen_helper_vssh_v_w_mask,  gen_helper_vssh_v_d_mask },
+        { NULL,                      NULL,
+          gen_helper_vssw_v_w_mask,  gen_helper_vssw_v_d_mask },
+        { gen_helper_vsse_v_b_mask,  gen_helper_vsse_v_h_mask,
+          gen_helper_vsse_v_w_mask,  gen_helper_vsse_v_d_mask }
+    };
+
+    fn =  fns[seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    return do_vext_ldst_stride_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+
+#define GEN_VEXT_ST_STRIDE_TRANS(NAME, DO_OP, SEQ)           \
+static bool trans_##NAME(DisasContext *s, arg_rnfvm* a)      \
+{                                                            \
+    vchkctx.check_misa = RVV;                                \
+    vchkctx.check_reg[0].need_check = true;                  \
+    vchkctx.check_reg[0].reg = a->rd;                        \
+    vchkctx.check_reg[0].widen = false;                      \
+    vchkctx.check_nf.need_check = true;                      \
+    vchkctx.check_nf.nf = a->nf;                             \
+                                                             \
+    if (!vext_check(s)) {                                    \
+        return false;                                        \
+    }                                                        \
+    return DO_OP(s, a, SEQ);                                 \
+}
+
+GEN_VEXT_ST_STRIDE_TRANS(vssb_v, vext_st_stride_trans, 0)
+GEN_VEXT_ST_STRIDE_TRANS(vssh_v, vext_st_stride_trans, 1)
+GEN_VEXT_ST_STRIDE_TRANS(vssw_v, vext_st_stride_trans, 2)
+GEN_VEXT_ST_STRIDE_TRANS(vsse_v, vext_st_stride_trans, 3)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 406fcd1dfe..345945d19c 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -257,6 +257,28 @@ GEN_VEXT_LD_ELEM(vlhu_v_w, uint16_t, uint32_t, H4, lduw)
 GEN_VEXT_LD_ELEM(vlhu_v_d, uint16_t, uint64_t, H8, lduw)
 GEN_VEXT_LD_ELEM(vlwu_v_w, uint32_t, uint32_t, H4, ldl)
 GEN_VEXT_LD_ELEM(vlwu_v_d, uint32_t, uint64_t, H8, ldl)
+GEN_VEXT_LD_ELEM(vlsb_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vlsb_v_h, int8_t,  int16_t, H2, ldsb)
+GEN_VEXT_LD_ELEM(vlsb_v_w, int8_t,  int32_t, H4, ldsb)
+GEN_VEXT_LD_ELEM(vlsb_v_d, int8_t,  int64_t, H8, ldsb)
+GEN_VEXT_LD_ELEM(vlsh_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vlsh_v_w, int16_t, int32_t, H4, ldsw)
+GEN_VEXT_LD_ELEM(vlsh_v_d, int16_t, int64_t, H8, ldsw)
+GEN_VEXT_LD_ELEM(vlsw_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlsw_v_d, int32_t, int64_t, H8, ldl)
+GEN_VEXT_LD_ELEM(vlse_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vlse_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vlse_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlse_v_d, int64_t, int64_t, H8, ldq)
+GEN_VEXT_LD_ELEM(vlsbu_v_b, uint8_t,  uint8_t,  H1, ldub)
+GEN_VEXT_LD_ELEM(vlsbu_v_h, uint8_t,  uint16_t, H2, ldub)
+GEN_VEXT_LD_ELEM(vlsbu_v_w, uint8_t,  uint32_t, H4, ldub)
+GEN_VEXT_LD_ELEM(vlsbu_v_d, uint8_t,  uint64_t, H8, ldub)
+GEN_VEXT_LD_ELEM(vlshu_v_h, uint16_t, uint16_t, H2, lduw)
+GEN_VEXT_LD_ELEM(vlshu_v_w, uint16_t, uint32_t, H4, lduw)
+GEN_VEXT_LD_ELEM(vlshu_v_d, uint16_t, uint64_t, H8, lduw)
+GEN_VEXT_LD_ELEM(vlswu_v_w, uint32_t, uint32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlswu_v_d, uint32_t, uint64_t, H8, ldl)
 
 #define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF)                       \
 static void vext_##NAME##_st_elem(CPURISCVState *env, abi_ptr addr,   \
@@ -280,6 +302,19 @@ GEN_VEXT_ST_ELEM(vse_v_b, int8_t,  H1, stb)
 GEN_VEXT_ST_ELEM(vse_v_h, int16_t, H2, stw)
 GEN_VEXT_ST_ELEM(vse_v_w, int32_t, H4, stl)
 GEN_VEXT_ST_ELEM(vse_v_d, int64_t, H8, stq)
+GEN_VEXT_ST_ELEM(vssb_v_b, int8_t,  H1, stb)
+GEN_VEXT_ST_ELEM(vssb_v_h, int16_t, H2, stb)
+GEN_VEXT_ST_ELEM(vssb_v_w, int32_t, H4, stb)
+GEN_VEXT_ST_ELEM(vssb_v_d, int64_t, H8, stb)
+GEN_VEXT_ST_ELEM(vssh_v_h, int16_t, H2, stw)
+GEN_VEXT_ST_ELEM(vssh_v_w, int32_t, H4, stw)
+GEN_VEXT_ST_ELEM(vssh_v_d, int64_t, H8, stw)
+GEN_VEXT_ST_ELEM(vssw_v_w, int32_t, H4, stl)
+GEN_VEXT_ST_ELEM(vssw_v_d, int64_t, H8, stl)
+GEN_VEXT_ST_ELEM(vsse_v_b, int8_t,  H1, stb)
+GEN_VEXT_ST_ELEM(vsse_v_h, int16_t, H2, stw)
+GEN_VEXT_ST_ELEM(vsse_v_w, int32_t, H4, stl)
+GEN_VEXT_ST_ELEM(vsse_v_d, int64_t, H8, stq)
 
 /* unit-stride: load vector element from continuous guest memory */
 static void vext_ld_unit_stride_mask(void *vd, void *v0, CPURISCVState *env,
@@ -485,3 +520,137 @@ GEN_VEXT_ST_UNIT_STRIDE(vse_v_b, int8_t,  int8_t)
 GEN_VEXT_ST_UNIT_STRIDE(vse_v_h, int16_t, int16_t)
 GEN_VEXT_ST_UNIT_STRIDE(vse_v_w, int32_t, int32_t)
 GEN_VEXT_ST_UNIT_STRIDE(vse_v_d, int64_t, int64_t)
+
+/* stride: load strided vector element from guest memory */
+static void vext_ld_stride_mask(void *vd, void *v0, CPURISCVState *env,
+        struct vext_ldst_ctx *ctx, uintptr_t ra)
+{
+    uint32_t i, k;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    if (s->vl == 0) {
+        return;
+    }
+    /* probe every access*/
+    for (i = 0; i < s->vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        probe_read_access(env, ctx->base + ctx->stride * i,
+                ctx->nf * s->msz, ra);
+    }
+    /* load bytes from guest memory */
+    for (i = 0; i < s->vl; i++) {
+        k = 0;
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        while (k < ctx->nf) {
+            target_ulong addr = ctx->base + ctx->stride * i + k * s->msz;
+            ctx->ld_elem(env, addr, i + k * s->vlmax, vd, ra);
+            k++;
+        }
+    }
+    /* clear tail elements */
+    for (k = 0; k < ctx->nf; k++) {
+        ctx->clear_elem(vd, s->vl + k * s->vlmax, s->vl * s->esz,
+                s->vlmax * s->esz);
+    }
+}
+
+#define GEN_VEXT_LD_STRIDE(NAME, MTYPE, ETYPE)                                 \
+void HELPER(NAME##_mask)(void *vd, target_ulong base, target_ulong stride,     \
+        void *v0, CPURISCVState *env, uint32_t desc)                           \
+{                                                                              \
+    static struct vext_ldst_ctx ctx;                                           \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                              \
+        sizeof(MTYPE), env->vext.vl, desc);                                    \
+    ctx.nf = vext_nf(desc);                                                    \
+    ctx.base = base;                                                           \
+    ctx.stride = stride;                                                       \
+    ctx.ld_elem = vext_##NAME##_ld_elem;                                       \
+    ctx.clear_elem = vext_##NAME##_clear_elem;                                 \
+                                                                               \
+    vext_ld_stride_mask(vd, v0, env, &ctx, GETPC());                           \
+}
+
+GEN_VEXT_LD_STRIDE(vlsb_v_b, int8_t,  int8_t)
+GEN_VEXT_LD_STRIDE(vlsb_v_h, int8_t,  int16_t)
+GEN_VEXT_LD_STRIDE(vlsb_v_w, int8_t,  int32_t)
+GEN_VEXT_LD_STRIDE(vlsb_v_d, int8_t,  int64_t)
+GEN_VEXT_LD_STRIDE(vlsh_v_h, int16_t, int16_t)
+GEN_VEXT_LD_STRIDE(vlsh_v_w, int16_t, int32_t)
+GEN_VEXT_LD_STRIDE(vlsh_v_d, int16_t, int64_t)
+GEN_VEXT_LD_STRIDE(vlsw_v_w, int32_t, int32_t)
+GEN_VEXT_LD_STRIDE(vlsw_v_d, int32_t, int64_t)
+GEN_VEXT_LD_STRIDE(vlse_v_b, int8_t,  int8_t)
+GEN_VEXT_LD_STRIDE(vlse_v_h, int16_t, int16_t)
+GEN_VEXT_LD_STRIDE(vlse_v_w, int32_t, int32_t)
+GEN_VEXT_LD_STRIDE(vlse_v_d, int64_t, int64_t)
+GEN_VEXT_LD_STRIDE(vlsbu_v_b, uint8_t,  uint8_t)
+GEN_VEXT_LD_STRIDE(vlsbu_v_h, uint8_t,  uint16_t)
+GEN_VEXT_LD_STRIDE(vlsbu_v_w, uint8_t,  uint32_t)
+GEN_VEXT_LD_STRIDE(vlsbu_v_d, uint8_t,  uint64_t)
+GEN_VEXT_LD_STRIDE(vlshu_v_h, uint16_t, uint16_t)
+GEN_VEXT_LD_STRIDE(vlshu_v_w, uint16_t, uint32_t)
+GEN_VEXT_LD_STRIDE(vlshu_v_d, uint16_t, uint64_t)
+GEN_VEXT_LD_STRIDE(vlswu_v_w, uint32_t, uint32_t)
+GEN_VEXT_LD_STRIDE(vlswu_v_d, uint32_t, uint64_t)
+
+/* stride: store strided vector element to guest memory */
+static void vext_st_stride_mask(void *vd, void *v0, CPURISCVState *env,
+        struct vext_ldst_ctx *ctx, uintptr_t ra)
+{
+    uint32_t i, k;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    /* probe every access*/
+    for (i = 0; i < s->vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        probe_write_access(env, ctx->base + ctx->stride * i,
+                ctx->nf * s->msz, ra);
+    }
+    /* store bytes to guest memory */
+    for (i = 0; i < s->vl; i++) {
+        k = 0;
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        while (k < ctx->nf) {
+            target_ulong addr = ctx->base + ctx->stride * i + k * s->msz;
+            ctx->st_elem(env, addr, i + k * s->vlmax, vd, ra);
+            k++;
+        }
+    }
+}
+
+#define GEN_VEXT_ST_STRIDE(NAME, MTYPE, ETYPE)                                 \
+void HELPER(NAME##_mask)(void *vd, target_ulong base, target_ulong stride,     \
+        void *v0, CPURISCVState *env, uint32_t desc)                           \
+{                                                                              \
+    static struct vext_ldst_ctx ctx;                                           \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                              \
+        sizeof(MTYPE), env->vext.vl, desc);                                    \
+    ctx.nf = vext_nf(desc);                                                    \
+    ctx.base = base;                                                           \
+    ctx.stride = stride;                                                       \
+    ctx.st_elem = vext_##NAME##_st_elem;                                       \
+                                                                               \
+    vext_st_stride_mask(vd, v0, env, &ctx, GETPC());                           \
+}
+
+GEN_VEXT_ST_STRIDE(vssb_v_b, int8_t,  int8_t)
+GEN_VEXT_ST_STRIDE(vssb_v_h, int8_t,  int16_t)
+GEN_VEXT_ST_STRIDE(vssb_v_w, int8_t,  int32_t)
+GEN_VEXT_ST_STRIDE(vssb_v_d, int8_t,  int64_t)
+GEN_VEXT_ST_STRIDE(vssh_v_h, int16_t, int16_t)
+GEN_VEXT_ST_STRIDE(vssh_v_w, int16_t, int32_t)
+GEN_VEXT_ST_STRIDE(vssh_v_d, int16_t, int64_t)
+GEN_VEXT_ST_STRIDE(vssw_v_w, int32_t, int32_t)
+GEN_VEXT_ST_STRIDE(vssw_v_d, int32_t, int64_t)
+GEN_VEXT_ST_STRIDE(vsse_v_b, int8_t,  int8_t)
+GEN_VEXT_ST_STRIDE(vsse_v_h, int16_t, int16_t)
+GEN_VEXT_ST_STRIDE(vsse_v_w, int32_t, int32_t)
+GEN_VEXT_ST_STRIDE(vsse_v_d, int64_t, int64_t)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 3/5] target/riscv: add vector index load and store instructions
  2020-02-10  7:42 ` LIU Zhiwei
@ 2020-02-10  7:42   ` LIU Zhiwei
  -1 siblings, 0 replies; 18+ messages in thread
From: LIU Zhiwei @ 2020-02-10  7:42 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768, LIU Zhiwei

Vector indexed operations add the contents of each element of the
vector offset operand specified by vs2 to the base effective address
to give the effective address of each element.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  35 ++++
 target/riscv/insn32.decode              |  16 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 164 ++++++++++++++++++
 target/riscv/vector_helper.c            | 214 ++++++++++++++++++++++++
 4 files changed, 429 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 19c1bfc317..5ebd3d6ccd 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -183,3 +183,38 @@ DEF_HELPER_6(vsse_v_b_mask, void, ptr, tl, tl, ptr, env, i32)
 DEF_HELPER_6(vsse_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
 DEF_HELPER_6(vsse_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
 DEF_HELPER_6(vsse_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_b_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxw_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxw_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_b_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_b_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxwu_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxwu_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_b_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxw_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxw_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_b_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 2f2d3d13b3..6a363a6b7e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -236,6 +236,22 @@ vssh_v     ... 010 . ..... ..... 101 ..... 0100111 @r_nfvm
 vssw_v     ... 010 . ..... ..... 110 ..... 0100111 @r_nfvm
 vsse_v     ... 010 . ..... ..... 111 ..... 0100111 @r_nfvm
 
+vlxb_v     ... 111 . ..... ..... 000 ..... 0000111 @r_nfvm
+vlxh_v     ... 111 . ..... ..... 101 ..... 0000111 @r_nfvm
+vlxw_v     ... 111 . ..... ..... 110 ..... 0000111 @r_nfvm
+vlxe_v     ... 011 . ..... ..... 111 ..... 0000111 @r_nfvm
+vlxbu_v    ... 011 . ..... ..... 000 ..... 0000111 @r_nfvm
+vlxhu_v    ... 011 . ..... ..... 101 ..... 0000111 @r_nfvm
+vlxwu_v    ... 011 . ..... ..... 110 ..... 0000111 @r_nfvm
+vsxb_v     ... 011 . ..... ..... 000 ..... 0100111 @r_nfvm
+vsxh_v     ... 011 . ..... ..... 101 ..... 0100111 @r_nfvm
+vsxw_v     ... 011 . ..... ..... 110 ..... 0100111 @r_nfvm
+vsxe_v     ... 011 . ..... ..... 111 ..... 0100111 @r_nfvm
+vsuxb_v    ... 111 . ..... ..... 000 ..... 0100111 @r_nfvm
+vsuxh_v    ... 111 . ..... ..... 101 ..... 0100111 @r_nfvm
+vsuxw_v    ... 111 . ..... ..... 110 ..... 0100111 @r_nfvm
+vsuxe_v    ... 111 . ..... ..... 111 ..... 0100111 @r_nfvm
+
 # *** new major opcode OP-V ***
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 5a7ea94c2d..13033b3906 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -499,3 +499,167 @@ GEN_VEXT_ST_STRIDE_TRANS(vssb_v, vext_st_stride_trans, 0)
 GEN_VEXT_ST_STRIDE_TRANS(vssh_v, vext_st_stride_trans, 1)
 GEN_VEXT_ST_STRIDE_TRANS(vssw_v, vext_st_stride_trans, 2)
 GEN_VEXT_ST_STRIDE_TRANS(vsse_v, vext_st_stride_trans, 3)
+
+/* index load and store */
+typedef void gen_helper_vext_ldst_index(TCGv_ptr, TCGv, TCGv_ptr,
+        TCGv_ptr, TCGv_env, TCGv_i32);
+
+static bool do_vext_ldst_index_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
+        uint32_t data, gen_helper_vext_ldst_index *fn, DisasContext *s)
+{
+    TCGv_ptr dest, mask, index;
+    TCGv base;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    index = tcg_temp_new_ptr();
+    base = tcg_temp_new();
+    desc = tcg_const_i32(simd_desc(0, maxsz_part1(s->maxsz), data));
+
+    gen_get_gpr(base, rs1);
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(index, cpu_env, vreg_ofs(s, vs2));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, base, mask, index, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free_ptr(index);
+    tcg_temp_free(base);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool vext_ld_index_trans(DisasContext *s, arg_rnfvm *a, uint8_t seq)
+{
+    uint8_t nf = a->nf + 1;
+    uint32_t data = s->mlen | (a->vm << 8) | (maxsz_part2(s->maxsz) << 9)
+        | (nf << 12);
+    gen_helper_vext_ldst_index *fn;
+    static gen_helper_vext_ldst_index * const fns[7][4] = {
+        /* masked index load */
+        { gen_helper_vlxb_v_b_mask,  gen_helper_vlxb_v_h_mask,
+          gen_helper_vlxb_v_w_mask,  gen_helper_vlxb_v_d_mask },
+        { NULL,                      gen_helper_vlxh_v_h_mask,
+          gen_helper_vlxh_v_w_mask,  gen_helper_vlxh_v_d_mask },
+        { NULL,                      NULL,
+          gen_helper_vlxw_v_w_mask,  gen_helper_vlxw_v_d_mask },
+        { gen_helper_vlxe_v_b_mask,  gen_helper_vlxe_v_h_mask,
+          gen_helper_vlxe_v_w_mask,  gen_helper_vlxe_v_d_mask },
+        { gen_helper_vlxbu_v_b_mask, gen_helper_vlxbu_v_h_mask,
+          gen_helper_vlxbu_v_w_mask, gen_helper_vlxbu_v_d_mask },
+        { NULL,                      gen_helper_vlxhu_v_h_mask,
+          gen_helper_vlxhu_v_w_mask, gen_helper_vlxhu_v_d_mask },
+        { NULL,                      NULL,
+          gen_helper_vlxwu_v_w_mask, gen_helper_vlxwu_v_d_mask },
+    };
+
+    fn =  fns[seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    return do_vext_ldst_index_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+
+#define GEN_VEXT_LD_INDEX_TRANS(NAME, DO_OP, SEQ)                         \
+static bool trans_##NAME(DisasContext *s, arg_rnfvm* a)                   \
+{                                                                         \
+    vchkctx.check_misa = RVV;                                             \
+    vchkctx.check_overlap_mask.need_check = true;                         \
+    vchkctx.check_overlap_mask.reg = a->rd;                               \
+    vchkctx.check_overlap_mask.vm = a->vm;                                \
+    vchkctx.check_reg[0].need_check = true;                               \
+    vchkctx.check_reg[0].reg = a->rd;                                     \
+    vchkctx.check_reg[0].widen = false;                                   \
+    vchkctx.check_reg[1].need_check = true;                               \
+    vchkctx.check_reg[0].reg = a->rs2;                                    \
+    vchkctx.check_reg[0].widen = false;                                   \
+    vchkctx.check_nf.need_check = true;                                   \
+    vchkctx.check_nf.nf = a->nf;                                          \
+                                                                          \
+    if (!vext_check(s)) {                                                 \
+        return false;                                                     \
+    }                                                                     \
+    return DO_OP(s, a, SEQ);                                              \
+}
+
+GEN_VEXT_LD_INDEX_TRANS(vlxb_v, vext_ld_index_trans, 0)
+GEN_VEXT_LD_INDEX_TRANS(vlxh_v, vext_ld_index_trans, 1)
+GEN_VEXT_LD_INDEX_TRANS(vlxw_v, vext_ld_index_trans, 2)
+GEN_VEXT_LD_INDEX_TRANS(vlxe_v, vext_ld_index_trans, 3)
+GEN_VEXT_LD_INDEX_TRANS(vlxbu_v, vext_ld_index_trans, 4)
+GEN_VEXT_LD_INDEX_TRANS(vlxhu_v, vext_ld_index_trans, 5)
+GEN_VEXT_LD_INDEX_TRANS(vlxwu_v, vext_ld_index_trans, 6)
+
+static bool vext_st_index_trans(DisasContext *s, arg_rnfvm *a, uint8_t seq)
+{
+    uint8_t nf = a->nf + 1;
+    uint32_t data = s->mlen | (a->vm << 8) | (maxsz_part2(s->maxsz) << 9)
+        | (nf << 12);
+    gen_helper_vext_ldst_index *fn;
+    static gen_helper_vext_ldst_index * const fns[4][4] = {
+        /* masked index store */
+        { gen_helper_vsxb_v_b_mask,  gen_helper_vsxb_v_h_mask,
+          gen_helper_vsxb_v_w_mask,  gen_helper_vsxb_v_d_mask },
+        { NULL,                      gen_helper_vsxh_v_h_mask,
+          gen_helper_vsxh_v_w_mask,  gen_helper_vsxh_v_d_mask },
+        { NULL,                      NULL,
+          gen_helper_vsxw_v_w_mask,  gen_helper_vsxw_v_d_mask },
+        { gen_helper_vsxe_v_b_mask,  gen_helper_vsxe_v_h_mask,
+          gen_helper_vsxe_v_w_mask,  gen_helper_vsxe_v_d_mask }
+    };
+
+    fn =  fns[seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    return do_vext_ldst_index_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+
+#define GEN_VEXT_ST_INDEX_TRANS(NAME, DO_OP, SEQ)                         \
+static bool trans_##NAME(DisasContext *s, arg_rnfvm* a)                   \
+{                                                                         \
+    vchkctx.check_misa = RVV;                                             \
+    vchkctx.check_reg[0].need_check = true;                               \
+    vchkctx.check_reg[0].reg = a->rd;                                     \
+    vchkctx.check_reg[0].widen = false;                                   \
+    vchkctx.check_reg[1].need_check = true;                               \
+    vchkctx.check_reg[0].reg = a->rs2;                                    \
+    vchkctx.check_reg[0].widen = false;                                   \
+    vchkctx.check_nf.need_check = true;                                   \
+    vchkctx.check_nf.nf = a->nf;                                          \
+                                                                          \
+    if (!vext_check(s)) {                                                 \
+        return false;                                                     \
+    }                                                                     \
+    return DO_OP(s, a, SEQ);                                              \
+}
+
+GEN_VEXT_ST_INDEX_TRANS(vsxb_v, vext_st_index_trans, 0)
+GEN_VEXT_ST_INDEX_TRANS(vsxh_v, vext_st_index_trans, 1)
+GEN_VEXT_ST_INDEX_TRANS(vsxw_v, vext_st_index_trans, 2)
+GEN_VEXT_ST_INDEX_TRANS(vsxe_v, vext_st_index_trans, 3)
+
+static bool trans_vsuxb_v(DisasContext *s, arg_rnfvm* a)
+{
+    return trans_vsxb_v(s, a);
+}
+
+static bool trans_vsuxh_v(DisasContext *s, arg_rnfvm* a)
+{
+    return trans_vsxh_v(s, a);
+}
+
+static bool trans_vsuxw_v(DisasContext *s, arg_rnfvm* a)
+{
+    return trans_vsxw_v(s, a);
+}
+
+static bool trans_vsuxe_v(DisasContext *s, arg_rnfvm* a)
+{
+    return trans_vsxe_v(s, a);
+}
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 345945d19c..0404394588 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -279,6 +279,28 @@ GEN_VEXT_LD_ELEM(vlshu_v_w, uint16_t, uint32_t, H4, lduw)
 GEN_VEXT_LD_ELEM(vlshu_v_d, uint16_t, uint64_t, H8, lduw)
 GEN_VEXT_LD_ELEM(vlswu_v_w, uint32_t, uint32_t, H4, ldl)
 GEN_VEXT_LD_ELEM(vlswu_v_d, uint32_t, uint64_t, H8, ldl)
+GEN_VEXT_LD_ELEM(vlxb_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vlxb_v_h, int8_t,  int16_t, H2, ldsb)
+GEN_VEXT_LD_ELEM(vlxb_v_w, int8_t,  int32_t, H4, ldsb)
+GEN_VEXT_LD_ELEM(vlxb_v_d, int8_t,  int64_t, H8, ldsb)
+GEN_VEXT_LD_ELEM(vlxh_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vlxh_v_w, int16_t, int32_t, H4, ldsw)
+GEN_VEXT_LD_ELEM(vlxh_v_d, int16_t, int64_t, H8, ldsw)
+GEN_VEXT_LD_ELEM(vlxw_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlxw_v_d, int32_t, int64_t, H8, ldl)
+GEN_VEXT_LD_ELEM(vlxe_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vlxe_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vlxe_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlxe_v_d, int64_t, int64_t, H8, ldq)
+GEN_VEXT_LD_ELEM(vlxbu_v_b, uint8_t,  uint8_t,  H1, ldub)
+GEN_VEXT_LD_ELEM(vlxbu_v_h, uint8_t,  uint16_t, H2, ldub)
+GEN_VEXT_LD_ELEM(vlxbu_v_w, uint8_t,  uint32_t, H4, ldub)
+GEN_VEXT_LD_ELEM(vlxbu_v_d, uint8_t,  uint64_t, H8, ldub)
+GEN_VEXT_LD_ELEM(vlxhu_v_h, uint16_t, uint16_t, H2, lduw)
+GEN_VEXT_LD_ELEM(vlxhu_v_w, uint16_t, uint32_t, H4, lduw)
+GEN_VEXT_LD_ELEM(vlxhu_v_d, uint16_t, uint64_t, H8, lduw)
+GEN_VEXT_LD_ELEM(vlxwu_v_w, uint32_t, uint32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlxwu_v_d, uint32_t, uint64_t, H8, ldl)
 
 #define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF)                       \
 static void vext_##NAME##_st_elem(CPURISCVState *env, abi_ptr addr,   \
@@ -315,6 +337,19 @@ GEN_VEXT_ST_ELEM(vsse_v_b, int8_t,  H1, stb)
 GEN_VEXT_ST_ELEM(vsse_v_h, int16_t, H2, stw)
 GEN_VEXT_ST_ELEM(vsse_v_w, int32_t, H4, stl)
 GEN_VEXT_ST_ELEM(vsse_v_d, int64_t, H8, stq)
+GEN_VEXT_ST_ELEM(vsxb_v_b, int8_t,  H1, stb)
+GEN_VEXT_ST_ELEM(vsxb_v_h, int16_t, H2, stb)
+GEN_VEXT_ST_ELEM(vsxb_v_w, int32_t, H4, stb)
+GEN_VEXT_ST_ELEM(vsxb_v_d, int64_t, H8, stb)
+GEN_VEXT_ST_ELEM(vsxh_v_h, int16_t, H2, stw)
+GEN_VEXT_ST_ELEM(vsxh_v_w, int32_t, H4, stw)
+GEN_VEXT_ST_ELEM(vsxh_v_d, int64_t, H8, stw)
+GEN_VEXT_ST_ELEM(vsxw_v_w, int32_t, H4, stl)
+GEN_VEXT_ST_ELEM(vsxw_v_d, int64_t, H8, stl)
+GEN_VEXT_ST_ELEM(vsxe_v_b, int8_t,  H1, stb)
+GEN_VEXT_ST_ELEM(vsxe_v_h, int16_t, H2, stw)
+GEN_VEXT_ST_ELEM(vsxe_v_w, int32_t, H4, stl)
+GEN_VEXT_ST_ELEM(vsxe_v_d, int64_t, H8, stq)
 
 /* unit-stride: load vector element from continuous guest memory */
 static void vext_ld_unit_stride_mask(void *vd, void *v0, CPURISCVState *env,
@@ -654,3 +689,182 @@ GEN_VEXT_ST_STRIDE(vsse_v_b, int8_t,  int8_t)
 GEN_VEXT_ST_STRIDE(vsse_v_h, int16_t, int16_t)
 GEN_VEXT_ST_STRIDE(vsse_v_w, int32_t, int32_t)
 GEN_VEXT_ST_STRIDE(vsse_v_d, int64_t, int64_t)
+
+/* index: load indexed vector element from guest memory */
+#define GEN_VEXT_GET_INDEX_ADDR(NAME, ETYPE, H)                   \
+static target_ulong vext_##NAME##_get_addr(target_ulong base,     \
+        uint32_t idx, void *vs2)                                  \
+{                                                                 \
+    return (base + *((ETYPE *)vs2 + H(idx)));                     \
+}
+
+GEN_VEXT_GET_INDEX_ADDR(vlxb_v_b, int8_t,  H1)
+GEN_VEXT_GET_INDEX_ADDR(vlxb_v_h, int16_t, H2)
+GEN_VEXT_GET_INDEX_ADDR(vlxb_v_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vlxb_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vlxh_v_h, int16_t, H2)
+GEN_VEXT_GET_INDEX_ADDR(vlxh_v_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vlxh_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vlxw_v_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vlxw_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vlxe_v_b, int8_t,  H1)
+GEN_VEXT_GET_INDEX_ADDR(vlxe_v_h, int16_t, H2)
+GEN_VEXT_GET_INDEX_ADDR(vlxe_v_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vlxe_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vlxbu_v_b, uint8_t,  H1)
+GEN_VEXT_GET_INDEX_ADDR(vlxbu_v_h, uint16_t, H2)
+GEN_VEXT_GET_INDEX_ADDR(vlxbu_v_w, uint32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vlxbu_v_d, uint64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vlxhu_v_h, uint16_t, H2)
+GEN_VEXT_GET_INDEX_ADDR(vlxhu_v_w, uint32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vlxhu_v_d, uint64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vlxwu_v_w, uint32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vlxwu_v_d, uint64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vsxb_v_b, int8_t,  H1)
+GEN_VEXT_GET_INDEX_ADDR(vsxb_v_h, int16_t, H2)
+GEN_VEXT_GET_INDEX_ADDR(vsxb_v_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vsxb_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vsxh_v_h, int16_t, H2)
+GEN_VEXT_GET_INDEX_ADDR(vsxh_v_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vsxh_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vsxw_v_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vsxw_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vsxe_v_b, int8_t,  H1)
+GEN_VEXT_GET_INDEX_ADDR(vsxe_v_h, int16_t, H2)
+GEN_VEXT_GET_INDEX_ADDR(vsxe_v_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vsxe_v_d, int64_t, H8)
+
+static void vext_ld_index_mask(void *vd, void *vs2, void *v0,
+        CPURISCVState *env, struct vext_ldst_ctx *ctx, uintptr_t ra)
+{
+    uint32_t i, k;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    if (s->vl == 0) {
+        return;
+    }
+    /* probe every access*/
+    for (i = 0; i < s->vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        probe_read_access(env, ctx->get_index_addr(ctx->base, i, vs2),
+                ctx->nf * s->msz, ra);
+    }
+    /* load bytes from guest memory */
+    for (i = 0; i < s->vl; i++) {
+        k = 0;
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        while (k < ctx->nf) {
+            abi_ptr addr = ctx->get_index_addr(ctx->base, i, vs2) +
+                k * s->msz;
+            ctx->ld_elem(env, addr, i + k * s->vlmax, vd, ra);
+            k++;
+        }
+    }
+    /* clear tail elements */
+    for (k = 0; k < ctx->nf; k++) {
+        ctx->clear_elem(vd, s->vl + k * s->vlmax, s->vl * s->esz,
+                s->vlmax * s->esz);
+    }
+}
+
+#define GEN_VEXT_LD_INDEX(NAME, MTYPE, ETYPE)                                  \
+void HELPER(NAME##_mask)(void *vd, target_ulong base, void *v0, void *vs2,     \
+        CPURISCVState *env, uint32_t desc)                                     \
+{                                                                              \
+    static struct vext_ldst_ctx ctx;                                           \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                              \
+        sizeof(MTYPE), env->vext.vl, desc);                                    \
+    ctx.nf = vext_nf(desc);                                                    \
+    ctx.base = base;                                                           \
+    ctx.ld_elem = vext_##NAME##_ld_elem;                                       \
+    ctx.clear_elem = vext_##NAME##_clear_elem;                                 \
+    ctx.get_index_addr = vext_##NAME##_get_addr;                               \
+                                                                               \
+    vext_ld_index_mask(vd, vs2, v0, env, &ctx, GETPC());                       \
+}                                                                              \
+
+GEN_VEXT_LD_INDEX(vlxb_v_b, int8_t,  int8_t)
+GEN_VEXT_LD_INDEX(vlxb_v_h, int8_t,  int16_t)
+GEN_VEXT_LD_INDEX(vlxb_v_w, int8_t,  int32_t)
+GEN_VEXT_LD_INDEX(vlxb_v_d, int8_t,  int64_t)
+GEN_VEXT_LD_INDEX(vlxh_v_h, int16_t, int16_t)
+GEN_VEXT_LD_INDEX(vlxh_v_w, int16_t, int32_t)
+GEN_VEXT_LD_INDEX(vlxh_v_d, int16_t, int64_t)
+GEN_VEXT_LD_INDEX(vlxw_v_w, int32_t, int32_t)
+GEN_VEXT_LD_INDEX(vlxw_v_d, int32_t, int64_t)
+GEN_VEXT_LD_INDEX(vlxe_v_b, int8_t,  int8_t)
+GEN_VEXT_LD_INDEX(vlxe_v_h, int16_t, int16_t)
+GEN_VEXT_LD_INDEX(vlxe_v_w, int32_t, int32_t)
+GEN_VEXT_LD_INDEX(vlxe_v_d, int64_t, int64_t)
+GEN_VEXT_LD_INDEX(vlxbu_v_b, uint8_t,  uint8_t)
+GEN_VEXT_LD_INDEX(vlxbu_v_h, uint8_t,  uint16_t)
+GEN_VEXT_LD_INDEX(vlxbu_v_w, uint8_t,  uint32_t)
+GEN_VEXT_LD_INDEX(vlxbu_v_d, uint8_t,  uint64_t)
+GEN_VEXT_LD_INDEX(vlxhu_v_h, uint16_t, uint16_t)
+GEN_VEXT_LD_INDEX(vlxhu_v_w, uint16_t, uint32_t)
+GEN_VEXT_LD_INDEX(vlxhu_v_d, uint16_t, uint64_t)
+GEN_VEXT_LD_INDEX(vlxwu_v_w, uint32_t, uint32_t)
+GEN_VEXT_LD_INDEX(vlxwu_v_d, uint32_t, uint64_t)
+
+/* index: store indexed vector element to guest memory */
+static void vext_st_index_mask(void *vd, void *vs2, void *v0,
+        CPURISCVState *env, struct vext_ldst_ctx *ctx, uintptr_t ra)
+{
+    uint32_t i, k;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    /* probe every access*/
+    for (i = 0; i < s->vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        probe_write_access(env, ctx->get_index_addr(ctx->base, i, vs2),
+                ctx->nf * s->msz, ra);
+    }
+    /* store bytes to guest memory */
+    for (i = 0; i < s->vl; i++) {
+        k = 0;
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        while (k < ctx->nf) {
+            target_ulong addr = ctx->get_index_addr(ctx->base, i, vs2) +
+                k * s->msz;
+            ctx->st_elem(env, addr, i + k * s->vlmax, vd, ra);
+            k++;
+        }
+    }
+}
+
+#define GEN_VEXT_ST_INDEX(NAME, MTYPE, ETYPE)                       \
+void HELPER(NAME##_mask)(void *vd, target_ulong base, void *v0,     \
+        void *vs2, CPURISCVState *env, uint32_t desc)               \
+{                                                                   \
+    static struct vext_ldst_ctx ctx;                                \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                   \
+        sizeof(MTYPE), env->vext.vl, desc);                         \
+    ctx.nf = vext_nf(desc);                                         \
+    ctx.base = base;                                                \
+    ctx.st_elem = vext_##NAME##_st_elem;                            \
+    ctx.get_index_addr = vext_##NAME##_get_addr;                    \
+                                                                    \
+    vext_st_index_mask(vd, vs2, v0, env, &ctx, GETPC());            \
+}
+
+GEN_VEXT_ST_INDEX(vsxb_v_b, int8_t,  int8_t)
+GEN_VEXT_ST_INDEX(vsxb_v_h, int8_t,  int16_t)
+GEN_VEXT_ST_INDEX(vsxb_v_w, int8_t,  int32_t)
+GEN_VEXT_ST_INDEX(vsxb_v_d, int8_t,  int64_t)
+GEN_VEXT_ST_INDEX(vsxh_v_h, int16_t, int16_t)
+GEN_VEXT_ST_INDEX(vsxh_v_w, int16_t, int32_t)
+GEN_VEXT_ST_INDEX(vsxh_v_d, int16_t, int64_t)
+GEN_VEXT_ST_INDEX(vsxw_v_w, int32_t, int32_t)
+GEN_VEXT_ST_INDEX(vsxw_v_d, int32_t, int64_t)
+GEN_VEXT_ST_INDEX(vsxe_v_b, int8_t,  int8_t)
+GEN_VEXT_ST_INDEX(vsxe_v_h, int16_t, int16_t)
+GEN_VEXT_ST_INDEX(vsxe_v_w, int32_t, int32_t)
+GEN_VEXT_ST_INDEX(vsxe_v_d, int64_t, int64_t)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 3/5] target/riscv: add vector index load and store instructions
@ 2020-02-10  7:42   ` LIU Zhiwei
  0 siblings, 0 replies; 18+ messages in thread
From: LIU Zhiwei @ 2020-02-10  7:42 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, qemu-devel, qemu-riscv, LIU Zhiwei

Vector indexed operations add the contents of each element of the
vector offset operand specified by vs2 to the base effective address
to give the effective address of each element.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  35 ++++
 target/riscv/insn32.decode              |  16 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 164 ++++++++++++++++++
 target/riscv/vector_helper.c            | 214 ++++++++++++++++++++++++
 4 files changed, 429 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 19c1bfc317..5ebd3d6ccd 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -183,3 +183,38 @@ DEF_HELPER_6(vsse_v_b_mask, void, ptr, tl, tl, ptr, env, i32)
 DEF_HELPER_6(vsse_v_h_mask, void, ptr, tl, tl, ptr, env, i32)
 DEF_HELPER_6(vsse_v_w_mask, void, ptr, tl, tl, ptr, env, i32)
 DEF_HELPER_6(vsse_v_d_mask, void, ptr, tl, tl, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_b_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxb_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxh_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxw_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxw_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_b_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxe_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_b_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxbu_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxhu_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxwu_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vlxwu_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_b_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxb_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxh_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxw_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxw_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_b_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vsxe_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 2f2d3d13b3..6a363a6b7e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -236,6 +236,22 @@ vssh_v     ... 010 . ..... ..... 101 ..... 0100111 @r_nfvm
 vssw_v     ... 010 . ..... ..... 110 ..... 0100111 @r_nfvm
 vsse_v     ... 010 . ..... ..... 111 ..... 0100111 @r_nfvm
 
+vlxb_v     ... 111 . ..... ..... 000 ..... 0000111 @r_nfvm
+vlxh_v     ... 111 . ..... ..... 101 ..... 0000111 @r_nfvm
+vlxw_v     ... 111 . ..... ..... 110 ..... 0000111 @r_nfvm
+vlxe_v     ... 011 . ..... ..... 111 ..... 0000111 @r_nfvm
+vlxbu_v    ... 011 . ..... ..... 000 ..... 0000111 @r_nfvm
+vlxhu_v    ... 011 . ..... ..... 101 ..... 0000111 @r_nfvm
+vlxwu_v    ... 011 . ..... ..... 110 ..... 0000111 @r_nfvm
+vsxb_v     ... 011 . ..... ..... 000 ..... 0100111 @r_nfvm
+vsxh_v     ... 011 . ..... ..... 101 ..... 0100111 @r_nfvm
+vsxw_v     ... 011 . ..... ..... 110 ..... 0100111 @r_nfvm
+vsxe_v     ... 011 . ..... ..... 111 ..... 0100111 @r_nfvm
+vsuxb_v    ... 111 . ..... ..... 000 ..... 0100111 @r_nfvm
+vsuxh_v    ... 111 . ..... ..... 101 ..... 0100111 @r_nfvm
+vsuxw_v    ... 111 . ..... ..... 110 ..... 0100111 @r_nfvm
+vsuxe_v    ... 111 . ..... ..... 111 ..... 0100111 @r_nfvm
+
 # *** new major opcode OP-V ***
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 5a7ea94c2d..13033b3906 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -499,3 +499,167 @@ GEN_VEXT_ST_STRIDE_TRANS(vssb_v, vext_st_stride_trans, 0)
 GEN_VEXT_ST_STRIDE_TRANS(vssh_v, vext_st_stride_trans, 1)
 GEN_VEXT_ST_STRIDE_TRANS(vssw_v, vext_st_stride_trans, 2)
 GEN_VEXT_ST_STRIDE_TRANS(vsse_v, vext_st_stride_trans, 3)
+
+/* index load and store */
+typedef void gen_helper_vext_ldst_index(TCGv_ptr, TCGv, TCGv_ptr,
+        TCGv_ptr, TCGv_env, TCGv_i32);
+
+static bool do_vext_ldst_index_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
+        uint32_t data, gen_helper_vext_ldst_index *fn, DisasContext *s)
+{
+    TCGv_ptr dest, mask, index;
+    TCGv base;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    index = tcg_temp_new_ptr();
+    base = tcg_temp_new();
+    desc = tcg_const_i32(simd_desc(0, maxsz_part1(s->maxsz), data));
+
+    gen_get_gpr(base, rs1);
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(index, cpu_env, vreg_ofs(s, vs2));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, base, mask, index, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free_ptr(index);
+    tcg_temp_free(base);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool vext_ld_index_trans(DisasContext *s, arg_rnfvm *a, uint8_t seq)
+{
+    uint8_t nf = a->nf + 1;
+    uint32_t data = s->mlen | (a->vm << 8) | (maxsz_part2(s->maxsz) << 9)
+        | (nf << 12);
+    gen_helper_vext_ldst_index *fn;
+    static gen_helper_vext_ldst_index * const fns[7][4] = {
+        /* masked index load */
+        { gen_helper_vlxb_v_b_mask,  gen_helper_vlxb_v_h_mask,
+          gen_helper_vlxb_v_w_mask,  gen_helper_vlxb_v_d_mask },
+        { NULL,                      gen_helper_vlxh_v_h_mask,
+          gen_helper_vlxh_v_w_mask,  gen_helper_vlxh_v_d_mask },
+        { NULL,                      NULL,
+          gen_helper_vlxw_v_w_mask,  gen_helper_vlxw_v_d_mask },
+        { gen_helper_vlxe_v_b_mask,  gen_helper_vlxe_v_h_mask,
+          gen_helper_vlxe_v_w_mask,  gen_helper_vlxe_v_d_mask },
+        { gen_helper_vlxbu_v_b_mask, gen_helper_vlxbu_v_h_mask,
+          gen_helper_vlxbu_v_w_mask, gen_helper_vlxbu_v_d_mask },
+        { NULL,                      gen_helper_vlxhu_v_h_mask,
+          gen_helper_vlxhu_v_w_mask, gen_helper_vlxhu_v_d_mask },
+        { NULL,                      NULL,
+          gen_helper_vlxwu_v_w_mask, gen_helper_vlxwu_v_d_mask },
+    };
+
+    fn =  fns[seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    return do_vext_ldst_index_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+
+#define GEN_VEXT_LD_INDEX_TRANS(NAME, DO_OP, SEQ)                         \
+static bool trans_##NAME(DisasContext *s, arg_rnfvm* a)                   \
+{                                                                         \
+    vchkctx.check_misa = RVV;                                             \
+    vchkctx.check_overlap_mask.need_check = true;                         \
+    vchkctx.check_overlap_mask.reg = a->rd;                               \
+    vchkctx.check_overlap_mask.vm = a->vm;                                \
+    vchkctx.check_reg[0].need_check = true;                               \
+    vchkctx.check_reg[0].reg = a->rd;                                     \
+    vchkctx.check_reg[0].widen = false;                                   \
+    vchkctx.check_reg[1].need_check = true;                               \
+    vchkctx.check_reg[0].reg = a->rs2;                                    \
+    vchkctx.check_reg[0].widen = false;                                   \
+    vchkctx.check_nf.need_check = true;                                   \
+    vchkctx.check_nf.nf = a->nf;                                          \
+                                                                          \
+    if (!vext_check(s)) {                                                 \
+        return false;                                                     \
+    }                                                                     \
+    return DO_OP(s, a, SEQ);                                              \
+}
+
+GEN_VEXT_LD_INDEX_TRANS(vlxb_v, vext_ld_index_trans, 0)
+GEN_VEXT_LD_INDEX_TRANS(vlxh_v, vext_ld_index_trans, 1)
+GEN_VEXT_LD_INDEX_TRANS(vlxw_v, vext_ld_index_trans, 2)
+GEN_VEXT_LD_INDEX_TRANS(vlxe_v, vext_ld_index_trans, 3)
+GEN_VEXT_LD_INDEX_TRANS(vlxbu_v, vext_ld_index_trans, 4)
+GEN_VEXT_LD_INDEX_TRANS(vlxhu_v, vext_ld_index_trans, 5)
+GEN_VEXT_LD_INDEX_TRANS(vlxwu_v, vext_ld_index_trans, 6)
+
+static bool vext_st_index_trans(DisasContext *s, arg_rnfvm *a, uint8_t seq)
+{
+    uint8_t nf = a->nf + 1;
+    uint32_t data = s->mlen | (a->vm << 8) | (maxsz_part2(s->maxsz) << 9)
+        | (nf << 12);
+    gen_helper_vext_ldst_index *fn;
+    static gen_helper_vext_ldst_index * const fns[4][4] = {
+        /* masked index store */
+        { gen_helper_vsxb_v_b_mask,  gen_helper_vsxb_v_h_mask,
+          gen_helper_vsxb_v_w_mask,  gen_helper_vsxb_v_d_mask },
+        { NULL,                      gen_helper_vsxh_v_h_mask,
+          gen_helper_vsxh_v_w_mask,  gen_helper_vsxh_v_d_mask },
+        { NULL,                      NULL,
+          gen_helper_vsxw_v_w_mask,  gen_helper_vsxw_v_d_mask },
+        { gen_helper_vsxe_v_b_mask,  gen_helper_vsxe_v_h_mask,
+          gen_helper_vsxe_v_w_mask,  gen_helper_vsxe_v_d_mask }
+    };
+
+    fn =  fns[seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    return do_vext_ldst_index_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+
+#define GEN_VEXT_ST_INDEX_TRANS(NAME, DO_OP, SEQ)                         \
+static bool trans_##NAME(DisasContext *s, arg_rnfvm* a)                   \
+{                                                                         \
+    vchkctx.check_misa = RVV;                                             \
+    vchkctx.check_reg[0].need_check = true;                               \
+    vchkctx.check_reg[0].reg = a->rd;                                     \
+    vchkctx.check_reg[0].widen = false;                                   \
+    vchkctx.check_reg[1].need_check = true;                               \
+    vchkctx.check_reg[0].reg = a->rs2;                                    \
+    vchkctx.check_reg[0].widen = false;                                   \
+    vchkctx.check_nf.need_check = true;                                   \
+    vchkctx.check_nf.nf = a->nf;                                          \
+                                                                          \
+    if (!vext_check(s)) {                                                 \
+        return false;                                                     \
+    }                                                                     \
+    return DO_OP(s, a, SEQ);                                              \
+}
+
+GEN_VEXT_ST_INDEX_TRANS(vsxb_v, vext_st_index_trans, 0)
+GEN_VEXT_ST_INDEX_TRANS(vsxh_v, vext_st_index_trans, 1)
+GEN_VEXT_ST_INDEX_TRANS(vsxw_v, vext_st_index_trans, 2)
+GEN_VEXT_ST_INDEX_TRANS(vsxe_v, vext_st_index_trans, 3)
+
+static bool trans_vsuxb_v(DisasContext *s, arg_rnfvm* a)
+{
+    return trans_vsxb_v(s, a);
+}
+
+static bool trans_vsuxh_v(DisasContext *s, arg_rnfvm* a)
+{
+    return trans_vsxh_v(s, a);
+}
+
+static bool trans_vsuxw_v(DisasContext *s, arg_rnfvm* a)
+{
+    return trans_vsxw_v(s, a);
+}
+
+static bool trans_vsuxe_v(DisasContext *s, arg_rnfvm* a)
+{
+    return trans_vsxe_v(s, a);
+}
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 345945d19c..0404394588 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -279,6 +279,28 @@ GEN_VEXT_LD_ELEM(vlshu_v_w, uint16_t, uint32_t, H4, lduw)
 GEN_VEXT_LD_ELEM(vlshu_v_d, uint16_t, uint64_t, H8, lduw)
 GEN_VEXT_LD_ELEM(vlswu_v_w, uint32_t, uint32_t, H4, ldl)
 GEN_VEXT_LD_ELEM(vlswu_v_d, uint32_t, uint64_t, H8, ldl)
+GEN_VEXT_LD_ELEM(vlxb_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vlxb_v_h, int8_t,  int16_t, H2, ldsb)
+GEN_VEXT_LD_ELEM(vlxb_v_w, int8_t,  int32_t, H4, ldsb)
+GEN_VEXT_LD_ELEM(vlxb_v_d, int8_t,  int64_t, H8, ldsb)
+GEN_VEXT_LD_ELEM(vlxh_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vlxh_v_w, int16_t, int32_t, H4, ldsw)
+GEN_VEXT_LD_ELEM(vlxh_v_d, int16_t, int64_t, H8, ldsw)
+GEN_VEXT_LD_ELEM(vlxw_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlxw_v_d, int32_t, int64_t, H8, ldl)
+GEN_VEXT_LD_ELEM(vlxe_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vlxe_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vlxe_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlxe_v_d, int64_t, int64_t, H8, ldq)
+GEN_VEXT_LD_ELEM(vlxbu_v_b, uint8_t,  uint8_t,  H1, ldub)
+GEN_VEXT_LD_ELEM(vlxbu_v_h, uint8_t,  uint16_t, H2, ldub)
+GEN_VEXT_LD_ELEM(vlxbu_v_w, uint8_t,  uint32_t, H4, ldub)
+GEN_VEXT_LD_ELEM(vlxbu_v_d, uint8_t,  uint64_t, H8, ldub)
+GEN_VEXT_LD_ELEM(vlxhu_v_h, uint16_t, uint16_t, H2, lduw)
+GEN_VEXT_LD_ELEM(vlxhu_v_w, uint16_t, uint32_t, H4, lduw)
+GEN_VEXT_LD_ELEM(vlxhu_v_d, uint16_t, uint64_t, H8, lduw)
+GEN_VEXT_LD_ELEM(vlxwu_v_w, uint32_t, uint32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlxwu_v_d, uint32_t, uint64_t, H8, ldl)
 
 #define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF)                       \
 static void vext_##NAME##_st_elem(CPURISCVState *env, abi_ptr addr,   \
@@ -315,6 +337,19 @@ GEN_VEXT_ST_ELEM(vsse_v_b, int8_t,  H1, stb)
 GEN_VEXT_ST_ELEM(vsse_v_h, int16_t, H2, stw)
 GEN_VEXT_ST_ELEM(vsse_v_w, int32_t, H4, stl)
 GEN_VEXT_ST_ELEM(vsse_v_d, int64_t, H8, stq)
+GEN_VEXT_ST_ELEM(vsxb_v_b, int8_t,  H1, stb)
+GEN_VEXT_ST_ELEM(vsxb_v_h, int16_t, H2, stb)
+GEN_VEXT_ST_ELEM(vsxb_v_w, int32_t, H4, stb)
+GEN_VEXT_ST_ELEM(vsxb_v_d, int64_t, H8, stb)
+GEN_VEXT_ST_ELEM(vsxh_v_h, int16_t, H2, stw)
+GEN_VEXT_ST_ELEM(vsxh_v_w, int32_t, H4, stw)
+GEN_VEXT_ST_ELEM(vsxh_v_d, int64_t, H8, stw)
+GEN_VEXT_ST_ELEM(vsxw_v_w, int32_t, H4, stl)
+GEN_VEXT_ST_ELEM(vsxw_v_d, int64_t, H8, stl)
+GEN_VEXT_ST_ELEM(vsxe_v_b, int8_t,  H1, stb)
+GEN_VEXT_ST_ELEM(vsxe_v_h, int16_t, H2, stw)
+GEN_VEXT_ST_ELEM(vsxe_v_w, int32_t, H4, stl)
+GEN_VEXT_ST_ELEM(vsxe_v_d, int64_t, H8, stq)
 
 /* unit-stride: load vector element from continuous guest memory */
 static void vext_ld_unit_stride_mask(void *vd, void *v0, CPURISCVState *env,
@@ -654,3 +689,182 @@ GEN_VEXT_ST_STRIDE(vsse_v_b, int8_t,  int8_t)
 GEN_VEXT_ST_STRIDE(vsse_v_h, int16_t, int16_t)
 GEN_VEXT_ST_STRIDE(vsse_v_w, int32_t, int32_t)
 GEN_VEXT_ST_STRIDE(vsse_v_d, int64_t, int64_t)
+
+/* index: load indexed vector element from guest memory */
+#define GEN_VEXT_GET_INDEX_ADDR(NAME, ETYPE, H)                   \
+static target_ulong vext_##NAME##_get_addr(target_ulong base,     \
+        uint32_t idx, void *vs2)                                  \
+{                                                                 \
+    return (base + *((ETYPE *)vs2 + H(idx)));                     \
+}
+
+GEN_VEXT_GET_INDEX_ADDR(vlxb_v_b, int8_t,  H1)
+GEN_VEXT_GET_INDEX_ADDR(vlxb_v_h, int16_t, H2)
+GEN_VEXT_GET_INDEX_ADDR(vlxb_v_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vlxb_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vlxh_v_h, int16_t, H2)
+GEN_VEXT_GET_INDEX_ADDR(vlxh_v_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vlxh_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vlxw_v_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vlxw_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vlxe_v_b, int8_t,  H1)
+GEN_VEXT_GET_INDEX_ADDR(vlxe_v_h, int16_t, H2)
+GEN_VEXT_GET_INDEX_ADDR(vlxe_v_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vlxe_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vlxbu_v_b, uint8_t,  H1)
+GEN_VEXT_GET_INDEX_ADDR(vlxbu_v_h, uint16_t, H2)
+GEN_VEXT_GET_INDEX_ADDR(vlxbu_v_w, uint32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vlxbu_v_d, uint64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vlxhu_v_h, uint16_t, H2)
+GEN_VEXT_GET_INDEX_ADDR(vlxhu_v_w, uint32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vlxhu_v_d, uint64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vlxwu_v_w, uint32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vlxwu_v_d, uint64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vsxb_v_b, int8_t,  H1)
+GEN_VEXT_GET_INDEX_ADDR(vsxb_v_h, int16_t, H2)
+GEN_VEXT_GET_INDEX_ADDR(vsxb_v_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vsxb_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vsxh_v_h, int16_t, H2)
+GEN_VEXT_GET_INDEX_ADDR(vsxh_v_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vsxh_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vsxw_v_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vsxw_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vsxe_v_b, int8_t,  H1)
+GEN_VEXT_GET_INDEX_ADDR(vsxe_v_h, int16_t, H2)
+GEN_VEXT_GET_INDEX_ADDR(vsxe_v_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vsxe_v_d, int64_t, H8)
+
+static void vext_ld_index_mask(void *vd, void *vs2, void *v0,
+        CPURISCVState *env, struct vext_ldst_ctx *ctx, uintptr_t ra)
+{
+    uint32_t i, k;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    if (s->vl == 0) {
+        return;
+    }
+    /* probe every access*/
+    for (i = 0; i < s->vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        probe_read_access(env, ctx->get_index_addr(ctx->base, i, vs2),
+                ctx->nf * s->msz, ra);
+    }
+    /* load bytes from guest memory */
+    for (i = 0; i < s->vl; i++) {
+        k = 0;
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        while (k < ctx->nf) {
+            abi_ptr addr = ctx->get_index_addr(ctx->base, i, vs2) +
+                k * s->msz;
+            ctx->ld_elem(env, addr, i + k * s->vlmax, vd, ra);
+            k++;
+        }
+    }
+    /* clear tail elements */
+    for (k = 0; k < ctx->nf; k++) {
+        ctx->clear_elem(vd, s->vl + k * s->vlmax, s->vl * s->esz,
+                s->vlmax * s->esz);
+    }
+}
+
+#define GEN_VEXT_LD_INDEX(NAME, MTYPE, ETYPE)                                  \
+void HELPER(NAME##_mask)(void *vd, target_ulong base, void *v0, void *vs2,     \
+        CPURISCVState *env, uint32_t desc)                                     \
+{                                                                              \
+    static struct vext_ldst_ctx ctx;                                           \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                              \
+        sizeof(MTYPE), env->vext.vl, desc);                                    \
+    ctx.nf = vext_nf(desc);                                                    \
+    ctx.base = base;                                                           \
+    ctx.ld_elem = vext_##NAME##_ld_elem;                                       \
+    ctx.clear_elem = vext_##NAME##_clear_elem;                                 \
+    ctx.get_index_addr = vext_##NAME##_get_addr;                               \
+                                                                               \
+    vext_ld_index_mask(vd, vs2, v0, env, &ctx, GETPC());                       \
+}                                                                              \
+
+GEN_VEXT_LD_INDEX(vlxb_v_b, int8_t,  int8_t)
+GEN_VEXT_LD_INDEX(vlxb_v_h, int8_t,  int16_t)
+GEN_VEXT_LD_INDEX(vlxb_v_w, int8_t,  int32_t)
+GEN_VEXT_LD_INDEX(vlxb_v_d, int8_t,  int64_t)
+GEN_VEXT_LD_INDEX(vlxh_v_h, int16_t, int16_t)
+GEN_VEXT_LD_INDEX(vlxh_v_w, int16_t, int32_t)
+GEN_VEXT_LD_INDEX(vlxh_v_d, int16_t, int64_t)
+GEN_VEXT_LD_INDEX(vlxw_v_w, int32_t, int32_t)
+GEN_VEXT_LD_INDEX(vlxw_v_d, int32_t, int64_t)
+GEN_VEXT_LD_INDEX(vlxe_v_b, int8_t,  int8_t)
+GEN_VEXT_LD_INDEX(vlxe_v_h, int16_t, int16_t)
+GEN_VEXT_LD_INDEX(vlxe_v_w, int32_t, int32_t)
+GEN_VEXT_LD_INDEX(vlxe_v_d, int64_t, int64_t)
+GEN_VEXT_LD_INDEX(vlxbu_v_b, uint8_t,  uint8_t)
+GEN_VEXT_LD_INDEX(vlxbu_v_h, uint8_t,  uint16_t)
+GEN_VEXT_LD_INDEX(vlxbu_v_w, uint8_t,  uint32_t)
+GEN_VEXT_LD_INDEX(vlxbu_v_d, uint8_t,  uint64_t)
+GEN_VEXT_LD_INDEX(vlxhu_v_h, uint16_t, uint16_t)
+GEN_VEXT_LD_INDEX(vlxhu_v_w, uint16_t, uint32_t)
+GEN_VEXT_LD_INDEX(vlxhu_v_d, uint16_t, uint64_t)
+GEN_VEXT_LD_INDEX(vlxwu_v_w, uint32_t, uint32_t)
+GEN_VEXT_LD_INDEX(vlxwu_v_d, uint32_t, uint64_t)
+
+/* index: store indexed vector element to guest memory */
+static void vext_st_index_mask(void *vd, void *vs2, void *v0,
+        CPURISCVState *env, struct vext_ldst_ctx *ctx, uintptr_t ra)
+{
+    uint32_t i, k;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    /* probe every access*/
+    for (i = 0; i < s->vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        probe_write_access(env, ctx->get_index_addr(ctx->base, i, vs2),
+                ctx->nf * s->msz, ra);
+    }
+    /* store bytes to guest memory */
+    for (i = 0; i < s->vl; i++) {
+        k = 0;
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        while (k < ctx->nf) {
+            target_ulong addr = ctx->get_index_addr(ctx->base, i, vs2) +
+                k * s->msz;
+            ctx->st_elem(env, addr, i + k * s->vlmax, vd, ra);
+            k++;
+        }
+    }
+}
+
+#define GEN_VEXT_ST_INDEX(NAME, MTYPE, ETYPE)                       \
+void HELPER(NAME##_mask)(void *vd, target_ulong base, void *v0,     \
+        void *vs2, CPURISCVState *env, uint32_t desc)               \
+{                                                                   \
+    static struct vext_ldst_ctx ctx;                                \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                   \
+        sizeof(MTYPE), env->vext.vl, desc);                         \
+    ctx.nf = vext_nf(desc);                                         \
+    ctx.base = base;                                                \
+    ctx.st_elem = vext_##NAME##_st_elem;                            \
+    ctx.get_index_addr = vext_##NAME##_get_addr;                    \
+                                                                    \
+    vext_st_index_mask(vd, vs2, v0, env, &ctx, GETPC());            \
+}
+
+GEN_VEXT_ST_INDEX(vsxb_v_b, int8_t,  int8_t)
+GEN_VEXT_ST_INDEX(vsxb_v_h, int8_t,  int16_t)
+GEN_VEXT_ST_INDEX(vsxb_v_w, int8_t,  int32_t)
+GEN_VEXT_ST_INDEX(vsxb_v_d, int8_t,  int64_t)
+GEN_VEXT_ST_INDEX(vsxh_v_h, int16_t, int16_t)
+GEN_VEXT_ST_INDEX(vsxh_v_w, int16_t, int32_t)
+GEN_VEXT_ST_INDEX(vsxh_v_d, int16_t, int64_t)
+GEN_VEXT_ST_INDEX(vsxw_v_w, int32_t, int32_t)
+GEN_VEXT_ST_INDEX(vsxw_v_d, int32_t, int64_t)
+GEN_VEXT_ST_INDEX(vsxe_v_b, int8_t,  int8_t)
+GEN_VEXT_ST_INDEX(vsxe_v_h, int16_t, int16_t)
+GEN_VEXT_ST_INDEX(vsxe_v_w, int32_t, int32_t)
+GEN_VEXT_ST_INDEX(vsxe_v_d, int64_t, int64_t)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 4/5] target/riscv: add fault-only-first unit stride load
  2020-02-10  7:42 ` LIU Zhiwei
@ 2020-02-10  7:42   ` LIU Zhiwei
  -1 siblings, 0 replies; 18+ messages in thread
From: LIU Zhiwei @ 2020-02-10  7:42 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768, LIU Zhiwei

The unit-stride fault-only-fault load instructions are used to
vectorize loops with data-dependent exit conditions(while loops).
These instructions execute as a regular load except that they
will only take a trap on element 0.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  22 ++++
 target/riscv/insn32.decode              |   7 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  88 +++++++++++++++
 target/riscv/vector_helper.c            | 138 ++++++++++++++++++++++++
 4 files changed, 255 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 5ebd3d6ccd..893dfc0fb8 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -218,3 +218,25 @@ DEF_HELPER_6(vsxe_v_b_mask, void, ptr, tl, ptr, ptr, env, i32)
 DEF_HELPER_6(vsxe_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
 DEF_HELPER_6(vsxe_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
 DEF_HELPER_6(vsxe_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_5(vlbff_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbff_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbff_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhff_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhff_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwff_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vleff_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vleff_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vleff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vleff_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbuff_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbuff_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbuff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbuff_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhuff_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhuff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhuff_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwuff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwuff_v_d_mask, void, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 6a363a6b7e..973ac63fda 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -219,6 +219,13 @@ vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
 vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
 vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
 vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
+vlbff_v    ... 100 . 10000 ..... 000 ..... 0000111 @r2_nfvm
+vlhff_v    ... 100 . 10000 ..... 101 ..... 0000111 @r2_nfvm
+vlwff_v    ... 100 . 10000 ..... 110 ..... 0000111 @r2_nfvm
+vleff_v    ... 000 . 10000 ..... 111 ..... 0000111 @r2_nfvm
+vlbuff_v   ... 000 . 10000 ..... 000 ..... 0000111 @r2_nfvm
+vlhuff_v   ... 000 . 10000 ..... 101 ..... 0000111 @r2_nfvm
+vlwuff_v   ... 000 . 10000 ..... 110 ..... 0000111 @r2_nfvm
 vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
 vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
 vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 13033b3906..66caa16d18 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -663,3 +663,91 @@ static bool trans_vsuxe_v(DisasContext *s, arg_rnfvm* a)
 {
     return trans_vsxe_v(s, a);
 }
+
+/* unit stride fault-only-first load */
+typedef void gen_helper_vext_ldff(TCGv_ptr, TCGv, TCGv_ptr,
+        TCGv_env, TCGv_i32);
+
+static bool do_vext_ldff_trans(uint32_t vd, uint32_t rs1, uint32_t data,
+        gen_helper_vext_ldff *fn, DisasContext *s)
+{
+    TCGv_ptr dest, mask;
+    TCGv base;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    base = tcg_temp_new();
+    desc = tcg_const_i32(simd_desc(0, maxsz_part1(s->maxsz), data));
+
+    gen_get_gpr(base, rs1);
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, base, mask, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free(base);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool vext_ldff_trans(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
+{
+    uint8_t nf = a->nf + 1;
+    uint32_t data = s->mlen | (a->vm << 8) | (maxsz_part2(s->maxsz) << 9)
+        | (nf << 12);
+    gen_helper_vext_ldff *fn;
+    static gen_helper_vext_ldff * const fns[7][4] = {
+        /* masked unit stride fault-only-first load */
+        { gen_helper_vlbff_v_b_mask,  gen_helper_vlbff_v_h_mask,
+          gen_helper_vlbff_v_w_mask,  gen_helper_vlbff_v_d_mask },
+        { NULL,                       gen_helper_vlhff_v_h_mask,
+          gen_helper_vlhff_v_w_mask,  gen_helper_vlhff_v_d_mask },
+        { NULL,                       NULL,
+          gen_helper_vlwff_v_w_mask,  gen_helper_vlwff_v_d_mask },
+        { gen_helper_vleff_v_b_mask,  gen_helper_vleff_v_h_mask,
+          gen_helper_vleff_v_w_mask,  gen_helper_vleff_v_d_mask },
+        { gen_helper_vlbuff_v_b_mask, gen_helper_vlbuff_v_h_mask,
+          gen_helper_vlbuff_v_w_mask, gen_helper_vlbuff_v_d_mask },
+        { NULL,                       gen_helper_vlhuff_v_h_mask,
+          gen_helper_vlhuff_v_w_mask, gen_helper_vlhuff_v_d_mask },
+        { NULL,                       NULL,
+          gen_helper_vlwuff_v_w_mask, gen_helper_vlwuff_v_d_mask }
+    };
+
+    fn =  fns[seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    return do_vext_ldff_trans(a->rd, a->rs1, data, fn, s);
+}
+
+#define GEN_VEXT_LDFF_TRANS(NAME, DO_OP, SEQ)                   \
+static bool trans_##NAME(DisasContext *s, arg_r2nfvm* a)        \
+{                                                               \
+    vchkctx.check_misa = RVV;                                   \
+    vchkctx.check_overlap_mask.need_check = true;               \
+    vchkctx.check_overlap_mask.reg = a->rd;                     \
+    vchkctx.check_overlap_mask.vm = a->vm;                      \
+    vchkctx.check_reg[0].need_check = true;                     \
+    vchkctx.check_reg[0].reg = a->rd;                           \
+    vchkctx.check_reg[0].widen = false;                         \
+    vchkctx.check_nf.need_check = true;                         \
+    vchkctx.check_nf.nf = a->nf;                                \
+                                                                \
+    if (!vext_check(s)) {                                       \
+        return false;                                           \
+    }                                                           \
+    return DO_OP(s, a, SEQ);                                    \
+}
+
+GEN_VEXT_LDFF_TRANS(vlbff_v, vext_ldff_trans, 0)
+GEN_VEXT_LDFF_TRANS(vlhff_v, vext_ldff_trans, 1)
+GEN_VEXT_LDFF_TRANS(vlwff_v, vext_ldff_trans, 2)
+GEN_VEXT_LDFF_TRANS(vleff_v, vext_ldff_trans, 3)
+GEN_VEXT_LDFF_TRANS(vlbuff_v, vext_ldff_trans, 4)
+GEN_VEXT_LDFF_TRANS(vlhuff_v, vext_ldff_trans, 5)
+GEN_VEXT_LDFF_TRANS(vlwuff_v, vext_ldff_trans, 6)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 0404394588..941851ab28 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -301,6 +301,28 @@ GEN_VEXT_LD_ELEM(vlxhu_v_w, uint16_t, uint32_t, H4, lduw)
 GEN_VEXT_LD_ELEM(vlxhu_v_d, uint16_t, uint64_t, H8, lduw)
 GEN_VEXT_LD_ELEM(vlxwu_v_w, uint32_t, uint32_t, H4, ldl)
 GEN_VEXT_LD_ELEM(vlxwu_v_d, uint32_t, uint64_t, H8, ldl)
+GEN_VEXT_LD_ELEM(vlbff_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vlbff_v_h, int8_t,  int16_t, H2, ldsb)
+GEN_VEXT_LD_ELEM(vlbff_v_w, int8_t,  int32_t, H4, ldsb)
+GEN_VEXT_LD_ELEM(vlbff_v_d, int8_t,  int64_t, H8, ldsb)
+GEN_VEXT_LD_ELEM(vlhff_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vlhff_v_w, int16_t, int32_t, H4, ldsw)
+GEN_VEXT_LD_ELEM(vlhff_v_d, int16_t, int64_t, H8, ldsw)
+GEN_VEXT_LD_ELEM(vlwff_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlwff_v_d, int32_t, int64_t, H8, ldl)
+GEN_VEXT_LD_ELEM(vleff_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vleff_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vleff_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vleff_v_d, int64_t, int64_t, H8, ldq)
+GEN_VEXT_LD_ELEM(vlbuff_v_b, uint8_t,  uint8_t,  H1, ldub)
+GEN_VEXT_LD_ELEM(vlbuff_v_h, uint8_t,  uint16_t, H2, ldub)
+GEN_VEXT_LD_ELEM(vlbuff_v_w, uint8_t,  uint32_t, H4, ldub)
+GEN_VEXT_LD_ELEM(vlbuff_v_d, uint8_t,  uint64_t, H8, ldub)
+GEN_VEXT_LD_ELEM(vlhuff_v_h, uint16_t, uint16_t, H2, lduw)
+GEN_VEXT_LD_ELEM(vlhuff_v_w, uint16_t, uint32_t, H4, lduw)
+GEN_VEXT_LD_ELEM(vlhuff_v_d, uint16_t, uint64_t, H8, lduw)
+GEN_VEXT_LD_ELEM(vlwuff_v_w, uint32_t, uint32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlwuff_v_d, uint32_t, uint64_t, H8, ldl)
 
 #define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF)                       \
 static void vext_##NAME##_st_elem(CPURISCVState *env, abi_ptr addr,   \
@@ -868,3 +890,119 @@ GEN_VEXT_ST_INDEX(vsxe_v_b, int8_t,  int8_t)
 GEN_VEXT_ST_INDEX(vsxe_v_h, int16_t, int16_t)
 GEN_VEXT_ST_INDEX(vsxe_v_w, int32_t, int32_t)
 GEN_VEXT_ST_INDEX(vsxe_v_d, int64_t, int64_t)
+
+/* unit-stride fault-only-fisrt load instructions */
+static void vext_ldff_mask(void *vd, void *v0, CPURISCVState *env,
+        struct vext_ldst_ctx *ctx, uintptr_t ra)
+{
+    void *host;
+    uint32_t i, k, vl = 0;
+    target_ulong addr, offset, remain;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    if (s->vl == 0) {
+        return;
+    }
+    /* probe every access*/
+    for (i = 0; i < s->vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        addr = ctx->base + ctx->nf * i * s->msz;
+        if (i == 0) {
+            probe_read_access(env, addr, ctx->nf * s->msz, ra);
+        } else {
+            /* if it triggles an exception, no need to check watchpoint */
+            offset = -(addr | TARGET_PAGE_MASK);
+            remain = ctx->nf * s->msz;
+            while (remain > 0) {
+                host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD,
+                        ctx->mmuidx);
+                if (host) {
+#ifdef CONFIG_USER_ONLY
+                    if (page_check_range(addr, ctx->nf * s->msz,
+                                PAGE_READ) < 0) {
+                        vl = i;
+                        goto ProbeSuccess;
+                    }
+#else
+                    probe_read_access(env, addr, ctx->nf * s->msz, ra);
+#endif
+                } else {
+                    vl = i;
+                    goto ProbeSuccess;
+                }
+                if (remain <=  offset) {
+                    break;
+                }
+                remain -= offset;
+                addr += offset;
+                offset = -(addr | TARGET_PAGE_MASK);
+            }
+        }
+    }
+ProbeSuccess:
+    /* load bytes from guest memory */
+    if (vl != 0) {
+        s->vl = vl;
+    }
+    for (i = 0; i < s->vl; i++) {
+        k = 0;
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        while (k < ctx->nf) {
+            target_ulong addr = ctx->base + (i * ctx->nf + k) * s->msz;
+            ctx->ld_elem(env, addr, i + k * s->vlmax, vd, ra);
+            k++;
+        }
+    }
+    /* clear tail elements */
+    if (vl != 0) {
+        env->vext.vl = vl;
+        return;
+    }
+    for (k = 0; k < ctx->nf; k++) {
+        ctx->clear_elem(vd, s->vl + k * s->vlmax, s->vl * s->esz,
+                s->vlmax * s->esz);
+    }
+}
+
+#define GEN_VEXT_LDFF(NAME, MTYPE, ETYPE, MMUIDX)                      \
+void HELPER(NAME##_mask)(void *vd, target_ulong base, void *v0,        \
+        CPURISCVState *env, uint32_t desc)                             \
+{                                                                      \
+    static struct vext_ldst_ctx ctx;                                   \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                      \
+        sizeof(MTYPE), env->vext.vl, desc);                            \
+    ctx.nf = vext_nf(desc);                                            \
+    ctx.base = base;                                                   \
+    ctx.mmuidx = MMUIDX;                                               \
+    ctx.ld_elem = vext_##NAME##_ld_elem;                               \
+    ctx.clear_elem = vext_##NAME##_clear_elem;                         \
+                                                                       \
+    vext_ldff_mask(vd, v0, env, &ctx, GETPC());                        \
+}
+
+GEN_VEXT_LDFF(vlbff_v_b, int8_t,  int8_t, MO_SB)
+GEN_VEXT_LDFF(vlbff_v_h, int8_t,  int16_t, MO_SB)
+GEN_VEXT_LDFF(vlbff_v_w, int8_t,  int32_t, MO_SB)
+GEN_VEXT_LDFF(vlbff_v_d, int8_t,  int64_t, MO_SB)
+GEN_VEXT_LDFF(vlhff_v_h, int16_t, int16_t, MO_LESW)
+GEN_VEXT_LDFF(vlhff_v_w, int16_t, int32_t, MO_LESW)
+GEN_VEXT_LDFF(vlhff_v_d, int16_t, int64_t, MO_LESW)
+GEN_VEXT_LDFF(vlwff_v_w, int32_t, int32_t, MO_LESL)
+GEN_VEXT_LDFF(vlwff_v_d, int32_t, int64_t, MO_LESL)
+GEN_VEXT_LDFF(vleff_v_b, int8_t,  int8_t, MO_SB)
+GEN_VEXT_LDFF(vleff_v_h, int16_t, int16_t, MO_LESW)
+GEN_VEXT_LDFF(vleff_v_w, int32_t, int32_t, MO_LESL)
+GEN_VEXT_LDFF(vleff_v_d, int64_t, int64_t, MO_LEQ)
+GEN_VEXT_LDFF(vlbuff_v_b, uint8_t,  uint8_t, MO_UB)
+GEN_VEXT_LDFF(vlbuff_v_h, uint8_t,  uint16_t, MO_UB)
+GEN_VEXT_LDFF(vlbuff_v_w, uint8_t,  uint32_t, MO_UB)
+GEN_VEXT_LDFF(vlbuff_v_d, uint8_t,  uint64_t, MO_UB)
+GEN_VEXT_LDFF(vlhuff_v_h, uint16_t, uint16_t, MO_LEUW)
+GEN_VEXT_LDFF(vlhuff_v_w, uint16_t, uint32_t, MO_LEUW)
+GEN_VEXT_LDFF(vlhuff_v_d, uint16_t, uint64_t, MO_LEUW)
+GEN_VEXT_LDFF(vlwuff_v_w, uint32_t, uint32_t, MO_LEUL)
+GEN_VEXT_LDFF(vlwuff_v_d, uint32_t, uint64_t, MO_LEUL)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 4/5] target/riscv: add fault-only-first unit stride load
@ 2020-02-10  7:42   ` LIU Zhiwei
  0 siblings, 0 replies; 18+ messages in thread
From: LIU Zhiwei @ 2020-02-10  7:42 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, qemu-devel, qemu-riscv, LIU Zhiwei

The unit-stride fault-only-fault load instructions are used to
vectorize loops with data-dependent exit conditions(while loops).
These instructions execute as a regular load except that they
will only take a trap on element 0.

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  22 ++++
 target/riscv/insn32.decode              |   7 ++
 target/riscv/insn_trans/trans_rvv.inc.c |  88 +++++++++++++++
 target/riscv/vector_helper.c            | 138 ++++++++++++++++++++++++
 4 files changed, 255 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 5ebd3d6ccd..893dfc0fb8 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -218,3 +218,25 @@ DEF_HELPER_6(vsxe_v_b_mask, void, ptr, tl, ptr, ptr, env, i32)
 DEF_HELPER_6(vsxe_v_h_mask, void, ptr, tl, ptr, ptr, env, i32)
 DEF_HELPER_6(vsxe_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
 DEF_HELPER_6(vsxe_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_5(vlbff_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbff_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbff_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhff_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhff_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwff_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vleff_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vleff_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vleff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vleff_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbuff_v_b_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbuff_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbuff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlbuff_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhuff_v_h_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhuff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlhuff_v_d_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwuff_v_w_mask, void, ptr, tl, ptr, env, i32)
+DEF_HELPER_5(vlwuff_v_d_mask, void, ptr, tl, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 6a363a6b7e..973ac63fda 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -219,6 +219,13 @@ vle_v      ... 000 . 00000 ..... 111 ..... 0000111 @r2_nfvm
 vlbu_v     ... 000 . 00000 ..... 000 ..... 0000111 @r2_nfvm
 vlhu_v     ... 000 . 00000 ..... 101 ..... 0000111 @r2_nfvm
 vlwu_v     ... 000 . 00000 ..... 110 ..... 0000111 @r2_nfvm
+vlbff_v    ... 100 . 10000 ..... 000 ..... 0000111 @r2_nfvm
+vlhff_v    ... 100 . 10000 ..... 101 ..... 0000111 @r2_nfvm
+vlwff_v    ... 100 . 10000 ..... 110 ..... 0000111 @r2_nfvm
+vleff_v    ... 000 . 10000 ..... 111 ..... 0000111 @r2_nfvm
+vlbuff_v   ... 000 . 10000 ..... 000 ..... 0000111 @r2_nfvm
+vlhuff_v   ... 000 . 10000 ..... 101 ..... 0000111 @r2_nfvm
+vlwuff_v   ... 000 . 10000 ..... 110 ..... 0000111 @r2_nfvm
 vsb_v      ... 000 . 00000 ..... 000 ..... 0100111 @r2_nfvm
 vsh_v      ... 000 . 00000 ..... 101 ..... 0100111 @r2_nfvm
 vsw_v      ... 000 . 00000 ..... 110 ..... 0100111 @r2_nfvm
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 13033b3906..66caa16d18 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -663,3 +663,91 @@ static bool trans_vsuxe_v(DisasContext *s, arg_rnfvm* a)
 {
     return trans_vsxe_v(s, a);
 }
+
+/* unit stride fault-only-first load */
+typedef void gen_helper_vext_ldff(TCGv_ptr, TCGv, TCGv_ptr,
+        TCGv_env, TCGv_i32);
+
+static bool do_vext_ldff_trans(uint32_t vd, uint32_t rs1, uint32_t data,
+        gen_helper_vext_ldff *fn, DisasContext *s)
+{
+    TCGv_ptr dest, mask;
+    TCGv base;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    base = tcg_temp_new();
+    desc = tcg_const_i32(simd_desc(0, maxsz_part1(s->maxsz), data));
+
+    gen_get_gpr(base, rs1);
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, base, mask, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free(base);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool vext_ldff_trans(DisasContext *s, arg_r2nfvm *a, uint8_t seq)
+{
+    uint8_t nf = a->nf + 1;
+    uint32_t data = s->mlen | (a->vm << 8) | (maxsz_part2(s->maxsz) << 9)
+        | (nf << 12);
+    gen_helper_vext_ldff *fn;
+    static gen_helper_vext_ldff * const fns[7][4] = {
+        /* masked unit stride fault-only-first load */
+        { gen_helper_vlbff_v_b_mask,  gen_helper_vlbff_v_h_mask,
+          gen_helper_vlbff_v_w_mask,  gen_helper_vlbff_v_d_mask },
+        { NULL,                       gen_helper_vlhff_v_h_mask,
+          gen_helper_vlhff_v_w_mask,  gen_helper_vlhff_v_d_mask },
+        { NULL,                       NULL,
+          gen_helper_vlwff_v_w_mask,  gen_helper_vlwff_v_d_mask },
+        { gen_helper_vleff_v_b_mask,  gen_helper_vleff_v_h_mask,
+          gen_helper_vleff_v_w_mask,  gen_helper_vleff_v_d_mask },
+        { gen_helper_vlbuff_v_b_mask, gen_helper_vlbuff_v_h_mask,
+          gen_helper_vlbuff_v_w_mask, gen_helper_vlbuff_v_d_mask },
+        { NULL,                       gen_helper_vlhuff_v_h_mask,
+          gen_helper_vlhuff_v_w_mask, gen_helper_vlhuff_v_d_mask },
+        { NULL,                       NULL,
+          gen_helper_vlwuff_v_w_mask, gen_helper_vlwuff_v_d_mask }
+    };
+
+    fn =  fns[seq][s->sew];
+    if (fn == NULL) {
+        return false;
+    }
+
+    return do_vext_ldff_trans(a->rd, a->rs1, data, fn, s);
+}
+
+#define GEN_VEXT_LDFF_TRANS(NAME, DO_OP, SEQ)                   \
+static bool trans_##NAME(DisasContext *s, arg_r2nfvm* a)        \
+{                                                               \
+    vchkctx.check_misa = RVV;                                   \
+    vchkctx.check_overlap_mask.need_check = true;               \
+    vchkctx.check_overlap_mask.reg = a->rd;                     \
+    vchkctx.check_overlap_mask.vm = a->vm;                      \
+    vchkctx.check_reg[0].need_check = true;                     \
+    vchkctx.check_reg[0].reg = a->rd;                           \
+    vchkctx.check_reg[0].widen = false;                         \
+    vchkctx.check_nf.need_check = true;                         \
+    vchkctx.check_nf.nf = a->nf;                                \
+                                                                \
+    if (!vext_check(s)) {                                       \
+        return false;                                           \
+    }                                                           \
+    return DO_OP(s, a, SEQ);                                    \
+}
+
+GEN_VEXT_LDFF_TRANS(vlbff_v, vext_ldff_trans, 0)
+GEN_VEXT_LDFF_TRANS(vlhff_v, vext_ldff_trans, 1)
+GEN_VEXT_LDFF_TRANS(vlwff_v, vext_ldff_trans, 2)
+GEN_VEXT_LDFF_TRANS(vleff_v, vext_ldff_trans, 3)
+GEN_VEXT_LDFF_TRANS(vlbuff_v, vext_ldff_trans, 4)
+GEN_VEXT_LDFF_TRANS(vlhuff_v, vext_ldff_trans, 5)
+GEN_VEXT_LDFF_TRANS(vlwuff_v, vext_ldff_trans, 6)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 0404394588..941851ab28 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -301,6 +301,28 @@ GEN_VEXT_LD_ELEM(vlxhu_v_w, uint16_t, uint32_t, H4, lduw)
 GEN_VEXT_LD_ELEM(vlxhu_v_d, uint16_t, uint64_t, H8, lduw)
 GEN_VEXT_LD_ELEM(vlxwu_v_w, uint32_t, uint32_t, H4, ldl)
 GEN_VEXT_LD_ELEM(vlxwu_v_d, uint32_t, uint64_t, H8, ldl)
+GEN_VEXT_LD_ELEM(vlbff_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vlbff_v_h, int8_t,  int16_t, H2, ldsb)
+GEN_VEXT_LD_ELEM(vlbff_v_w, int8_t,  int32_t, H4, ldsb)
+GEN_VEXT_LD_ELEM(vlbff_v_d, int8_t,  int64_t, H8, ldsb)
+GEN_VEXT_LD_ELEM(vlhff_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vlhff_v_w, int16_t, int32_t, H4, ldsw)
+GEN_VEXT_LD_ELEM(vlhff_v_d, int16_t, int64_t, H8, ldsw)
+GEN_VEXT_LD_ELEM(vlwff_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlwff_v_d, int32_t, int64_t, H8, ldl)
+GEN_VEXT_LD_ELEM(vleff_v_b, int8_t,  int8_t,  H1, ldsb)
+GEN_VEXT_LD_ELEM(vleff_v_h, int16_t, int16_t, H2, ldsw)
+GEN_VEXT_LD_ELEM(vleff_v_w, int32_t, int32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vleff_v_d, int64_t, int64_t, H8, ldq)
+GEN_VEXT_LD_ELEM(vlbuff_v_b, uint8_t,  uint8_t,  H1, ldub)
+GEN_VEXT_LD_ELEM(vlbuff_v_h, uint8_t,  uint16_t, H2, ldub)
+GEN_VEXT_LD_ELEM(vlbuff_v_w, uint8_t,  uint32_t, H4, ldub)
+GEN_VEXT_LD_ELEM(vlbuff_v_d, uint8_t,  uint64_t, H8, ldub)
+GEN_VEXT_LD_ELEM(vlhuff_v_h, uint16_t, uint16_t, H2, lduw)
+GEN_VEXT_LD_ELEM(vlhuff_v_w, uint16_t, uint32_t, H4, lduw)
+GEN_VEXT_LD_ELEM(vlhuff_v_d, uint16_t, uint64_t, H8, lduw)
+GEN_VEXT_LD_ELEM(vlwuff_v_w, uint32_t, uint32_t, H4, ldl)
+GEN_VEXT_LD_ELEM(vlwuff_v_d, uint32_t, uint64_t, H8, ldl)
 
 #define GEN_VEXT_ST_ELEM(NAME, ETYPE, H, STSUF)                       \
 static void vext_##NAME##_st_elem(CPURISCVState *env, abi_ptr addr,   \
@@ -868,3 +890,119 @@ GEN_VEXT_ST_INDEX(vsxe_v_b, int8_t,  int8_t)
 GEN_VEXT_ST_INDEX(vsxe_v_h, int16_t, int16_t)
 GEN_VEXT_ST_INDEX(vsxe_v_w, int32_t, int32_t)
 GEN_VEXT_ST_INDEX(vsxe_v_d, int64_t, int64_t)
+
+/* unit-stride fault-only-fisrt load instructions */
+static void vext_ldff_mask(void *vd, void *v0, CPURISCVState *env,
+        struct vext_ldst_ctx *ctx, uintptr_t ra)
+{
+    void *host;
+    uint32_t i, k, vl = 0;
+    target_ulong addr, offset, remain;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    if (s->vl == 0) {
+        return;
+    }
+    /* probe every access*/
+    for (i = 0; i < s->vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        addr = ctx->base + ctx->nf * i * s->msz;
+        if (i == 0) {
+            probe_read_access(env, addr, ctx->nf * s->msz, ra);
+        } else {
+            /* if it triggles an exception, no need to check watchpoint */
+            offset = -(addr | TARGET_PAGE_MASK);
+            remain = ctx->nf * s->msz;
+            while (remain > 0) {
+                host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD,
+                        ctx->mmuidx);
+                if (host) {
+#ifdef CONFIG_USER_ONLY
+                    if (page_check_range(addr, ctx->nf * s->msz,
+                                PAGE_READ) < 0) {
+                        vl = i;
+                        goto ProbeSuccess;
+                    }
+#else
+                    probe_read_access(env, addr, ctx->nf * s->msz, ra);
+#endif
+                } else {
+                    vl = i;
+                    goto ProbeSuccess;
+                }
+                if (remain <=  offset) {
+                    break;
+                }
+                remain -= offset;
+                addr += offset;
+                offset = -(addr | TARGET_PAGE_MASK);
+            }
+        }
+    }
+ProbeSuccess:
+    /* load bytes from guest memory */
+    if (vl != 0) {
+        s->vl = vl;
+    }
+    for (i = 0; i < s->vl; i++) {
+        k = 0;
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        while (k < ctx->nf) {
+            target_ulong addr = ctx->base + (i * ctx->nf + k) * s->msz;
+            ctx->ld_elem(env, addr, i + k * s->vlmax, vd, ra);
+            k++;
+        }
+    }
+    /* clear tail elements */
+    if (vl != 0) {
+        env->vext.vl = vl;
+        return;
+    }
+    for (k = 0; k < ctx->nf; k++) {
+        ctx->clear_elem(vd, s->vl + k * s->vlmax, s->vl * s->esz,
+                s->vlmax * s->esz);
+    }
+}
+
+#define GEN_VEXT_LDFF(NAME, MTYPE, ETYPE, MMUIDX)                      \
+void HELPER(NAME##_mask)(void *vd, target_ulong base, void *v0,        \
+        CPURISCVState *env, uint32_t desc)                             \
+{                                                                      \
+    static struct vext_ldst_ctx ctx;                                   \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                      \
+        sizeof(MTYPE), env->vext.vl, desc);                            \
+    ctx.nf = vext_nf(desc);                                            \
+    ctx.base = base;                                                   \
+    ctx.mmuidx = MMUIDX;                                               \
+    ctx.ld_elem = vext_##NAME##_ld_elem;                               \
+    ctx.clear_elem = vext_##NAME##_clear_elem;                         \
+                                                                       \
+    vext_ldff_mask(vd, v0, env, &ctx, GETPC());                        \
+}
+
+GEN_VEXT_LDFF(vlbff_v_b, int8_t,  int8_t, MO_SB)
+GEN_VEXT_LDFF(vlbff_v_h, int8_t,  int16_t, MO_SB)
+GEN_VEXT_LDFF(vlbff_v_w, int8_t,  int32_t, MO_SB)
+GEN_VEXT_LDFF(vlbff_v_d, int8_t,  int64_t, MO_SB)
+GEN_VEXT_LDFF(vlhff_v_h, int16_t, int16_t, MO_LESW)
+GEN_VEXT_LDFF(vlhff_v_w, int16_t, int32_t, MO_LESW)
+GEN_VEXT_LDFF(vlhff_v_d, int16_t, int64_t, MO_LESW)
+GEN_VEXT_LDFF(vlwff_v_w, int32_t, int32_t, MO_LESL)
+GEN_VEXT_LDFF(vlwff_v_d, int32_t, int64_t, MO_LESL)
+GEN_VEXT_LDFF(vleff_v_b, int8_t,  int8_t, MO_SB)
+GEN_VEXT_LDFF(vleff_v_h, int16_t, int16_t, MO_LESW)
+GEN_VEXT_LDFF(vleff_v_w, int32_t, int32_t, MO_LESL)
+GEN_VEXT_LDFF(vleff_v_d, int64_t, int64_t, MO_LEQ)
+GEN_VEXT_LDFF(vlbuff_v_b, uint8_t,  uint8_t, MO_UB)
+GEN_VEXT_LDFF(vlbuff_v_h, uint8_t,  uint16_t, MO_UB)
+GEN_VEXT_LDFF(vlbuff_v_w, uint8_t,  uint32_t, MO_UB)
+GEN_VEXT_LDFF(vlbuff_v_d, uint8_t,  uint64_t, MO_UB)
+GEN_VEXT_LDFF(vlhuff_v_h, uint16_t, uint16_t, MO_LEUW)
+GEN_VEXT_LDFF(vlhuff_v_w, uint16_t, uint32_t, MO_LEUW)
+GEN_VEXT_LDFF(vlhuff_v_d, uint16_t, uint64_t, MO_LEUW)
+GEN_VEXT_LDFF(vlwuff_v_w, uint32_t, uint32_t, MO_LEUL)
+GEN_VEXT_LDFF(vlwuff_v_d, uint32_t, uint64_t, MO_LEUL)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 5/5] target/riscv: add vector amo operations
  2020-02-10  7:42 ` LIU Zhiwei
@ 2020-02-10  7:42   ` LIU Zhiwei
  -1 siblings, 0 replies; 18+ messages in thread
From: LIU Zhiwei @ 2020-02-10  7:42 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768, LIU Zhiwei

Vector AMOs operate as if aq and rl bits were zero on each element
with regard to ordering relative to other instructions in the same hart.
Vector AMOs provide no ordering guarantee between element operations
in the same vector AMO instruction

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  57 +++++
 target/riscv/insn32-64.decode           |  11 +
 target/riscv/insn32.decode              |  13 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 167 ++++++++++++++
 target/riscv/vector_helper.c            | 292 ++++++++++++++++++++++++
 5 files changed, 540 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 893dfc0fb8..3624a20262 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -240,3 +240,60 @@ DEF_HELPER_5(vlhuff_v_w_mask, void, ptr, tl, ptr, env, i32)
 DEF_HELPER_5(vlhuff_v_d_mask, void, ptr, tl, ptr, env, i32)
 DEF_HELPER_5(vlwuff_v_w_mask, void, ptr, tl, ptr, env, i32)
 DEF_HELPER_5(vlwuff_v_d_mask, void, ptr, tl, ptr, env, i32)
+#ifdef TARGET_RISCV64
+DEF_HELPER_6(vamoswapw_v_d_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoswapd_v_d_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoaddd_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoxord_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoandd_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_d_a_mask,   void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoord_v_d_a_mask,   void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomind_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxd_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_d_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominud_v_d_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_d_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxud_v_d_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoswapw_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoswapd_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoaddd_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoxord_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoandd_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_d_mask,   void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoord_v_d_mask,   void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomind_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxd_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominud_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxud_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+#endif
+DEF_HELPER_6(vamoswapw_v_w_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_w_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_w_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_w_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_w_a_mask,   void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_w_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_w_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_w_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_w_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoswapw_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_w_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_w_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_w_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_w_mask,   void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_w_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_w_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 380bf791bc..86153d93fa 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -57,6 +57,17 @@ amomax_d   10100 . . ..... ..... 011 ..... 0101111 @atom_st
 amominu_d  11000 . . ..... ..... 011 ..... 0101111 @atom_st
 amomaxu_d  11100 . . ..... ..... 011 ..... 0101111 @atom_st
 
+#*** Vector AMO operations (in addition to Zvamo) ***
+vamoswapd_v     00001 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamoaddd_v      00000 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamoxord_v      00100 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamoandd_v      01100 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamoord_v       01000 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamomind_v      10000 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamomaxd_v      10100 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamominud_v     11000 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamomaxud_v     11100 . . ..... ..... 111 ..... 0101111 @r_wdvm
+
 # *** RV64F Standard Extension (in addition to RV32F) ***
 fcvt_l_s   1100000  00010 ..... ... ..... 1010011 @r2_rm
 fcvt_lu_s  1100000  00011 ..... ... ..... 1010011 @r2_rm
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 973ac63fda..077551dd13 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -43,6 +43,7 @@
 &u    imm rd
 &shift     shamt rs1 rd
 &atomic    aq rl rs2 rs1 rd
+&rwdvm     vm wd rd rs1 rs2
 &r2nfvm    vm rd rs1 nf
 &rnfvm     vm rd rs1 rs2 nf
 
@@ -64,6 +65,7 @@
 @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
 @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
 @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
+@r_wdvm  ..... wd:1 vm:1 ..... ..... ... ..... ....... &rwdvm %rs2 %rs1 %rd
 @r2_nfvm nf:3 ... vm:1 ..... ..... ... ..... ....... &r2nfvm %rs1 %rd
 @r_nfvm  nf:3 ... vm:1 ..... ..... ... ..... ....... &rnfvm %rs2 %rs1 %rd
 @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
@@ -259,6 +261,17 @@ vsuxh_v    ... 111 . ..... ..... 101 ..... 0100111 @r_nfvm
 vsuxw_v    ... 111 . ..... ..... 110 ..... 0100111 @r_nfvm
 vsuxe_v    ... 111 . ..... ..... 111 ..... 0100111 @r_nfvm
 
+#*** Vector AMO operations are encoded under the standard AMO major opcode ***
+vamoswapw_v     00001 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamoaddw_v      00000 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamoxorw_v      00100 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamoandw_v      01100 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamoorw_v       01000 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamominw_v      10000 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamomaxw_v      10100 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamominuw_v     11000 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamomaxuw_v     11100 . . ..... ..... 110 ..... 0101111 @r_wdvm
+
 # *** new major opcode OP-V ***
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 66caa16d18..f628e16346 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -751,3 +751,170 @@ GEN_VEXT_LDFF_TRANS(vleff_v, vext_ldff_trans, 3)
 GEN_VEXT_LDFF_TRANS(vlbuff_v, vext_ldff_trans, 4)
 GEN_VEXT_LDFF_TRANS(vlhuff_v, vext_ldff_trans, 5)
 GEN_VEXT_LDFF_TRANS(vlwuff_v, vext_ldff_trans, 6)
+
+/* vector atomic operation */
+typedef void gen_helper_vext_amo(TCGv_ptr, TCGv, TCGv_ptr, TCGv_ptr,
+        TCGv_env, TCGv_i32);
+
+static bool do_vext_amo_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
+        uint32_t data, gen_helper_vext_amo *fn, DisasContext *s)
+{
+    TCGv_ptr dest, mask, index;
+    TCGv base;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    index = tcg_temp_new_ptr();
+    base = tcg_temp_new();
+    desc = tcg_const_i32(simd_desc(0, maxsz_part1(s->maxsz), data));
+
+    gen_get_gpr(base, rs1);
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(index, cpu_env, vreg_ofs(s, vs2));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, base, mask, index, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free_ptr(index);
+    tcg_temp_free(base);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool vext_amo_trans(DisasContext *s, arg_rwdvm *a, uint8_t seq)
+{
+    uint32_t data = s->mlen | (a->vm << 8) | (maxsz_part2(s->maxsz) << 9)
+        | (a->wd << 12);
+    gen_helper_vext_amo *fn;
+#ifdef TARGET_RISCV64
+    static gen_helper_vext_amo *const fns[2][18][2] = {
+        /* atomic operation */
+        { { gen_helper_vamoswapw_v_w_a_mask, gen_helper_vamoswapw_v_d_a_mask },
+          { gen_helper_vamoaddw_v_w_a_mask,  gen_helper_vamoaddw_v_d_a_mask },
+          { gen_helper_vamoxorw_v_w_a_mask,  gen_helper_vamoxorw_v_d_a_mask },
+          { gen_helper_vamoandw_v_w_a_mask,  gen_helper_vamoandw_v_d_a_mask },
+          { gen_helper_vamoorw_v_w_a_mask,   gen_helper_vamoorw_v_d_a_mask },
+          { gen_helper_vamominw_v_w_a_mask,  gen_helper_vamominw_v_d_a_mask },
+          { gen_helper_vamomaxw_v_w_a_mask,  gen_helper_vamomaxw_v_d_a_mask },
+          { gen_helper_vamominuw_v_w_a_mask, gen_helper_vamominuw_v_d_a_mask },
+          { gen_helper_vamomaxuw_v_w_a_mask, gen_helper_vamomaxuw_v_d_a_mask },
+          { NULL,                            gen_helper_vamoswapd_v_d_a_mask },
+          { NULL,                            gen_helper_vamoaddd_v_d_a_mask },
+          { NULL,                            gen_helper_vamoxord_v_d_a_mask },
+          { NULL,                            gen_helper_vamoandd_v_d_a_mask },
+          { NULL,                            gen_helper_vamoord_v_d_a_mask },
+          { NULL,                            gen_helper_vamomind_v_d_a_mask },
+          { NULL,                            gen_helper_vamomaxd_v_d_a_mask },
+          { NULL,                            gen_helper_vamominud_v_d_a_mask },
+          { NULL,                           gen_helper_vamomaxud_v_d_a_mask } },
+        /* no atomic operation */
+        { { gen_helper_vamoswapw_v_w_mask, gen_helper_vamoswapw_v_d_mask },
+          { gen_helper_vamoaddw_v_w_mask,  gen_helper_vamoaddw_v_d_mask },
+          { gen_helper_vamoxorw_v_w_mask,  gen_helper_vamoxorw_v_d_mask },
+          { gen_helper_vamoandw_v_w_mask,  gen_helper_vamoandw_v_d_mask },
+          { gen_helper_vamoorw_v_w_mask,   gen_helper_vamoorw_v_d_mask },
+          { gen_helper_vamominw_v_w_mask,  gen_helper_vamominw_v_d_mask },
+          { gen_helper_vamomaxw_v_w_mask,  gen_helper_vamomaxw_v_d_mask },
+          { gen_helper_vamominuw_v_w_mask, gen_helper_vamominuw_v_d_mask },
+          { gen_helper_vamomaxuw_v_w_mask, gen_helper_vamomaxuw_v_d_mask },
+          { NULL,                          gen_helper_vamoswapd_v_d_mask },
+          { NULL,                          gen_helper_vamoaddd_v_d_mask },
+          { NULL,                          gen_helper_vamoxord_v_d_mask },
+          { NULL,                          gen_helper_vamoandd_v_d_mask },
+          { NULL,                          gen_helper_vamoord_v_d_mask },
+          { NULL,                          gen_helper_vamomind_v_d_mask },
+          { NULL,                          gen_helper_vamomaxd_v_d_mask },
+          { NULL,                          gen_helper_vamominud_v_d_mask },
+          { NULL,                          gen_helper_vamomaxud_v_d_mask } }
+    };
+#else
+    static gen_helper_vext_amo *const fns[2][9][2] = {
+        /* atomic operation */
+        { { gen_helper_vamoswapw_v_w_a_mask, NULL },
+          { gen_helper_vamoaddw_v_w_a_mask,  NULL },
+          { gen_helper_vamoxorw_v_w_a_mask,  NULL },
+          { gen_helper_vamoandw_v_w_a_mask,  NULL },
+          { gen_helper_vamoorw_v_w_a_mask,   NULL },
+          { gen_helper_vamominw_v_w_a_mask,  NULL },
+          { gen_helper_vamomaxw_v_w_a_mask,  NULL },
+          { gen_helper_vamominuw_v_w_a_mask, NULL },
+          { gen_helper_vamomaxuw_v_w_a_mask, NULL } },
+        /* no atomic operation */
+        { { gen_helper_vamoswapw_v_w_mask, NULL },
+          { gen_helper_vamoaddw_v_w_mask,  NULL },
+          { gen_helper_vamoxorw_v_w_mask,  NULL },
+          { gen_helper_vamoandw_v_w_mask,  NULL },
+          { gen_helper_vamoorw_v_w_mask,   NULL },
+          { gen_helper_vamominw_v_w_mask,  NULL },
+          { gen_helper_vamomaxw_v_w_mask,  NULL },
+          { gen_helper_vamominuw_v_w_mask, NULL },
+          { gen_helper_vamomaxuw_v_w_mask, NULL } }
+    };
+#endif
+    if (s->sew < 2) {
+        return false;
+    }
+
+    if (tb_cflags(s->base.tb) & CF_PARALLEL) {
+#ifdef CONFIG_ATOMIC64
+        fn = fns[0][seq][s->sew - 2];
+#else
+        gen_helper_exit_atomic(cpu_env);
+        s->base.is_jmp = DISAS_NORETURN;
+        return true;
+#endif
+    } else {
+        fn = fns[1][seq][s->sew - 2];
+    }
+    if (fn == NULL) {
+        return false;
+    }
+
+    return do_vext_amo_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+
+#define GEN_VEXT_AMO_TRANS(NAME, DO_OP, SEQ)                              \
+static bool trans_##NAME(DisasContext *s, arg_rwdvm* a)                   \
+{                                                                         \
+    vchkctx.check_misa = RVV | RVA;                                       \
+    if (a->wd) {                                                          \
+        vchkctx.check_overlap_mask.need_check = true;                     \
+        vchkctx.check_overlap_mask.reg = a->rd;                           \
+        vchkctx.check_overlap_mask.vm = a->vm;                            \
+    }                                                                     \
+    vchkctx.check_reg[0].need_check = true;                               \
+    vchkctx.check_reg[0].reg = a->rd;                                     \
+    vchkctx.check_reg[0].widen = false;                                   \
+    vchkctx.check_reg[1].need_check = true;                               \
+    vchkctx.check_reg[1].reg = a->rs2;                                    \
+    vchkctx.check_reg[1].widen = false;                                   \
+                                                                          \
+    if (!vext_check(s)) {                                                 \
+        return false;                                                     \
+    }                                                                     \
+    return DO_OP(s, a, SEQ);                                              \
+}
+
+GEN_VEXT_AMO_TRANS(vamoswapw_v, vext_amo_trans, 0)
+GEN_VEXT_AMO_TRANS(vamoaddw_v, vext_amo_trans, 1)
+GEN_VEXT_AMO_TRANS(vamoxorw_v, vext_amo_trans, 2)
+GEN_VEXT_AMO_TRANS(vamoandw_v, vext_amo_trans, 3)
+GEN_VEXT_AMO_TRANS(vamoorw_v, vext_amo_trans, 4)
+GEN_VEXT_AMO_TRANS(vamominw_v, vext_amo_trans, 5)
+GEN_VEXT_AMO_TRANS(vamomaxw_v, vext_amo_trans, 6)
+GEN_VEXT_AMO_TRANS(vamominuw_v, vext_amo_trans, 7)
+GEN_VEXT_AMO_TRANS(vamomaxuw_v, vext_amo_trans, 8)
+#ifdef TARGET_RISCV64
+GEN_VEXT_AMO_TRANS(vamoswapd_v, vext_amo_trans, 9)
+GEN_VEXT_AMO_TRANS(vamoaddd_v, vext_amo_trans, 10)
+GEN_VEXT_AMO_TRANS(vamoxord_v, vext_amo_trans, 11)
+GEN_VEXT_AMO_TRANS(vamoandd_v, vext_amo_trans, 12)
+GEN_VEXT_AMO_TRANS(vamoord_v, vext_amo_trans,  13)
+GEN_VEXT_AMO_TRANS(vamomind_v, vext_amo_trans, 14)
+GEN_VEXT_AMO_TRANS(vamomaxd_v, vext_amo_trans, 15)
+GEN_VEXT_AMO_TRANS(vamominud_v, vext_amo_trans, 16)
+GEN_VEXT_AMO_TRANS(vamomaxud_v, vext_amo_trans, 17)
+#endif
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 941851ab28..d6f1585c40 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -102,6 +102,11 @@ static uint32_t vext_vm(uint32_t desc)
     return (simd_data(desc) >> 8) & 0x1;
 }
 
+static uint32_t vext_wd(uint32_t desc)
+{
+    return (simd_data(desc) >> 12) & 0x1;
+}
+
 /*
  * Get vector group length [64, 2048] in bytes. Its range is [64, 2048].
  *
@@ -174,6 +179,21 @@ static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
     memset(tail, 0, tot - cnt);
 }
 #endif
+
+static void vext_clearl(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
+{
+    int32_t *cur = ((int32_t *)vd + H4(idx));
+    vext_clear(cur, cnt, tot);
+}
+
+#ifdef TARGET_RISCV64
+static void vext_clearq(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
+{
+    int64_t *cur = (int64_t *)vd + idx;
+    vext_clear(cur, cnt, tot);
+}
+#endif
+
 /* common structure for all vector instructions */
 struct vext_common_ctx {
     uint32_t vlmax;
@@ -1006,3 +1026,275 @@ GEN_VEXT_LDFF(vlhuff_v_w, uint16_t, uint32_t, MO_LEUW)
 GEN_VEXT_LDFF(vlhuff_v_d, uint16_t, uint64_t, MO_LEUW)
 GEN_VEXT_LDFF(vlwuff_v_w, uint32_t, uint32_t, MO_LEUL)
 GEN_VEXT_LDFF(vlwuff_v_d, uint32_t, uint64_t, MO_LEUL)
+
+/* Vector AMO Operations (Zvamo) */
+/* data structure and common functions for load and store */
+typedef void vext_amo_noatomic_fn(void *vs3, target_ulong addr,
+        uint32_t wd, uint32_t idx, CPURISCVState *env, uintptr_t retaddr);
+typedef void vext_amo_atomic_fn(void *vs3, target_ulong addr,
+        uint32_t wd, uint32_t idx, CPURISCVState *env);
+
+struct vext_amo_ctx {
+    struct vext_common_ctx vcc;
+    uint32_t wd;
+    target_ulong base;
+
+    vext_get_index_addr *get_index_addr;
+    vext_amo_atomic_fn *atomic_op;
+    vext_amo_noatomic_fn *noatomic_op;
+    vext_ld_clear_elem *clear_elem;
+};
+
+#ifdef TARGET_RISCV64
+GEN_VEXT_GET_INDEX_ADDR(vamoswapw_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamoswapd_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamoaddw_v_d,  int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamoaddd_v_d,  int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamoxorw_v_d,  int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamoxord_v_d,  int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamoandw_v_d,  int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamoandd_v_d,  int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamoorw_v_d,   int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamoord_v_d,   int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamominw_v_d,  int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamomind_v_d,  int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamomaxw_v_d,  int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamomaxd_v_d,  int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamominuw_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamominud_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamomaxuw_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamomaxud_v_d, int64_t, H8)
+#endif
+GEN_VEXT_GET_INDEX_ADDR(vamoswapw_v_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vamoaddw_v_w,  int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vamoxorw_v_w,  int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vamoandw_v_w,  int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vamoorw_v_w,   int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vamominw_v_w,  int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vamomaxw_v_w,  int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vamominuw_v_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vamomaxuw_v_w, int32_t, H4)
+
+/* no atomic opreation for vector atomic insructions */
+#define DO_SWAP(N, M) (M)
+#define DO_AND(N, M)  (N & M)
+#define DO_XOR(N, M)  (N ^ M)
+#define DO_OR(N, M)   (N | M)
+#define DO_ADD(N, M)  (N + M)
+#define DO_MAX(N, M)  ((N) >= (M) ? (N) : (M))
+#define DO_MIN(N, M)  ((N) >= (M) ? (M) : (N))
+
+#define GEN_VEXT_AMO_NOATOMIC_OP(NAME, ETYPE, MTYPE, H, DO_OP, SUF)      \
+static void vext_##NAME##_noatomic_op(void *vs3, target_ulong addr,      \
+        uint32_t wd, uint32_t idx, CPURISCVState *env, uintptr_t retaddr)\
+{                                                                        \
+    ETYPE ret;                                                           \
+    target_ulong tmp;                                                    \
+    int mmu_idx = cpu_mmu_index(env, false);                             \
+    tmp = cpu_ld##SUF##_mmuidx_ra(env, addr, mmu_idx, retaddr);          \
+    ret = DO_OP((ETYPE)(MTYPE)tmp, *((ETYPE *)vs3 + H(idx)));            \
+    cpu_st##SUF##_mmuidx_ra(env, addr, ret, mmu_idx, retaddr);           \
+    if (wd) {                                                            \
+        *((ETYPE *)vs3 + H(idx)) = (target_long)(MTYPE)tmp;              \
+    }                                                                    \
+}
+
+GEN_VEXT_AMO_NOATOMIC_OP(vamoswapw_v_w, int32_t,  int32_t, H4, DO_SWAP, l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoaddw_v_w,  int32_t,  int32_t, H4, DO_ADD,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoxorw_v_w,  int32_t,  int32_t, H4, DO_XOR,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoandw_v_w,  int32_t,  int32_t, H4, DO_AND,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoorw_v_w,   int32_t,  int32_t, H4, DO_OR,   l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamominw_v_w,  int32_t,  int32_t, H4, DO_MIN,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxw_v_w,  int32_t,  int32_t, H4, DO_MAX,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamominuw_v_w, uint32_t, int32_t, H4, DO_MIN,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxuw_v_w, uint32_t, int32_t, H4, DO_MAX,  l)
+#ifdef TARGET_RISCV64
+GEN_VEXT_AMO_NOATOMIC_OP(vamoswapw_v_d, int64_t,  int32_t, H8, DO_SWAP, l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoswapd_v_d, int64_t,  int64_t, H8, DO_SWAP, q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoaddw_v_d,  int64_t,  int32_t, H8, DO_ADD,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoaddd_v_d,  int64_t,  int64_t, H8, DO_ADD,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoxorw_v_d,  int64_t,  int32_t, H8, DO_XOR,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoxord_v_d,  int64_t,  int64_t, H8, DO_XOR,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoandw_v_d,  int64_t,  int32_t, H8, DO_AND,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoandd_v_d,  int64_t,  int64_t, H8, DO_AND,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoorw_v_d,   int64_t,  int32_t, H8, DO_OR,   l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoord_v_d,   int64_t,  int64_t, H8, DO_OR,   q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamominw_v_d,  int64_t,  int32_t, H8, DO_MIN,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomind_v_d,  int64_t,  int64_t, H8, DO_MIN,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxw_v_d,  int64_t,  int32_t, H8, DO_MAX,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxd_v_d,  int64_t,  int64_t, H8, DO_MAX,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamominuw_v_d, uint64_t, int32_t, H8, DO_MIN,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamominud_v_d, uint64_t, int64_t, H8, DO_MIN,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxuw_v_d, uint64_t, int32_t, H8, DO_MAX,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxud_v_d, uint64_t, int64_t, H8, DO_MAX,  q)
+#endif
+
+/* atomic opreation for vector atomic insructions */
+#ifndef CONFIG_USER_ONLY
+#define GEN_VEXT_ATOMIC_OP(NAME, ETYPE, MTYPE, MOFLAG, H, AMO)           \
+static void vext_##NAME##_atomic_op(void *vs3, target_ulong addr,        \
+        uint32_t wd, uint32_t idx, CPURISCVState *env)                   \
+{                                                                        \
+    target_ulong tmp;                                                    \
+    int mem_idx = cpu_mmu_index(env, false);                             \
+    tmp = helper_atomic_##AMO##_le(env, addr, *((ETYPE *)vs3 + H(idx)),  \
+            make_memop_idx(MO_ALIGN | MOFLAG, mem_idx));                 \
+    if (wd) {                                                            \
+        *((ETYPE *)vs3 + H(idx)) = (target_long)(MTYPE)tmp;              \
+    }                                                                    \
+}
+#else
+#define GEN_VEXT_ATOMIC_OP(NAME, ETYPE, MTYPE, MOFLAG, H, AMO)           \
+static void vext_##NAME##_atomic_op(void *vs3, target_ulong addr,        \
+        uint32_t wd, uint32_t idx, CPURISCVState *env)                   \
+{                                                                        \
+    target_ulong tmp;                                                    \
+    tmp = helper_atomic_##AMO##_le(env, addr, *((ETYPE *)vs3 + H(idx))); \
+    if (wd) {                                                            \
+        *((ETYPE *)vs3 + H(idx)) = (target_long)(MTYPE)tmp;              \
+    }                                                                    \
+}
+#endif
+
+GEN_VEXT_ATOMIC_OP(vamoswapw_v_w, int32_t,  int32_t,  MO_TESL, H4, xchgl)
+GEN_VEXT_ATOMIC_OP(vamoaddw_v_w,  int32_t,  int32_t,  MO_TESL, H4, fetch_addl)
+GEN_VEXT_ATOMIC_OP(vamoxorw_v_w,  int32_t,  int32_t,  MO_TESL, H4, fetch_xorl)
+GEN_VEXT_ATOMIC_OP(vamoandw_v_w,  int32_t,  int32_t,  MO_TESL, H4, fetch_andl)
+GEN_VEXT_ATOMIC_OP(vamoorw_v_w,   int32_t,  int32_t,  MO_TESL, H4, fetch_orl)
+GEN_VEXT_ATOMIC_OP(vamominw_v_w,  int32_t,  int32_t,  MO_TESL, H4, fetch_sminl)
+GEN_VEXT_ATOMIC_OP(vamomaxw_v_w,  int32_t,  int32_t,  MO_TESL, H4, fetch_smaxl)
+GEN_VEXT_ATOMIC_OP(vamominuw_v_w, uint32_t, int32_t,  MO_TEUL, H4, fetch_uminl)
+GEN_VEXT_ATOMIC_OP(vamomaxuw_v_w, uint32_t, int32_t,  MO_TEUL, H4, fetch_umaxl)
+#ifdef TARGET_RISCV64
+GEN_VEXT_ATOMIC_OP(vamoswapw_v_d, int64_t,  int32_t,  MO_TESL, H8, xchgl)
+GEN_VEXT_ATOMIC_OP(vamoswapd_v_d, int64_t,  int64_t,  MO_TEQ,  H8, xchgq)
+GEN_VEXT_ATOMIC_OP(vamoaddw_v_d,  int64_t,  int32_t,  MO_TESL, H8, fetch_addl)
+GEN_VEXT_ATOMIC_OP(vamoaddd_v_d,  int64_t,  int64_t,  MO_TEQ,  H8, fetch_addq)
+GEN_VEXT_ATOMIC_OP(vamoxorw_v_d,  int64_t,  int32_t,  MO_TESL, H8, fetch_xorl)
+GEN_VEXT_ATOMIC_OP(vamoxord_v_d,  int64_t,  int64_t,  MO_TEQ,  H8, fetch_xorq)
+GEN_VEXT_ATOMIC_OP(vamoandw_v_d,  int64_t,  int32_t,  MO_TESL, H8, fetch_andl)
+GEN_VEXT_ATOMIC_OP(vamoandd_v_d,  int64_t,  int64_t,  MO_TEQ,  H8, fetch_andq)
+GEN_VEXT_ATOMIC_OP(vamoorw_v_d,   int64_t,  int32_t,  MO_TESL, H8, fetch_orl)
+GEN_VEXT_ATOMIC_OP(vamoord_v_d,   int64_t,  int64_t,  MO_TEQ,  H8, fetch_orq)
+GEN_VEXT_ATOMIC_OP(vamominw_v_d,  int64_t,  int32_t,  MO_TESL, H8, fetch_sminl)
+GEN_VEXT_ATOMIC_OP(vamomind_v_d,  int64_t,  int64_t,  MO_TEQ,  H8, fetch_sminq)
+GEN_VEXT_ATOMIC_OP(vamomaxw_v_d,  int64_t,  int32_t,  MO_TESL, H8, fetch_smaxl)
+GEN_VEXT_ATOMIC_OP(vamomaxd_v_d,  int64_t,  int64_t,  MO_TEQ,  H8, fetch_smaxq)
+GEN_VEXT_ATOMIC_OP(vamominuw_v_d, uint64_t, int32_t,  MO_TEUL, H8, fetch_uminl)
+GEN_VEXT_ATOMIC_OP(vamominud_v_d, uint64_t, int64_t,  MO_TEQ,  H8, fetch_uminq)
+GEN_VEXT_ATOMIC_OP(vamomaxuw_v_d, uint64_t, int32_t,  MO_TEUL, H8, fetch_umaxl)
+GEN_VEXT_ATOMIC_OP(vamomaxud_v_d, uint64_t, int64_t,  MO_TEQ,  H8, fetch_umaxq)
+#endif
+
+static void vext_amo_atomic_mask(void *vs3, void *vs2, void *v0,
+        CPURISCVState *env, struct vext_amo_ctx *ctx, uintptr_t ra)
+{
+    uint32_t i;
+    target_long addr;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    for (i = 0; i < s->vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        probe_read_access(env, ctx->get_index_addr(ctx->base, i, vs2),
+                s->msz, ra);
+        probe_write_access(env, ctx->get_index_addr(ctx->base, i, vs2),
+                s->msz, ra);
+    }
+    for (i = 0; i < env->vext.vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        addr = ctx->get_index_addr(ctx->base, i, vs2);
+        ctx->atomic_op(vs3, addr, ctx->wd, i, env);
+    }
+    ctx->clear_elem(vs3, s->vl, s->vl * s->esz, s->vlmax * s->esz);
+}
+
+static void vext_amo_noatomic_mask(void *vs3, void *vs2, void *v0,
+        CPURISCVState *env, struct vext_amo_ctx *ctx, uintptr_t ra)
+{
+    uint32_t i;
+    target_long addr;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    for (i = 0; i < s->vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        probe_read_access(env, ctx->get_index_addr(ctx->base, i, vs2),
+                s->msz, ra);
+        probe_write_access(env, ctx->get_index_addr(ctx->base, i, vs2),
+                s->msz, ra);
+    }
+    for (i = 0; i < s->vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        addr = ctx->get_index_addr(ctx->base, i, vs2);
+        ctx->noatomic_op(vs3, addr, ctx->wd, i, env, ra);
+    }
+    ctx->clear_elem(vs3, s->vl, s->vl * s->esz, s->vlmax * s->esz);
+}
+
+#define GEN_VEXT_AMO(NAME, MTYPE, ETYPE, CLEAR_FN)                    \
+void HELPER(NAME##_a_mask)(void *vs3, target_ulong base, void *v0,    \
+        void *vs2, CPURISCVState *env, uint32_t desc)                 \
+{                                                                     \
+    static struct vext_amo_ctx ctx;                                   \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                     \
+        sizeof(MTYPE), env->vext.vl, desc);                           \
+    ctx.wd = vext_wd(desc);                                           \
+    ctx.base = base;                                                  \
+    ctx.atomic_op = vext_##NAME##_atomic_op;                          \
+    ctx.get_index_addr = vext_##NAME##_get_addr;                      \
+    ctx.clear_elem = CLEAR_FN;                                        \
+                                                                      \
+    vext_amo_atomic_mask(vs3, vs2, v0, env, &ctx, GETPC());           \
+}                                                                     \
+                                                                      \
+void HELPER(NAME##_mask)(void *vs3, target_ulong base, void *v0,      \
+        void *vs2, CPURISCVState *env, uint32_t desc)                 \
+{                                                                     \
+    static struct vext_amo_ctx ctx;                                   \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                     \
+        sizeof(MTYPE), env->vext.vl, desc);                           \
+    ctx.wd = vext_wd(desc);                                           \
+    ctx.base = base;                                                  \
+    ctx.noatomic_op = vext_##NAME##_noatomic_op;                      \
+    ctx.get_index_addr = vext_##NAME##_get_addr;                      \
+    ctx.clear_elem = CLEAR_FN;                                        \
+                                                                      \
+    vext_amo_noatomic_mask(vs3, vs2, v0, env, &ctx, GETPC());         \
+}
+
+#ifdef TARGET_RISCV64
+GEN_VEXT_AMO(vamoswapw_v_d, int32_t,  int64_t, vext_clearq)
+GEN_VEXT_AMO(vamoswapd_v_d, int64_t, int64_t, vext_clearq)
+GEN_VEXT_AMO(vamoaddw_v_d,  int32_t,  int64_t, vext_clearq)
+GEN_VEXT_AMO(vamoaddd_v_d,  int64_t, int64_t, vext_clearq)
+GEN_VEXT_AMO(vamoxorw_v_d,  int32_t,  int64_t, vext_clearq)
+GEN_VEXT_AMO(vamoxord_v_d,  int64_t, int64_t, vext_clearq)
+GEN_VEXT_AMO(vamoandw_v_d,  int32_t,  int64_t, vext_clearq)
+GEN_VEXT_AMO(vamoandd_v_d,  int64_t, int64_t, vext_clearq)
+GEN_VEXT_AMO(vamoorw_v_d,   int32_t,  int64_t, vext_clearq)
+GEN_VEXT_AMO(vamoord_v_d,   int64_t, int64_t, vext_clearq)
+GEN_VEXT_AMO(vamominw_v_d,  int32_t,  int64_t, vext_clearq)
+GEN_VEXT_AMO(vamomind_v_d,  int64_t, int64_t, vext_clearq)
+GEN_VEXT_AMO(vamomaxw_v_d,  int32_t,  int64_t, vext_clearq)
+GEN_VEXT_AMO(vamomaxd_v_d,  int64_t, int64_t, vext_clearq)
+GEN_VEXT_AMO(vamominuw_v_d, uint32_t,  uint64_t, vext_clearq)
+GEN_VEXT_AMO(vamominud_v_d, uint64_t, uint64_t, vext_clearq)
+GEN_VEXT_AMO(vamomaxuw_v_d, uint32_t, uint64_t, vext_clearq)
+GEN_VEXT_AMO(vamomaxud_v_d, uint64_t, uint64_t, vext_clearq)
+#endif
+GEN_VEXT_AMO(vamoswapw_v_w, int32_t, int32_t, vext_clearl)
+GEN_VEXT_AMO(vamoaddw_v_w,  int32_t, int32_t, vext_clearl)
+GEN_VEXT_AMO(vamoxorw_v_w,  int32_t, int32_t, vext_clearl)
+GEN_VEXT_AMO(vamoandw_v_w,  int32_t, int32_t, vext_clearl)
+GEN_VEXT_AMO(vamoorw_v_w,   int32_t, int32_t, vext_clearl)
+GEN_VEXT_AMO(vamominw_v_w,  int32_t, int32_t, vext_clearl)
+GEN_VEXT_AMO(vamomaxw_v_w,  int32_t, int32_t, vext_clearl)
+GEN_VEXT_AMO(vamominuw_v_w, uint32_t, uint32_t, vext_clearl)
+GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, vext_clearl)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 5/5] target/riscv: add vector amo operations
@ 2020-02-10  7:42   ` LIU Zhiwei
  0 siblings, 0 replies; 18+ messages in thread
From: LIU Zhiwei @ 2020-02-10  7:42 UTC (permalink / raw)
  To: richard.henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, qemu-devel, qemu-riscv, LIU Zhiwei

Vector AMOs operate as if aq and rl bits were zero on each element
with regard to ordering relative to other instructions in the same hart.
Vector AMOs provide no ordering guarantee between element operations
in the same vector AMO instruction

Signed-off-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
---
 target/riscv/helper.h                   |  57 +++++
 target/riscv/insn32-64.decode           |  11 +
 target/riscv/insn32.decode              |  13 ++
 target/riscv/insn_trans/trans_rvv.inc.c | 167 ++++++++++++++
 target/riscv/vector_helper.c            | 292 ++++++++++++++++++++++++
 5 files changed, 540 insertions(+)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 893dfc0fb8..3624a20262 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -240,3 +240,60 @@ DEF_HELPER_5(vlhuff_v_w_mask, void, ptr, tl, ptr, env, i32)
 DEF_HELPER_5(vlhuff_v_d_mask, void, ptr, tl, ptr, env, i32)
 DEF_HELPER_5(vlwuff_v_w_mask, void, ptr, tl, ptr, env, i32)
 DEF_HELPER_5(vlwuff_v_d_mask, void, ptr, tl, ptr, env, i32)
+#ifdef TARGET_RISCV64
+DEF_HELPER_6(vamoswapw_v_d_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoswapd_v_d_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoaddd_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoxord_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoandd_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_d_a_mask,   void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoord_v_d_a_mask,   void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomind_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxd_v_d_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_d_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominud_v_d_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_d_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxud_v_d_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoswapw_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoswapd_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoaddd_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoxord_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoandd_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_d_mask,   void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoord_v_d_mask,   void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomind_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxd_v_d_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominud_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxud_v_d_mask, void, ptr, tl, ptr, ptr, env, i32)
+#endif
+DEF_HELPER_6(vamoswapw_v_w_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_w_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_w_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_w_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_w_a_mask,   void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_w_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_w_a_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_w_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_w_a_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoswapw_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoaddw_v_w_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoxorw_v_w_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoandw_v_w_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamoorw_v_w_mask,   void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominw_v_w_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxw_v_w_mask,  void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamominuw_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+DEF_HELPER_6(vamomaxuw_v_w_mask, void, ptr, tl, ptr, ptr, env, i32)
+
diff --git a/target/riscv/insn32-64.decode b/target/riscv/insn32-64.decode
index 380bf791bc..86153d93fa 100644
--- a/target/riscv/insn32-64.decode
+++ b/target/riscv/insn32-64.decode
@@ -57,6 +57,17 @@ amomax_d   10100 . . ..... ..... 011 ..... 0101111 @atom_st
 amominu_d  11000 . . ..... ..... 011 ..... 0101111 @atom_st
 amomaxu_d  11100 . . ..... ..... 011 ..... 0101111 @atom_st
 
+#*** Vector AMO operations (in addition to Zvamo) ***
+vamoswapd_v     00001 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamoaddd_v      00000 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamoxord_v      00100 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamoandd_v      01100 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamoord_v       01000 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamomind_v      10000 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamomaxd_v      10100 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamominud_v     11000 . . ..... ..... 111 ..... 0101111 @r_wdvm
+vamomaxud_v     11100 . . ..... ..... 111 ..... 0101111 @r_wdvm
+
 # *** RV64F Standard Extension (in addition to RV32F) ***
 fcvt_l_s   1100000  00010 ..... ... ..... 1010011 @r2_rm
 fcvt_lu_s  1100000  00011 ..... ... ..... 1010011 @r2_rm
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 973ac63fda..077551dd13 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -43,6 +43,7 @@
 &u    imm rd
 &shift     shamt rs1 rd
 &atomic    aq rl rs2 rs1 rd
+&rwdvm     vm wd rd rs1 rs2
 &r2nfvm    vm rd rs1 nf
 &rnfvm     vm rd rs1 rs2 nf
 
@@ -64,6 +65,7 @@
 @r_rm    .......   ..... ..... ... ..... ....... %rs2 %rs1 %rm %rd
 @r2_rm   .......   ..... ..... ... ..... ....... %rs1 %rm %rd
 @r2      .......   ..... ..... ... ..... ....... %rs1 %rd
+@r_wdvm  ..... wd:1 vm:1 ..... ..... ... ..... ....... &rwdvm %rs2 %rs1 %rd
 @r2_nfvm nf:3 ... vm:1 ..... ..... ... ..... ....... &r2nfvm %rs1 %rd
 @r_nfvm  nf:3 ... vm:1 ..... ..... ... ..... ....... &rnfvm %rs2 %rs1 %rd
 @r2_zimm . zimm:11  ..... ... ..... ....... %rs1 %rd
@@ -259,6 +261,17 @@ vsuxh_v    ... 111 . ..... ..... 101 ..... 0100111 @r_nfvm
 vsuxw_v    ... 111 . ..... ..... 110 ..... 0100111 @r_nfvm
 vsuxe_v    ... 111 . ..... ..... 111 ..... 0100111 @r_nfvm
 
+#*** Vector AMO operations are encoded under the standard AMO major opcode ***
+vamoswapw_v     00001 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamoaddw_v      00000 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamoxorw_v      00100 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamoandw_v      01100 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamoorw_v       01000 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamominw_v      10000 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamomaxw_v      10100 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamominuw_v     11000 . . ..... ..... 110 ..... 0101111 @r_wdvm
+vamomaxuw_v     11100 . . ..... ..... 110 ..... 0101111 @r_wdvm
+
 # *** new major opcode OP-V ***
 vsetvli         0 ........... ..... 111 ..... 1010111  @r2_zimm
 vsetvl          1000000 ..... ..... 111 ..... 1010111  @r
diff --git a/target/riscv/insn_trans/trans_rvv.inc.c b/target/riscv/insn_trans/trans_rvv.inc.c
index 66caa16d18..f628e16346 100644
--- a/target/riscv/insn_trans/trans_rvv.inc.c
+++ b/target/riscv/insn_trans/trans_rvv.inc.c
@@ -751,3 +751,170 @@ GEN_VEXT_LDFF_TRANS(vleff_v, vext_ldff_trans, 3)
 GEN_VEXT_LDFF_TRANS(vlbuff_v, vext_ldff_trans, 4)
 GEN_VEXT_LDFF_TRANS(vlhuff_v, vext_ldff_trans, 5)
 GEN_VEXT_LDFF_TRANS(vlwuff_v, vext_ldff_trans, 6)
+
+/* vector atomic operation */
+typedef void gen_helper_vext_amo(TCGv_ptr, TCGv, TCGv_ptr, TCGv_ptr,
+        TCGv_env, TCGv_i32);
+
+static bool do_vext_amo_trans(uint32_t vd, uint32_t rs1, uint32_t vs2,
+        uint32_t data, gen_helper_vext_amo *fn, DisasContext *s)
+{
+    TCGv_ptr dest, mask, index;
+    TCGv base;
+    TCGv_i32 desc;
+
+    dest = tcg_temp_new_ptr();
+    mask = tcg_temp_new_ptr();
+    index = tcg_temp_new_ptr();
+    base = tcg_temp_new();
+    desc = tcg_const_i32(simd_desc(0, maxsz_part1(s->maxsz), data));
+
+    gen_get_gpr(base, rs1);
+    tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
+    tcg_gen_addi_ptr(index, cpu_env, vreg_ofs(s, vs2));
+    tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
+
+    fn(dest, base, mask, index, cpu_env, desc);
+
+    tcg_temp_free_ptr(dest);
+    tcg_temp_free_ptr(mask);
+    tcg_temp_free_ptr(index);
+    tcg_temp_free(base);
+    tcg_temp_free_i32(desc);
+    return true;
+}
+
+static bool vext_amo_trans(DisasContext *s, arg_rwdvm *a, uint8_t seq)
+{
+    uint32_t data = s->mlen | (a->vm << 8) | (maxsz_part2(s->maxsz) << 9)
+        | (a->wd << 12);
+    gen_helper_vext_amo *fn;
+#ifdef TARGET_RISCV64
+    static gen_helper_vext_amo *const fns[2][18][2] = {
+        /* atomic operation */
+        { { gen_helper_vamoswapw_v_w_a_mask, gen_helper_vamoswapw_v_d_a_mask },
+          { gen_helper_vamoaddw_v_w_a_mask,  gen_helper_vamoaddw_v_d_a_mask },
+          { gen_helper_vamoxorw_v_w_a_mask,  gen_helper_vamoxorw_v_d_a_mask },
+          { gen_helper_vamoandw_v_w_a_mask,  gen_helper_vamoandw_v_d_a_mask },
+          { gen_helper_vamoorw_v_w_a_mask,   gen_helper_vamoorw_v_d_a_mask },
+          { gen_helper_vamominw_v_w_a_mask,  gen_helper_vamominw_v_d_a_mask },
+          { gen_helper_vamomaxw_v_w_a_mask,  gen_helper_vamomaxw_v_d_a_mask },
+          { gen_helper_vamominuw_v_w_a_mask, gen_helper_vamominuw_v_d_a_mask },
+          { gen_helper_vamomaxuw_v_w_a_mask, gen_helper_vamomaxuw_v_d_a_mask },
+          { NULL,                            gen_helper_vamoswapd_v_d_a_mask },
+          { NULL,                            gen_helper_vamoaddd_v_d_a_mask },
+          { NULL,                            gen_helper_vamoxord_v_d_a_mask },
+          { NULL,                            gen_helper_vamoandd_v_d_a_mask },
+          { NULL,                            gen_helper_vamoord_v_d_a_mask },
+          { NULL,                            gen_helper_vamomind_v_d_a_mask },
+          { NULL,                            gen_helper_vamomaxd_v_d_a_mask },
+          { NULL,                            gen_helper_vamominud_v_d_a_mask },
+          { NULL,                           gen_helper_vamomaxud_v_d_a_mask } },
+        /* no atomic operation */
+        { { gen_helper_vamoswapw_v_w_mask, gen_helper_vamoswapw_v_d_mask },
+          { gen_helper_vamoaddw_v_w_mask,  gen_helper_vamoaddw_v_d_mask },
+          { gen_helper_vamoxorw_v_w_mask,  gen_helper_vamoxorw_v_d_mask },
+          { gen_helper_vamoandw_v_w_mask,  gen_helper_vamoandw_v_d_mask },
+          { gen_helper_vamoorw_v_w_mask,   gen_helper_vamoorw_v_d_mask },
+          { gen_helper_vamominw_v_w_mask,  gen_helper_vamominw_v_d_mask },
+          { gen_helper_vamomaxw_v_w_mask,  gen_helper_vamomaxw_v_d_mask },
+          { gen_helper_vamominuw_v_w_mask, gen_helper_vamominuw_v_d_mask },
+          { gen_helper_vamomaxuw_v_w_mask, gen_helper_vamomaxuw_v_d_mask },
+          { NULL,                          gen_helper_vamoswapd_v_d_mask },
+          { NULL,                          gen_helper_vamoaddd_v_d_mask },
+          { NULL,                          gen_helper_vamoxord_v_d_mask },
+          { NULL,                          gen_helper_vamoandd_v_d_mask },
+          { NULL,                          gen_helper_vamoord_v_d_mask },
+          { NULL,                          gen_helper_vamomind_v_d_mask },
+          { NULL,                          gen_helper_vamomaxd_v_d_mask },
+          { NULL,                          gen_helper_vamominud_v_d_mask },
+          { NULL,                          gen_helper_vamomaxud_v_d_mask } }
+    };
+#else
+    static gen_helper_vext_amo *const fns[2][9][2] = {
+        /* atomic operation */
+        { { gen_helper_vamoswapw_v_w_a_mask, NULL },
+          { gen_helper_vamoaddw_v_w_a_mask,  NULL },
+          { gen_helper_vamoxorw_v_w_a_mask,  NULL },
+          { gen_helper_vamoandw_v_w_a_mask,  NULL },
+          { gen_helper_vamoorw_v_w_a_mask,   NULL },
+          { gen_helper_vamominw_v_w_a_mask,  NULL },
+          { gen_helper_vamomaxw_v_w_a_mask,  NULL },
+          { gen_helper_vamominuw_v_w_a_mask, NULL },
+          { gen_helper_vamomaxuw_v_w_a_mask, NULL } },
+        /* no atomic operation */
+        { { gen_helper_vamoswapw_v_w_mask, NULL },
+          { gen_helper_vamoaddw_v_w_mask,  NULL },
+          { gen_helper_vamoxorw_v_w_mask,  NULL },
+          { gen_helper_vamoandw_v_w_mask,  NULL },
+          { gen_helper_vamoorw_v_w_mask,   NULL },
+          { gen_helper_vamominw_v_w_mask,  NULL },
+          { gen_helper_vamomaxw_v_w_mask,  NULL },
+          { gen_helper_vamominuw_v_w_mask, NULL },
+          { gen_helper_vamomaxuw_v_w_mask, NULL } }
+    };
+#endif
+    if (s->sew < 2) {
+        return false;
+    }
+
+    if (tb_cflags(s->base.tb) & CF_PARALLEL) {
+#ifdef CONFIG_ATOMIC64
+        fn = fns[0][seq][s->sew - 2];
+#else
+        gen_helper_exit_atomic(cpu_env);
+        s->base.is_jmp = DISAS_NORETURN;
+        return true;
+#endif
+    } else {
+        fn = fns[1][seq][s->sew - 2];
+    }
+    if (fn == NULL) {
+        return false;
+    }
+
+    return do_vext_amo_trans(a->rd, a->rs1, a->rs2, data, fn, s);
+}
+
+#define GEN_VEXT_AMO_TRANS(NAME, DO_OP, SEQ)                              \
+static bool trans_##NAME(DisasContext *s, arg_rwdvm* a)                   \
+{                                                                         \
+    vchkctx.check_misa = RVV | RVA;                                       \
+    if (a->wd) {                                                          \
+        vchkctx.check_overlap_mask.need_check = true;                     \
+        vchkctx.check_overlap_mask.reg = a->rd;                           \
+        vchkctx.check_overlap_mask.vm = a->vm;                            \
+    }                                                                     \
+    vchkctx.check_reg[0].need_check = true;                               \
+    vchkctx.check_reg[0].reg = a->rd;                                     \
+    vchkctx.check_reg[0].widen = false;                                   \
+    vchkctx.check_reg[1].need_check = true;                               \
+    vchkctx.check_reg[1].reg = a->rs2;                                    \
+    vchkctx.check_reg[1].widen = false;                                   \
+                                                                          \
+    if (!vext_check(s)) {                                                 \
+        return false;                                                     \
+    }                                                                     \
+    return DO_OP(s, a, SEQ);                                              \
+}
+
+GEN_VEXT_AMO_TRANS(vamoswapw_v, vext_amo_trans, 0)
+GEN_VEXT_AMO_TRANS(vamoaddw_v, vext_amo_trans, 1)
+GEN_VEXT_AMO_TRANS(vamoxorw_v, vext_amo_trans, 2)
+GEN_VEXT_AMO_TRANS(vamoandw_v, vext_amo_trans, 3)
+GEN_VEXT_AMO_TRANS(vamoorw_v, vext_amo_trans, 4)
+GEN_VEXT_AMO_TRANS(vamominw_v, vext_amo_trans, 5)
+GEN_VEXT_AMO_TRANS(vamomaxw_v, vext_amo_trans, 6)
+GEN_VEXT_AMO_TRANS(vamominuw_v, vext_amo_trans, 7)
+GEN_VEXT_AMO_TRANS(vamomaxuw_v, vext_amo_trans, 8)
+#ifdef TARGET_RISCV64
+GEN_VEXT_AMO_TRANS(vamoswapd_v, vext_amo_trans, 9)
+GEN_VEXT_AMO_TRANS(vamoaddd_v, vext_amo_trans, 10)
+GEN_VEXT_AMO_TRANS(vamoxord_v, vext_amo_trans, 11)
+GEN_VEXT_AMO_TRANS(vamoandd_v, vext_amo_trans, 12)
+GEN_VEXT_AMO_TRANS(vamoord_v, vext_amo_trans,  13)
+GEN_VEXT_AMO_TRANS(vamomind_v, vext_amo_trans, 14)
+GEN_VEXT_AMO_TRANS(vamomaxd_v, vext_amo_trans, 15)
+GEN_VEXT_AMO_TRANS(vamominud_v, vext_amo_trans, 16)
+GEN_VEXT_AMO_TRANS(vamomaxud_v, vext_amo_trans, 17)
+#endif
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 941851ab28..d6f1585c40 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -102,6 +102,11 @@ static uint32_t vext_vm(uint32_t desc)
     return (simd_data(desc) >> 8) & 0x1;
 }
 
+static uint32_t vext_wd(uint32_t desc)
+{
+    return (simd_data(desc) >> 12) & 0x1;
+}
+
 /*
  * Get vector group length [64, 2048] in bytes. Its range is [64, 2048].
  *
@@ -174,6 +179,21 @@ static void vext_clear(void *tail, uint32_t cnt, uint32_t tot)
     memset(tail, 0, tot - cnt);
 }
 #endif
+
+static void vext_clearl(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
+{
+    int32_t *cur = ((int32_t *)vd + H4(idx));
+    vext_clear(cur, cnt, tot);
+}
+
+#ifdef TARGET_RISCV64
+static void vext_clearq(void *vd, uint32_t idx, uint32_t cnt, uint32_t tot)
+{
+    int64_t *cur = (int64_t *)vd + idx;
+    vext_clear(cur, cnt, tot);
+}
+#endif
+
 /* common structure for all vector instructions */
 struct vext_common_ctx {
     uint32_t vlmax;
@@ -1006,3 +1026,275 @@ GEN_VEXT_LDFF(vlhuff_v_w, uint16_t, uint32_t, MO_LEUW)
 GEN_VEXT_LDFF(vlhuff_v_d, uint16_t, uint64_t, MO_LEUW)
 GEN_VEXT_LDFF(vlwuff_v_w, uint32_t, uint32_t, MO_LEUL)
 GEN_VEXT_LDFF(vlwuff_v_d, uint32_t, uint64_t, MO_LEUL)
+
+/* Vector AMO Operations (Zvamo) */
+/* data structure and common functions for load and store */
+typedef void vext_amo_noatomic_fn(void *vs3, target_ulong addr,
+        uint32_t wd, uint32_t idx, CPURISCVState *env, uintptr_t retaddr);
+typedef void vext_amo_atomic_fn(void *vs3, target_ulong addr,
+        uint32_t wd, uint32_t idx, CPURISCVState *env);
+
+struct vext_amo_ctx {
+    struct vext_common_ctx vcc;
+    uint32_t wd;
+    target_ulong base;
+
+    vext_get_index_addr *get_index_addr;
+    vext_amo_atomic_fn *atomic_op;
+    vext_amo_noatomic_fn *noatomic_op;
+    vext_ld_clear_elem *clear_elem;
+};
+
+#ifdef TARGET_RISCV64
+GEN_VEXT_GET_INDEX_ADDR(vamoswapw_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamoswapd_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamoaddw_v_d,  int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamoaddd_v_d,  int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamoxorw_v_d,  int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamoxord_v_d,  int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamoandw_v_d,  int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamoandd_v_d,  int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamoorw_v_d,   int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamoord_v_d,   int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamominw_v_d,  int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamomind_v_d,  int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamomaxw_v_d,  int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamomaxd_v_d,  int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamominuw_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamominud_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamomaxuw_v_d, int64_t, H8)
+GEN_VEXT_GET_INDEX_ADDR(vamomaxud_v_d, int64_t, H8)
+#endif
+GEN_VEXT_GET_INDEX_ADDR(vamoswapw_v_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vamoaddw_v_w,  int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vamoxorw_v_w,  int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vamoandw_v_w,  int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vamoorw_v_w,   int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vamominw_v_w,  int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vamomaxw_v_w,  int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vamominuw_v_w, int32_t, H4)
+GEN_VEXT_GET_INDEX_ADDR(vamomaxuw_v_w, int32_t, H4)
+
+/* no atomic opreation for vector atomic insructions */
+#define DO_SWAP(N, M) (M)
+#define DO_AND(N, M)  (N & M)
+#define DO_XOR(N, M)  (N ^ M)
+#define DO_OR(N, M)   (N | M)
+#define DO_ADD(N, M)  (N + M)
+#define DO_MAX(N, M)  ((N) >= (M) ? (N) : (M))
+#define DO_MIN(N, M)  ((N) >= (M) ? (M) : (N))
+
+#define GEN_VEXT_AMO_NOATOMIC_OP(NAME, ETYPE, MTYPE, H, DO_OP, SUF)      \
+static void vext_##NAME##_noatomic_op(void *vs3, target_ulong addr,      \
+        uint32_t wd, uint32_t idx, CPURISCVState *env, uintptr_t retaddr)\
+{                                                                        \
+    ETYPE ret;                                                           \
+    target_ulong tmp;                                                    \
+    int mmu_idx = cpu_mmu_index(env, false);                             \
+    tmp = cpu_ld##SUF##_mmuidx_ra(env, addr, mmu_idx, retaddr);          \
+    ret = DO_OP((ETYPE)(MTYPE)tmp, *((ETYPE *)vs3 + H(idx)));            \
+    cpu_st##SUF##_mmuidx_ra(env, addr, ret, mmu_idx, retaddr);           \
+    if (wd) {                                                            \
+        *((ETYPE *)vs3 + H(idx)) = (target_long)(MTYPE)tmp;              \
+    }                                                                    \
+}
+
+GEN_VEXT_AMO_NOATOMIC_OP(vamoswapw_v_w, int32_t,  int32_t, H4, DO_SWAP, l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoaddw_v_w,  int32_t,  int32_t, H4, DO_ADD,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoxorw_v_w,  int32_t,  int32_t, H4, DO_XOR,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoandw_v_w,  int32_t,  int32_t, H4, DO_AND,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoorw_v_w,   int32_t,  int32_t, H4, DO_OR,   l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamominw_v_w,  int32_t,  int32_t, H4, DO_MIN,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxw_v_w,  int32_t,  int32_t, H4, DO_MAX,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamominuw_v_w, uint32_t, int32_t, H4, DO_MIN,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxuw_v_w, uint32_t, int32_t, H4, DO_MAX,  l)
+#ifdef TARGET_RISCV64
+GEN_VEXT_AMO_NOATOMIC_OP(vamoswapw_v_d, int64_t,  int32_t, H8, DO_SWAP, l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoswapd_v_d, int64_t,  int64_t, H8, DO_SWAP, q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoaddw_v_d,  int64_t,  int32_t, H8, DO_ADD,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoaddd_v_d,  int64_t,  int64_t, H8, DO_ADD,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoxorw_v_d,  int64_t,  int32_t, H8, DO_XOR,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoxord_v_d,  int64_t,  int64_t, H8, DO_XOR,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoandw_v_d,  int64_t,  int32_t, H8, DO_AND,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoandd_v_d,  int64_t,  int64_t, H8, DO_AND,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoorw_v_d,   int64_t,  int32_t, H8, DO_OR,   l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamoord_v_d,   int64_t,  int64_t, H8, DO_OR,   q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamominw_v_d,  int64_t,  int32_t, H8, DO_MIN,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomind_v_d,  int64_t,  int64_t, H8, DO_MIN,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxw_v_d,  int64_t,  int32_t, H8, DO_MAX,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxd_v_d,  int64_t,  int64_t, H8, DO_MAX,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamominuw_v_d, uint64_t, int32_t, H8, DO_MIN,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamominud_v_d, uint64_t, int64_t, H8, DO_MIN,  q)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxuw_v_d, uint64_t, int32_t, H8, DO_MAX,  l)
+GEN_VEXT_AMO_NOATOMIC_OP(vamomaxud_v_d, uint64_t, int64_t, H8, DO_MAX,  q)
+#endif
+
+/* atomic opreation for vector atomic insructions */
+#ifndef CONFIG_USER_ONLY
+#define GEN_VEXT_ATOMIC_OP(NAME, ETYPE, MTYPE, MOFLAG, H, AMO)           \
+static void vext_##NAME##_atomic_op(void *vs3, target_ulong addr,        \
+        uint32_t wd, uint32_t idx, CPURISCVState *env)                   \
+{                                                                        \
+    target_ulong tmp;                                                    \
+    int mem_idx = cpu_mmu_index(env, false);                             \
+    tmp = helper_atomic_##AMO##_le(env, addr, *((ETYPE *)vs3 + H(idx)),  \
+            make_memop_idx(MO_ALIGN | MOFLAG, mem_idx));                 \
+    if (wd) {                                                            \
+        *((ETYPE *)vs3 + H(idx)) = (target_long)(MTYPE)tmp;              \
+    }                                                                    \
+}
+#else
+#define GEN_VEXT_ATOMIC_OP(NAME, ETYPE, MTYPE, MOFLAG, H, AMO)           \
+static void vext_##NAME##_atomic_op(void *vs3, target_ulong addr,        \
+        uint32_t wd, uint32_t idx, CPURISCVState *env)                   \
+{                                                                        \
+    target_ulong tmp;                                                    \
+    tmp = helper_atomic_##AMO##_le(env, addr, *((ETYPE *)vs3 + H(idx))); \
+    if (wd) {                                                            \
+        *((ETYPE *)vs3 + H(idx)) = (target_long)(MTYPE)tmp;              \
+    }                                                                    \
+}
+#endif
+
+GEN_VEXT_ATOMIC_OP(vamoswapw_v_w, int32_t,  int32_t,  MO_TESL, H4, xchgl)
+GEN_VEXT_ATOMIC_OP(vamoaddw_v_w,  int32_t,  int32_t,  MO_TESL, H4, fetch_addl)
+GEN_VEXT_ATOMIC_OP(vamoxorw_v_w,  int32_t,  int32_t,  MO_TESL, H4, fetch_xorl)
+GEN_VEXT_ATOMIC_OP(vamoandw_v_w,  int32_t,  int32_t,  MO_TESL, H4, fetch_andl)
+GEN_VEXT_ATOMIC_OP(vamoorw_v_w,   int32_t,  int32_t,  MO_TESL, H4, fetch_orl)
+GEN_VEXT_ATOMIC_OP(vamominw_v_w,  int32_t,  int32_t,  MO_TESL, H4, fetch_sminl)
+GEN_VEXT_ATOMIC_OP(vamomaxw_v_w,  int32_t,  int32_t,  MO_TESL, H4, fetch_smaxl)
+GEN_VEXT_ATOMIC_OP(vamominuw_v_w, uint32_t, int32_t,  MO_TEUL, H4, fetch_uminl)
+GEN_VEXT_ATOMIC_OP(vamomaxuw_v_w, uint32_t, int32_t,  MO_TEUL, H4, fetch_umaxl)
+#ifdef TARGET_RISCV64
+GEN_VEXT_ATOMIC_OP(vamoswapw_v_d, int64_t,  int32_t,  MO_TESL, H8, xchgl)
+GEN_VEXT_ATOMIC_OP(vamoswapd_v_d, int64_t,  int64_t,  MO_TEQ,  H8, xchgq)
+GEN_VEXT_ATOMIC_OP(vamoaddw_v_d,  int64_t,  int32_t,  MO_TESL, H8, fetch_addl)
+GEN_VEXT_ATOMIC_OP(vamoaddd_v_d,  int64_t,  int64_t,  MO_TEQ,  H8, fetch_addq)
+GEN_VEXT_ATOMIC_OP(vamoxorw_v_d,  int64_t,  int32_t,  MO_TESL, H8, fetch_xorl)
+GEN_VEXT_ATOMIC_OP(vamoxord_v_d,  int64_t,  int64_t,  MO_TEQ,  H8, fetch_xorq)
+GEN_VEXT_ATOMIC_OP(vamoandw_v_d,  int64_t,  int32_t,  MO_TESL, H8, fetch_andl)
+GEN_VEXT_ATOMIC_OP(vamoandd_v_d,  int64_t,  int64_t,  MO_TEQ,  H8, fetch_andq)
+GEN_VEXT_ATOMIC_OP(vamoorw_v_d,   int64_t,  int32_t,  MO_TESL, H8, fetch_orl)
+GEN_VEXT_ATOMIC_OP(vamoord_v_d,   int64_t,  int64_t,  MO_TEQ,  H8, fetch_orq)
+GEN_VEXT_ATOMIC_OP(vamominw_v_d,  int64_t,  int32_t,  MO_TESL, H8, fetch_sminl)
+GEN_VEXT_ATOMIC_OP(vamomind_v_d,  int64_t,  int64_t,  MO_TEQ,  H8, fetch_sminq)
+GEN_VEXT_ATOMIC_OP(vamomaxw_v_d,  int64_t,  int32_t,  MO_TESL, H8, fetch_smaxl)
+GEN_VEXT_ATOMIC_OP(vamomaxd_v_d,  int64_t,  int64_t,  MO_TEQ,  H8, fetch_smaxq)
+GEN_VEXT_ATOMIC_OP(vamominuw_v_d, uint64_t, int32_t,  MO_TEUL, H8, fetch_uminl)
+GEN_VEXT_ATOMIC_OP(vamominud_v_d, uint64_t, int64_t,  MO_TEQ,  H8, fetch_uminq)
+GEN_VEXT_ATOMIC_OP(vamomaxuw_v_d, uint64_t, int32_t,  MO_TEUL, H8, fetch_umaxl)
+GEN_VEXT_ATOMIC_OP(vamomaxud_v_d, uint64_t, int64_t,  MO_TEQ,  H8, fetch_umaxq)
+#endif
+
+static void vext_amo_atomic_mask(void *vs3, void *vs2, void *v0,
+        CPURISCVState *env, struct vext_amo_ctx *ctx, uintptr_t ra)
+{
+    uint32_t i;
+    target_long addr;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    for (i = 0; i < s->vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        probe_read_access(env, ctx->get_index_addr(ctx->base, i, vs2),
+                s->msz, ra);
+        probe_write_access(env, ctx->get_index_addr(ctx->base, i, vs2),
+                s->msz, ra);
+    }
+    for (i = 0; i < env->vext.vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        addr = ctx->get_index_addr(ctx->base, i, vs2);
+        ctx->atomic_op(vs3, addr, ctx->wd, i, env);
+    }
+    ctx->clear_elem(vs3, s->vl, s->vl * s->esz, s->vlmax * s->esz);
+}
+
+static void vext_amo_noatomic_mask(void *vs3, void *vs2, void *v0,
+        CPURISCVState *env, struct vext_amo_ctx *ctx, uintptr_t ra)
+{
+    uint32_t i;
+    target_long addr;
+    struct vext_common_ctx *s = &ctx->vcc;
+
+    for (i = 0; i < s->vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        probe_read_access(env, ctx->get_index_addr(ctx->base, i, vs2),
+                s->msz, ra);
+        probe_write_access(env, ctx->get_index_addr(ctx->base, i, vs2),
+                s->msz, ra);
+    }
+    for (i = 0; i < s->vl; i++) {
+        if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
+            continue;
+        }
+        addr = ctx->get_index_addr(ctx->base, i, vs2);
+        ctx->noatomic_op(vs3, addr, ctx->wd, i, env, ra);
+    }
+    ctx->clear_elem(vs3, s->vl, s->vl * s->esz, s->vlmax * s->esz);
+}
+
+#define GEN_VEXT_AMO(NAME, MTYPE, ETYPE, CLEAR_FN)                    \
+void HELPER(NAME##_a_mask)(void *vs3, target_ulong base, void *v0,    \
+        void *vs2, CPURISCVState *env, uint32_t desc)                 \
+{                                                                     \
+    static struct vext_amo_ctx ctx;                                   \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                     \
+        sizeof(MTYPE), env->vext.vl, desc);                           \
+    ctx.wd = vext_wd(desc);                                           \
+    ctx.base = base;                                                  \
+    ctx.atomic_op = vext_##NAME##_atomic_op;                          \
+    ctx.get_index_addr = vext_##NAME##_get_addr;                      \
+    ctx.clear_elem = CLEAR_FN;                                        \
+                                                                      \
+    vext_amo_atomic_mask(vs3, vs2, v0, env, &ctx, GETPC());           \
+}                                                                     \
+                                                                      \
+void HELPER(NAME##_mask)(void *vs3, target_ulong base, void *v0,      \
+        void *vs2, CPURISCVState *env, uint32_t desc)                 \
+{                                                                     \
+    static struct vext_amo_ctx ctx;                                   \
+    vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                     \
+        sizeof(MTYPE), env->vext.vl, desc);                           \
+    ctx.wd = vext_wd(desc);                                           \
+    ctx.base = base;                                                  \
+    ctx.noatomic_op = vext_##NAME##_noatomic_op;                      \
+    ctx.get_index_addr = vext_##NAME##_get_addr;                      \
+    ctx.clear_elem = CLEAR_FN;                                        \
+                                                                      \
+    vext_amo_noatomic_mask(vs3, vs2, v0, env, &ctx, GETPC());         \
+}
+
+#ifdef TARGET_RISCV64
+GEN_VEXT_AMO(vamoswapw_v_d, int32_t,  int64_t, vext_clearq)
+GEN_VEXT_AMO(vamoswapd_v_d, int64_t, int64_t, vext_clearq)
+GEN_VEXT_AMO(vamoaddw_v_d,  int32_t,  int64_t, vext_clearq)
+GEN_VEXT_AMO(vamoaddd_v_d,  int64_t, int64_t, vext_clearq)
+GEN_VEXT_AMO(vamoxorw_v_d,  int32_t,  int64_t, vext_clearq)
+GEN_VEXT_AMO(vamoxord_v_d,  int64_t, int64_t, vext_clearq)
+GEN_VEXT_AMO(vamoandw_v_d,  int32_t,  int64_t, vext_clearq)
+GEN_VEXT_AMO(vamoandd_v_d,  int64_t, int64_t, vext_clearq)
+GEN_VEXT_AMO(vamoorw_v_d,   int32_t,  int64_t, vext_clearq)
+GEN_VEXT_AMO(vamoord_v_d,   int64_t, int64_t, vext_clearq)
+GEN_VEXT_AMO(vamominw_v_d,  int32_t,  int64_t, vext_clearq)
+GEN_VEXT_AMO(vamomind_v_d,  int64_t, int64_t, vext_clearq)
+GEN_VEXT_AMO(vamomaxw_v_d,  int32_t,  int64_t, vext_clearq)
+GEN_VEXT_AMO(vamomaxd_v_d,  int64_t, int64_t, vext_clearq)
+GEN_VEXT_AMO(vamominuw_v_d, uint32_t,  uint64_t, vext_clearq)
+GEN_VEXT_AMO(vamominud_v_d, uint64_t, uint64_t, vext_clearq)
+GEN_VEXT_AMO(vamomaxuw_v_d, uint32_t, uint64_t, vext_clearq)
+GEN_VEXT_AMO(vamomaxud_v_d, uint64_t, uint64_t, vext_clearq)
+#endif
+GEN_VEXT_AMO(vamoswapw_v_w, int32_t, int32_t, vext_clearl)
+GEN_VEXT_AMO(vamoaddw_v_w,  int32_t, int32_t, vext_clearl)
+GEN_VEXT_AMO(vamoxorw_v_w,  int32_t, int32_t, vext_clearl)
+GEN_VEXT_AMO(vamoandw_v_w,  int32_t, int32_t, vext_clearl)
+GEN_VEXT_AMO(vamoorw_v_w,   int32_t, int32_t, vext_clearl)
+GEN_VEXT_AMO(vamominw_v_w,  int32_t, int32_t, vext_clearl)
+GEN_VEXT_AMO(vamomaxw_v_w,  int32_t, int32_t, vext_clearl)
+GEN_VEXT_AMO(vamominuw_v_w, uint32_t, uint32_t, vext_clearl)
+GEN_VEXT_AMO(vamomaxuw_v_w, uint32_t, uint32_t, vext_clearl)
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/5] target/riscv: add vector unit stride load and store instructions
  2020-02-10  7:42   ` LIU Zhiwei
@ 2020-02-12  6:38     ` Richard Henderson
  -1 siblings, 0 replies; 18+ messages in thread
From: Richard Henderson @ 2020-02-12  6:38 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

On 2/9/20 11:42 PM, LIU Zhiwei wrote:
> +/*
> + * As simd_desc supports at most 256 bytes, and in this implementation,
> + * the max vector group length is 2048 bytes. So split it into two parts.
> + *
> + * The first part is floor(maxsz, 64), encoded in maxsz of simd_desc.
> + * The second part is (maxsz % 64) >> 3, encoded in data of simd_desc.
> + */
> +static uint32_t maxsz_part1(uint32_t maxsz)
> +{
> +    return ((maxsz & ~(0x3f)) >> 3) + 0x8; /* add offset 8 to avoid return 0 */
> +}
> +
> +static uint32_t maxsz_part2(uint32_t maxsz)
> +{
> +    return (maxsz & 0x3f) >> 3;
> +}

I would much rather adjust simd_desc to support 2048 bytes.

I've just posted a patch set that removes an assert in target/arm that would
trigger if SIMD_DATA_SHIFT was increased to make room for a larger oprsz.

Or, since we're not going through tcg_gen_gvec_* for ldst, don't bother with
simd_desc at all, and just pass vlen, unencoded.

> +/* define check conditions data structure */
> +struct vext_check_ctx {
> +
> +    struct vext_reg {
> +        uint8_t reg;
> +        bool widen;
> +        bool need_check;
> +    } check_reg[6];
> +
> +    struct vext_overlap_mask {
> +        uint8_t reg;
> +        uint8_t vm;
> +        bool need_check;
> +    } check_overlap_mask;
> +
> +    struct vext_nf {
> +        uint8_t nf;
> +        bool need_check;
> +    } check_nf;
> +    target_ulong check_misa;
> +
> +} vchkctx;

You cannot use a global variable.  The data must be thread-safe.

If we're going to do the checks this way, with a structure, it needs to be on
the stack or within DisasContext.

> +#define GEN_VEXT_LD_US_TRANS(NAME, DO_OP, SEQ)                            \
> +static bool trans_##NAME(DisasContext *s, arg_r2nfvm* a)                  \
> +{                                                                         \
> +    vchkctx.check_misa = RVV;                                             \
> +    vchkctx.check_overlap_mask.need_check = true;                         \
> +    vchkctx.check_overlap_mask.reg = a->rd;                               \
> +    vchkctx.check_overlap_mask.vm = a->vm;                                \
> +    vchkctx.check_reg[0].need_check = true;                               \
> +    vchkctx.check_reg[0].reg = a->rd;                                     \
> +    vchkctx.check_reg[0].widen = false;                                   \
> +    vchkctx.check_nf.need_check = true;                                   \
> +    vchkctx.check_nf.nf = a->nf;                                          \
> +                                                                          \
> +    if (!vext_check(s)) {                                                 \
> +        return false;                                                     \
> +    }                                                                     \
> +    return DO_OP(s, a, SEQ);                                              \
> +}

I don't see the improvement from a pointer.  Something like

    if (vext_check_isa_ill(s) &&
        vext_check_overlap(s, a->rd, a->rm) &&
        vext_check_reg(s, a->rd, false) &&
        vext_check_nf(s, a->nf)) {
        return DO_OP(s, a, SEQ);
    }
    return false;

seems just as clear without the extra data.

> +#ifdef CONFIG_USER_ONLY
> +#define MO_SB 0
> +#define MO_LESW 0
> +#define MO_LESL 0
> +#define MO_LEQ 0
> +#define MO_UB 0
> +#define MO_LEUW 0
> +#define MO_LEUL 0
> +#endif

What is this for?  We already define these unconditionally.


> +static inline int vext_elem_mask(void *v0, int mlen, int index)
> +{
> +    int idx = (index * mlen) / 8;
> +    int pos = (index * mlen) % 8;
> +
> +    return (*((uint8_t *)v0 + idx) >> pos) & 0x1;
> +}

This is a little-endian indexing of the mask.  Just above we talk about using a
host-endian ordering of uint64_t.

Thus this must be based on uint64_t instead of uint8_t.

> +/*
> + * This function checks watchpoint before really load operation.
> + *
> + * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
> + * In user mode, there is no watchpoint support now.
> + *
> + * It will triggle an exception if there is no mapping in TLB
> + * and page table walk can't fill the TLB entry. Then the guest
> + * software can return here after process the exception or never return.
> + */
> +static void probe_read_access(CPURISCVState *env, target_ulong addr,
> +        target_ulong len, uintptr_t ra)
> +{
> +    while (len) {
> +        const target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
> +        const target_ulong curlen = MIN(pagelen, len);
> +
> +        probe_read(env, addr, curlen, cpu_mmu_index(env, false), ra);

The return value here is non-null when we can read directly from host memory.
It would be a shame to throw that work away.


> +/* data structure and common functions for load and store */
> +typedef void vext_ld_elem_fn(CPURISCVState *env, target_ulong addr,
> +        uint32_t idx, void *vd, uintptr_t retaddr);
> +typedef void vext_st_elem_fn(CPURISCVState *env, target_ulong addr,
> +        uint32_t idx, void *vd, uintptr_t retaddr);
> +typedef target_ulong vext_get_index_addr(target_ulong base,
> +        uint32_t idx, void *vs2);
> +typedef void vext_ld_clear_elem(void *vd, uint32_t idx,
> +        uint32_t cnt, uint32_t tot);
> +
> +struct vext_ldst_ctx {
> +    struct vext_common_ctx vcc;
> +    uint32_t nf;
> +    target_ulong base;
> +    target_ulong stride;
> +    int mmuidx;
> +
> +    vext_ld_elem_fn *ld_elem;
> +    vext_st_elem_fn *st_elem;
> +    vext_get_index_addr *get_index_addr;
> +    vext_ld_clear_elem *clear_elem;
> +};

I think you should pass these elements directly, as needed, rather than putting
them all in a struct.

This would allow the main helper function to be inlined, which in turn allows
the mini helper functions to be inlined.


r~


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/5] target/riscv: add vector unit stride load and store instructions
@ 2020-02-12  6:38     ` Richard Henderson
  0 siblings, 0 replies; 18+ messages in thread
From: Richard Henderson @ 2020-02-12  6:38 UTC (permalink / raw)
  To: LIU Zhiwei, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, qemu-devel, qemu-riscv

On 2/9/20 11:42 PM, LIU Zhiwei wrote:
> +/*
> + * As simd_desc supports at most 256 bytes, and in this implementation,
> + * the max vector group length is 2048 bytes. So split it into two parts.
> + *
> + * The first part is floor(maxsz, 64), encoded in maxsz of simd_desc.
> + * The second part is (maxsz % 64) >> 3, encoded in data of simd_desc.
> + */
> +static uint32_t maxsz_part1(uint32_t maxsz)
> +{
> +    return ((maxsz & ~(0x3f)) >> 3) + 0x8; /* add offset 8 to avoid return 0 */
> +}
> +
> +static uint32_t maxsz_part2(uint32_t maxsz)
> +{
> +    return (maxsz & 0x3f) >> 3;
> +}

I would much rather adjust simd_desc to support 2048 bytes.

I've just posted a patch set that removes an assert in target/arm that would
trigger if SIMD_DATA_SHIFT was increased to make room for a larger oprsz.

Or, since we're not going through tcg_gen_gvec_* for ldst, don't bother with
simd_desc at all, and just pass vlen, unencoded.

> +/* define check conditions data structure */
> +struct vext_check_ctx {
> +
> +    struct vext_reg {
> +        uint8_t reg;
> +        bool widen;
> +        bool need_check;
> +    } check_reg[6];
> +
> +    struct vext_overlap_mask {
> +        uint8_t reg;
> +        uint8_t vm;
> +        bool need_check;
> +    } check_overlap_mask;
> +
> +    struct vext_nf {
> +        uint8_t nf;
> +        bool need_check;
> +    } check_nf;
> +    target_ulong check_misa;
> +
> +} vchkctx;

You cannot use a global variable.  The data must be thread-safe.

If we're going to do the checks this way, with a structure, it needs to be on
the stack or within DisasContext.

> +#define GEN_VEXT_LD_US_TRANS(NAME, DO_OP, SEQ)                            \
> +static bool trans_##NAME(DisasContext *s, arg_r2nfvm* a)                  \
> +{                                                                         \
> +    vchkctx.check_misa = RVV;                                             \
> +    vchkctx.check_overlap_mask.need_check = true;                         \
> +    vchkctx.check_overlap_mask.reg = a->rd;                               \
> +    vchkctx.check_overlap_mask.vm = a->vm;                                \
> +    vchkctx.check_reg[0].need_check = true;                               \
> +    vchkctx.check_reg[0].reg = a->rd;                                     \
> +    vchkctx.check_reg[0].widen = false;                                   \
> +    vchkctx.check_nf.need_check = true;                                   \
> +    vchkctx.check_nf.nf = a->nf;                                          \
> +                                                                          \
> +    if (!vext_check(s)) {                                                 \
> +        return false;                                                     \
> +    }                                                                     \
> +    return DO_OP(s, a, SEQ);                                              \
> +}

I don't see the improvement from a pointer.  Something like

    if (vext_check_isa_ill(s) &&
        vext_check_overlap(s, a->rd, a->rm) &&
        vext_check_reg(s, a->rd, false) &&
        vext_check_nf(s, a->nf)) {
        return DO_OP(s, a, SEQ);
    }
    return false;

seems just as clear without the extra data.

> +#ifdef CONFIG_USER_ONLY
> +#define MO_SB 0
> +#define MO_LESW 0
> +#define MO_LESL 0
> +#define MO_LEQ 0
> +#define MO_UB 0
> +#define MO_LEUW 0
> +#define MO_LEUL 0
> +#endif

What is this for?  We already define these unconditionally.


> +static inline int vext_elem_mask(void *v0, int mlen, int index)
> +{
> +    int idx = (index * mlen) / 8;
> +    int pos = (index * mlen) % 8;
> +
> +    return (*((uint8_t *)v0 + idx) >> pos) & 0x1;
> +}

This is a little-endian indexing of the mask.  Just above we talk about using a
host-endian ordering of uint64_t.

Thus this must be based on uint64_t instead of uint8_t.

> +/*
> + * This function checks watchpoint before really load operation.
> + *
> + * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
> + * In user mode, there is no watchpoint support now.
> + *
> + * It will triggle an exception if there is no mapping in TLB
> + * and page table walk can't fill the TLB entry. Then the guest
> + * software can return here after process the exception or never return.
> + */
> +static void probe_read_access(CPURISCVState *env, target_ulong addr,
> +        target_ulong len, uintptr_t ra)
> +{
> +    while (len) {
> +        const target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
> +        const target_ulong curlen = MIN(pagelen, len);
> +
> +        probe_read(env, addr, curlen, cpu_mmu_index(env, false), ra);

The return value here is non-null when we can read directly from host memory.
It would be a shame to throw that work away.


> +/* data structure and common functions for load and store */
> +typedef void vext_ld_elem_fn(CPURISCVState *env, target_ulong addr,
> +        uint32_t idx, void *vd, uintptr_t retaddr);
> +typedef void vext_st_elem_fn(CPURISCVState *env, target_ulong addr,
> +        uint32_t idx, void *vd, uintptr_t retaddr);
> +typedef target_ulong vext_get_index_addr(target_ulong base,
> +        uint32_t idx, void *vs2);
> +typedef void vext_ld_clear_elem(void *vd, uint32_t idx,
> +        uint32_t cnt, uint32_t tot);
> +
> +struct vext_ldst_ctx {
> +    struct vext_common_ctx vcc;
> +    uint32_t nf;
> +    target_ulong base;
> +    target_ulong stride;
> +    int mmuidx;
> +
> +    vext_ld_elem_fn *ld_elem;
> +    vext_st_elem_fn *st_elem;
> +    vext_get_index_addr *get_index_addr;
> +    vext_ld_clear_elem *clear_elem;
> +};

I think you should pass these elements directly, as needed, rather than putting
them all in a struct.

This would allow the main helper function to be inlined, which in turn allows
the mini helper functions to be inlined.


r~


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/5] target/riscv: add vector unit stride load and store instructions
  2020-02-12  6:38     ` Richard Henderson
@ 2020-02-12  8:55       ` LIU Zhiwei
  -1 siblings, 0 replies; 18+ messages in thread
From: LIU Zhiwei @ 2020-02-12  8:55 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

Hi, Richard

Thanks for comments.

On 2020/2/12 14:38, Richard Henderson wrote:
> On 2/9/20 11:42 PM, LIU Zhiwei wrote:
>> +/*
>> + * As simd_desc supports at most 256 bytes, and in this implementation,
>> + * the max vector group length is 2048 bytes. So split it into two parts.
>> + *
>> + * The first part is floor(maxsz, 64), encoded in maxsz of simd_desc.
>> + * The second part is (maxsz % 64) >> 3, encoded in data of simd_desc.
>> + */
>> +static uint32_t maxsz_part1(uint32_t maxsz)
>> +{
>> +    return ((maxsz & ~(0x3f)) >> 3) + 0x8; /* add offset 8 to avoid return 0 */
>> +}
>> +
>> +static uint32_t maxsz_part2(uint32_t maxsz)
>> +{
>> +    return (maxsz & 0x3f) >> 3;
>> +}
> I would much rather adjust simd_desc to support 2048 bytes.
>
> I've just posted a patch set that removes an assert in target/arm that would
> trigger if SIMD_DATA_SHIFT was increased to make room for a larger oprsz.
Do you mean "assert(maxsz % 8 == 0 && maxsz <= (8 << SIMD_MAXSZ_BITS));" 
in tcg-op-gvec.c?
If it is removed, I can pass 2048 bytes by set maxsz == 256.
> Or, since we're not going through tcg_gen_gvec_* for ldst, don't bother with
> simd_desc at all, and just pass vlen, unencoded.
  Vlen is not enough,  lmul is also needed in helpers.
>
>> +/* define check conditions data structure */
>> +struct vext_check_ctx {
>> +
>> +    struct vext_reg {
>> +        uint8_t reg;
>> +        bool widen;
>> +        bool need_check;
>> +    } check_reg[6];
>> +
>> +    struct vext_overlap_mask {
>> +        uint8_t reg;
>> +        uint8_t vm;
>> +        bool need_check;
>> +    } check_overlap_mask;
>> +
>> +    struct vext_nf {
>> +        uint8_t nf;
>> +        bool need_check;
>> +    } check_nf;
>> +    target_ulong check_misa;
>> +
>> +} vchkctx;
> You cannot use a global variable.  The data must be thread-safe.
>
> If we're going to do the checks this way, with a structure, it needs to be on
> the stack or within DisasContext.
>> +#define GEN_VEXT_LD_US_TRANS(NAME, DO_OP, SEQ)                            \
>> +static bool trans_##NAME(DisasContext *s, arg_r2nfvm* a)                  \
>> +{                                                                         \
>> +    vchkctx.check_misa = RVV;                                             \
>> +    vchkctx.check_overlap_mask.need_check = true;                         \
>> +    vchkctx.check_overlap_mask.reg = a->rd;                               \
>> +    vchkctx.check_overlap_mask.vm = a->vm;                                \
>> +    vchkctx.check_reg[0].need_check = true;                               \
>> +    vchkctx.check_reg[0].reg = a->rd;                                     \
>> +    vchkctx.check_reg[0].widen = false;                                   \
>> +    vchkctx.check_nf.need_check = true;                                   \
>> +    vchkctx.check_nf.nf = a->nf;                                          \
>> +                                                                          \
>> +    if (!vext_check(s)) {                                                 \
>> +        return false;                                                     \
>> +    }                                                                     \
>> +    return DO_OP(s, a, SEQ);                                              \
>> +}
> I don't see the improvement from a pointer.  Something like
>
>      if (vext_check_isa_ill(s) &&
>          vext_check_overlap(s, a->rd, a->rm) &&
>          vext_check_reg(s, a->rd, false) &&
>          vext_check_nf(s, a->nf)) {
>          return DO_OP(s, a, SEQ);
>      }
>      return false;
>
> seems just as clear without the extra data.
I am not quite sure which is clearer. In my opinion, setting datas is 
more easy than call different intefaces.
>> +#ifdef CONFIG_USER_ONLY
>> +#define MO_SB 0
>> +#define MO_LESW 0
>> +#define MO_LESL 0
>> +#define MO_LEQ 0
>> +#define MO_UB 0
>> +#define MO_LEUW 0
>> +#define MO_LEUL 0
>> +#endif
> What is this for?  We already define these unconditionally.
Yes. I miss a head file "exec/memop.h". When I compile in user mode,  
some make errors appear.
I will remove these codes next patch.
>
>> +static inline int vext_elem_mask(void *v0, int mlen, int index)
>> +{
>> +    int idx = (index * mlen) / 8;
>> +    int pos = (index * mlen) % 8;
>> +
>> +    return (*((uint8_t *)v0 + idx) >> pos) & 0x1;
>> +}
> This is a little-endian indexing of the mask.  Just above we talk about using a
> host-endian ordering of uint64_t.
>
> Thus this must be based on uint64_t instead of uint8_t.
>
>> +/*
>> + * This function checks watchpoint before really load operation.
>> + *
>> + * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
>> + * In user mode, there is no watchpoint support now.
>> + *
>> + * It will triggle an exception if there is no mapping in TLB
>> + * and page table walk can't fill the TLB entry. Then the guest
>> + * software can return here after process the exception or never return.
>> + */
>> +static void probe_read_access(CPURISCVState *env, target_ulong addr,
>> +        target_ulong len, uintptr_t ra)
>> +{
>> +    while (len) {
>> +        const target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
>> +        const target_ulong curlen = MIN(pagelen, len);
>> +
>> +        probe_read(env, addr, curlen, cpu_mmu_index(env, false), ra);
> The return value here is non-null when we can read directly from host memory.
> It would be a shame to throw that work away.
Yes. These host addresses can be useful. I just ignore them, because it 
will have to
add some local variables to use them. And cpu_*_mmuidx_ra will just 
search tlb table by tlb_hit.
I am not quite sure if I should keep host address in an array.

Do you think it is necessary?
>
>> +/* data structure and common functions for load and store */
>> +typedef void vext_ld_elem_fn(CPURISCVState *env, target_ulong addr,
>> +        uint32_t idx, void *vd, uintptr_t retaddr);
>> +typedef void vext_st_elem_fn(CPURISCVState *env, target_ulong addr,
>> +        uint32_t idx, void *vd, uintptr_t retaddr);
>> +typedef target_ulong vext_get_index_addr(target_ulong base,
>> +        uint32_t idx, void *vs2);
>> +typedef void vext_ld_clear_elem(void *vd, uint32_t idx,
>> +        uint32_t cnt, uint32_t tot);
>> +
>> +struct vext_ldst_ctx {
>> +    struct vext_common_ctx vcc;
>> +    uint32_t nf;
>> +    target_ulong base;
>> +    target_ulong stride;
>> +    int mmuidx;
>> +
>> +    vext_ld_elem_fn *ld_elem;
>> +    vext_st_elem_fn *st_elem;
>> +    vext_get_index_addr *get_index_addr;
>> +    vext_ld_clear_elem *clear_elem;
>> +};
> I think you should pass these elements directly, as needed, rather than putting
> them all in a struct.
>
> This would allow the main helper function to be inlined, which in turn allows
> the mini helper functions to be inlined.
The structure is to reduce main helper function code size and reduce the 
number of arguments
of mini helper functions.
I once pass these elements directly before this patch. It's more 
confused as so many scatted
variables and arguments.

I'm not quite sure about the efficiency improvements. If you are sure 
about that, could you
explain more details about how to achieve it.

Best Regards,
Zhiwei

>
>
> r~



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/5] target/riscv: add vector unit stride load and store instructions
@ 2020-02-12  8:55       ` LIU Zhiwei
  0 siblings, 0 replies; 18+ messages in thread
From: LIU Zhiwei @ 2020-02-12  8:55 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, qemu-devel, qemu-riscv

Hi, Richard

Thanks for comments.

On 2020/2/12 14:38, Richard Henderson wrote:
> On 2/9/20 11:42 PM, LIU Zhiwei wrote:
>> +/*
>> + * As simd_desc supports at most 256 bytes, and in this implementation,
>> + * the max vector group length is 2048 bytes. So split it into two parts.
>> + *
>> + * The first part is floor(maxsz, 64), encoded in maxsz of simd_desc.
>> + * The second part is (maxsz % 64) >> 3, encoded in data of simd_desc.
>> + */
>> +static uint32_t maxsz_part1(uint32_t maxsz)
>> +{
>> +    return ((maxsz & ~(0x3f)) >> 3) + 0x8; /* add offset 8 to avoid return 0 */
>> +}
>> +
>> +static uint32_t maxsz_part2(uint32_t maxsz)
>> +{
>> +    return (maxsz & 0x3f) >> 3;
>> +}
> I would much rather adjust simd_desc to support 2048 bytes.
>
> I've just posted a patch set that removes an assert in target/arm that would
> trigger if SIMD_DATA_SHIFT was increased to make room for a larger oprsz.
Do you mean "assert(maxsz % 8 == 0 && maxsz <= (8 << SIMD_MAXSZ_BITS));" 
in tcg-op-gvec.c?
If it is removed, I can pass 2048 bytes by set maxsz == 256.
> Or, since we're not going through tcg_gen_gvec_* for ldst, don't bother with
> simd_desc at all, and just pass vlen, unencoded.
  Vlen is not enough,  lmul is also needed in helpers.
>
>> +/* define check conditions data structure */
>> +struct vext_check_ctx {
>> +
>> +    struct vext_reg {
>> +        uint8_t reg;
>> +        bool widen;
>> +        bool need_check;
>> +    } check_reg[6];
>> +
>> +    struct vext_overlap_mask {
>> +        uint8_t reg;
>> +        uint8_t vm;
>> +        bool need_check;
>> +    } check_overlap_mask;
>> +
>> +    struct vext_nf {
>> +        uint8_t nf;
>> +        bool need_check;
>> +    } check_nf;
>> +    target_ulong check_misa;
>> +
>> +} vchkctx;
> You cannot use a global variable.  The data must be thread-safe.
>
> If we're going to do the checks this way, with a structure, it needs to be on
> the stack or within DisasContext.
>> +#define GEN_VEXT_LD_US_TRANS(NAME, DO_OP, SEQ)                            \
>> +static bool trans_##NAME(DisasContext *s, arg_r2nfvm* a)                  \
>> +{                                                                         \
>> +    vchkctx.check_misa = RVV;                                             \
>> +    vchkctx.check_overlap_mask.need_check = true;                         \
>> +    vchkctx.check_overlap_mask.reg = a->rd;                               \
>> +    vchkctx.check_overlap_mask.vm = a->vm;                                \
>> +    vchkctx.check_reg[0].need_check = true;                               \
>> +    vchkctx.check_reg[0].reg = a->rd;                                     \
>> +    vchkctx.check_reg[0].widen = false;                                   \
>> +    vchkctx.check_nf.need_check = true;                                   \
>> +    vchkctx.check_nf.nf = a->nf;                                          \
>> +                                                                          \
>> +    if (!vext_check(s)) {                                                 \
>> +        return false;                                                     \
>> +    }                                                                     \
>> +    return DO_OP(s, a, SEQ);                                              \
>> +}
> I don't see the improvement from a pointer.  Something like
>
>      if (vext_check_isa_ill(s) &&
>          vext_check_overlap(s, a->rd, a->rm) &&
>          vext_check_reg(s, a->rd, false) &&
>          vext_check_nf(s, a->nf)) {
>          return DO_OP(s, a, SEQ);
>      }
>      return false;
>
> seems just as clear without the extra data.
I am not quite sure which is clearer. In my opinion, setting datas is 
more easy than call different intefaces.
>> +#ifdef CONFIG_USER_ONLY
>> +#define MO_SB 0
>> +#define MO_LESW 0
>> +#define MO_LESL 0
>> +#define MO_LEQ 0
>> +#define MO_UB 0
>> +#define MO_LEUW 0
>> +#define MO_LEUL 0
>> +#endif
> What is this for?  We already define these unconditionally.
Yes. I miss a head file "exec/memop.h". When I compile in user mode,  
some make errors appear.
I will remove these codes next patch.
>
>> +static inline int vext_elem_mask(void *v0, int mlen, int index)
>> +{
>> +    int idx = (index * mlen) / 8;
>> +    int pos = (index * mlen) % 8;
>> +
>> +    return (*((uint8_t *)v0 + idx) >> pos) & 0x1;
>> +}
> This is a little-endian indexing of the mask.  Just above we talk about using a
> host-endian ordering of uint64_t.
>
> Thus this must be based on uint64_t instead of uint8_t.
>
>> +/*
>> + * This function checks watchpoint before really load operation.
>> + *
>> + * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
>> + * In user mode, there is no watchpoint support now.
>> + *
>> + * It will triggle an exception if there is no mapping in TLB
>> + * and page table walk can't fill the TLB entry. Then the guest
>> + * software can return here after process the exception or never return.
>> + */
>> +static void probe_read_access(CPURISCVState *env, target_ulong addr,
>> +        target_ulong len, uintptr_t ra)
>> +{
>> +    while (len) {
>> +        const target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
>> +        const target_ulong curlen = MIN(pagelen, len);
>> +
>> +        probe_read(env, addr, curlen, cpu_mmu_index(env, false), ra);
> The return value here is non-null when we can read directly from host memory.
> It would be a shame to throw that work away.
Yes. These host addresses can be useful. I just ignore them, because it 
will have to
add some local variables to use them. And cpu_*_mmuidx_ra will just 
search tlb table by tlb_hit.
I am not quite sure if I should keep host address in an array.

Do you think it is necessary?
>
>> +/* data structure and common functions for load and store */
>> +typedef void vext_ld_elem_fn(CPURISCVState *env, target_ulong addr,
>> +        uint32_t idx, void *vd, uintptr_t retaddr);
>> +typedef void vext_st_elem_fn(CPURISCVState *env, target_ulong addr,
>> +        uint32_t idx, void *vd, uintptr_t retaddr);
>> +typedef target_ulong vext_get_index_addr(target_ulong base,
>> +        uint32_t idx, void *vs2);
>> +typedef void vext_ld_clear_elem(void *vd, uint32_t idx,
>> +        uint32_t cnt, uint32_t tot);
>> +
>> +struct vext_ldst_ctx {
>> +    struct vext_common_ctx vcc;
>> +    uint32_t nf;
>> +    target_ulong base;
>> +    target_ulong stride;
>> +    int mmuidx;
>> +
>> +    vext_ld_elem_fn *ld_elem;
>> +    vext_st_elem_fn *st_elem;
>> +    vext_get_index_addr *get_index_addr;
>> +    vext_ld_clear_elem *clear_elem;
>> +};
> I think you should pass these elements directly, as needed, rather than putting
> them all in a struct.
>
> This would allow the main helper function to be inlined, which in turn allows
> the mini helper functions to be inlined.
The structure is to reduce main helper function code size and reduce the 
number of arguments
of mini helper functions.
I once pass these elements directly before this patch. It's more 
confused as so many scatted
variables and arguments.

I'm not quite sure about the efficiency improvements. If you are sure 
about that, could you
explain more details about how to achieve it.

Best Regards,
Zhiwei

>
>
> r~



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/5] target/riscv: add vector unit stride load and store instructions
  2020-02-12  6:38     ` Richard Henderson
@ 2020-02-19  8:57       ` LIU Zhiwei
  -1 siblings, 0 replies; 18+ messages in thread
From: LIU Zhiwei @ 2020-02-19  8:57 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, qemu-riscv, qemu-devel, wxy194768

[-- Attachment #1: Type: text/plain, Size: 9592 bytes --]

Hi, Richard
Thanks for your informative comments. I'm addressing these comments.
And a little confused in some comments.
On 2020/2/12 14:38, Richard Henderson wrote:
> On 2/9/20 11:42 PM, LIU Zhiwei wrote:
>> +/*
>> + * As simd_desc supports at most 256 bytes, and in this implementation,
>> + * the max vector group length is 2048 bytes. So split it into two parts.
>> + *
>> + * The first part is floor(maxsz, 64), encoded in maxsz of simd_desc.
>> + * The second part is (maxsz % 64) >> 3, encoded in data of simd_desc.
>> + */
>> +static uint32_t maxsz_part1(uint32_t maxsz)
>> +{
>> +    return ((maxsz & ~(0x3f)) >> 3) + 0x8; /* add offset 8 to avoid return 0 */
>> +}
>> +
>> +static uint32_t maxsz_part2(uint32_t maxsz)
>> +{
>> +    return (maxsz & 0x3f) >> 3;
>> +}
> I would much rather adjust simd_desc to support 2048 bytes.
>
> I've just posted a patch set that removes an assert in target/arm that would
> trigger if SIMD_DATA_SHIFT was increased to make room for a larger oprsz.
>
> Or, since we're not going through tcg_gen_gvec_* for ldst, don't bother with
> simd_desc at all, and just pass vlen, unencoded.
>
>> +/* define check conditions data structure */
>> +struct vext_check_ctx {
>> +
>> +    struct vext_reg {
>> +        uint8_t reg;
>> +        bool widen;
>> +        bool need_check;
>> +    } check_reg[6];
>> +
>> +    struct vext_overlap_mask {
>> +        uint8_t reg;
>> +        uint8_t vm;
>> +        bool need_check;
>> +    } check_overlap_mask;
>> +
>> +    struct vext_nf {
>> +        uint8_t nf;
>> +        bool need_check;
>> +    } check_nf;
>> +    target_ulong check_misa;
>> +
>> +} vchkctx;
> You cannot use a global variable.  The data must be thread-safe.
>
> If we're going to do the checks this way, with a structure, it needs to be on
> the stack or within DisasContext.
>
>> +#define GEN_VEXT_LD_US_TRANS(NAME, DO_OP, SEQ)                            \
>> +static bool trans_##NAME(DisasContext *s, arg_r2nfvm* a)                  \
>> +{                                                                         \
>> +    vchkctx.check_misa = RVV;                                             \
>> +    vchkctx.check_overlap_mask.need_check = true;                         \
>> +    vchkctx.check_overlap_mask.reg = a->rd;                               \
>> +    vchkctx.check_overlap_mask.vm = a->vm;                                \
>> +    vchkctx.check_reg[0].need_check = true;                               \
>> +    vchkctx.check_reg[0].reg = a->rd;                                     \
>> +    vchkctx.check_reg[0].widen = false;                                   \
>> +    vchkctx.check_nf.need_check = true;                                   \
>> +    vchkctx.check_nf.nf = a->nf;                                          \
>> +                                                                          \
>> +    if (!vext_check(s)) {                                                 \
>> +        return false;                                                     \
>> +    }                                                                     \
>> +    return DO_OP(s, a, SEQ);                                              \
>> +}
> I don't see the improvement from a pointer.  Something like
>
>      if (vext_check_isa_ill(s) &&
>          vext_check_overlap(s, a->rd, a->rm) &&
>          vext_check_reg(s, a->rd, false) &&
>          vext_check_nf(s, a->nf)) {
>          return DO_OP(s, a, SEQ);
>      }
>      return false;
>
> seems just as clear without the extra data.
>
>> +#ifdef CONFIG_USER_ONLY
>> +#define MO_SB 0
>> +#define MO_LESW 0
>> +#define MO_LESL 0
>> +#define MO_LEQ 0
>> +#define MO_UB 0
>> +#define MO_LEUW 0
>> +#define MO_LEUL 0
>> +#endif
> What is this for?  We already define these unconditionally.
>
>
>> +static inline int vext_elem_mask(void *v0, int mlen, int index)
>> +{
>> +    int idx = (index * mlen) / 8;
>> +    int pos = (index * mlen) % 8;
>> +
>> +    return (*((uint8_t *)v0 + idx) >> pos) & 0x1;
>> +}
> This is a little-endian indexing of the mask.  Just above we talk about using a
> host-endian ordering of uint64_t.
>
> Thus this must be based on uint64_t instead of uint8_t.
>
>> +/*
>> + * This function checks watchpoint before really load operation.
>> + *
>> + * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
>> + * In user mode, there is no watchpoint support now.
>> + *
>> + * It will triggle an exception if there is no mapping in TLB
>> + * and page table walk can't fill the TLB entry. Then the guest
>> + * software can return here after process the exception or never return.
>> + */
>> +static void probe_read_access(CPURISCVState *env, target_ulong addr,
>> +        target_ulong len, uintptr_t ra)
>> +{
>> +    while (len) {
>> +        const target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
>> +        const target_ulong curlen = MIN(pagelen, len);
>> +
>> +        probe_read(env, addr, curlen, cpu_mmu_index(env, false), ra);
> The return value here is non-null when we can read directly from host memory.
> It would be a shame to throw that work away.
>
>
>> +/* data structure and common functions for load and store */
>> +typedef void vext_ld_elem_fn(CPURISCVState *env, target_ulong addr,
>> +        uint32_t idx, void *vd, uintptr_t retaddr);
>> +typedef void vext_st_elem_fn(CPURISCVState *env, target_ulong addr,
>> +        uint32_t idx, void *vd, uintptr_t retaddr);
>> +typedef target_ulong vext_get_index_addr(target_ulong base,
>> +        uint32_t idx, void *vs2);
>> +typedef void vext_ld_clear_elem(void *vd, uint32_t idx,
>> +        uint32_t cnt, uint32_t tot);
>> +
>> +struct vext_ldst_ctx {
>> +    struct vext_common_ctx vcc;
>> +    uint32_t nf;
>> +    target_ulong base;
>> +    target_ulong stride;
>> +    int mmuidx;
>> +
>> +    vext_ld_elem_fn *ld_elem;
>> +    vext_st_elem_fn *st_elem;
>> +    vext_get_index_addr *get_index_addr;
>> +    vext_ld_clear_elem *clear_elem;
>> +};
> I think you should pass these elements directly, as needed, rather than putting
> them all in a struct.
>
> This would allow the main helper function to be inlined, which in turn allows
> the mini helper functions to be inlined.
1. What's the main helper function? What's is the mini helper functions 
here?

I guess main helper function is such code like:

    #define GEN_VEXT_LD_UNIT_STRIDE(NAME, MTYPE, ETYPE)                \
    void HELPER(NAME##_mask)(void *vd, target_ulong base, void *v0, \
             CPURISCVState *env, uint32_t
    desc)                                              \
    { \
         static struct vext_ldst_ctx ctx;                                \
         vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                 
                    \
             sizeof(MTYPE), env->vl, desc);                               \
         ctx.nf = vext_nf(desc);                                  \
         ctx.base = base;                                     \
         ctx.ld_elem = vext_##NAME##_ld_elem;                  \
         ctx.clear_elem = vext_##NAME##_clear_elem;               \
       \
         vext_ld_unit_stride_mask(vd, v0, env, &ctx, GETPC());          
                 \
    } \

And the mini helper function is such code like:

    static void vext_ld_unit_stride_mask(void *vd, void *v0,
    CPURISCVState *env,
             struct vext_ldst_ctx *ctx, uintptr_t ra)
    {
         uint32_t i, k;
         struct vext_common_ctx *s = &ctx->vcc;

         if (s->vl == 0) {
             return;
         }
         /* probe every access*/
         for (i = 0; i < s->vl; i++) {
             if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
                 continue;
             }
             probe_read_access(env, ctx->base + ctx->nf * i * s->msz,
                     ctx->nf * s->msz, ra);
         }
         /* load bytes from guest memory */
         for (i = 0; i < s->vl; i++) {
             k = 0;
             if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
                 continue;
             }
             while (k < ctx->nf) {
                 target_ulong addr = ctx->base + (i * ctx->nf + k) * s->msz;
                 ctx->ld_elem(env, addr, i + k * s->vlmax, vd, ra);
                 k++;
             }
         }
         /* clear tail elements */
         for (k = 0; k < ctx->nf; k++) {
             ctx->clear_elem(vd, s->vl + k * s->vlmax, s->vl * s->esz,
                     s->vlmax * s->esz);
         }
    }

Is it right?
2.  The number of  parameters grows a lot when pass directly to mini 
helper functions.

For example, the number of parameters increases from 5 to 11.

    static void vext_ld_unit_stride(void *vd, target_ulong base, void *v0,
             CPURISCVState *env,  vext_ld_elem_fn *ld_elem,
             vext_ld_clear_elem *clear_elem，uint32_t vlmax,
             uint32_t nf, uint32_t esz, uint32_t msz, uintptr_t ra)


As vlmax and nf can be extracted  from desc, another form of mini helper
will increases the number of parameters from 5 to 10.

Is it OK?

BTW: In this patchset, I use a lot macros to generate code.  Is it OK?

Best Regards,
Zhiwei
>
> r~


[-- Attachment #2: Type: text/html, Size: 13219 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/5] target/riscv: add vector unit stride load and store instructions
@ 2020-02-19  8:57       ` LIU Zhiwei
  0 siblings, 0 replies; 18+ messages in thread
From: LIU Zhiwei @ 2020-02-19  8:57 UTC (permalink / raw)
  To: Richard Henderson, alistair23, chihmin.chao, palmer
  Cc: wenmeng_zhang, wxy194768, qemu-devel, qemu-riscv

[-- Attachment #1: Type: text/plain, Size: 9592 bytes --]

Hi, Richard
Thanks for your informative comments. I'm addressing these comments.
And a little confused in some comments.
On 2020/2/12 14:38, Richard Henderson wrote:
> On 2/9/20 11:42 PM, LIU Zhiwei wrote:
>> +/*
>> + * As simd_desc supports at most 256 bytes, and in this implementation,
>> + * the max vector group length is 2048 bytes. So split it into two parts.
>> + *
>> + * The first part is floor(maxsz, 64), encoded in maxsz of simd_desc.
>> + * The second part is (maxsz % 64) >> 3, encoded in data of simd_desc.
>> + */
>> +static uint32_t maxsz_part1(uint32_t maxsz)
>> +{
>> +    return ((maxsz & ~(0x3f)) >> 3) + 0x8; /* add offset 8 to avoid return 0 */
>> +}
>> +
>> +static uint32_t maxsz_part2(uint32_t maxsz)
>> +{
>> +    return (maxsz & 0x3f) >> 3;
>> +}
> I would much rather adjust simd_desc to support 2048 bytes.
>
> I've just posted a patch set that removes an assert in target/arm that would
> trigger if SIMD_DATA_SHIFT was increased to make room for a larger oprsz.
>
> Or, since we're not going through tcg_gen_gvec_* for ldst, don't bother with
> simd_desc at all, and just pass vlen, unencoded.
>
>> +/* define check conditions data structure */
>> +struct vext_check_ctx {
>> +
>> +    struct vext_reg {
>> +        uint8_t reg;
>> +        bool widen;
>> +        bool need_check;
>> +    } check_reg[6];
>> +
>> +    struct vext_overlap_mask {
>> +        uint8_t reg;
>> +        uint8_t vm;
>> +        bool need_check;
>> +    } check_overlap_mask;
>> +
>> +    struct vext_nf {
>> +        uint8_t nf;
>> +        bool need_check;
>> +    } check_nf;
>> +    target_ulong check_misa;
>> +
>> +} vchkctx;
> You cannot use a global variable.  The data must be thread-safe.
>
> If we're going to do the checks this way, with a structure, it needs to be on
> the stack or within DisasContext.
>
>> +#define GEN_VEXT_LD_US_TRANS(NAME, DO_OP, SEQ)                            \
>> +static bool trans_##NAME(DisasContext *s, arg_r2nfvm* a)                  \
>> +{                                                                         \
>> +    vchkctx.check_misa = RVV;                                             \
>> +    vchkctx.check_overlap_mask.need_check = true;                         \
>> +    vchkctx.check_overlap_mask.reg = a->rd;                               \
>> +    vchkctx.check_overlap_mask.vm = a->vm;                                \
>> +    vchkctx.check_reg[0].need_check = true;                               \
>> +    vchkctx.check_reg[0].reg = a->rd;                                     \
>> +    vchkctx.check_reg[0].widen = false;                                   \
>> +    vchkctx.check_nf.need_check = true;                                   \
>> +    vchkctx.check_nf.nf = a->nf;                                          \
>> +                                                                          \
>> +    if (!vext_check(s)) {                                                 \
>> +        return false;                                                     \
>> +    }                                                                     \
>> +    return DO_OP(s, a, SEQ);                                              \
>> +}
> I don't see the improvement from a pointer.  Something like
>
>      if (vext_check_isa_ill(s) &&
>          vext_check_overlap(s, a->rd, a->rm) &&
>          vext_check_reg(s, a->rd, false) &&
>          vext_check_nf(s, a->nf)) {
>          return DO_OP(s, a, SEQ);
>      }
>      return false;
>
> seems just as clear without the extra data.
>
>> +#ifdef CONFIG_USER_ONLY
>> +#define MO_SB 0
>> +#define MO_LESW 0
>> +#define MO_LESL 0
>> +#define MO_LEQ 0
>> +#define MO_UB 0
>> +#define MO_LEUW 0
>> +#define MO_LEUL 0
>> +#endif
> What is this for?  We already define these unconditionally.
>
>
>> +static inline int vext_elem_mask(void *v0, int mlen, int index)
>> +{
>> +    int idx = (index * mlen) / 8;
>> +    int pos = (index * mlen) % 8;
>> +
>> +    return (*((uint8_t *)v0 + idx) >> pos) & 0x1;
>> +}
> This is a little-endian indexing of the mask.  Just above we talk about using a
> host-endian ordering of uint64_t.
>
> Thus this must be based on uint64_t instead of uint8_t.
>
>> +/*
>> + * This function checks watchpoint before really load operation.
>> + *
>> + * In softmmu mode, the TLB API probe_access is enough for watchpoint check.
>> + * In user mode, there is no watchpoint support now.
>> + *
>> + * It will triggle an exception if there is no mapping in TLB
>> + * and page table walk can't fill the TLB entry. Then the guest
>> + * software can return here after process the exception or never return.
>> + */
>> +static void probe_read_access(CPURISCVState *env, target_ulong addr,
>> +        target_ulong len, uintptr_t ra)
>> +{
>> +    while (len) {
>> +        const target_ulong pagelen = -(addr | TARGET_PAGE_MASK);
>> +        const target_ulong curlen = MIN(pagelen, len);
>> +
>> +        probe_read(env, addr, curlen, cpu_mmu_index(env, false), ra);
> The return value here is non-null when we can read directly from host memory.
> It would be a shame to throw that work away.
>
>
>> +/* data structure and common functions for load and store */
>> +typedef void vext_ld_elem_fn(CPURISCVState *env, target_ulong addr,
>> +        uint32_t idx, void *vd, uintptr_t retaddr);
>> +typedef void vext_st_elem_fn(CPURISCVState *env, target_ulong addr,
>> +        uint32_t idx, void *vd, uintptr_t retaddr);
>> +typedef target_ulong vext_get_index_addr(target_ulong base,
>> +        uint32_t idx, void *vs2);
>> +typedef void vext_ld_clear_elem(void *vd, uint32_t idx,
>> +        uint32_t cnt, uint32_t tot);
>> +
>> +struct vext_ldst_ctx {
>> +    struct vext_common_ctx vcc;
>> +    uint32_t nf;
>> +    target_ulong base;
>> +    target_ulong stride;
>> +    int mmuidx;
>> +
>> +    vext_ld_elem_fn *ld_elem;
>> +    vext_st_elem_fn *st_elem;
>> +    vext_get_index_addr *get_index_addr;
>> +    vext_ld_clear_elem *clear_elem;
>> +};
> I think you should pass these elements directly, as needed, rather than putting
> them all in a struct.
>
> This would allow the main helper function to be inlined, which in turn allows
> the mini helper functions to be inlined.
1. What's the main helper function? What's is the mini helper functions 
here?

I guess main helper function is such code like:

    #define GEN_VEXT_LD_UNIT_STRIDE(NAME, MTYPE, ETYPE)                \
    void HELPER(NAME##_mask)(void *vd, target_ulong base, void *v0, \
             CPURISCVState *env, uint32_t
    desc)                                              \
    { \
         static struct vext_ldst_ctx ctx;                                \
         vext_common_ctx_init(&ctx.vcc, sizeof(ETYPE),                 
                    \
             sizeof(MTYPE), env->vl, desc);                               \
         ctx.nf = vext_nf(desc);                                  \
         ctx.base = base;                                     \
         ctx.ld_elem = vext_##NAME##_ld_elem;                  \
         ctx.clear_elem = vext_##NAME##_clear_elem;               \
       \
         vext_ld_unit_stride_mask(vd, v0, env, &ctx, GETPC());          
                 \
    } \

And the mini helper function is such code like:

    static void vext_ld_unit_stride_mask(void *vd, void *v0,
    CPURISCVState *env,
             struct vext_ldst_ctx *ctx, uintptr_t ra)
    {
         uint32_t i, k;
         struct vext_common_ctx *s = &ctx->vcc;

         if (s->vl == 0) {
             return;
         }
         /* probe every access*/
         for (i = 0; i < s->vl; i++) {
             if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
                 continue;
             }
             probe_read_access(env, ctx->base + ctx->nf * i * s->msz,
                     ctx->nf * s->msz, ra);
         }
         /* load bytes from guest memory */
         for (i = 0; i < s->vl; i++) {
             k = 0;
             if (!s->vm && !vext_elem_mask(v0, s->mlen, i)) {
                 continue;
             }
             while (k < ctx->nf) {
                 target_ulong addr = ctx->base + (i * ctx->nf + k) * s->msz;
                 ctx->ld_elem(env, addr, i + k * s->vlmax, vd, ra);
                 k++;
             }
         }
         /* clear tail elements */
         for (k = 0; k < ctx->nf; k++) {
             ctx->clear_elem(vd, s->vl + k * s->vlmax, s->vl * s->esz,
                     s->vlmax * s->esz);
         }
    }

Is it right?
2.  The number of  parameters grows a lot when pass directly to mini 
helper functions.

For example, the number of parameters increases from 5 to 11.

    static void vext_ld_unit_stride(void *vd, target_ulong base, void *v0,
             CPURISCVState *env,  vext_ld_elem_fn *ld_elem,
             vext_ld_clear_elem *clear_elem，uint32_t vlmax,
             uint32_t nf, uint32_t esz, uint32_t msz, uintptr_t ra)


As vlmax and nf can be extracted  from desc, another form of mini helper
will increases the number of parameters from 5 to 10.

Is it OK?

BTW: In this patchset, I use a lot macros to generate code.  Is it OK?

Best Regards,
Zhiwei
>
> r~


[-- Attachment #2: Type: text/html, Size: 13219 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-02-19  8:58 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-10  7:42 [PATCH v3 0/5] target/riscv: support vector extension part 2 LIU Zhiwei
2020-02-10  7:42 ` LIU Zhiwei
2020-02-10  7:42 ` [PATCH v3 1/5] target/riscv: add vector unit stride load and store instructions LIU Zhiwei
2020-02-10  7:42   ` LIU Zhiwei
2020-02-12  6:38   ` Richard Henderson
2020-02-12  6:38     ` Richard Henderson
2020-02-12  8:55     ` LIU Zhiwei
2020-02-12  8:55       ` LIU Zhiwei
2020-02-19  8:57     ` LIU Zhiwei
2020-02-19  8:57       ` LIU Zhiwei
2020-02-10  7:42 ` [PATCH v3 2/5] target/riscv: add vector " LIU Zhiwei
2020-02-10  7:42   ` LIU Zhiwei
2020-02-10  7:42 ` [PATCH v3 3/5] target/riscv: add vector index " LIU Zhiwei
2020-02-10  7:42   ` LIU Zhiwei
2020-02-10  7:42 ` [PATCH v3 4/5] target/riscv: add fault-only-first unit stride load LIU Zhiwei
2020-02-10  7:42   ` LIU Zhiwei
2020-02-10  7:42 ` [PATCH v3 5/5] target/riscv: add vector amo operations LIU Zhiwei
2020-02-10  7:42   ` LIU Zhiwei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.